Monitor Your Serverless Apps Like a Formula 1 Engineer

September 24, 2019

“I love not patching the OS on my EC2 instances anymore, but with this API Gateway and Lambda setup on AWS, I can’t tell where stuff is going wrong. Where is a request failing? Where is the latency of a request slowing down? With hundreds of microservices, how can I see all this stuff in one place?” This is not an actual quote, but a summary of sentiments we’ve heard going around.

If you’re only here because of Formula 1, then check out how huge their monitoring setup on AWS is. If you’re here because you’d like to instrument your app a bit more, so that you too can make faster decisions, then read on.

Rise of Serverless Monitoring

In late 2016, early 2017, AWS started rolling out X-Ray, a service to trace requests from the end-user as they pass through distributed applications. A bunch of startups sprang up on the heels of this newly released AWS X-Ray API that improved upon it with more user-friendly interfaces, query options, pre-built dashboards, and actionable interpolated data points.

There are two main categories among products for serverless insights — startups and heavyweights. The startups, like Thundra, Epsagon, IOPipe, Lumigo, and Dashbird came out right on the heels of X-Ray. IOPipe, though founded April 2016, seems to have also added tracing to the line of products after the announcement of X-Ray, according to snapshots of their website on the Wayback Machine.

The heavyweights are big companies like Datadog, New Relic and AppDynamics. They have pockets that go tens and hundreds of millions of dollars deep. Probably as a result of being big and successful, they didn’t jump on announcing a serverless tracing product right away. Two years later, however, in and around 2019 we’re starting to see these bigger companies invest heavily in serverless too. The race is on.

There are a couple of monitoring solutions that we wanted to spotlight, as we found their features to be unique. Note: this isn’t paid publicity, these are just companies we think particularly stand out!

Thundra

If you like using SQL queries to sort through metrics, Thundra translates your point-and-click filters into SQL queries that you can edit on the fly. Check it out in their live demo (no sign-up required).

Thundra can automatically plug into your Lambda functions using a Serverless Framework plugin they developed. With a promise to delete all monitoring data within 24 hours if requested, Thundra seems to have one of the strongest privacy policies.

Thundra also offers customers the ability to forward monitoring data directly to your own on-premises servers or private cloud. Meaning that the data is never hitting the Thundra backend.

Datadog

In the past few months, Datadog has been making strides into the serverless monitoring space to meet some of the needs the serverless community has been facing.

It’s early on for Datadog, however, they have a large customer base distributed throughout the world and that should bode well for future releases from their team.

Tracing

Tracing is an area that practically every company mentioned thus far does to some degree.

Serverless Framework Enterprise

Although, it’s early on, the Serverless Framework has some useful functionality around labeling cold starts, catching errors, tracking invocations, and pulling in CloudWatch logs. Serverless Framework Enterprise also works with existing Serverless Framework projects with only two lines of code in your serverless.yml.

This is able to be achieved because Serverless Framework Enterprise is automatically wrapping around your AWS Lambda code during deployments. This is a common pattern and as a result we’ve seen a fair amount of monitoring companies roll out a Serverless Framework plugin to handle automatically wrapping your Lambda code.

If you just want to use straight-up AWS X-Ray without the prettified tracing, New Relic, NewDynamics and the AIOps provider MoogSoft all offer integrations with AWS X-Ray.

If you are interested in taking a magnifying glass to each API request and function in your code and are working with Java. Then check out Thundra “line-by-line” tracing. Line-by-line tracing gives you feedback on how each line of a method is performing. However, this functionality is only supported for Java, not for Python or NodeJS.

Alerts

Every company mentioned has some type of alerting, however here are some noteworthy mentions.

If you’re looking into solutions that offer advanced alerts that surpass the existing CloudWatch alerts. Thundra can do “aggregate” alerting. It’s gives developers the ability to alert based on if the health of a function drops below a certain threshold (e.g. 70%, 85%, etc).

The other alert worth mentioning from Thundra is alerting based on slow API requests. Allowing a developer to understand at a granular level where the performance impacts are coming from faster.

If you’re looking to “only get alerts that matter”, you could check out Datadog, they give you the ability to “fine tune” your alarms and “recognize acceptable failure levels”. Datadog also helps with a commonly faced problem at larger organizations where planned downtime throws a slew of alerts at your team. Datadog handles this by giving you the ability to easily schedule alerts to be off on a schedule matching your pre-planned downtime. Keeping Slack clean!

Datadog also covers a wide range of integrations, which allows your team to quickly be notified and respond in real-time.

On-prem Monitoring

The last area we will talk about is when you need to monitor your on-premises applications as well as your serverless applications. Although AWS CloudWatch has something called, CloudWatch agent which is meant to be deployed on servers either on the cloud or on-premises. Some of the monitoring providers mentioned above have their own on-premises agents.

DataDog would be one of the heavy-hitter monitoring providers worth noting that provide comprehensive on-premises monitoring through an agent that is installed onto your desired server. NewRelic and AppDynamics also have agents that can be installed similar to DataDog.

A recent development, Epsagon, released agentless tracing which is a way to connect your cloud functions, containers, and virtual machines together even if those containers or virtual machines are running on-premises or entirely different cloud environments. All without installing an agent on the host machine.

Conclusion

If you’re still deciding which monitoring solution to leverage for your company or use-case. We would recommend that you create a free trial and give all of them a shot!

If you want to learn more about the differences between logging, tracing, and monitoring. Check out this article by BMC.

‍

Access free book

The dream team

At Serverless Guru, we're a collective of proactive solution finders. We prioritize genuineness, forward-thinking vision, and above all, we commit to diligently serving our members each and every day.

See open positions