Deep Dive Into Serverless

February 7, 2023
Ryan Jones
5 minutes to read

Cloudfront can be simply defined as a CDN (Content Delivery Network), caching your static assets in a datacenter nearer to your viewers. But Cloudfront is a lot more complex and versatile than this simple definition.
Cloudfront is a “pull” CDN, which means that you don’t push your content to the CDN. The content is pulled into the CDN Edge from the origin at the first request of any piece of content.

In addition to the traditional pull and cache usage, Cloudfront can also be used as:

  • A Networking Router
  • A Firewall
  • A Web Server
  • An Application Server

Why is using a CDN relevant?

The main reason is to improve the speed of delivery of static content. By caching the content on the CDN edge, you not only reduce the download time from a few seconds to a few milliseconds, but you also reduce the load and amount of requests on your backend (Network, IO, CPU, Memory, …).


Static content can be defined as content not changing between two identical requests done in the same time frame.

Identical can be as simple as the same URI, or as fine grained as down to the authentication header. The time frame can range between 1 second to 1 year.
The most common case is caching resources like Javascript or CSS and serving the same file to all users forever. But caching a JSON response tailored to a user (Authentication header) for a few seconds reduces the backend calls when the user has the well-known “frenetic browser reload syndrome”.

Edges, Mid-Tier Caches, and Origins

Cloudfront isn’t “just” some servers in datacenters around the world. The service is a layered network of Edge Locations and Regional Edge Caches (or Mid-Tier Caches).

Edge Locations are distributed around the globe with more than 400 points of presence in over 90 cities across 48 countries. Each Edge Location is connected to one of the 13 Regional Edge Caches.

Regional Edge Caches are transparent to you and your visitors, you can’t configure them or access them directly. Your visitors will interact with the nearest Edge Location, which will connect to the attached Regional Edge Cache and finally to your origin. Therefore, in this article, we will refer to Cloudfront as the combination of Edge Locations and Region Edge Caches.

What Have We Learned?

Cloudfront is more than just a simple “pull-cache-serve” service

  • You improve delivery speed to your visitors
  • You can increase resilience by always using a healthy backend
  • You improve overall speed to your backend by leveraging AWS’s backbone
  • You can modify any request to tailor the response to your visitor’s device or region
  • You don’t always need a backend
  • You protect your backend by reducing the number of calls reaching it

Access free book

More from Serverless Guru

Building Serverless REST APIs for a Meal Prep Service with CloudGTO

October 31, 2023
Learn More

How to build an AWS AppSync GraphQL API with multiple data sources

October 26, 2023
Learn More

Building a Secure Serverless API with Lambda Function URL and CloudFront — Part 1

October 17, 2023
Learn More

Safeguarding Your Messages: Hello AWS SQS Dead Letter Queues!

Let's Talk

Introduction

In today's fast-paced world of cloud computing, where data flows ceaselessly between services and systems, ensuring the reliable delivery of messages has become paramount. Amazon Web Services (AWS) recognizes the importance of seamless messaging, which is why Amazon Simple Queue Service (SQS) has become a go-to choice for developers

But what happens when messages encounter roadblocks at their destination, causing disruptions in the flow of information? The answer lies in SQS Dead-Letter Queues (DLQs).

In this article, we will see how to set up an AWS SQS with a Dead-Letter Queue on your Serverless Framework project using Infrastructure as code (IaC) and understand the value of this feature for fault-tolerance systems.

This article covers:

  • What is a SQS DLQ?
  • When will you want to avoid SQS DLQ?
  • The main benefits of SQS DLQs.
  • How to set up a SQS DLQ with SLS.
  • Bonus: Handling Batch Errors easily in Lambda with SQS

What is a SQS DLQ?

The Problem

When you are developing or designing an architecture that involves queues, such as a high-volume write flow like the one below:

The "External Request" component that requests the SNS does not wait for the writing to finish. But, the SQS communication with Lambda is synchronous.

You can address the scenario where processing takes place after message delivery, and this component receives an unexpected data format - in the example from the diagram above, this component would be the Lambda function, or it could simply be a scenario that hasn't been covered by unit tests or even considered. When this happens, it's possible for a bug to emerge, and the processing of your Lambda function, or whatever you use to receive the message, may result in a failure.

In certain situations with Lambda, it might be something momentary, like a network failure or something similar. For such cases, retry mechanisms are available, which has been discussed in another article written by Samuel Lock.

However, if it's something unexpected and persistent, or even intermittent, it becomes necessary to isolate these failed messages to analyze the cause as soon as possible. The development team can then implement a hotfix and, if applicable, devise a retry strategy for these isolated messages, but only after the hotfix has been applied.

The Solution

Suggested revision: AWS services can act as building blocks, such as Legos. In this case, we can connect a SQS to a DLQ without changing the code of the current workflow, effectively designating it as the repository for messages that have encountered processing failures.

The DLQ will route problem messages to the components subscribed to it. In the example from the diagram above, we can connect this DLQ to a Lambda function. Instead of performing the processing that resulted in the failure, the Lambda function can simply store the messages in a data store like DynamoDB. Compared to leaving messages in SQS DLQs, storing them in a data store provides more flexibility and options for handling complex error scenarios, enabling queries to understand the inputs that caused errors. This process may involve writing new unit test cases, fixing the error in the Lambda code that triggered the exception, and, if it aligns with your processing type, attempting manual or automatic retries of these messages.

Additionally, instead of using DynamoDB, you can connect it to any other component, such as saving the error-causing inputs in an S3 bucket as JSON files and querying them using Athena. Once you can isolate error messages in a DLQ, the possibilities for precisely how you can handle them are limitless.

When will you want to avoid SQS DLQ?

We have two primary use cases where you want to avoid using a DLQ:

  • With standard queues when you want to be able to keep retrying the transmission of a message “indefinitely”.
  • With a FIFO queue if you don't want to break the exact order of messages or operations.

However, in these two scenarios, you may still consider using a DLQ, but it's most appropriate when:

  • You need to troubleshoot incorrect message transmission.
  • You aim to reduce the number of messages in your queue.
  • You want to minimize the risk of exposing your system to poison-pill messages (messages that can be received but cannot be processed).

Unless you know very well why you don't want to use it, I still recommend using a DLQ, so you don't take the risk of losing your messages permanently.

The main benefits of DLQs

  • Message Integrity. By isolating messages that have failed to process correctly, SQS DLQs help maintain data integrity.
  • Enhanced Reliability. This feature ensures that no message is lost, even when unexpected errors occur in your system.
  • Customized Error Handling. DLQs offer flexibility in handling failed messages. You can connect DLQs to various components, such as AWS Lambda, databases, or storage services like Amazon DynamoDB or Amazon S3, to implement customized error handling and analysis procedures. This flexibility allows you to choose the most suitable approach for your specific use case.

How to set up a SQS DLQ with Serverless Framework

As a prerequisite, you must have the Serverless Framework and Node.js installed.

First and foremost, you need to initiate a new project. There are several templates created by the ServerlessGuru team on GitHub at this link. For instance, you can find the Webpack template, which helps you achieve a smaller bundle size. Another widely used option is the Serverless Framework's own templates, which can be utilized with the following command, for example: 'serverless create --template aws-nodejs-ecma-script --path'.

Once the project is created, proceed to define the necessary resource syntax. Follow these steps to configure a Dead Letter Queue (DLQ). I will use names of my choice and recommend using the same ones for learning purposes. After it's up and running, you can make changes as needed.

1. Define the Dead Letter Queue (DLQ): In your resources section inside “resources: Resources:”, you will define the Dead Letter Queue (DLQ) for your SQS queue as 'DeadLetterQueueSubscribeNews'. Just remember that DLQ is where messages that couldn't be successfully processed will be sent.

2. Configure the main SQS Queue to use DLQ: In the 'SubscribeNewsQueue' resource definition, you will specify the 'RedrivePolicy', which is used to configure the main SQS queue to send messages to the DLQ when they fail processing. Here's what you have:

For our DLQ purpose about these keys, the most important one is “RedrivePolicy”, but you should  know what they do:

  • RedrivePolicy: This defines a dead-letter target queue named 'DeadLetterQueueSubscribeNews' and sets 'maxReceiveCount' to 3. Messages failing processing three times will be moved to the dead-letter queue for isolation and handling.
  • MessageRetentionPeriod: This is set to 3600 seconds (1 hour). Messages will be stored in the queue for up to 1 hour before automatic deletion.
  • VisibilityTimeout: It's configured for 30 seconds. Messages become "invisible" to other consumers for this duration after being picked up for processing.
  • ReceiveMessageWaitTimeSeconds: This parameter is set to 20 seconds, enabling long-polling to reduce unnecessary API requests during message retrieval.

3. Set up your Lambda function to use the main SQS Queue: Your Lambda function, 'subscribe', will be configured to be triggered by the 'SubscribeNewsQueue' SQS queue through the 'events' section, before check the screenshot bellow, it’s worth to mention that I’m using a widely used plugin to handle iam role inside the function configuration called serverless-iam-roles-per-function, so if you don’t want to use traditional iam role statement, install this plugin before continue:

This means that your Lambda function will consume messages from the 'SubscribeNewsQueue'.

4. Set up ANOTHER Lambda function to use the DLQ: Nothing special here, it’s almost the same configuration from the last Lambda, but it’s referencing the DQL

After these steps, you will have a DLQ working. If there are some doubts about how to handle the messages, I did a fully functional example on this repository.

Handling Batch Errors Easily in Lambda with SQS

You may have noticed the batch configuration in Lambda, let's understand this topic better. By default when you’re using batch in AWS Lambda with SQS as a trigger, if one message fails in a batch, the entire batch is retried including successfully processed messages. This default behavior can be inefficient, especially for large batches.

Not so long ago the challenge was handling errors effectively without marking the entire batch as successful. But after Serverless Framework 2.67.0 version was introduced the 'functionResponseType' option with the value 'ReportBatchItemFailures' to address this.

When 'functionResponseType' is set to 'ReportBatchItemFailures':

  • Only the specific failed message is retried.
  • Successfully processed messages in the batch are not retried.

For example, if you have one batch of 5 messages and 3 of them fail, but 2 are successful in processing. When Retry occurs, only the trigger for these 3 failed messages will be retriggered on batch and not the initial 5.

The only thing you need to do after setting the 'functionResponseType' to 'ReportBatchItemFailures'. It is to change the return of your code to return a list containing the ID of the messages that failed, like the example below:

In order to be able to assimilate a complete flow, I added another 2 code examples that deal with the ID of the messages that failed:

If you want to use another approach with predefined wrappers and configure 'functionResponseType' on cloudformation, check out the Lambda Powertools. It has a utility to help with this type of batch processing.


This setting improves efficiency and accuracy in handling errors during batch processing.

Conclusion

In this article, you learned how to prevent message loss when unexpected situations occur in your system.

AWS SQS Dead Letter Queues are vital components for ensuring the reliability and fault tolerance of your cloud-based applications. They enable easier troubleshooting of messages and enhance the integrity of your system.

Using this new skill in your toolkit when building serverless apps will empower developers to build robust systems that can withstand the unpredictability of the digital landscape.

If you have any questions about this topic, you can reach me by opening an issue on my GitHub or by contacting Serverless Guru on social media (Twitter) (Linkedin).

References

More from Serverless Guru

Join the Community

Gather, share, and learn about AWS and serverless with enthusiasts worldwide in our open and free community.