Cloudfront can be simply defined as a CDN (Content Delivery Network), caching your static assets in a datacenter nearer to your viewers. But Cloudfront is a lot more complex and versatile than this simple definition. Cloudfront is a “pull” CDN, which means that you don’t push your content to the CDN. The content is pulled into the CDN Edge from the origin at the first request of any piece of content.
In addition to the traditional pull and cache usage, Cloudfront can also be used as:
A Networking Router
A Firewall
A Web Server
An Application Server
Why is using a CDN relevant?
The main reason is to improve the speed of delivery of static content. By caching the content on the CDN edge, you not only reduce the download time from a few seconds to a few milliseconds, but you also reduce the load and amount of requests on your backend (Network, IO, CPU, Memory, …).
Static content can be defined as content not changing between two identical requests done in the same time frame.
Identical can be as simple as the same URI, or as fine grained as down to the authentication header. The time frame can range between 1 second to 1 year. The most common case is caching resources like Javascript or CSS and serving the same file to all users forever. But caching a JSON response tailored to a user (Authentication header) for a few seconds reduces the backend calls when the user has the well-known “frenetic browser reload syndrome”.
Edges, Mid-Tier Caches, and Origins
Cloudfront isn’t “just” some servers in datacenters around the world. The service is a layered network of Edge Locations and Regional Edge Caches (or Mid-Tier Caches).
Edge Locations are distributed around the globe with more than 400 points of presence in over 90 cities across 48 countries. Each Edge Location is connected to one of the 13 Regional Edge Caches.
Regional Edge Caches are transparent to you and your visitors, you can’t configure them or access them directly. Your visitors will interact with the nearest Edge Location, which will connect to the attached Regional Edge Cache and finally to your origin. Therefore, in this article, we will refer to Cloudfront as the combination of Edge Locations and Region Edge Caches.
What Have We Learned?
Cloudfront is more than just a simple “pull-cache-serve” service
You improve delivery speed to your visitors
You can increase resilience by always using a healthy backend
You improve overall speed to your backend by leveraging AWS’s backbone
You can modify any request to tailor the response to your visitor’s device or region
You don’t always need a backend
You protect your backend by reducing the number of calls reaching it
*NOTE: This article demonstrates how to get to 10k RPS, the assumption is by increasing the provisioned concurrency further we would get to the 30k RPS number in the title
GOAL
Test Lambda’s Ability to handle Handle 10000/RPS
APPLICATION
The application used for this tests is a single GET endpoint NodeJS application deployed with AWS API GW/Lambda/DynamoDB
This config will try to generate 500 user request/second and will try to ramp up the requests to 10000/RPS in a period of 5 minutes.
Result
The above is the Cloudwatch Dashboard for the Application. All the count shown in the graphs are aggregate of 1 min.
For eg: the first graph shows the API GW call's count. At the highest peak API Gateway received 448K request in a minute. Which means 475000/60 = 7916 request per second
As we can see in the concurrent execution graph it hits the region concurrency and goes in leaner after it reached 1000 concurrent execution.
At this point lambda starts throttling the requests as we can see the Throttle Graph almost same amount of 5XX error can be seen in API Gateway 5XX graph
This test generated a throughput of 7916 at max and out of it 6416 got throttled
With default limits lambda can only serve 1000 concurrent request per second other requests beyond that will be throttled
Load Testing With Increased Regional Soft Limits
I have increased the regional concurrency limit of us-east-1 region to 20000 via service quota limit increase. And I was expecting by the limit increase lambda can process 10000/RPS more.
Scenario 1
Region: us-east-1
Concurrent Execution limit: 20000 (Shared across all the function in the region)
This config will try to generate 2500 user request/second and will try to ramp up the requests to 10000/RPS in a period of 15 minutes.
Result
As we can see in the above dashboard the requests started to get throttled, when the concurrent execution goes above 3000 execution/second. even when the traffic is gradually increasing
The number 3000 is AWS Lambda's burst concurrency limit in us-east-1 region.
After the initial burst, your functions' concurrency can scale by an additional 500 instances each minute. This continues until there are enough instances to serve all requests, or until a concurrency limit is reached. When requests come in faster than your function can scale, or when your function is at maximum concurrency, additional requests fail with a throttling error (429 status code).
This test will try to generate a quick traffic of 10000 users comes in the time span of 2 minutes
Result
Here we can see artillery started with generating a quick load of around 4700/RPS . And lambda
started 3000+ containers to serve them and started throttling the requests.
Artillery generated a traffic of 10000 requests at max and out of it 600 requests got throttled
So in both scenarios(Gradual increase/Quick increase in traffic) we can see lambda was not able to process all the requests received because of the burst concurrency limit and the time needed for it to scale(500/min), during the period of scale after the initial burst it will throttle some of the requests.
Load Testing With Provisioned Concurrency
For this test i have enabled provisioned concurrency(10000) to the lambda function. Assuming that 10000 lambda instance's are available there to all the time to process any traffic up to 10000/RPS
This config will try to generate 2500 user request/second and will try to ramp up the requests to 9250/RPS in a period of 15 minutes. I kept 9250 because i want to see how the graph will look like without using 100% of the Provisioned concurrency.
Result
Some info on Provisioned concurrency Cloudwatch metrics
ProvisionedConcurrentExecutions – concurrent executions using Provisioned Concurrency
ProvisionedConcurrencyUtilization - fraction of Provisioned Concurrency in use ie:
(ProvisionedConcurrentExecutions / total amount of provisioned concurrency allocated)
ProvisionedConcurrencyInvocations - number of invocations using Provisioned Concurrency
ProvisionedConcurrencySpilloverInvocations - number of invocations that are above Provisioned Concurrency
On the graph we can see artillery has generated a load of 9250 request per second. And lambda was able to execute all of that requests without throttling any of the request ✌️✌️✌️
There are some 5XX errors thrown by API Gateway. Which i believe because some of the lambda's timed out or failed to read from the DynamoDB, I didn't dig in deep because the goal here was to check if lambda was able to process all of the given request without throttling.
This config will try to generate 5000 user request/second and will try to ramp up the requests to 10000/RPS in a period of 3 minutes
Result
Here artillery generated a traffic of 10000RPS and kept it linear for sometime. As we can see lambda was able to process all of the requests without throttling. ✌️✌️✌️
We can also see some numbers in ProvisionedConcurrencySpilloverInvocations graph around
350 requests. These invocations happens when the ProvisionedConcurrencyUtilization goes more than 100% (Count 1 in the graph represent 100%) these requests are served by lambda's on demand scaling and these requests may have cold starts.
The provisioned concurrency can also scale with AWS autoscaling. I tried to use it and it did not work as expected. There are not much resources available online regarding autoscaling of provisioned concurrency. I will dig deep into this soon and will try to update this doc with the results.
Conclusion
All these tests gives us answer to a couple of questions,
Is AWS Lambda scalable as a traditional EC2/Container based architecture?YES
Can Lambda serve 30000RPS ? YES
But it can be difficult.
With default AWS regional limits lambda cannot serve more than 1000 concurrent execution
With increased concurrent execution limit, there is still one more limit the Burst Concurrency limit. This will limit lambda to serve only 3000 concurrent request at time. If it receives more than 3000 concurrent requests some of them will be throttled until lambda scales by 500 per minute.
By enabling provisioned concurrency and adding required number of concurrency to a function we can scale the functions without any throttling.