Cloudfront can be simply defined as a CDN (Content Delivery Network), caching your static assets in a datacenter nearer to your viewers. But Cloudfront is a lot more complex and versatile than this simple definition. Cloudfront is a “pull” CDN, which means that you don’t push your content to the CDN. The content is pulled into the CDN Edge from the origin at the first request of any piece of content.
In addition to the traditional pull and cache usage, Cloudfront can also be used as:
A Networking Router
A Web Server
An Application Server
Why is using a CDN relevant?
The main reason is to improve the speed of delivery of static content. By caching the content on the CDN edge, you not only reduce the download time from a few seconds to a few milliseconds, but you also reduce the load and amount of requests on your backend (Network, IO, CPU, Memory, …).
Static content can be defined as content not changing between two identical requests done in the same time frame.
Edges, Mid-Tier Caches, and Origins
Cloudfront isn’t “just” some servers in datacenters around the world. The service is a layered network of Edge Locations and Regional Edge Caches (or Mid-Tier Caches).
Edge Locations are distributed around the globe with more than 400 points of presence in over 90 cities across 48 countries. Each Edge Location is connected to one of the 13 Regional Edge Caches.
Regional Edge Caches are transparent to you and your visitors, you can’t configure them or access them directly. Your visitors will interact with the nearest Edge Location, which will connect to the attached Regional Edge Cache and finally to your origin. Therefore, in this article, we will refer to Cloudfront as the combination of Edge Locations and Region Edge Caches.
What Have We Learned?
Cloudfront is more than just a simple “pull-cache-serve” service
You improve delivery speed to your visitors
You can increase resilience by always using a healthy backend
You improve overall speed to your backend by leveraging AWS’s backbone
You can modify any request to tailor the response to your visitor’s device or region
You don’t always need a backend
You protect your backend by reducing the number of calls reaching it
This config will try to generate 2500 user request/second and will try to ramp up the requests to 10000/RPS in a period of 15 minutes.
As we can see in the above dashboard the requests started to get throttled, when the concurrent execution goes above 3000 execution/second. even when the traffic is gradually increasing
The number 3000 is AWS Lambda's burst concurrency limit in us-east-1 region.
After the initial burst, your functions' concurrency can scale by an additional 500 instances each minute. This continues until there are enough instances to serve all requests, or until a concurrency limit is reached. When requests come in faster than your function can scale, or when your function is at maximum concurrency, additional requests fail with a throttling error (429 status code).
This test will try to generate a quick traffic of 10000 users comes in the time span of 2 minutes
Here we can see artillery started with generating a quick load of around 4700/RPS . And lambda
started 3000+ containers to serve them and started throttling the requests.
Artillery generated a traffic of 10000 requests at max and out of it 600 requests got throttled
So in both scenarios(Gradual increase/Quick increase in traffic) we can see lambda was not able to process all the requests received because of the burst concurrency limit and the time needed for it to scale(500/min), during the period of scale after the initial burst it will throttle some of the requests.
Load Testing With Provisioned Concurrency
For this test i have enabled provisioned concurrency(10000) to the lambda function. Assuming that 10000 lambda instance's are available there to all the time to process any traffic up to 10000/RPS
This config will try to generate 2500 user request/second and will try to ramp up the requests to 9250/RPS in a period of 15 minutes. I kept 9250 because i want to see how the graph will look like without using 100% of the Provisioned concurrency.
Some info on Provisioned concurrency Cloudwatch metrics
ProvisionedConcurrentExecutions – concurrent executions using Provisioned Concurrency
ProvisionedConcurrencyUtilization - fraction of Provisioned Concurrency in use ie:
(ProvisionedConcurrentExecutions / total amount of provisioned concurrency allocated)
ProvisionedConcurrencyInvocations - number of invocations using Provisioned Concurrency
ProvisionedConcurrencySpilloverInvocations - number of invocations that are above Provisioned Concurrency
On the graph we can see artillery has generated a load of 9250 request per second. And lambda was able to execute all of that requests without throttling any of the request ✌️✌️✌️
There are some 5XX errors thrown by API Gateway. Which i believe because some of the lambda's timed out or failed to read from the DynamoDB, I didn't dig in deep because the goal here was to check if lambda was able to process all of the given request without throttling.
This config will try to generate 5000 user request/second and will try to ramp up the requests to 10000/RPS in a period of 3 minutes
Here artillery generated a traffic of 10000RPS and kept it linear for sometime. As we can see lambda was able to process all of the requests without throttling. ✌️✌️✌️
We can also see some numbers in ProvisionedConcurrencySpilloverInvocations graph around
350 requests. These invocations happens when the ProvisionedConcurrencyUtilization goes more than 100% (Count 1 in the graph represent 100%) these requests are served by lambda's on demand scaling and these requests may have cold starts.
The provisioned concurrency can also scale with AWS autoscaling. I tried to use it and it did not work as expected. There are not much resources available online regarding autoscaling of provisioned concurrency. I will dig deep into this soon and will try to update this doc with the results.
All these tests gives us answer to a couple of questions,
Is AWS Lambda scalable as a traditional EC2/Container based architecture?YES
Can Lambda serve 30000RPS ? YES
But it can be difficult.
With default AWS regional limits lambda cannot serve more than 1000 concurrent execution
With increased concurrent execution limit, there is still one more limit the Burst Concurrency limit. This will limit lambda to serve only 3000 concurrent request at time. If it receives more than 3000 concurrent requests some of them will be throttled until lambda scales by 500 per minute.
By enabling provisioned concurrency and adding required number of concurrency to a function we can scale the functions without any throttling.