Scaling AWS Lambda to 30k Requests Per Second

February 9, 2021

*NOTE: This article demonstrates how to get to 10k RPS, the assumption is by increasing the provisioned concurrency further we would get to the 30k RPS number in the title

GOAL

Test Lambda’s Ability to handle Handle 10000/RPS

APPLICATION

The application used for this tests is a single GET endpoint NodeJS application deployed with AWS API GW/Lambda/DynamoDB

LOAD TESTING TOOLS USED

Artillery.io

  • Written in NodeJS
  • Lightweight
  • Easy Installation
  • Uses JSON/YML/JS scripts
  • NO GUI
  • Load generation limited to host system memory and CPU Utilization

Jmeter

  • Runs on JVM
  • XML Configuration
  • GUI Available
  • Loads of Plugins
  • Better logs
  • Load generation limited to host system memory and CPU Utilization

Serverless-Artillery

  • Written in NodeJS
  • Runs on AWS Lambda
  • Lightweight
  • Easy Installation
  • Can generate higher throughput with simple configurations
  • Load generation time is limited to Lambda default timeout 15 min

The load test tool I used here is Serverless-Artillery because with MacBook Pro(8BG, 2.4GHz) I was not able to generate test loads more than 2000RPS.

Load Testing With Regional Soft Limits

Region: us-east-1

Concurrent Execution limit: 1000 (Shared across all the function in the region)

The Goal here is to find how many requests Lambda can handle with default soft limits applied by AWS on the us-east-1 region.

Load Test configuration

  
config:
  target: "https://e2oerxwy12.execute-api.us-east-1.amazonaws.com"
  plugins:
    cloudwatch:
      namespace: "sls-artillery"
  phases:
    -
      duration: 300
      arrivalRate: 500
      rampTo: 10000
scenarios:
  -
    flow:
      -
        get:
          url: "/dev/get?id=erewqed"
  


This config will try to generate 500 user request/second and will try to ramp up the requests to 10000/RPS in a period of 5 minutes.

Result

The above is the Cloudwatch Dashboard for the Application. All the count shown in the graphs are aggregate of 1 min.

For eg:  the first graph shows the API GW call's count. At the highest peak API Gateway received 448K request in a minute. Which means 475000/60 = 7916 request per second

As we can see in the concurrent execution graph it hits the region concurrency and goes in leaner after it reached 1000 concurrent execution.

At this point lambda starts throttling the requests as we  can see the Throttle Graph almost same amount of 5XX error can be seen in API Gateway 5XX graph

This test generated a throughput of 7916 at max and out of it 6416  got throttled

With default limits lambda can only serve 1000 concurrent request per second other requests beyond that will be throttled

Load Testing With Increased Regional Soft Limits

I have increased the regional concurrency limit  of us-east-1 region to 20000  via service quota limit increase. And I was expecting by the limit increase lambda can process 10000/RPS more.

Scenario 1

Region: us-east-1

Concurrent Execution limit: 20000 (Shared across all the function in the region)

Load Test configuration

  
config:
  target: "https://e2oerxwy12.execute-api.us-east-1.amazonaws.com"
  plugins:
    cloudwatch:
      namespace: "sls-artillery"
  phases:
    -
      duration: 900
      arrivalRate: 2500
      rampTo: 10000
scenarios:
  -
    flow:
      -
        get:
          url: "/dev/get?id=erewqed"
  


This config will try to generate 2500 user request/second and will try to ramp up the requests to 10000/RPS in a period of 15 minutes.

Result

As we can see in the above dashboard the requests started to get throttled, when the concurrent execution goes above 3000 execution/second. even when the traffic is gradually increasing

The number 3000 is AWS Lambda's burst concurrency limit in us-east-1 region.

After the initial burst, your functions' concurrency can scale by an additional 500 instances each minute. This continues until there are enough instances to serve all requests, or until a concurrency limit is reached. When requests come in faster than your function can scale, or when your function is at maximum concurrency, additional requests fail with a throttling error (429 status code).

Scenario 2

Load Test configuration

  
config:
  target: "https://e2oerxwy12.execute-api.us-east-1.amazonaws.com"
  plugins:
    cloudwatch:
      namespace: "sls-artillery"
  phases:
    -
      duration: 120
      arrivalRate: 10000
      rampTo: 10000
scenarios:
  -
    flow:
      -
        get:
          url: "/dev/get?id=erewqed"
  


This test will try to generate a quick traffic of 10000 users comes in the time span of 2 minutes

Result

Here we can see artillery started with generating a quick load of around 4700/RPS .  And lambda

started 3000+ containers to serve them and started throttling the requests.

Artillery generated a traffic of 10000 requests at max and out of it 600 requests got throttled

So in both scenarios(Gradual increase/Quick increase in traffic) we can see lambda was not able to process all the requests received because of the burst concurrency limit and the time needed for it to scale(500/min), during the period of scale after the initial burst it will throttle some of the requests.

Load Testing With Provisioned Concurrency

For this test i have enabled provisioned concurrency(10000) to the lambda function. Assuming that 10000 lambda instance's are available there to all the time to process any traffic up to 10000/RPS

Scenario 1

Load Test configuration

  
config:
  target: "https://e2oerxwy12.execute-api.us-east-1.amazonaws.com"
  plugins:
    cloudwatch:
      namespace: "sls-artillery"
  phases:
    -
      duration: 600
      arrivalRate: 2500
      rampTo: 9250
scenarios:
  -
    flow:
      -
        get:
          url: "/dev/get?id=erewqed"
  


This config will try to generate 2500 user request/second and will try to ramp up the requests to 9250/RPS in a period of 15 minutes. I kept 9250 because i want to see how the graph will look like without using 100% of the Provisioned concurrency.

Result

Some info on Provisioned concurrency Cloudwatch metrics

  1. ProvisionedConcurrentExecutions – concurrent executions using Provisioned Concurrency
  2. ProvisionedConcurrencyUtilization - fraction of Provisioned Concurrency in use ie:
  3. (ProvisionedConcurrentExecutions / total amount of provisioned concurrency allocated)
  4. ProvisionedConcurrencyInvocations  - number of invocations using Provisioned Concurrency
  5. ProvisionedConcurrencySpilloverInvocations - number of invocations that are above Provisioned Concurrency

On the graph we can see artillery has generated a load of 9250 request per second. And lambda was able to execute all of that requests without throttling any of the request ✌️✌️✌️

There are some 5XX errors thrown by API Gateway. Which i believe because some of the lambda's timed out or failed to read from the DynamoDB, I didn't dig in deep because the goal here was to check if lambda was able to process all of the given request without throttling.

Scenario 2

Load Test configuration

  
config:
  target: "https://e2oerxwy12.execute-api.us-east-1.amazonaws.com"
  plugins:
    cloudwatch:
      namespace: "sls-artillery"
  phases:
    -
      duration: 180
      arrivalRate: 5000
      rampTo: 10000
scenarios:
  -
    flow:
      -
        get:
          url: "/dev/get?id=erewqed"
  


This config will try to generate 5000 user request/second and will try to ramp up the requests to 10000/RPS in a period of 3 minutes

Result

Here artillery generated a traffic of 10000RPS and kept it linear for sometime. As we can see lambda was able to process all of the requests without throttling. ✌️✌️✌️

We can also see some numbers in ProvisionedConcurrencySpilloverInvocations  graph around

350 requests. These invocations happens when the ProvisionedConcurrencyUtilization goes more than 100% (Count 1 in the graph represent 100%) these requests are served by lambda's on demand scaling and these requests may have cold starts.

The provisioned concurrency can also scale with AWS autoscaling. I tried to use it and it did not work as expected. There are not much resources available online regarding autoscaling of provisioned concurrency.  I will dig deep into this soon and will try to update this doc with the results.

Conclusion

All these tests gives us answer to a couple of questions,

Is AWS Lambda scalable as a traditional EC2/Container based architecture? YES

Can Lambda serve 30000RPS ? YES

  • But it can be difficult.
  • With default AWS regional limits lambda cannot serve more than 1000 concurrent execution
  • With increased concurrent execution limit, there is still one more limit the Burst Concurrency limit.  This will limit lambda to serve only 3000 concurrent request at time. If it receives more than 3000 concurrent requests some of them will be throttled until lambda scales by 500 per minute.
  • By enabling provisioned concurrency and adding required number of concurrency to a function we can scale the functions without any throttling.

**Resources Referenced**

Access free book

The dream team

At Serverless Guru, we're a collective of proactive solution finders. We prioritize genuineness, forward-thinking vision, and above all, we commit to diligently serving our members each and every day.

See open positions

Looking for skilled architects & developers?

Join businesses around the globe that trust our services. Let's start your serverless journey. Get in touch today!
Ryan Jones
Founder
Speak to a Guru
Edu Marcos
Chief Technology Officer
Speak to a Guru
Mason Toberny
Head of Enterprise Accounts
Speak to a Guru

Join the Community

Gather, share, and learn about AWS and serverless with enthusiasts worldwide in our open and free community.