AWS Serverless Concurrency
Concurrency
Concurrency is essential to understanding how serverless apps scale.
Concurrency is the number of concurrent Lambda calls.
It is calculated by multiplying the request rate by the function's average duration.
Requests for calls are throttled if they exceed the account or Lambda function concurrency limit.
The following affects your concurrency:
- The event source's invocation model
- AWS service restrictions
Each invocation model interacts with the Lambda service differently.
AWS Serverless Concurrency Video
W3schools.com collaborates with Amazon Web Services to deliver digital training content to our students.
How Concurrency Works
Requests will be throttled if they exceed concurrency.
For example, if your function runs for 20 seconds and receives 50 requests per second, your concurrency is 1000.
If your available concurrency is less than 1000, requests will be throttled.
A failed or throttled request will get two retries with an asynchronous event source.
Concurrency limits
For a synchronous event source, there are no built-in retries.
Streaming event sources like Kinesis Data Streams count shards.
For Lambda, the limit is one concurrent Lambda invocation per shard.
Most streaming services will keep retrying a record until it is processed or the retention time has expired.
A batch of records is held up until the retention period ends if one record fails.
That's why it's critical to add code to handle partial failures.
The table below compares Lambda Execution Model and Concurrency Measure.
Concurrency Measure | Lambda Execution Model |
---|---|
Request rate * average duration | Synchronous |
Lift and shift | Asynchronous |
By default, one execution per shard. Use parallelization factor to run several threads per shard. | Polling, stream |
It needs more than 3 GB of memory | 5 polls per second, rising with queue depth |
Related reads:
Managing Concurrency for a Lambda Function