AWS Serverless Concurrency

Concurrency

Concurrency is essential to understanding how serverless apps scale.

Concurrency is the number of concurrent Lambda calls.

It is calculated by multiplying the request rate by the function's average duration.

Requests for calls are throttled if they exceed the account or Lambda function concurrency limit.

The following affects your concurrency:

Each invocation model interacts with the Lambda service differently.

W3schools.com collaborates with Amazon Web Services to deliver digital training content to our students.

Requests will be throttled if they exceed concurrency.

For example, if your function runs for 20 seconds and receives 50 requests per second, your concurrency is 1000.

If your available concurrency is less than 1000, requests will be throttled.

A failed or throttled request will get two retries with an asynchronous event source.

For a synchronous event source, there are no built-in retries.

Streaming event sources like Kinesis Data Streams count shards.

For Lambda, the limit is one concurrent Lambda invocation per shard.

Most streaming services will keep retrying a record until it is processed or the retention time has expired.

A batch of records is held up until the retention period ends if one record fails.

That's why it's critical to add code to handle partial failures.

The table below compares Lambda Execution Model and Concurrency Measure.

Concurrency Measure	Lambda Execution Model
Request rate * average duration	Synchronous
Lift and shift	Asynchronous
By default, one execution per shard. Use parallelization factor to run several threads per shard.	Polling, stream
It needs more than 3 GB of memory	5 polls per second, rising with queue depth