AWS Lambda Concurrency Limits: Understanding and Managing Function Throttling

Lambda concurrency is the number of function instances running simultaneously. Each concurrent execution consumes one unit from your account's concurrency limit. The default limit is 1,000 concurrent executions per region — which sounds generous until you have dozens of Lambda functions, a microservices architecture where one API call triggers a cascade of downstream invocations, or an event-driven pipeline that processes bursts of work in parallel.

When the account-level concurrency limit is exhausted, new Lambda invocations are throttled. The throttling behavior varies by invocation type: synchronous invocations (API Gateway calling a Lambda function) return 429 errors immediately; asynchronous invocations (S3 events, SNS messages) retry for up to 6 hours before sending the event to a dead letter queue; event source mappings (SQS, Kinesis) pause polling until concurrency becomes available. None of these failure modes are immediately obvious without monitoring.

Account-Level vs. Function-Level Concurrency

The account-level limit is a shared pool. Every Lambda function in the region draws from the same 1,000-unit pool by default. This means a single function experiencing a traffic spike can consume the entire pool and throttle all other functions in the account — a noisy neighbor problem within your own account.

Reserved concurrency solves this by allocating a dedicated pool to specific functions. A function with reserved concurrency set to 100 always has up to 100 concurrent executions available, regardless of what other functions are doing. It also caps that function at 100 — it cannot consume more than its reserved allocation, which prevents it from starving other functions. The reserved allocation is subtracted from the unreserved pool available to other functions.

Assign reserved concurrency to your most critical functions (the ones handling user-facing API requests) and your most variable functions (event processors that can spike dramatically). Leave adequate unreserved concurrency for background and batch functions that can tolerate throttling without user impact.

Provisioned Concurrency for Cold Start-Sensitive Functions

Provisioned concurrency keeps function instances initialized and warm, eliminating cold start latency for the specified number of concurrent executions. It's distinct from reserved concurrency — you can have both, with reserved concurrency defining the maximum concurrent executions and provisioned concurrency pre-warming a subset of them.

Provisioned concurrency is charged hourly for the configured amount, regardless of whether the pre-warmed instances are invoked. For functions with strict latency requirements and predictable traffic patterns, this cost is justified. For functions with unpredictable traffic, consider using auto-scaling for provisioned concurrency — AWS supports Application Auto Scaling for Lambda provisioned concurrency that adjusts the warm pool based on actual utilization metrics.

Monitoring Concurrency Utilization

Lambda publishes concurrency metrics to CloudWatch under the AWS/Lambda namespace:

ConcurrentExecutions: current simultaneous executions across all functions in the account
UnreservedConcurrentExecutions: executions from the shared pool (not assigned reserved concurrency)
Throttles: count of throttled invocations per function
ProvisionedConcurrencyUtilization: percentage of provisioned concurrency in use

Create alarms on ConcurrentExecutions at 70% and 85% of your account limit. Also create per-function alarms on Throttles — any non-zero throttle count on a production function warrants investigation. Throttles on event-driven functions are particularly dangerous because they're silent: the SQS consumer pauses, messages queue up, and the system appears to be working normally until the queue depth starts growing.

Requesting Concurrency Limit Increases

Lambda account-level concurrency can be increased beyond 1,000 through Service Quotas. Submit a request through the Service Quotas console under Lambda — select "Concurrent executions" and specify your target value. AWS auto-approves increases to a few thousand for established accounts; larger increases require a brief justification explaining the workload.

For serious serverless architectures, 3,000-10,000 concurrent executions is a reasonable production limit. Request increases proactively based on your growth trajectory rather than waiting for throttles to appear in production. Monitor quota utilization across all services to catch approaching limits before they cause incidents.

Burst Concurrency Limits

In addition to the account-level concurrency limit, Lambda enforces a burst concurrency limit on how quickly the concurrent execution count can grow. When a function scales from 0 to handling a traffic spike, Lambda adds new instances at a rate capped by the burst limit (which varies by region — 3,000 initial burst in most major regions, with 500 additional instances per minute thereafter). A sudden spike from 0 to 5,000 concurrent executions doesn't happen instantaneously; there's a scaling ramp that takes several minutes.

Design event-driven architectures with burst behavior in mind. SQS queue consumers with Lambda benefit from configuring the batch size and maximum concurrency to avoid hitting burst limits. Spreading load across multiple SQS queues processed by separate functions effectively distributes the burst budget across multiple independent scaling contexts.

FAQ

What's the difference between Lambda throttling and Lambda errors?

Throttling occurs when the concurrency limit is exceeded — the invocation never starts. Errors occur when the function starts but fails during execution. Both appear in CloudWatch metrics but require different responses: throttling requires concurrency management or quota increases; errors require application debugging. Lambda's Errors metric doesn't include throttles, so monitor both Errors and Throttles separately for complete visibility.

Can I set per-function concurrency limits to prevent one function from starving others?

Yes. Setting reserved concurrency on a function caps it at that value — the function cannot exceed its reserved allocation even if unreserved concurrency is available. Setting a function to reserved concurrency of 0 effectively disables it (all invocations are throttled). This is useful for disabling a malfunctioning function without deleting it while you investigate.

Does Lambda concurrency work differently for container image functions?

No. Concurrency works the same way regardless of whether your function uses a ZIP deployment package or a container image. The concurrency model, reserved concurrency, and provisioned concurrency all apply identically. Container image functions do have slightly longer cold start times due to image pulling, which makes provisioned concurrency more valuable for latency-sensitive container image functions.

Protect your AWS accounts before it's too late

Vigilare monitors your AWS accounts for suspension risks — billing anomalies, IAM issues, GuardDuty findings, and more — and alerts you before AWS takes action.

See Vigilare pricing Talk to us about securing your AWS Browse documentation →

Written by Vigilare Engineering

Platform Team

The Vigilare platform team. We write about the AWS security, compliance, and cost signals behind account suspensions, and the practical steps to stay ahead of them.