Thumbnail

5 Techniques for Addressing Cold Start Issues in Production Serverless Applications

5 Techniques for Addressing Cold Start Issues in Production Serverless Applications

Cold starts remain one of the most challenging obstacles for teams running serverless applications at scale. This article breaks down five proven techniques that reduce initialization latency and improve response times, drawing on insights from engineers who have solved these problems in production environments. These strategies range from optimizing package size to strategically managing resource initialization.

Shrink Package and Split Layers

We hit cold start issues hard when we moved our app's image processing pipeline to AWS Lambda. Users would upload a profile photo, and the first request after an idle period would take 8-12 seconds to respond because the Lambda function had to spin up a new container, load our image processing dependencies, and initialize the connection to S3. For a feature that should feel instant, that was unacceptable.

The technique that proved most effective was provisioned concurrency combined with dependency layer optimization. We split our Lambda deployment into layers, keeping the heavy image processing libraries in a separate layer that AWS caches more aggressively. Then we set provisioned concurrency to keep 3 warm instances running at all times during peak hours using a scheduled scaling policy. During off-peak we dropped it to 1.

But the real win came from reducing our deployment package size. We stripped out every unnecessary dependency, switched from a full image processing library to a lightweight alternative that did only what we needed, and moved from Python to a compiled language for the core processing function. That alone dropped our cold start from 8 seconds to about 1.5 seconds even without provisioned concurrency. With provisioned concurrency on top, users never experience a cold start during normal usage. The monthly cost increase for provisioned concurrency was about $45, which was trivial compared to the user experience improvement. My advice is to optimise your package size first because it's free and permanent, then layer on provisioned concurrency for the remaining gap.

Adopt Provisioned Concurrency and Slash Latency

I am a full-stack developer, and I found that using "Provisioned Concurrency" was the only way to stop slow start times from ruining my rental app. When I first launched my API on AWS Lambda, new visitors had to wait 2.8 seconds for the page to load. This "cold start" was so slow that users were leaving the site immediately. The main problems were that my code took over 2 seconds just to connect to the database. This gets worse during busy evening hours, and those delays would sometimes stretch to 5 seconds.
I tried a few different ways to fix this. We made efforts to "wake up" the code every few minutes. It only improved things by 38% and was very unreliable. Then I cleaned the code by reducing the file sizes. This tactic gave us a 28% boost, but it wasn't enough. In the end, I used Provisioned Concurrency. That kept 12 instances of my code "warm" and ready to go at all times.
The results were a total game changer. Our average wait time dropped from 2.8 seconds to just 112 milliseconds.

Prune Dependencies and Embrace Microservices

Cold starts are an obstacle inherent to serverless architectures; however, after working with multiple serverless providers and seeing real-life examples of this struggle instead of cloud provider blame, we can attribute this challenge mostly to excessive, unneeded dependencies being packaged with code. A combination of aggressively tree-shaking your code down to the minimum necessary to execute as well as migrating to an architecture more focused on microservices can help maintain minimal deployment package sizes which directly lead to faster cold start times. Eliminating the need for complex "warm-up" pings eliminates the need for lengthy cold starts in most production environments.

To overcome this architectural issue, you should focus on moving functionality into smaller and more precise functions rather than using a large monolithic application. This will directly address the cold start problem, which many high-volume applications experience. Additionally, if you do not optimize your build process to remove unused libraries, you will be paying for performance that you do not actually receive.

Finally, while there is a constant struggle between cost-effectiveness and response time for users, a calm and rational approach to micro-optimizing provides the separation between a stable system and one that will keep you awake at night.

Defer Initialization and Target Hot Paths

Cold start issues in serverless are one of those problems that feel theoretical until you are running something where latency actually matters. I hit this hard while building API infrastructure at a Fortune 100 healthcare technology company where provisioning response times directly affected clinical workflows. The first thing I learned is that cold start is not one problem, it is three: initialization time of the runtime, initialization time of your application code, and initialization time of your dependencies. Most people optimize for the first and ignore the second and third, which is where most of the latency actually lives.

The technique that proved most effective was a combination of provisioned concurrency for the critical path functions and aggressive lazy loading for everything else. The functions that sat on the path between a clinician request and a database response got provisioned concurrency because a 2 to 3 second cold start in that context is clinically unacceptable. Everything else got lazy initialization, meaning dependencies that were not needed for the first invocation were deferred until they were actually called. That combination cut our worst case cold start latency by roughly 70 percent without the cost of provisioning concurrency across every function in the service.

The less obvious thing that made the biggest difference was auditing what we were actually loading at initialization time. We had accumulated a set of SDK imports and configuration loaders that ran on every cold start regardless of which code path was actually being executed. Stripping that down to only what was genuinely needed for initialization, and moving everything else to lazy loading, was cheaper to implement than provisioned concurrency and had a comparable impact on the functions where cold start was a real problem. If you are hitting cold start issues the first question I would ask is not how do I keep the function warm, it is what am I doing at initialization that I do not actually need to do at initialization.

Ayush Raj Jha
Ayush Raj JhaSenior Software Engineer, Oracle Corporation

Decouple Inference and Keep Routers Warm

In production serverless applications handling heavy AI workloads, cold starts present a dual challenge: the infrastructure latency of initializing inference functions and the algorithmic sparsity of new data. To mitigate the infrastructure bottleneck, the most effective technique I've implemented relies on asynchronous orchestration. By decoupling heavy AI inference and retrieval layers into event-driven background tasks while keeping lightweight routing and state-management functions consistently warm, we can effectively mask initialization latency from the end user while preserving the cost and elasticity benefits of a serverless architecture.

On the algorithmic side, cold start issues in these systems typically manifest as new items or new users lacking historical engagement data. To address "item cold starts," we utilize content-based feature extraction to evaluate the intrinsic visual and textual properties of the content itself, rather than relying on a history of user interactions. For "user cold starts," we apply generalized learning models trained on platform-wide patterns to provide a strong baseline. This dual approach ensures immediate, high-quality recommendations until sufficient user-specific data is gathered for fine-tuning.

Debanshu Das
Debanshu DasEngineer & Technical Lead

Related Articles

Copyright © 2026 Featured. All rights reserved.