Optimizing Costs for Container-as-a-Service Architectures Using Zero Scaling

Optimizing Costs for Container-as-a-Service Architectures Using Zero Scaling

In container-as-a-service (CaaS) architectures, balancing product performance and cost optimization hinges on eliminating unnecessary cloud expenditure. One method to reduce such spending is to deactivate unused containers and reactivate them as needed, a process called zero scaling. This article examines scaling to zero within the CaaS framework, explains how it contrasts with autoscaling, and uncovers how it can be used to optimize your CaaS costs.

What Is Zero Scaling?

Zero scaling, or scaling to zero, involves reducing resources to zero when idle and scaling to the required number of replicas as needed. This approach allows for the optimization of cloud resources and their associated costs, ensuring that containers only run when necessary.

Why Is Zero Scaling Essential for Businesses Using CaaS?

Since CaaS providers typically operate on a pay-per-use model, zero scaling of unused containers is prudent, particularly as data and traffic demands fluctuate over time. Without scaling to zero, idle containers with no demand continue to use computing resources, leading to unnecessary costs for unused capabilities.

Here’s an analogy to illustrate the value of scaling to zero: Imagine an examination body that conducts biannual tests. Its website is active only during the weeks surrounding each test period. If this website is left running throughout the year, or even if it is scaled down to a single replica, the idle container(s) for the registration service will still accrue costs, eating into the enterprise’s cloud budget. Zero scaling solves this problem by ensuring the enterprise only operates and incurs costs for the containers twice a year, during the active test periods.

Autoscaling vs. Zero Scaling

Although scaling to zero and autoscaling to one replica both aim for cost optimization, they differ in their approach, efficacy, and implementation. While autoscaling does generate some cost savings, zero scaling does so more effectively.

The table below offers a bird’s eye view of the similarities, differences, upsides, and downsides of each approach.

ParametersAutoscalingZero Scaling
DescriptionAllows you to scale down/up or in/out automatically based on preconfigured schedules (date/time), user request metrics, or artificial intelligence predictions.Lets you scale in (to zero) and out by tracking and predicting inbound requests, and increasing or removing the number of container instances accordingly.
DifferencesKeeps at least one container invocation running during idle periods. Supports both vertical and horizontal scaling.Deletes all containers when idle. Supports horizontal scaling only.
Use caseApplications with regular periods of use, but with highly varied peak and non-peak usage.Applications with regular periods of disuse.
BenefitsNo latency or performance glitches once inbound requests start to arrive.Lower baseline utilization and cost; you do not pay for idle resources at all. You fully maximize resource utilization and save on cost.
RestrictionsComparatively higher cost; since at least one container invocation is left running you continue to pay for compute resources that are not being used.Some lag time during which new pods are created and booted up, usually 1–2 seconds.
Vertical (scale up/down) and horizontal (scale in/out) autoscaling vs. zero scaling (scale to zero)
The differences between zero scaling and autoscaling

How Does Zero Scaling Work in a CaaS Architecture?

Zero scaling for CaaS first requires a decision about whether to use the manual or automatic method. A four-step process is then required to set up zero scaling.

Manual vs Automatic

Zero scaling can be implemented either manually or automatically. The manual scaling method involves toggling the ON/OFF icon for each container manually, based on inbound requests. This approach may be impractical due to the significant effort it demands.

In the automatic approach, the architecture is configured to detect new requests automatically and activate the necessary number of containers. Four key steps are involved in zero scaling: configuration, tracking, triggering, and scaling. Although most CaaS providers offer autoscaling by default, you can configure scaling to zero yourself.

Four-Step Process

Having determined whether to use manual or automatic implementation, zero scaling can be set up using a four-step process.

The zero scaling process includes configuration, tracking, triggering, and scaling
How zero scaling works

Step 1: Configuration

In this step, you define your scaling limits, rules, and behavior. Zero scaling requires that you set minimum and maximum target replica counts. Typically, there is a default number, which varies across different CaaS providers. For example, in the following table, the default is autoscaling to 1 replica, while the newly defined minimum value is scaling to zero.

Scale PropertyDescriptionDefault valueMinimum valueMaximum value
minReplicasMinimum replica count1030
maxReplicasMaximum replica count10130

When setting the scale limit to zero, keep in mind that there’s usually a default cooldown period ranging from 30 to 300 seconds. During this time, it is not possible to launch or terminate any replicas. This cooldown period prevents autoscaling groups (ASGs)—a key component of zero scaling in CaaS—from adding or removing replicas before completing the previously scheduled activities on the replicas.

You can set scale rules using one of these three methods:

  • HTTP requests: When defining scale rules based on HTTP requests, you configure your CaaS provider to collect HTTP metrics every 15 seconds and launch a new replica when a specific number of HTTP requests is reached. For example, if you set the maximum value of HTTP requests per replica at 100 and the minimum to 0, a new replica will be launched once HTTP requests hit 101. If no requests are incoming and the cooldown period has passed, all existing replicas will be terminated.
  • TCP connections: This method is similar to HTTP request-based scaling, but it relies on the number of concurrent TCP connections.
  • Event-driven or custom metrics: Custom or event-driven metrics are application-specific metrics collected via an API. Integrating a third-party autoscaler, such as Knative or KEDA, makes this method more complex. Not all custom metrics are suitable for every application type.

Step 2: Monitoring

In zero scaling, software acts as an intermediary between containers and inbound requests, monitoring traffic flow and collecting metrics based on the scale rules you set in step 1.

Step 3: Triggering

Also known as scale behavior, triggering uses an algorithm to compare the collected metrics with your preset scale limits. It thereby determines the appropriate number of replicas to operate at any time. The algorithm must also detect new requests in zero-scaled containers, initiating a new replica to avoid request timeouts or HTTP 500 errors.

Step 4: Scaling

This step adjusts the number of required replicas, scaling in or out to achieve the precise quantity needed.

Zero Scaling Pros and Cons

Let’s look at the pros and cons of zero scaling.

Advantages of Zero Scaling

Beyond the clear cost-saving benefits, zero scaling offers several additional advantages:

  • Waste elimination: Cloud resources are billed by the hour or second, regardless of actual use. An idle instance costing $0.0105 per second could lead to $163,296 in wasted expenses over six months. Zero scaling ensures you pay only for the computing load you use, with minimal maintenance fees possibly incurred.
  • Budgeting accuracy and better investments: Zero scaling eliminates unnecessary expenses by terminating idle instances, thereby reducing bills. Staying within budget allows for better resource allocation, enhanced return on investment (ROI), and the opportunity to invest in innovation with the saved funds.
  • Improved security: Idle containers can become backdoors for cybersecurity attacks. With zero scaling, all containers, including potentially vulnerable ones, are deleted, thus shrinking the attack surface. Also, launching a new replica typically involves using a freshly pulled image, unless configured otherwise, which helps to eliminate existing vulnerabilities.

Disadvantages of Zero Scaling

Zero scaling has one significant drawback: the cold start time, which is the time it takes for a new or previously terminated app or service to boot up from scratch. In well-optimized CaaS and Kubernetes environments, this cold start period lasts 1–2 seconds because new containers do not receive requests until they are marked “Ready”. This can be a problem because web requests normally need to be processed in less than 100 milliseconds.

To mitigate the cold start-time issue, consider the following strategies:

  • Prefetch and cache images locally or at the edge, rather than pulling them from a remote registry for each replica launch. This can significantly reduce start times.
  • Use zero scaling wisely, understanding when and how it’s best applied. For example, with a situation like the exam registration portal mentioned earlier, preemptively sending a few requests to the container just before peak times can help avoid cold start delays for users.
  • In development/testing environments, implement scheduled zero scaling by turning off idle containers during non-working hours, such as nights, weekends, or holidays, and reactivating them when the team returns.
  • Where zero scaling isn’t feasible, employ a keep-warm strategy. This involves maintaining one running instance to handle immediate requests, with the capacity to scale up or out in response to increased traffic, similar to autoscaling.

Gcore for CaaS Zero Scaling

Gcore CaaS offers an intuitive scale-to-zero feature alongside the default autoscaling option, providing you with the option to decide which of your containers should be scaled to zero and which must be scaled to one.

Autoscaling interface in Gcore CaaS
Gcore Customer Portal UI for setting scale rules

Gcore CaaS delivers high availability and 24/7 technical support to ensure a seamless zero-scaling process.


Zero scaling is based on the fact that not all applications or services receive constant traffic. Allowing container resources to remain active without traffic leads to unnecessary expenditure, impacting an organization’s revenue and diminishing ROI. For applications that don’t have constant activity, zero scaling can result in substantial cost savings within the CaaS framework, where customers pay for resource consumption regardless of actual use. However, the cold start time is a notable compromise, effectively managed by selectively applying zero scaling to appropriate applications while scaling others to a minimum of one.

Gcore CaaS enables easy configuration of both zero scaling and autoscaling with just a few clicks. Get started for free to explore how it works.

Explore Gcore CaaS

Subscribe to our newsletter

Stay informed about the latest updates, news, and insights.