In container-as-a-service (CaaS) architectures, balancing product performance and cost optimization hinges on eliminating unnecessary cloud expenditure. One method to reduce such spending is to deactivate unused containers and reactivate them as needed, a process called zero scaling. This article examines scaling to zero within the CaaS framework, explains how it contrasts with autoscaling, and uncovers how it can be used to optimize your CaaS costs.
What Is Zero Scaling?
Zero scaling, or scaling to zero, involves reducing resources to zero when idle and scaling to the required number of replicas as needed. This approach allows for the optimization of cloud resources and their associated costs, ensuring that containers only run when necessary.
Why Is Zero Scaling Essential for Businesses Using CaaS?
Since CaaS providers typically operate on a pay-per-use model, zero scaling of unused containers is prudent, particularly as data and traffic demands fluctuate over time. Without scaling to zero, idle containers with no demand continue to use computing resources, leading to unnecessary costs for unused capabilities.
Here’s an analogy to illustrate the value of scaling to zero: Imagine an examination body that conducts biannual tests. Its website is active only during the weeks surrounding each test period. If this website is left running throughout the year, or even if it is scaled down to a single replica, the idle container(s) for the registration service will still accrue costs, eating into the enterprise’s cloud budget. Zero scaling solves this problem by ensuring the enterprise only operates and incurs costs for the containers twice a year, during the active test periods.
Autoscaling vs. Zero Scaling
Although scaling to zero and autoscaling to one replica both aim for cost optimization, they differ in their approach, efficacy, and implementation. While autoscaling does generate some cost savings, zero scaling does so more effectively.
The table below offers a bird’s eye view of the similarities, differences, upsides, and downsides of each approach.
How Does Zero Scaling Work in a CaaS Architecture?
Zero scaling for CaaS first requires a decision about whether to use the manual or automatic method. A four-step process is then required to set up zero scaling.
Manual vs Automatic
Zero scaling can be implemented either manually or automatically. The manual scaling method involves toggling the ON/OFF icon for each container manually, based on inbound requests. This approach may be impractical due to the significant effort it demands.
In the automatic approach, the architecture is configured to detect new requests automatically and activate the necessary number of containers. Four key steps are involved in zero scaling: configuration, tracking, triggering, and scaling. Although most CaaS providers offer autoscaling by default, you can configure scaling to zero yourself.
Four-Step Process
Having determined whether to use manual or automatic implementation, zero scaling can be set up using a four-step process.
Step 1: Configuration
In this step, you define your scaling limits, rules, and behavior. Zero scaling requires that you set minimum and maximum target replica counts. Typically, there is a default number, which varies across different CaaS providers. For example, in the following table, the default is autoscaling to 1 replica, while the newly defined minimum value is scaling to zero.
Scale Property | Description | Default value | Minimum value | Maximum value |
minReplicas | Minimum replica count | 1 | 0 | 30 |
maxReplicas | Maximum replica count | 10 | 1 | 30 |
When setting the scale limit to zero, keep in mind that there’s usually a default cooldown period ranging from 30 to 300 seconds. During this time, it is not possible to launch or terminate any replicas. This cooldown period prevents autoscaling groups (ASGs)—a key component of zero scaling in CaaS—from adding or removing replicas before completing the previously scheduled activities on the replicas.
You can set scale rules using one of these three methods:
- HTTP requests: When defining scale rules based on HTTP requests, you configure your CaaS provider to collect HTTP metrics every 15 seconds and launch a new replica when a specific number of HTTP requests is reached. For example, if you set the maximum value of HTTP requests per replica at 100 and the minimum to 0, a new replica will be launched once HTTP requests hit 101. If no requests are incoming and the cooldown period has passed, all existing replicas will be terminated.
- TCP connections: This method is similar to HTTP request-based scaling, but it relies on the number of concurrent TCP connections.
- Event-driven or custom metrics: Custom or event-driven metrics are application-specific metrics collected via an API. Integrating a third-party autoscaler, such as Knative or KEDA, makes this method more complex. Not all custom metrics are suitable for every application type.
Step 2: Monitoring
In zero scaling, software acts as an intermediary between containers and inbound requests, monitoring traffic flow and collecting metrics based on the scale rules you set in step 1.
Step 3: Triggering
Also known as scale behavior, triggering uses an algorithm to compare the collected metrics with your preset scale limits. It thereby determines the appropriate number of replicas to operate at any time. The algorithm must also detect new requests in zero-scaled containers, initiating a new replica to avoid request timeouts or HTTP 500 errors.
Step 4: Scaling
This step adjusts the number of required replicas, scaling in or out to achieve the precise quantity needed.
Zero Scaling Pros and Cons
Let’s look at the pros and cons of zero scaling.
Advantages of Zero Scaling
Beyond the clear cost-saving benefits, zero scaling offers several additional advantages:
- Waste elimination: Cloud resources are billed by the hour or second, regardless of actual use. An idle instance costing $0.0105 per second could lead to $163,296 in wasted expenses over six months. Zero scaling ensures you pay only for the computing load you use, with minimal maintenance fees possibly incurred.
- Budgeting accuracy and better investments: Zero scaling eliminates unnecessary expenses by terminating idle instances, thereby reducing bills. Staying within budget allows for better resource allocation, enhanced return on investment (ROI), and the opportunity to invest in innovation with the saved funds.
- Improved security: Idle containers can become backdoors for cybersecurity attacks. With zero scaling, all containers, including potentially vulnerable ones, are deleted, thus shrinking the attack surface. Also, launching a new replica typically involves using a freshly pulled image, unless configured otherwise, which helps to eliminate existing vulnerabilities.
Disadvantages of Zero Scaling
Zero scaling has one significant drawback: the cold start time, which is the time it takes for a new or previously terminated app or service to boot up from scratch. In well-optimized CaaS and Kubernetes environments, this cold start period lasts 1–2 seconds because new containers do not receive requests until they are marked “Ready”. This can be a problem because web requests normally need to be processed in less than 100 milliseconds.
To mitigate the cold start-time issue, consider the following strategies:
- Prefetch and cache images locally or at the edge, rather than pulling them from a remote registry for each replica launch. This can significantly reduce start times.
- Use zero scaling wisely, understanding when and how it’s best applied. For example, with a situation like the exam registration portal mentioned earlier, preemptively sending a few requests to the container just before peak times can help avoid cold start delays for users.
- In development/testing environments, implement scheduled zero scaling by turning off idle containers during non-working hours, such as nights, weekends, or holidays, and reactivating them when the team returns.
- Where zero scaling isn’t feasible, employ a keep-warm strategy. This involves maintaining one running instance to handle immediate requests, with the capacity to scale up or out in response to increased traffic, similar to autoscaling.
Gcore for CaaS Zero Scaling
Gcore CaaS offers an intuitive scale-to-zero feature alongside the default autoscaling option, providing you with the option to decide which of your containers should be scaled to zero and which must be scaled to one.
Gcore CaaS delivers high availability and 24/7 technical support to ensure a seamless zero-scaling process.
Conclusion
Zero scaling is based on the fact that not all applications or services receive constant traffic. Allowing container resources to remain active without traffic leads to unnecessary expenditure, impacting an organization’s revenue and diminishing ROI. For applications that don’t have constant activity, zero scaling can result in substantial cost savings within the CaaS framework, where customers pay for resource consumption regardless of actual use. However, the cold start time is a notable compromise, effectively managed by selectively applying zero scaling to appropriate applications while scaling others to a minimum of one.
Gcore CaaS enables easy configuration of both zero scaling and autoscaling with just a few clicks. Get started for free to explore how it works.