Speed is everything in business. For businesses to stay competitive in a fast-moving tech space, the latest software versions should be rolled out as soon as they are ready for release, without disrupting active users. Docker and container runtimes have become the preferred deployment units of software and many enterprises have migrated their workloads to Kubernetes. It's safe to say that container runtime and Kubernetes in a sense have risen to become the gold standard of modern infrastructure.

The main aim of this article is to give you a clear idea about how to orchestrate a rolling update on a Kubernetes cluster and avoid downtime during Pod deployments. I will discuss how to build on rolling updates to achieve zero-downtime deployments without breaking or losing a single in-flight request.

The Setup

By default, Kubernetes uses a strategy called rolling update during deployments. This strategy tries to avoid downtime by synchronizing the termination of existing pods with the creation of new Kubernetes pods during the deployment window.

It keeps at least some instances up-and-running at any point in time while performing the updates. The API exposes some core primitives that allow you to specify the exact way how Kubernetes juggles multiple replicas during the update.

The Challenge

Depending on the workload and available compute resources we might want to configure, how many instances we want to over or under-provision at any time. Given three desired replicas, there are many update strategies at your disposal.

This code snippet configures one such strategy:

apiVersion: apps/v1
kind: Deployment
replicas: 3
maxSurge: 0
maxUnavailable: 1
type: RollingUpdate

maxSurge declares how many additional pods can be provisioned with regards to the number of replicas while maxUnavailable declares the number of pods that can be stopped with regards to the current number of replicas.

There are 3 main updating strategies that you can implement using the maxSurge and maxUnavailable primitives:

Deploy by adding a pod, then remove an old one

Allow only one additional pod (maxSurge = 1) above the desired number of 3, and the number of the available pods cannot go below it (maxUnavailable = 0).

Deploy by removing a pod, then adding a new one

Allow no additional pod (maxSurge = 0) and allow one single pod at a time to be unavailable (maxUnavailable = 1).

Deploy by updating pods as fast as possible

Finally, we are allowing  one additional Pod (maxSurge = 1) as well as one that is not available (maxUnavailable = 1). This is the fastest strategy out of all three because it completes in the shortest time.

There is yet another update strategy called recreate which involves terminating all the older pods and then  proceeding to spin up all the updated ones. This saves time and is ideal for a development environment, but the only issue here is that it fails to implement a zero-downtime deployment.

The Event

Now let’s say that I want to deploy a new version of my application. So I’ve created a new version of my docker container image and I’ve pushed that up to Docker Hub, and now I want to go ahead and upgrade my replication controller to that new container image that I’ve defined. The switch from an old to a new version will not be perfectly smooth, especially for services, with lots of requests/sec. Not all of the requests will be handled successfully and you will have a situation where lots of requests are failing.

The Root Cause

At a certain point, during this deployment resource update, some requests will be redirected to pods that are being taken out of service. This is because the Ingress Controller does not know a rollingUpdate Deployment is happening. Your pod continues to receive traffic after shutting down, causing downtime in your service. This is because it takes one or more seconds for the kube-proxy daemon to remove the pod ip from the iptables list and also for the endpoints controller to see this change and delete the pod ip from the list of valid endpoints.

Also, the time it takes for a newly created pod to be able to handle the workload is not negligible. Kubernetes will, however, start sending traffic to the container even though it may not be able to serve requests.

The Fix

To counter this behavior, you need to implement checks that decide when your application is ready to serve traffic and when it should be left alone. A readiness probe will check whether your container is ready so that it has some time to warm up before getting hit with requests from the service. If this check fails, the container will be removed from the service of Kubernetes endpoint and it will not receive any new traffic.

Integrating the readiness probe can be achieved with the following snippet:

- name: nginx
image: nginx:latest
- cat
- /tmp/test_readiness.txt
successThreshold: 1
failureThreshold: 2
periodSeconds: 5
initialDelaySeconds: 5
command: ["sh", "-c", "sleep 10 && /usr/sbin/nginx -s quit"]

Here Kubernetes will run a command inside a container. The command prints out the contents of test_readiness.txt to stdout. If it returns an exit code=0 then it is marked as healthy. The initialDelaySeconds is required because it takes an application a little bit of time to get ready.

The sleep 10 provisions time for the pod eviction event to propagate and ensures that the application has some time to get taken off of the endpoints.


In summary, when taking down the running pods, ideally you want to account for the time needed to clean up for a graceful shutdown. When provisioning new pods, you want to account for the time needed for your application to startup using a readiness check. You also want to avoid a situation where you are shutting down all the pods at the same time.

You can simplify the process of rolling updates in your cluster using Kalc to eliminate any guesswork. At Kalc, we believe that every move in your Kubernetes setup should be a calculated move. This is why we set out to roll out the Kubernetes calculator. Use it to estimate the cost of your next deployment. It will provide intelligent reports and alerts on potential issues with your intended changes.