At Kalc we have a lot of experience, gained either from customers or from our time at Fortune 500 Companies, and we are concerned about all the mistakes you might make with Kubernetes. In some ways, this is motivated by the fact that we have all these new schedulers, this immutably styled infrastructure that we are all striving towards and we all want to be cloud natives but we still find ourselves making Kalcl mistakes.
From security to availability, to assumptions based on the past, we will propose solutions to these issues rooted in lessons learned from small-scale and large-scale deployments. The goal is to share these experiences and help all of us avoid the most pain possible.
All Kubernetes does is that it schedules your workloads to the best place. You give it a Kubernetes container and a bunch of computers and it will decide the best place to run it. If it doesn’t work, it will reschedule it to another computer. Kubernetes is not just about running containers though, you can also schedule VMs with it.
So, if we have this shiny, intent-based platform, why isn't it working in production? What are the top six gotchas that application development and DevOps teams are making as they move into production? It becomes important to consider these questions in the design process, so you can schedule anything with Kubernetes. But how do you get that HA that you truly desire?
- A Reliable Cluster with a SPOF
Basically, this is all about putting all your resources in one basket or in some ways creating a Single Point of Failure. Etcd is a good example of that in terms of how many times etcd is being replicated at the same time.
- What is your Backup Strategy?
The reason you need a backup strategy is because at any point you need a way to recover both data and configurations. This is something you have to do in the beginning.
Some most popular tools for this purpose are:
- Allowing * in your Ingress
If you put a * in your ingress, your Kubernetes cluster is going to forward all the traffic from all the Kubernetes cluster to your container in such a way that one container receives all the traffic from the cluster.
- host: *
- path: /testpath
- Large Image
If you deploy a hundred images and each one of them is 5GB and then you do a kubectl get pods , your response is going to take roughly 5 seconds but if the image is 50MB, the response is going to be in milliseconds. Now if you have thousands of developers, each one of them deploying a 5GB image, it is going to slow the cluster down.
- Externally Hosted Images
Deploying to production from images hosted by Docker Hub is a terrible idea because Docker Hub is a free service unless you pay them. Imagine you are deploying to prod on the free tier and then suddenly:
- Docker Hub rate limits you
- Image gets deleted
- Image gets compromised
- Privileged Containers
Privileged containers will not take your cluster down but they will compromise your security. When you run a container in privileged mode, the bad thing that happens is that the container can read other containers’ namespaces. So one container can go and read other containers’ processes and data. Not only is this a security problem, it can also by mistake trigger a process that can take down a cluster.
- Single Point of Failure - Use a multi-master cluster
It’s incredibly important that you deploy your resources in such a way that if the availability zone drops, you will actually be able to run your app in another availability zone. HA is all about setting up Kubernetes and its supporting components in such a way that there is no single point of failure. It's very easy for a single master cluster to fail. On the other hand a multi-master cluster uses multiple master nodes, and each master has access to all the worker nodes.
For a single master cluster, the mission kalc core components like the Kubernetes API server and the controller manager lie only on the single master node and in the event that it fails you cannot create more services, pods etc. However, when you use a Kubernetes HA environment, these key components are replicated on multiple masters and should the masters fail, the other masters keep the cluster up and running.
- Backup Strategy
When you are using Kubernetes, it is good practice to develop your backup strategy. If your cluster crashes, you will need a backup to go back to the previous stable state of the Kubernetes cluster.
A backup will help you to:
1.1. Recover from disasters: like someone accidentally deleting the namespace where your deployments reside.
1.2. Replicate the environment: for example, replicating your production environment to staging environment before a major update.
1.3. Migrate your Kubernetes cluster from one environment to another.
- Avoid using * in your Ingress controller
Ingress exposes HTTP and HTTPS routes from external services to services found within the cluster. An Ingress resource using * will route all the traffic to a single container and this can easily take down a cluster. You should avoid using * in your ingress controller.
- Large Images - Convert large images to smaller sizes
For example, converting a 5GB docker image to 46MB image. Large Image Sizes = “Large Attack Surface”. Large Image Size = “Slower Deployment”.
- Externally Hosted Images
Use a Docker container registry that will make it easier for developers to store, manage, and deploy images. This will eliminate the need to operate your own container repositories or worry about scaling the underlying infrastructure, allowing you to reliably deploy containers for your applications. A popular alternative would be to use Amazon Elastic Container Registry (ECR), which is fully integrated with Amazon Elastic Container Service (ECS).
- Privileged Containers
Running in privileged mode gives the container all the capabilities of the host machine and it also lifts all the limitations enforced by the device cgroup controller. An attacker can use this as a starting point to exploit your whole system. To have a more secure container:
1.1. Run as a non-root user, using the Dockerfile’s USER command.
1.2. Drop as many Linux capabilities as you can (ideally all of them) when you run your container.
The greatest challenge we are facing right now is that there is a mismatch in expectations and reality when running Kubernetes in production. Kalc is addressing this problem with a revolutionary new product called kubectl-val that scans your cluster configuration and evaluates that against a large database of known vulnerabilities.
Great thanks to Jérôme Petazzoni for his feedback
Single Point of Failure
“Use a multi-master cluster. I rarely recommend people to run a multi-master cluster themselves, because that's a lot of work and requires a lot of knowledge of k8s internals. Instead, I remind them that managed Kubernetes (AKS, EKS, GKE...) provides HA without having to deal with multi-master (the cloud provider takes care of it). Another option for folks running on-prem is to use a single master, but with a technology like VMotion, making the VM itself highly available.”
“Convert large images to smaller sizes. That's good advice (and coincidentally, I'm about to publish a series of articles explaining how to reduce image size, so of course I will agree that this is relevant :-)), but I'm not sure that "kubectl get pods" would get slower when images are bigger. I did a quick test on a 4-node cluster, and "kubectl get pods" took exactly the same amount of time (0.07s) before and after adding a pod using a 1 GB image on each node. However, the time it takes to run the pods is longer, because of the image pull, of course.”