Docker makes building containers remarkably easy. The downside of this simplicity is that it's easy to build huge containers full of things you don't need - including security holes. By using a smaller, specialized base image such as Alpine, you can significantly minimize the attack surface.
Recently there have been many Kubernetes user bug reports about Alpine and DNS lookups from Kubernetes Pods. This issue appears to be inconsistent and difficult to reproduce, but involves some DNS lookups being really slow.
In this post, I will explain the root causes for such delays, discuss some mitigations and present the fixes.
In Kubernetes, the most common way for a Pod to access a DNS server (kube-dns / CoreDNS) is via the Service abstraction.
You begin by scheduling a DNS Pod and Service on the Kubernetes cluster, and configuring the kubelets to tell individual containers to use the DNS Service IP to resolve DNS names.
The DNS server watches the Kubernetes API for new Services and creates a set of DNS records for each Service
Similarly, it also creates a set of DNS records for each pod in the cluster
"Recently, we had a recurring situation where DNS resolution was failing for Kubernetes-external services that took a full day to diagnose and remedy." Adam Margherio, Software Engineer @ Toyota Connected
Services importing and using Toyota's Redis-dependent library reported service crashes on start-up. The applications were failing to resolve an Azure Redis hostname during bean initialization.
The Azure Cache for Redis is a fully managed, open source–compatible in-memory data store. What the Engineers at Toyota Connected were facing was a DNS problem where the Pod container could not resolve the external network DNS for the Azure Redis Service.
There's is an open issue in Kubernetes #64924
The Root Cause
We've experienced similar issues with Alpine and DNS-lookups in Kubernetes. In versions of alpine 3.4 and later, a name resolution issue was introduced with the switch to musl libc causing problems with certain deployments of Kubernetes. One common issue you may find with musl libc, is with parallel querying of name servers.
The problem arises when your first name server has a different DNS view (such as service discovery through DNS). For example, when you start your Docker daemon with --dns=18.104.22.168 --dns=10.10.10.10 where 22.214.171.124 is a local DNS server to resolve names for service discovery and 10.10.10.10 is for external DNS resolving, you will not be able to guarantee that 126.96.36.199 will always be queried first. This leads to sporadic failures.
search ns.svc.cluster.local svc.cluster.local cluster.local
When making a query of the redis service, e.g. azure-redis-service-url.com, the DNS client should query the name and eventually return a correct resolved IP address. A DNS lookup request from a Pod is going to be sent to 10.10.10.10 which is a ClusterIP (a virtual IP) of the kube-dns Service. When querying multiple search domains, the Alpine DNS client will stop further lookups when one search domain returns something unexpected.
This problem occurs when two UDP packets are sent via the same socket at the same time from different threads. Which is exactly what happens in this case. The musl libc performs A and AAAA DNS lookups in parallel. A UDP packet might get dropped by the kernel due to the race condition. Eventually the client will try to re-send it after a timeout, usually 5 seconds, which is not ideal. Indeed, the problem is not only specific to Kubernetes - any Linux multi-threaded process sending UDP packets in parallel is prone to this issue.
There are many workarounds for this issue. You could disable parallel lookups, disable IPv6 to avoid AAAA lookups, use TCP for lookups, set a real IP address of a DNS server in a Pod's resolver configuration file instead, etc. An alternative patch is to run a DNS server instance on each node in the cluster and make the Kubernetes Pods to query the DNS server running on its local node.
At Kalc, we started using Alpine when the Docker images of other distros were rather large but you'd be surprised what the sizes are now e.g. ubuntu:19.04 is 30MB (instead of 300MB+). Admittedly, this doesn't beat alpine:3.8 which is 2MB. My recommendation is to not use Alpine for packaging containers (for now) ... this, realistically, is the low hanging fruit.
To completely eliminate the problem in all cases, it is important to have a failure prevention strategy. Most companies with early-stage Kubernetes deployments neglect adopting failure prevention strategies. At Kalc, our failure prevention approach covers substantially more risk control scenarios than are possible to define manually. Our AI-first approach covers DNS resolution failures among other known issues.
We have approached this problem by replicating Kubernetes' behavior in AI. Now we can train our Kubernetes AI with the most common failure scenarios and let developers test their cluster configs against these scenarios. This leads to minimized outages, increased deployment pipeline stability and visibility.
First, I showed you the underlying details of why DNS lookups can take longer to resolve or in some cases, timeout with an error. I revealed the culprit - the Alpine MUSL libc which is inherently racy.
Next, I presented the fixes which eliminate the relevant race conditions.
Finally, I emphasized that it's important to know your risk lines and to prevent failures by verifying and measuring the impact of your Kubernetes changes with kubectl-val.