Intro

Docker and Kubernetes have transformed the way businesses deliver software to their end-users. Kubernetes containers have risen to become the new unit of delivering software. Kubernetes has evolved with capabilities that can deliver a cloud that solely uses containers for application delivery.

As Verizon, AT&T, and other large telecommunications companies continue building their 5G infrastructure, it's becoming clear that 5G is going to be running on Kubernetes. AT&T is using the OpenStack-Helm project to orchestrate containers across a bare-metal Kubernetes cluster. The initial use case for AT&T’s containerized Network Cloud will be to power VNFs (Virtual Network Functions) for the emerging 5G networking.

The Setup

AT&T chose Kubernetes and OpenStack over VMWare to provide the flexibility and agility required for a cutting edge, continent-spanning 5G network.

OpenStack has positioned itself as the leading option for building private clouds.

Using one control plane, you can manage compute, networking, and storage.

OpenStack and container technology forms the backbone of AT&T's massively distributed service that serves millions of users.

Software Layers of the AT&T Network Cloud Reference Design

  1. They OS / OCI Compliant Container Runtime
    The Linux OS, and the container runtime, form the foundation blocks of enabling containerization in the platform.
  2. Kubernetes https://kubernetes.io/
    They use Kubernetes for container orchestration.
  3. Helm https://helm.sh/
    Helm is used as a Kubernetes package manager. It helps in packaging, releasing, installing, and upgrading Kubernetes applications.
  4. Calico https://www.projectcalico.org/
    Calico provides a Layer3 software-defined-networking (SDN) framework for the control plane.
  5. Ceph https://ceph.io/
    Ceph is used as a software-defined-storage (SDS) backend for the control plane.

On top of these foundational open source container and orchestration layers, the platform is leveraging other open source software stacks that help operators manage their cloud delivery.

The Challenge

Docker and Kubernetes are usually doing things that make a lot of sense, especially as they release new versions and bring new features. The only problem is that those changes sometimes don't make sense for organizations that are trying to run workloads that are based on Open Infrastructure. Low level things like running OpenVSwitch, Libvert, etc become incredibly hard to manage.

The Event

This is an issue AT&T had to face with Kubernetes 1.10.0 running on Kernel: 4.15.0-34-generic | Systemd: 229.

A dangling cgroup was created whenever a Pod with a volume referencing a secret was present in the Pod spec. It didn't even have to be mounted anywhere (i.e. "volumeMounts didn't even have to be present in the Pod spec). This led to the build up of a very high number of stale orphaned systemd "Kubernetes transient mount" entries that was impacting kubelet's CPU and Memory.

Cronjobs accentuated this problem because they create a lot of Kubernetes Pods. This would effectively leak cgroups. The leaked cgroups would then end up causing huge spikes in CPU and Memory load for the kubelet especially when scraping the cAdvisor metrics, which include CPU and Memory usage for every cgroup present on the system.

This issue could take quite some time for it to manifest itself - slowly building up.

The Root Cause

The team at AT&T found it challenging to arrive at a possible root cause for these leaks. It was equally not easy figuring out what sort of environment this was applicable to.

In the end, they were able to deduce that it wasn't exactly the Kernel Version or the Systemd Version that was the root cause, rather it was the combination of the two that would end up causing this problem.

The Fix

In such situations, you need to decide whether its necessary to upgrade Kubernetes, the Kernel or upgrade Systemd. It is not always easy to effectively put this into practice immediately to solve the problem.

Upgrading the releases without impacting the workloads is sometimes pretty challenging. When upgrading Kubernetes for example, you will encounter changes in the way Kubernetes manages things. This is especially relevant when trying to run VNF workloads since they can't afford to drop packets. When performance falls because there are other processes contending for time on those CPUs, users will experience mysterious drops in calls.

Therefore, when upgrading one release to another, it is critical that you implement rigorous regression testing - which is normally time-consuming. In this case, they ended up creating temporary scripts running in the system to help clean these things up until they were sure that it was safe to move forward with a new version of the Kernel, Kubernetes, or Systemd.

find /sys/fs/cgroup/*/system.slice -name tasks | wc -l
systemctl list-units --state=running| \
  sed -rn 'Kubernetes.transient.mount/s, (run-\$+).+(/var/lib/kubelet/pods/.+),\l \2,p' |
\
 xargs -r -ll sh -c 'test -d $2 || echo $1' -- | \
  xargs -r -tll systemctl stop |& wc -l
find /sys/fs/cgroup/*/system.slice -name tasks | wc -l

Conclusion

The successful Kubernetes deployment of AT&T's 5G service will demonstrate the relevance of OpenStack and containers in a massively distributed production environment. As you adapt your Kubernetes cluster to the needs of your work environment, it is recommended that you carry out rigorous rudimentary regression tests. Kalc.io automates this process by leveraging AI and Machine Learning models to pinpoint the bad configurations in your cluster. These self-maintaining tests require less time and effort from the operations team standpoint, thereby allowing them to concentrate on other important tasks.