Kubernetes

Private DockerHub Registry: Failed To Pull Image

Toni Kurya
March 25, 2020

Every company is looking for a digital transformation by optimizing application development for the cloud. Kubernetes patterns are widely preferred for creating cloud-native apps with Kubernetes as a runtime platform. They eliminate most of the manual processes involved in deploying and scaling containerized applications.

#Kubernetes outageRead More
Kubernetes

Tencent Gaming—Running Gaming Workloads in Kubernetes

Toni Kurya
March 19, 2020

In the early days, there was no virtualization. Everyone was on bare metal still using physical machines. One of the reasons that Tencent wanted virtualization was because of resource utilization. They had an average of 10% CPU utilization in their Data Centers. They wanted a system that could offer elastic resource utilization.

#resources utilizationRead More
Kubernetes

Kubelet CPU/Memory Usage linearly increasing when using CronJob

Toni Kurya
March 11, 2020

Docker and Kubernetes have transformed the way businesses deliver software to their end-users. Containers have risen to become the new unit of delivering software. Kubernetes has evolved with capabilities that can deliver a cloud that solely uses containers for application delivery.

Read More
Kubernetes

To Replicas Or Not To Replicas In Kubernetes Deployments

Toni Kurya
March 3, 2020

Modern advertising requires a modern ad platform that enables advertisers to engage the best audiences with a consistent and relevant ad experience. Adobe Advertising Cloud a platform that manages advertising across traditional TV and digital formats. Like most Fortune 500 companies, Adobe uses Kubernetes for running ...

#resources utilizationRead More
Kubernetes

Paybase: Debugging a Service Mesh

Toni Kurya
February 24, 2020

As we all know, running a distributed system can be a messy business. Even as more and more organizations are moving towards microservices architecture systems, debugging these systems can be a major problem. A good platform gives us visibility on what’s deployed, which Endpoints are called, and full visibility on Distributed Traces.

#Kubernetes podsRead More
Kubernetes

Zero Downtime Rolling Updates With Kubernetes

Toni Kurya
February 18, 2020

Speed is everything in business. For businesses to stay competitive in a fast-moving tech space, the latest software versions should be rolled out as soon as they are ready for release, without disrupting active users. Docker and container runtimes have become the preferred deployment units of software and many enterprises have ...

#resources utilizationRead More
Kubernetes

Overcoming a Data Center Outage

Toni Kurya
February 12, 2020

In today's cloud world, you are no longer just worried about machine failure but you also have to account for data center failure. UPS system failure, cybercrime, human error, natural disasters, and faulty generators all rank as the top 5 culprits of data center outages. In November 2019, AWS, Microsoft Azure and Google Cloud all experienced ...

#Kubernetes outageRead More
Kubernetes

How To Delete A Namespace Stuck At Terminating State

Toni Kurya
February 8, 2020

Among the issues most users encounter every day with Kubernetes, the most common ones include poor performance, slow response, server breakdown, and network connection failure. However, the issue that seems insignificant but racks our brains is deletion failure, for example, process ending failure, file deletion failure, and driver ...

#Kubernetes clusterRead More
Kubernetes

I Can't Believe I have All The Node's Resources

Toni Kurya
February 3, 2020

AWS and other leading public cloud companies have made infrastructure as a service and it’s subscription based model quite attractive. As more and more companies continue to buy into the benefits of this business model, they are making the next step which is to migrate their applications from datacenters composed of ...

#resources utilizationRead More
Kubernetes

How ARP Cache Overflows and DNS Timeouts Brought Tinder Down

Toni Kurya
January 28, 2020

Kubernetes is the latest and greatest technology for taking containers to the next level. If you are in a situation where you've been using docker kubernetes for a little while and your website or application makes it to the big leagues and is suddenly driving a lot of traffic your way, you need a way to scale up really fast.

#resources issuesRead More
Kubernetes

A Postmortem of a Service Mesh Speeding Accident

Toni Kurya
January 24, 2020

Cloud-native applications are often designed as a batch of distributed microservices, which run in Containers. Today Kubernetes has become the go-to solution for deploying and orchestrating containerized applications. It has a rich set of APIs that abstract away the underlying hardware infrastructure by acting as a distributed operating system

#resources issuesRead More
Kubernetes

Data on Persistent Volume wiped after kubelet restart

Toni Kurya
January 20, 2020

Kubernetes adoption is exploding and this is due to it being a great platform for running your applications. Kubernetes itself is a stateful platform - so there is data associated with that. As with all data processing applications, you need to provide some data protection capabilities.

#Kubernetes podsRead More
Kubernetes

A Strange Case of Data Loss During Migration

Toni Kurya
January 14, 2020

In Kubernetes, managing storage is a distinct problem from managing compute. This is because pods are ephemeral, they come and go quite often. Therefore, on-disk files in a container are also ephemeral. So what happens if you have data that you must persist even though the pod itself goes down?

#Kubernetes outageRead More
Kubernetes

How a Cluster Autoscaler might actually save your life?

Toni Kurya
January 10, 2020

In Kubernetes we have this concept of Pods. A Pod is a grouping of Containers (such as Docker containers) on the same host. It can be one or more. A good example of this is when you an application server in a Pod and you also have monitoring and logging containers co-located on the same host.

#cpu issuesRead More
Kubernetes

Having DNS lookup failures for services in your cluster?

Toni Kurya
January 5, 2020

When we look at customers and the problems they encounter using Kubernetes, one of the most prominent issues they run into is related to CoreDNS (DNS Server). Before CoreDNS came in we had kube-dns. CoreDNS was GA in 1.11 so if you created clusters after 1.11 by default you're getting CoreDNS.

Kubernetes DNSRead More
Kubernetes

Is your pod status shown as pending?

Toni Kurya
December 30, 2019

When we talk to customers about containerizing, modernizing their applications we always ask them why they want to use Kubernetes. Most of the time Kubernetes ends up being the answer but we want to emphasize that Kubernetes is not a golden hammer.In this article, I intend to cover the number one way in which Kubernetes on AWS EKS has failed.

#Kubernetes podsRead More
Kubernetes

How To Break a Cassandra Cluster

Toni Kurya
December 24, 2019

Apache Cassandra is a highly scalable free and open source NoSQL database, achieving great performance on multi-node setups with no single point of failure. Cassandra supports replication across multiple data centers and offers lower latency for users and the ability to survive regional outages.

#Kubernetes clusterRead More
Kubernetes

What Happens When Something Goes Wrong With EKS IAM Roles

Toni Kurya
December 11, 2019

The rapid adoption of Kubernetes has led to an increase in outages covering entire company operations. Recently SourceClear experienced a Kubernetes outage which lasted two days and affected multiple teams. SourceClear is a software security company which uses data-science and machine-learning to help developers use open-source safely by analyzing the libraries

#Kubernetes outageRead More
Kubernetes

Where Did All My Pods Go?

Toni Kurya
November 15, 2019

Enterprises using Kubernetes often need to autoscale their resources based on more than just CPU usage—for example concurrent persistent connections or queue length.This post walks you through an incident where one of our customers enabled autoscaling for their application and one day all their Pods disappeared.

#cpu issuesRead More
Kubernetes

The Case of the Infected Cluster

Toni Kurya
November 10, 2019

Today's distributed systems need to be resilient. Resilient, in short, is a way that ideally a user does not notice at all if a random failure takes place or that the user at least can continue to use the degraded application. On Monday 9 July 2018,

#Kubernetes clusterRead More
Kubernetes

CoreDNS Autopath Failure For External Name Services

Toni Kurya
November 4, 2019

Datadog is a monitoring service for cloud-based workflows offering Kubernetes insights through metrics, traces, logs, dashboards, etc. They are positioned as a Cloud Native service provider. Through their SaaS-based analytics platform,

#Kubernetes clusterRead More
Kubernetes

Kubernetes Jobs and the Sidecar Problem

Toni Kurya
October 31, 2019

Imagine that, you have a large computation to perform, and once the computation is done, you want Kubernetes to stop Pods automatically. Simply put, we are talking about running Pods temporarily until a Job is completed

#sidecarcontainerRead More
Kubernetes

Zalando's Total DNS outage in Kubernetes cluster

Nisar Ahmad
October 24, 2019

Zalando is an e-commerce store that provides lifestyle and fashion products to customers in seventeen European markets. Zalando is considered the starting point for fashion in Europe, and it currently offers more than 300,000 products, with 2,000 different brands in fashion and lifestyle.

#Kubernetes clusterRead More
Kubernetes

Job being constantly recreated despite RestartPolicy: Never

Toni Kurya
October 19, 2019

Universe.com, a division within Ticketmaster, is shaping the future of the event industry using Kubernetes. They provide meaningful, real-life experiences to people around the globe through a world-class event ticketing platform.

#KubernetesJobRead More
Kubernetes

Debugging DNS Failure On Pods Looking Up External Resources

Toni Kurya
October 13, 2019

Docker makes building containers remarkably easy. The downside of this simplicity is that it's easy to build huge containers full of things you don't need - including security holes. By using a smaller, specialized base image such as Alpine, you can significantly minimize the attack surface.

#Kubernetes podsRead More
Kubernetes

Challenges With Running PostgreSQL On Kubernetes

Toni Kurya
October 8, 2019

Containers have become the next big thing in infrastructure software. However, for you to take full advantage of containers you need to be conversant on how to turn them into production services. This is where Kubernetes shines — as an orchestrator of your containerized applications.

#run PostgreSQLRead More
Kubernetes

The Developer Guide to Taking a Kubernetes Cluster Down

Toni Kurya
October 3, 2019

At Kalc we have a lot of experience, gained either from customers or from our time at Fortune 500 Companies, and we are concerned about all the mistakes you might make with Kubernetes. In some ways, this is motivated by the fact that we have all these new schedulers, this immutably styled infrastructure that we are all striving towards

Read More
Kubernetes

Managing Kubernetes Clusters on AWS Using Kops

Toni Kurya
September 27, 2019

Containers are a well-established way of packaging an application. Kubernetes has also gotten out of the early-adopters phase. Today it is a widely held view that Kubernetes is a cost-effective, ready-made solution that enterprise customers can trust.

#Kubernetes clusterRead More
Kubernetes

How JetStack simple admission webhook lead to a Kubernetes cluster outage?

Nisar Ahmad
September 19, 2019

Jetstack is a fast growing Kubernetes professional services company that helps startups, SMBs, and enterprises to modernize their cloud-native Kubernetes infrastructure. They have been building, operating, and contributing to the Kubernetes ecosystem since 2015.

#cpu issuesRead More
Kubernetes

How to solve the strange case of kube-api pods constantly restarting

Nisar Ahmad
September 13, 2019

NRE Labs is a site for teaching network automation in the browser using real, interactive, compelling virtual environments. Its main aim is to democratize interactive, dependency-free learning. The Labs are powered by the Antidote project, which provides a platform for representing curriculum-as-code.

Read More
Kubernetes

How Pivotal Caused an Application Outage on Kubernetes?

Nisar Ahmad
August 29, 2019

Pivotal offers business transformation, a cloud-native platform, microservices, containers, developer tools, and consulting services to help enterprise-level businesses to build and run their applications. VMware recently showed intentions to acquire Pivotal for $2.7 bn.

#Pod Disruption Budget Read More
Kubernetes

How was Grafana’s Production Outage Caused Using Kubernetes Pod Priorities?

Nisar Ahmad
August 7, 2019

Grafana is the leading open source metric suite for analytics and visualization that is commonly used for analyzing time series data. It can also be used in many other domains, such as home automation, industrial sensors, process control, weather, etc.

#resources issuesRead More
Kubernetes

How Moonlight Fixed Application Outage Issues on Kubernetes?

Nisar Ahmad
August 1, 2019

Moonlight is a professional community of software developers and designers where you can find and work with quality candidates based on their experience and location.

#cpu issuesRead More
Kubernetes

Setting Up Your EKS Cluster for Scale

Toni Kurya
July 25, 2019

Many organizations are modernizing their existing applications to become more agile and innovate faster. Architectural patterns like microservices enable teams to independently test services and continuously push applications to delivery environments.

#resources utilizationRead More
Kubernetes

How Blue Matador Recovered Kubernetes Node OOM?

Nisar Ahmad
August 16, 2018

Blue Matador is a platform that monitors your AWS infrastructure and compute resources, understands the baselines, manages thresholds, and sends actionable alerts. It is considered as a check engine light for your public cloud infrastructure and keeps a pulse effortlessly on everything in the cloud environment.

#cpu issuesRead More