Blog

Building a Reliable Kubernetes Cluster in the Amazon Cloud

Sep 7, 2016 | by Oleg Chunikhin

As a technology company focused on complex project integrations that unify legacy systems as well as modular solutions that ensure lasting scalability, we work on a multitude of projects that involve custom software development; packaged, open source, and SaaS software integration; infrastructure setup; and production operations and maintenance.

From a technology standpoint, our approach is always agnostic. We work with Java and .Net backends, web and mobile (all platforms), Amazon and Azure cloud services and infrastructure, and even on-premises deployments.

Containerization has been a de-facto standard for us for quite some time as a way to manage complex systems and processes, but with so much complexity and so many technologies at play, we are always seeking new ways to improve the efficiency of our work, reuse what we do, and focus our team on the unique business requirements of each project.

One way to do this is through the application of a flexible and reliable platform for managing complex multi-component clustered containerization software – building reusable components for various DevOps needs, and supporting production operation and reuse.

 

One way to improve the efficiency of our work

is through the application of a flexible and

reliable platform for managing complex

multi-component clustered containerization

software.

Among the requirements for the platform we identified the following:

The Path to the Solution

Several frameworks exist, that could serve as a basic for the solution, but the following three made the list of realistic contenders:

After some research and prototyping we identified Kubernetes as the main candidate for our standard DevOps and cluster orchestration platform – for a number of reasons.

Kubernetes – The Pros

It’s not the goal of this post to describe in detail how we compared the tools, but I'd like to give a brief summary of where Kubernetes really shines:

All in all, in my opinion, Kubernetes strikes the right balance between "too much abstraction, need to write a lot of boilerplate code" and "too little abstraction, the system is not flexible".

Kubernetes strikes the right balance between

"too much abstraction, need to write a lot of

boilerplate code" and "too little

abstraction, the system is not flexible".

Kubernetes – The Cons

Unfortunately, even the sun has dark spots - Kubernetes is notoriously difficult to setup for use in production.

Kubernetes is notoriously difficult to

setup for use in production.

 

Our requirements for the platform setup process were mainly derived from general platform requirements; we wanted to do the following:

There are many ways to setup a Kubernetes cluster, some of them are even part of the official documentation and distribution, but looking into each of them we saw different issues preventing them from becoming a standard for EastBanc Technologies’ projects. As a result, we designed and built a Kubernetes cluster setup and configuration process that would work for us.

Kubernetes Deployment Re-Imagined

For our Kubernetes deployment procedure we decided to rely on cloud provider tools for IaaS resource management, namely Cloud Formation for AWS and Resource Manager for Azure.

Thus to create a cluster, you don’t need to

setup anything on your machine, just use the

Cloud Formation template and AWS console

to create a new stack.

 

To create a cluster, you don’t need to setup anything on your machine, just use the Cloud Formation template and AWS console to create a new stack. The Kubernetes cluster Cloud Formation template we implemented creates several resources, as described in the following diagram:

kub cluster

Let’s take a look at these resources in a little more depth:

To configure Kubernetes software components running on the master and the nodes, we used portable multi-node cluster configuration approach described in Kubernetes documentation.

The following diagram shows the resulting software configuration:

kub cluster

The cluster initialization steps are split into three categories:

  1. Packer script preparing AMI for the cluster.
  2. Cloud Formation template creating or updating AWS resources for the cluster.
  3. A bootstrap script running as the last step of the master or node instance boot process.

AMI Preparation

We built a customized AMI for the cluster based on the official Kubernetes AMI k8s-debian-jessie, which is in turn just a standard Debian Jessie image with some additional packages installed.

AMI preparation is implemented via packer script. The following steps are then performed:

  1. Update installed packages.
  2. Create docker-bootstrap service in addition to docker service, that is already configured in the base image.
  3. Update docker systemd service configuration so that the flanneld overlay network can be configured on the service startup.
  4. Pull etcdflanneld, and Kubernetes hyperkube Docker images to ensure fast and reliable node startup.
  5. Create /etc/kubernetes/bootstrap script and add its execution into /etc/rc.localscript so that it runs as the last step of OS boot sequence.
  6. Prepare static pod manifest files and Kubernetes configuration files in/etc/kubernetes.
  7. Prepare other auxiliary tools used during instance bootstrap (such assafe_format_and_mount.sh script).
  8. Ensure that /srv/kubernetes directory is mounted as tmpfs (to provide for safe storage of secret keys and certificates.
  9. Cleanup temporary and log files.

Cloud Formation Template

The Cloud Formation template creates and initializes AWS resources as shown in the first diagram above. As a part of this configuration, it creates launch configuration objects for Kubernetes master and node instances, and associates them with master and node auto scaling groups.

Both master and node launch configurations include AWS User Data scripts, that create/etc/kubernetes/stack-config.sh file in which several environment variables are set.

These environment variables are used by /etc/kubernetes/bootstrap script to acquire context information about the environment it is running in.

In particular, Master EIP, instance role (whether this is a Kubernetes master or node instance), and S3 bucket name are passed this way.

Instance Bootstrap Script

Instance bootstrap script runs as the last step in the instance boot sequence. The script works slightly differently on the master and the nodes. The following steps are performed as part of this process:

On all nodes:

On master only:

On nodes only:

On all nodes:

After kubelet is started on the master, it takes care of starting other Kubernetes components (such as apiserverschedulercontroller-manager, etc.) in pods as defined in static manifest files, and then keeps them running. Kubelet started on nodes only starts kube-proxy in a pod and then connects to master for further instructions.

Working with the New Cluster

As soon as master is started and fully initialized, the administrator can download the Kubernetes client configuration file from the S3 bucket. The files in the bucket are only accessible by the master EC2 instance role, the node EC2 instances role, and AWS account administrator.

The cluster REST API is available via HTTPS on a standard port on the master EIP.

Security, Reliability, and Scalability as Standard

As a result of our efforts, we now have a simple way to setup a reliable production ready Kubernetes cluster on AWS.

 

We now have a simple way to setup a reliable

production ready Kubernetes cluster on AWS.

 

The Cloud Formation template may be used as is or further customized to meet specific project needs (such as adding additional AWS resources, such as RDS, or changing the region or availability zones in which the cluster is run). We can also easily customize which add-ons will run on the cluster.

From a security perspective, the new cluster is secure by default, thanks to the following features:

The new cluster is also reliable:

The cluster is also scalable:

We also made sure that we are not limiting our options:

Next Steps and Future Work

Having achieved the minimal set of features required to run a Kubernetes cluster in production, there is still space for improvement:

Currently, the cluster is vulnerable to a failure of the availability zone where the master node is running. The master auto-scaling group is intentionally limited to a single availability zone due to AWS EBS limitations (EBS cannot be used in an availability zone different from the one in which it was initially created). There are two ways of overcoming this issue:

We are planning to implement both.

Even with the improvements described above, the cluster will still be vulnerable to whole region failures. Because of this, we are planning to introduce cluster federation as an option, and entertain different automated disaster recovery strategies for inter-region and hybrid deployments.

Security may also be improved with EBS encryption, embedding tools such as HashiCorp Vault, and potentially changing secrets distribution strategy.