Sailing into Infinity: Seamlessly managed serverless containers using Kubernetes and AWS Fargate
Serverless computing promises rapid deployment and speedy, unlimited scaling, while eliminating the need for managing servers, operating systems and even container runtimes. As the infrastructure team at Contentful, we would like to enable our development teams to use this new paradigm and also reap the advantages ourselves in terms of reduced operational overhead.
For greenfield projects this is easy - if you're starting from scratch you can simply pick a Serverless / Function-as-a-Service provider and adapt to its operational model. But like many organisations, we've invested in tools for deploying apps, troubleshooting them, monitoring them and collecting their logs in a unified manner, allowing us to operate our apps with a unified toolset. As Ben Kehoe from iRobot puts it in his article, Function-as-a-Service like Lambda is "DifferentOps" - you still need to operate your application, but your tools for doing so are radically different.
Seamlessly managed serverless containers using Kubernetes and AWS Fargate
At Contentful, our production infrastructure runs on Kubernetes on AWS, with metrics, alerting and logging pipelines build on top of Kubernetes primitives. So any "integrated serverless" solution would ideally be running inside the same networking environment (AWS VPC), and be managed with our existing Kubernetes-based tools.
At re:Invent 2017 AWS announced Fargate, a service built on their Elastic Container Service that lets you run container workloads without having to manage clusters of virtual machines.
Think of it as Containers-as-a-Service: you submit a definition of the containers you wish to run to a Fargate Cluster and it schedules them. Fargate Clusters are virtual and dimensionless, they represent an "infinite" amount of compute capacity. So serverless, right? You can develop your application in a Docker environment and then run it at scale with Fargate. (Note: Fargate is available only in the US East region at the moment.)
This is to support the new Kubernetes service (AWS EKS) later this year, but it's not yet generally available. While we wait for AWS to play catch-up, we decided to experiment with a project from the Azure team called virtual-kubelet. This project aims to provide "serverless within Kubernetes for all serverless providers", but does not yet support ECS or Fargate.
The Kubelet in Kubernetes is the "node agent", servicing between the Kubernetes Scheduler and a Container Runtime like Docker on the Node. It provides hooks for other Kubernetes components to fetch metrics, status and container logs. What the virtual-kubelet provides is a virtual node, in this case with "infinite" capacity backed by Fargate:
$ kubectl describe node virtual-kubelet Name: virtual-kubelet Roles: agent Labels: alpha.service-controller.kubernetes.io/exclude-balancer=true beta.kubernetes.io/os=linux kubernetes.io/role=agent type=virtual-kubelet Annotations: node.alpha.kubernetes.io/ttl=0 Taints: aws.amazon.com/fargate:NoSchedule CreationTimestamp: Fri, 23 Mar 2018 08:50:13 +0100 Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- Ready True Fri, 23 Mar 2018 13:52:12 +0100 Fri, 23 Mar 2018 13:52:12 +0100 KubeletReady kubelet is ready. OutOfDisk False Fri, 23 Mar 2018 13:52:12 +0100 Fri, 23 Mar 2018 13:52:12 +0100 KubeletHasSufficientDisk kubelet has sufficient disk space available MemoryPressure False Fri, 23 Mar 2018 13:52:12 +0100 Fri, 23 Mar 2018 13:52:12 +0100 KubeletHasSufficientMemory kubelet has sufficient memory available DiskPressure False Fri, 23 Mar 2018 13:52:12 +0100 Fri, 23 Mar 2018 13:52:12 +0100 KubeletHasNoDiskPressure kubelet has no disk pressure NetworkUnavailable False Fri, 23 Mar 2018 13:52:12 +0100 Fri, 23 Mar 2018 13:52:12 +0100 RouteCreated RouteController created a route Addresses: InternalIP: 100.71.202.231 Capacity: cpu: 20 memory: 100Gi pods: 20 Allocatable: cpu: 20 memory: 100Gi pods: 20 System Info: Machine ID: System UUID: Boot ID: Kernel Version: OS Image: Operating System: Linux Architecture: amd64 Container Runtime Version: Kubelet Version: v1.8.3 Kube-Proxy Version: PodCIDR: 100.96.42.0/24 ExternalID: virtual-kubelet Non-terminated Pods: (11 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits --------- ---- ------------ ---------- --------------- ------------- default helloworld-7cd454f7c7-5kqc5 100m (0%) 0 (0%) 0 (0%) 0 (0%) default helloworld-7cd454f7c7-9pjjn 100m (0%) 0 (0%) 0 (0%) 0 (0%) default helloworld-7cd454f7c7-c6xdf 100m (0%) 0 (0%) 0 (0%) 0 (0%) default helloworld-7cd454f7c7-cj857 100m (0%) 0 (0%) 0 (0%) 0 (0%) default helloworld-7cd454f7c7-cjc8d 100m (0%) 0 (0%) 0 (0%) 0 (0%) default helloworld-7cd454f7c7-fbfpt 100m (0%) 0 (0%) 0 (0%) 0 (0%) default helloworld-7cd454f7c7-ft94n 100m (0%) 0 (0%) 0 (0%) 0 (0%) default helloworld-7cd454f7c7-gmkbc 100m (0%) 0 (0%) 0 (0%) 0 (0%) default helloworld-7cd454f7c7-ng4t2 100m (0%) 0 (0%) 0 (0%) 0 (0%) default helloworld-7cd454f7c7-qjpdq 100m (0%) 0 (0%) 0 (0%) 0 (0%) default helloworld-7cd454f7c7-tfwcz 100m (0%) 0 (0%) 0 (0%) 0 (0%) Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) CPU Requests CPU Limits Memory Requests Memory Limits ------------ ---------- --------------- ------------- 1100m (5%) 0 (0%) 0 (0%) 0 (0%)
And the AWS Console for Fargate showing the running Tasks:
In case of the virtual-kubelet AWS implementation we decided to map each Pod scheduled to our virtual node to an ECS Task Definition and Task, which is then executed on a Fargate Cluster. As the virtual-kubelet is stateless, we had to create a two-sided mapping, which could also restore the state of the virtual node by fetching the current running ECS Task and their Task Definitions from the Cluster.
Container logs are delivered by Fargate to Cloudwatch Logs and fetched on-demand by the virtual-kubelet if you run
kubectl logs -f my-container-running-on-fargate.
To avoid breaking our existing set-up, we added a Taint to the virtual-kubelet node so you have to explicitly add a Toleration to your pod specs to be able to schedule them there.
The virtual-kubelet needs to be able to manage ECS resources. In our Kubernetes-on-AWS infrastructure we use the fabulous kube2iam project, which allows you to add execution roles to Kubernetes Pods by adding annotations.
From a high-level overview the setup looks like
Create fargate cluster
Add IAM roles for virtual-kubelet and for the fargate task executor
Deploy virtual-kubelet fork to your k8s cluster
Run a sample Deployment
Going into the details for each of those steps would make this blog post overly long - you can see the detailed steps and commands in https://github.com/johanneswuerbach/virtual-kubelet/blob/aws/providers/aws/README.md, while we are trying to merge our changes upstream (https://github.com/virtual-kubelet/virtual-kubelet/pull/127)
While the virtual-kubelet for AWS Fargate doesn’t support all Pod features yet (e.g. volumes, metrics), we considered this experiment a success and integrated this concept of serverless containers into our toolbox of architecture patterns.
Looking at the price for fargate containers you roughly see the same $/hour if you are able to run at full utilization:
| EC2 m4.xlarge | vCPU 4 | Memory 16 GB | $0,2 per Hour | | 8 Fargate Containers | 0,5 vCPU $0,0506 | 2 GB $0,00000353 | ~ $0,2 per Hour |
When you factor in the overhead of managing the nodes themselves - OS and application patching, dealing with node failures and so on - Fargate comes out on top. Serverless ftw!
Looking at our current stack we consider moving our Jenkins Kubernetes Agents into Fargate as they those are generally high utilized during their lifespan and benefit from the fast start times.
In the midterm, we could also imagine using Fargate as kind of an overflow buffer which allows to bring in capacity quickly during a request peak and then either scale down load decreases or replace the buffer with cheaper nodes if it sustains.
For us, this proof-of-concept hack showed a lot of potential for providing faster and easier scaling while allowing us to keep our established workflows on top of one of the leading application deployment frameworks in the world today.
We would be interested to know your thoughts if you try this out, or if you have other insights or use cases for the approach described above. Please reach out to us on the Contentful community slack!