Serverless computing promises rapid deployment and speedy, unlimited scaling, while eliminating the need for managing servers, operating systems and even container runtimes. As Infrastructure Team at Contentful, we would like to enable our development teams to use this new paradigm and also reap the advantages ourselves in terms of reduced operational overhead.
For greenfield projects this is easy - if you are starting from scratch you can simply pick a Serverless / Function-as-a-Service provider and adapt to its operational model. However, like many organisations, we have invested in tools for deploying apps, troubleshooting them, monitoring them and collecting their logs in a unified manner, allowing us to operate our apps with a unified toolset. As Ben Kehoe from iRobot puts it in his article, Function-as-a-Service like Lambda is "DifferentOps" - you still need to operate your application, but your tools for doing so are radically different.
At Contentful, our production infrastructure runs on Kubernetes on AWS, with metrics, alerting and logging pipelines build on top of Kubernetes primitives. So any "integrated serverless" solution would ideally be running inside the same networking environment (AWS VPC), and be managed with our existing Kubernetes-based tools.
At re:Invent 2017 AWS announced Fargate, a service built on their Elastic Container Service that lets you run container workloads without having to manage clusters of virtual machines.
Think of it as Containers-as-a-Service: you submit to a Fargate Cluster a definition of the containers you wish to run and it schedules them. Fargate Clusters are virtual and dimensionless, they represent an "infinite" amount of compute capacity. So serverless, right? You can develop your application in a Docker environment and then run it at scale with Fargate. (Note: Fargate is available only in the US East region at the moment.)
This is due support the new Kubernetes service (AWS EKS) later this year, but this is not yet generally available. While we wait for AWS to play catch-up, we decided to experiment with a project from the Azure team called virtual-kubelet. This project aims to provide "serverless within Kubernetes for all serverless providers", but does not yet support ECS or Fargate.
The Kubelet in Kubernetes is the "node agent", servicing between the Kubernetes Scheduler and a Container Runtime like Docker on the Node. It provides hooks for other Kubernetes components to fetch metrics, status and container logs. What the virtual-kubelet provides is a virtual node, in this case with "infinite" capacity backed by Fargate:
And the AWS Console for Fargate showing the running Tasks:
In case of the virtual-kubelet AWS implementation we decided to map each Pod scheduled to our virtual node to an ECS Task Definition and Task, which is then executed on a Fargate Cluster. As the virtual-kubelet is stateless we had to create a two-sided mapping, which could also restore the state of the virtual node by fetching the current running ECS Task and their Task Definitions from the Cluster.
Container logs are delivered by Fargate to Cloudwatch Logs and fetched on-demand by the virtual-kubelet if you run
kubectl logs -f my-container-running-on-fargate.
To avoid breaking our existing set-up, we added a Taint to the virtual-kubelet node so you have to explicitly add a Toleration to your pod specs to be able to schedule them there.
The virtual-kubelet needs to be able to manage ECS resources. In our Kubernetes-on-AWS infrastructure we use the fabulous kube2iam project, which allows you to add execution roles to Kubernetes Pods by adding annotations.
From a high-level overview the setup looks like
Going into the details for each of those steps would make this blog post overly long - you can see the detailed steps and commands in https://github.com/johanneswuerbach/virtual-kubelet/blob/aws/providers/aws/README.md, while we are trying to merge our changes upstream (https://github.com/virtual-kubelet/virtual-kubelet/pull/127)
While the virtual-kubelet for AWS Fargate doesn’t support all Pod features yet (e.g. volumes, metrics), we considered this experiment a success and integrated this concept of serverless containers into our toolbox of architecture patterns.
Looking at the price for fargate containers you roughly see the same $/hour if you are able to run at full utilization:
When you factor in the overhead of managing the nodes themselves - OS and application patching, dealing with node failures and so on - Fargate comes out on top. Serverless ftw!
Looking at our current stack we consider moving our Jenkins Kubernetes Agents into Fargate as they those are generally high utilized during their lifespan and benefit from the fast start times.
In the more midterm we could also imagine using Fargate as kind of an overflow buffer which allows to bring in capacity quickly during a request peak and then either scale-down load decreases or replace the buffer with cheaper nodes if it sustains.
We would be interested to know your thoughts if you try this out, or if you have other insights or use cases for the approach described above.
For us, this proof-of-concept hack showed a lot of potential for providing faster and easier scaling while allowing us to keep our established workflows on top of one of the leading application deployment frameworks in the world today.