Last year we were really excited about the release of the Personal Access Tokens feature, or PAT. In a nutshell, they are tokens bound to your user in Contentful which you can use to perform actions like using the Content Management API with all your roles and permissions applied (if you want to know more about PATs take a look to our knowledge base page).
Creating them is easy - a couple of clicks in Contentful's webapp or an API call, and voilà, you get yourself a fresh token to use right away. But because they're so easy to generate, it might be that we sometimes forget about them and leave them behind in our source code. And then we check that source code into source control which is already not a good security practice. But this becomes a bigger problem when tools like Github make it really easy to search across thousands of public code repositories for them.
Leaking your Contentful PATs is dangerous and it should be avoided at all costs. Remember that they have the same permissions as your user in Contentful has. This means, that if you're a space admin and your token is leaked, anyone could use it, for example, to delete all your content. And things would only get worse if you were an organization admin.
It's important to think about these tokens as your passwords. You would never write your password in any source code file so why not treat your PATs the same? As a rule of thumb, any time you are dealing with credentials it's better practice to access them using environment variables. This reduces the chances of accidentally leaking them.
But we can go one step further and build some tooling to help us quickly identify those PATs leaked in our organization. And that's what we did on our last hackathon! The idea was to have a cronjob that would run every day and find all the PATs leaked in Github repos belonging to our organization and its users. With the data gathered by the tool, we could then go in and at least revoke those tokens even if we don't fix the code. Following is a brief description on how we implemented it. You can also find all the code on its Github repo: https://github.com/madtrick/cfpat-audit
First of all, we need to write the script that will query Github for files with leaked tokens. Once we have all the offending files we have to check if they belong to users in our Github organization. You can use Github's organization members and code search APIs to do this. Included in the repo is an executable that you can run locally to find leaked tokens in your org:
Ok, so we have a script that we can use to get the list of files that are leaking PATs. And we want to run it regularly so we can react quickly to any incident and revoke the leaked tokens. But if we want to run this as a cronjob, that means at least setting up a machine and deploying the code there, and then of course making sure that this machine is up and running 24/7.
That can seem like a lot of work for such a small script. So we decided to be like the cool kids and use serverless computing: run a small script on a regular basis without having to worry about all the infrastructure requirements. Since we use AWS at Contentful, the choice was clear – we were going to use lambda functions. Think of lambda functions as event handlers that react to different triggers: API calls, CRUD operations on S3, ..., or scheduled events. Our lambda function is simple and small:
It finds the leaked tokens for the org and then saves them in a file in S3. Additionally, not described on this post, we set up an alert so we get notified each time a file was created in the bucket.
Getting your code up and running on AWS lambda requires some initial effort. Things like uploading the code, setting up the right roles, configuring logging. This sounds like quite some work, which we were thinking of getting rid of by using lambda functions. Thankfully there are frameworks like Serverless which abstract all of these and help you a lot along the process. So, unsurprisingly that's what we did.
This is the
serverles.yml file which we used to deploy and setup the function in AWS.
So the only thing left is to deploy it and wait for those tokens come your way.
Writing this small script was fun and interesting. Lambda functions are great for dealing with event based workflows and paired with frameworks like Serveless makes it a breeze to use.