How to use serverless as cronjobs to keep your Personal Access Tokens secure

Last year we were really excited about the release of the Personal Access Tokens feature, or PAT. In a nutshell, they are tokens bound to your user in Contentful which you can use to perform actions like using the Content Management API with all your roles and permissions applied (if you want to know more about PATs take a look to our knowledge base page).

Creating them is easy, a couple of clicks in Contentful's webapp or an API call, and voilà, you get yourself a fresh token to use right away. But it might be because they are so easy to generate that sometimes we forget about them and leave them behind in our source code. And then we check that source code into source control which is already not a good security practice. But this becomes a bigger problem when tools like Github make it really easy to search across thousands of public code repositories for them.

Searching for tokens

Leaking your Contentful PATs is dangerous and it should be avoided at all costs. Remember that they have the same permissions as your user in Contentful has. This means, that if you're a space admin and your token is leaked anyone could use it to for example delete all your content. And things would only get worse if you were an organization admin.

It's important to think about these tokens as your passwords. You would never write your password in any source code file so why not do the same with your PATs? As a rule of thumb, any time you are dealing with credentials it's a better practice to access them using environment variables. This reduces the chances of accidentally leaking them.

But we can go one step further and build some tooling to help us quickly identify those PATs leaked in our organization. And that's what we did on our last hackathon! The idea was to have a cronjob that would run every day and find all the PATs leaked in Github repos belonging to our organization and its users. With the data gathered by the tool, we could then go an at least revoke those tokens while we don't fix the code. Following is a brief description on how we implemented it. You can also find all the code on its Github repo: https://github.com/madtrick/cfpat-audit

First of all, we need to write the script that will query Github for files with leaked tokens. Once we have all the offending files we have to check if they belong to users in our Github organization. You can use Github's organization members and code search APIs to do this. Included in the repo is an executable that you can run locally to find leaked tokens in your org:

1
2
3
4
5
6
7
8
>$ ORGANIZATION=ZZZ AUDIT_BUCKET=YYY GITHUB_ACCESS_TOKEN=XXX node bin/audit


[
    { file: '...', user: 'john-doe' }
    ...
]

Ok, so we have a script that we can use to get the list of files that are leaking PATs. And we want to run it regularly so we can react quickly to any incident and revoke the leaked tokens. But if we want to run this as a cronjob that means at least setting up a machine and deploying the code there and then of course making sure that this machine is up and running 24/7.

That can seem like a lot of work for such a small script. So we decided to be like the cool kids and use serverless computing: run a small script on a regular basis without having to worry about all the infrastructure requirements. Since we use AWS at Contentful the choice was clear, we were going to use lambda functions. Think of lambda functions as event handlers that react to different triggers: API calls, CRUD operations on S3, …, or scheduled events. Our lambda function is simple and small:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
'use strict';

const Bluebird = require('bluebird');

const findOrganizationCPATS = require('./lib/find-organization-cpats');
const storeAuditLog = require('./lib/aws/store-audit-log');

module.exports.run = (event, context, callback) => {
  Bluebird.coroutine(function * () {
    try {
      const tokens = yield findOrganizationCPATS();
      yield storeAuditLog(JSON.stringify(tokens));

      callback(null);
    } catch (e) {
      callback(e);
    }
  })();
};

It finds the leaked tokens for the org and then saves them in a file in S3. Additionally, not described on this post, we set up an alert so we get notified each time a file was created in the bucket.

Getting your code up and running on AWS lambda requires some initial effort. Things like uploading the code, setting up the right roles, configuring logging. This sounds like quite some work, which we were thinking of getting rid of by using lambda functions. Thankfully there are frameworks like Serverless which abstract all of these and help you a lot along the process. So, unsurprisingly that's what we did.

This is the serverles.yml file which we used to deploy and setup the function in AWS.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
service: cfpat-audit-v7

provider:
  name: aws
  runtime: nodejs6.10
  timeout: 300
  region: us-east-1
  iamRoleStatements:
    - Effect: Allow
      Action:
        - s3:PutObject
      Resource:
        - arn:aws:s3:::${env:AUDIT_BUCKET}/*

package:
 include:
   - lib/**

functions:
  audit:
    handler: handler.run
    events:
      - schedule: rate(1 day)
    environment:
      GITHUB_ACCESS_TOKEN: ${env:GITHUB_ACCESS_TOKEN}
      GITHUB_ORGANIZATION_ID: ${env:GITHUB_ORGANIZATION_ID}
      AUDIT_BUCKET: ${env:AUDIT_BUCKET}

So the only thing left is to deploy it and wait for those tokens come your way.

1
AUDIT_BUCKET=XXX GITHUB_ORGANIZATION_ID=YYY GITHUB_ACCESS_TOKEN=ZZZ $(npm bin)/serverless deploy --verbose

Writing this small script was fun and interesting. Lambda functions are great for dealing with event based workflows and paired with frameworks like Serveless makes it a breeze to use.

Blog posts in your inbox

Subscribe to receive most important updates. We send emails once a month.