Combining APIs using GraphQL schema stitching: Part 2

20181227 dda and graphql 01

You can stitch APIs together thanks to GraphQL. It’s an incredibly useful approach that let’s you have unified, transparent access to multiple GraphQL APIs. This can make it easier to access data that is split across multiple APIs without understanding where exactly it’s located. In part one, we looked at the concept and purpose behind combining APIs. In this sequel, we will delve into a working example of GraphQL schema stitching to appreciate how it all works.

Mashing up the two sources of data

Let’s dive into how we can build the stitching proxy that mashes up two data sources.

A remote schema

The first step is to link up to a remote schema. To create such a schema, we need to create an HTTP link which goes to GitHub’s GraphQL API. Authorization is also taken care of in this step by passing a GitHub token with the necessary access privileges we need for this project. The context manipulates every request before going out to that link and adds the necessary token into the header.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
const {ApolloServer} = require('apollo-server');
const {HttpLink} = require('apollo-link-http');
const {setContext} = require('apollo-link-context');
const {
 introspectSchema,
 makeRemoteExecutableSchema
} = require('graphql-tools');
const fetch = require('node-fetch');

const {GITHUB_TOKEN} = require('./config.json');

const gitHubLink = setContext((request) => ({
 headers: {
   'Authorization': `Bearer ${GITHUB_TOKEN}`,
 }
})).concat(new HttpLink({uri: 'https://api.github.com/graphql', fetch}));

async function startServer() {
 const gitHubRemoteSchema = await introspectSchema(gitHubLink);

 const gitHubSchema = makeRemoteExecutableSchema({
   schema: gitHubRemoteSchema,
   link: gitHubLink,
 });

 const server = new ApolloServer({ schema: gitHubSchema });

 return await server.listen();
}

startServer().then(({ url }) => {
 console.log(`🚀 Server ready at ${url}`);
});

We’ll execute an introspection query via that link which will download the schema from the API and make it available locally to manipulate. Following that we will make the schema executable, which in this case means that any call to a query or type in the schema is deferred to this link and passed along. Starting our server with a local proxy for the GitHub API takes care of authentication.

Two in one

We need 2 APIs to make a stitched GraphQL one. So in our second step, we’re doing the same thing above but with Contentful. We’re getting the executable schemas for both these APIs and merging them together in its default configuration. GraphQL Tools simply combines everything into one big schema, but intelligently remembers which object belongs to which API and delegates them correctly.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
'use strict';

const {ApolloServer} = require('apollo-server');
const {HttpLink} = require('apollo-link-http');
const {setContext} = require('apollo-link-context');
const {
 introspectSchema,
 makeRemoteExecutableSchema,
 mergeSchemas
} = require('graphql-tools');
const fetch = require('node-fetch');

const {GITHUB_TOKEN, CONTENTFUL_SPACE, CONTENTFUL_TOKEN} = require('./config.json');

const gitHubLink = setContext((request) => ({
 headers: {
   'Authorization': `Bearer ${GITHUB_TOKEN}`,
 }
})).concat(new HttpLink({uri: 'https://api.github.com/graphql', fetch}));

const contentfulLink = setContext((request) => ({
 headers: {
   'Authorization': `Bearer ${CONTENTFUL_TOKEN}`,
 }
})).concat(new HttpLink({uri: `https://graphql.contentful.com/content/v1/spaces/${CONTENTFUL_SPACE}`, fetch}));

async function startServer() {
 const gitHubRemoteSchema = await introspectSchema(gitHubLink);

 const gitHubSchema = makeRemoteExecutableSchema({
   schema: gitHubRemoteSchema,
   link: gitHubLink,
 });

 const contentfulRemoteSchema = await introspectSchema(contentfulLink);

 const contentfulSchema = makeRemoteExecutableSchema({
   schema: contentfulRemoteSchema,
   link: contentfulLink,
 });

 const schema = mergeSchemas({
   schemas: [
     gitHubSchema,
     contentfulSchema
   ]
 });

 const server = new ApolloServer({schema});

 return await server.listen();
}

startServer().then(({ url }) => {
 console.log(`🚀 Server ready at ${url}`);
});

At this point, things exist in each schema (a GitHub schema with data from GitHub and a Contentful equivalent with data from Contentful) without any references between them.

Conflicting schemas

The problem here is that these schemas conflict since they both have the same type with the name “Repository”. Every type and query in a GraphQL schema has to be unique. With this setup, we will only be able to query one of those APIs instead of both.

Schema transformation

The GraphQL specifications include the definition of an Abstract Syntax Tree (AST), which can be used to manipulate the schema programmatically.

There’s more code involved with the Contentful schema. We set it up like we did previously before passing it to transformSchema. This provides an array of transformations to schema that turn it into a format that works for my use case.

In this case, we rename all types that contain the word repository and replace them with repositoryMetadata. We also take care of the top-level queries called root fields. Both APIs have root fields called repository. We’re also going to go ahead and rename those to repositoryMetadata.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
async function getContentfulSchema() {
 const contentfulRemoteSchema = await introspectSchema(contentfulLink);

 const contentfulSchema = makeRemoteExecutableSchema({
   schema: contentfulRemoteSchema,
   link: contentfulLink,
 });

 return transformSchema(contentfulSchema, [
   new RenameTypes((name) => {
     if (name.includes('Repository')) {
       return name.replace('Repository', 'RepositoryMetadata');
     }

     return name;
   }),
   new RenameRootFields((operation, fieldName) => {
     if (fieldName.includes('repository')) {
       return fieldName.replace('repository', 'repositoryMetadata');
     }

     return fieldName;
   })
 ]);
}

With the above, we still have the exact same schema from GitHub, but we’ve modified the Contentful schema to rename every type and query with the word repository to become repositoryMetadata. GraphQL tools is sufficiently advanced to seamlessly pass the data on to the right API, despite the type repositoryMetadata not existing within Contentful’s GraphQL API.

Stitching two schemas together

Time to get stitching: We have the two schemas living in the same API and we can query against them without knowing what the URL is. However, the data isn’t linked and we still need to stitch it up.

To do that, we need to write some new type definitions — extending types of existing schemas (or the one single schema we have now) to build the links between objects. For that, we can use the extend keyword that allows you to manipulate existing types. In this case, we add a field contentfulMetadata to GitHub’s repository type and extend Contentful’s repositoryMetadata type with a field named gitHubRepository that contains the other object.

We also need to write a resolver to resolve the fields for those objects. Let’s first take the Contentful repository metadata object. We defined it such that the field gitHubRepository unwrapped object will be resolved by this function. The first item we’re adding is a fragment, which is a small piece of reusable GraphQL query, that should be included whenever we query for the gitHubRepository field. In this case, the fragment queries for the GitHub name and login, which is necessary because these will be used to ask the GitHub API.

To request for this data, we use mergeInfo.delegateToSchema to instruct the server that the information sought by the user lives in another schema that we’ve delegated to it (by having queried the repository query with the GitHub name and login of the relevant repository, retrieved by the earlier fragment).

All of this is straightforward but what if the APIs don’t fit so neatly together, the data is not directly available in the format we want, or not a clear query to request for the data we need. This is the exact problem when querying Contentful for data — Contentful has a quirk where we can only ask for a collection of objects, not an individual object by one of its fields.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
Repository: {
  contentfulMetadata: {
    fragment: '... on Repository { name owner { login } }',
    resolve(repository, args, context, info) {
      return info.mergeInfo.delegateToSchema({
        schema: transformedContentfulSchema,
        operation: 'query',
        fieldName: 'repositoryMetadataCollection',
        args: {
          where: {
            gitHubLogin: repository.owner.login,
            gitHubName: repository.name
          },
          limit: 1
        },
        context,
        info,
        transforms: [
          new WrapQuery(
            ['repositoryMetadataCollection'],
            (subtree) => ({
                kind: Kind.FIELD,
                name: {
                  kind: Kind.NAME,
                  value: 'items'
                },
                selectionSet: subtree,
            }),
            result => result && result.items && result.items[0]
          )
        ]
      });
    }
  }
}

So let’s define a fragment which states that we need a name and the owner’s log in (which is the organization name) whenever we query for Contentful metadata. Again, we are delegating a schema using a query but as you can see, we are querying for a collection instead of an individual object — this is a problem because our email says Contentful metadata will be an individual object, not a collection of them.

When we delegate to a schema, we can add a new transformation to the query and result that we get from that. This is what we’ve done and wrapQuery allows me to modify the query for repository metadata connection to add a new level in between, so instead of query directly for the data, we are querying for the items, then data. By doing this, we would also be modifying results to simply return the first entry in the items array.

Schemas stitched

Schemas stitched

Now that we have my schemas stitched with the references between the two, there’s more we can do with these tools. We can also infer new data — one of my favorite API endpoints in GitHub’s API is the "read a file from my GitHub repository and return it to the API endpoint". To GraphQL, it’s just another field to query for. But what if we don’t just want to read this data and manipulate it in our API — what if we could take information contained in these files and expose them as fields in my modified GitHub API.

For this use case, we need to know if this repository has a TravisCI YAML file because if it does, we would be able to query TravisCI for the basis of this project. So we’ll expand the type repository that comes from the GitHub GraphQL API, add a new field hasTravisCi (it will be a boolean and will never be null).

To write a resolver, we first need to have the fragment. We query GitHub object in the master branch for the file master:.travis.yml which we are supposed to treat like a text block (a UTF-8 encoded string). Then, we take that file and check if it exists.

Conclusion

The main takeaway here is to stop treating your APIs as silos — there are relationships between them. When you start making use of those relationships programmatically, you can seamlessly access data across these APIs to make the life of your developers easier.

To rehash, here are key improvements you stand to benefit from when working with GraphQL and schema stitching:

  • When you query for data, you simply get the results in JSON
  • While working with data from one API (such as the Repository object from GitHub in the example above), you can also ask for information that another API (Contentful in this instance) holds (such as which team maintains the repository)
  • It may take a little longer since it has to go through two APIs, but you will get all the information (which team and department owns the repo) seamlessly in the same response as the information from GitHub
  • You can query if this repository has unit tests on TravisCI

Plus you get all the usual benefits of a GraphQL API: - An autocomplete feature lets you easily figure out names when you ask for fields on an object (e.g., the URL or SSH on a URL), - The ability to browse the schema to see all possible queries available on the stitched schema if you have a complicated request for the AP

Watch the recording of my GitHub Universe talk about GraphQL schema stitching to see this example in action.

Updates in your inbox

Subscribe to receive the most important updates. We send eNewsletters once a month that cover trending topics and feature updates.