How ASTs power the GraphQL schema handling

At Contentful, we’re currently working hard on our new GraphQL API, which is already available in alpha phase 🎉. When you read about GraphQL today, you’ll quickly discover the main strong points it offers that the average GraphQL API user can take advantage of:

  • You can query what you want and save requests!
  • You’re dealing with a strongly-typed schema!
  • It offers a rich, evolving ecosystem for you to enjoy!

Implementing a GraphQL API is a different story though. For implementers, you’ll most likely come across the following advice:

"Map your database schema 1-to-1 to your types."

On our side, however, it’s not as easy as that since our content infrastructure lets the users define the structure of their content freely. This means we could be serving a particular user a very flat data entry structure while delivering complete content trees reaching several levels deep to another user. This flexibility means we deal with data structures of all kinds, making support for GraphQL trickier since we now have to create GraphQL schemas on the fly and deal with domain objects based on abstract syntax trees instead of simply mapping a database schema to GraphQL. If this sounds complicated, don't worry – this article will cover everything in detail.

Author’s Note: This article is based on a meetup talk I gave; a recording of the talk is linked at the end of this article.

The GraphQL abstract syntax tree - Dawn of a Schema

The foundation of any GraphQL API is a so-called abstract syntax tree which is heavily used server side to deal with schema definitions and parsing of the actual GraphQL query.

But what is an abstract syntax tree?

For me, the word abstract syntax tree (AST) is just a fancy way of describing deeply nested objects that hold all the information about about some source code—or in our case, GraphQL queries.

For example, let’s take Babel, a very popular JavaScript compiler that lets you write JavaScript that is not yet widely supported and convert it into older syntax. Babel transforms all the source code you throw at it into an abstract syntax tree, and then executes transformations on this tree. Afterwards, the updated and transformed tree is used to generate source code that not only works in the latest and greatest browsers but also browsers that haven’t seen updates in a while.

What’s included in the abstract syntax tree?

A great tool to inspect abstract syntax trees is AST Explorer. The site lets you quickly paste code from JavaScript to PHP to TypeScript and even GraphQL queries into the UI and then provides the resulting abstract syntax tree.

When we look at the following GraphQL query…

1
2
3
4
5
6
7
{
  course(id: "1toEOumnkEksWakieoeC6M") {
    fields {
      title
    }
  }
}

…the resulting abstract syntax tree (don’t worry too much about it) looks like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
{
  "kind": "Document",
  "definitions": [
    {
      "kind": "OperationDefinition",
      "operation": "query",
      "name": null,
      "variableDefinitions": null,
      "directives": [],
      "selectionSet": {
        "kind": "SelectionSet",
        "selections": [
          {
            "kind": "Field",
            "alias": null,
            "name": {
              "kind": "Name",
              "value": "course",
              "loc": {
                "start": 4,
                "end": 10
              }
            },
            "arguments": [
              {
                "kind": "Argument",
                "name": {
                  "kind": "Name",
                  "value": "id",
                  "loc": {
                    "start": 11,
                    "end": 13
                  }
                },
                "value": {
                  "kind": "StringValue",
                  "value": "1toEOumnkEksWakieoeC6M",
                  "loc": {
                    "start": 15,
                    "end": 39
                  }
                },
                "loc": {
                  "start": 11,
                  "end": 39
                }
              }
            ],
            "directives": []
            ...
            ...
            ...
          }
        ],
        "loc": {
          "start": 0,
          "end": 79
        }
      },
      "loc": {
        "start": 0,
        "end": 79
      }
    }
  ],
  "loc": {
    "start": 0,
    "end": 79
  }
}

The AST includes a lot of metadata, such as location in the source, or identifiers, such as argument names; and thanks to this deeply-nested JSON object, we now have all the power we need to work with GraphQL schemas and queries. All that meta information comes in handy when developing your own GraphQL server; for instance, from that, we can tell you the line of your query that’s causing problems easily.

For the schema, these POJOs (Plain Old JSON Objects) are usually translated into so-called domain objects. They encapsulate the information contained in the AST, but are enriched with methods and are proper instances of the GraphQL base types. For example, every type that has fields to select from will be created as a GraphQLObjectType instance. Now you can define a function on it how data should be fetched.

Lets say your API gives you location data in cartesian and geographic values as "location". For your GraphQL Location type you always want to serve back geographic coordinates, so you define a makeLocationFieldResolver like the following:

1
2
3
4
5
6
7
8
9
10
11
const resolverRoot = {
  cartesian: {},
  geographic: {
    latitude: 52.501817,
    longitude: 13.411247
  }
}

function makeLocationFieldResolver (field) {
  return (root) => root.geographic[field]
}

If our type definitions are available in the System Definition Language (SDL) format, we can construct the AST from it and assign resolvers to fields by using a nested object that has functions as its leaf-most values:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
// graphql-tools

const typeDefs = `
  type Location {
    lat: Float!
    lon: Float!
  }
`

const resolvers = {
  Location: {
    lat: makeLocationFieldResolver('latitude'),
    lon: makeLocationFieldResolver('longitude')
  }
}

const executableSchema = makeExecutableSchema({
  typeDefs,
  resolvers,
});

Of course it has to be a little different at Contentful, given that we don’t have a System Definition Language (SDL) at hand that we can parse. So what we do is simply creating those domain objects "by hand", based on the content model we obtain from the database.

1
2
3
4
5
6
7
8
9
// graphql-js

const locationType = new GraphQLObjectType({
  name: 'Location',
  fields: {
    lat: { type: GraphQLFloat, resolve: makeLocationFieldResolver('latitude') },
    lon: { type: GraphQLFloat, resolve: makeLocationFieldResolver('longitude') }
  }
})

"What about the line numbers for my errors? 😱" I hear you asking. Luckily, we only need to do that for the schema generation - we can fully leverage the usual GraphQL flow for query documents that you send us, from the string you send us down all the way to the response JSON.

The two sides of GraphQL – type system definition language and query document

To make GraphQL work, there are two main parts you have to focus on:

  • Server implementation of the GraphQL API endpoint has to provide a schema in a so-called type system definition language which defines what data is available at this endpoint.
  • On the client-side, a developer can then make requests that include a query document defining what data should be contained in the response.

SDL - the type system definition language

One of the strengths of GraphQL is that it’s based on strongly-typed schema definitions. These type definitions define how the data should look like and what queries are actually allowed with your GraphQL API. A type definition looks as follows:

1
2
3
4
type AssetFile {
  contentType: String
  fileName: String
}

The definition above defines that the type AssetFile has exactly two fields (contentType and fileName), with both being type String. The cool thing about that definition is now that we can use it inside of other type definitions.

1
2
3
type Person {
  image: AssetFile
}

The SDL makes it possible to define a complete data set:

  • What’s included in an entry?
  • How does entries relate to each other?
  • What can be accessed and where?

When you use tools like GraphiQL, an in-browser IDE to explore GraphQL endpoints, you may have noticed that you can easily discover the data available at the API endpoint by opening the docs section. The docs section includes all information based on the schema that was written in the SDL you defined.

Using a tool like GraphiQL, an in-browser IDE to explore GraphQL endpoints

Sidenote: The folks from Prisma also built a tool called GraphQL Playground which sits on top of GraphiQL adding a few extra features and a "more up to date" UI

The way these GraphQL tools work is that they send one initial request on startup – a so-called IntrospectionQuery, which is a standard GraphQL request that uses POST and includes a GraphQL query in the request payload. The requests performed by a GraphQL user can differ based on the use of different query types.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
query IntrospectionQuery {
  __schema {
    queryType { name }
    mutationType { name }
    subscriptionType { name }
    types {
      ...FullType
    }
    directives {
      name
      description
      locations
      args {
        ...InputValue
      }
    }
  }
}

The response to this introspection query provides all the schema information that is needed to provide API documentation, make auto-completion possible, and give the client-side developer all the guidance to happily query whatever data she is interested in.

The client side of things – the query document

Now that we have defined the available data schema, what’s missing is the GraphQL request that includes a query document. The query document is the actual GraphQL query that you already saw at the beginning of this article.

1
2
3
4
5
6
7
{
  course(id: "1toEOumnkEksWakieoeC6M") {
    fields {
      title
    }
  }
}

The query document is basically a string value that’s included in the payload hitting our GraphQL endpoint. The tools GraphiQL and GraphQL Playground will help in writing your first queries easily.

The combination of the query document and the SDL

So why are ASTs so important for GraphQL?

When a request hits our GraphQL endpoint, the schema written in SDL and the query document included in the request payload will be read and transformed into ASTs. If parsing succeeds, we can be sure that both the query and schema are valid; otherwise, we can display errors detailing where something is syntactically incorrect.

Then we visit each field name in the query document to check if a corresponding type definition is present in the schema and if they’re compatible—do they have the same amount of arguments and are these of the same types?

If these validations pass, we can proceed to respond to the request by resolving the resources requested in the query. Resolvers are a topic we won’t cover in this article but in case you’re interested, you can read Prisma’s introduction "GraphQL Server Basics: GraphQL Schemas, TypeDefs & Resolvers Explained"—it’s an excellent read!

Interaction of query documents and the SDL

Easy language processing thanks to abstract syntax trees

GraphQL’s power lies in its schema and type definitions which move API development to a completely new level. Thanks to the rich ecosystem, the tools and the concept of abstract syntax trees, it’s good fun to develop our new GraphQL endpoint at Contentful.

Moreover, it’s not only about developer experience but rather about a whole set of new possibilities. With ASTs, you can easily transform the resulting schema definition—this is, for example, what makes schema stitching easily possible.

Schema stitching is the idea that you can take two or more GraphQL schemas, and merge them into a single endpoint that can pull data from all of them.

Think about that for a moment—with GraphQL, we can very easily combine several APIs into a single powerful one. Combine this with the power of serverless technologies and API development as you currently know it will be something of the past. Be ready! ;)

Learn more about getting started with GraphQL and Contentful. Start by creating a free Contentful account, if you don't already have one, and find out how effortless our content infrastructure works with your code and static site projects.

Sidenote: Nikolas Burg also gave an excellent presentation on how to do schema stitching using Contentful and Prisma at our previous Contentful meetup in Berlin. It’s worth watching!

Recording of the talk

If reading is not your jam, I also spoke about this exact topic at one of our Contentful User Meetups. Check it out here:

Blog posts in your inbox

Subscribe to receive most important updates. We send emails once a month.