GraphQL dynamic schema generation for changing data models

Published on December 21, 2018

20181219 BLOG Dynamic Schema Generation-01

Written by

Stephan Schneider

JavaScript Backend Developer

Inspiration for your inbox

Subscribe and stay up-to-date on best practices for delivering modern digital experiences.

Every Contentful user has their own content model, which consists of varying content types. With no two users’ content models being the same, each user’s data model is used to generate a GraphQL schema that is tailored to them.

This means we have a slightly different set of challenges compared to the average user when they use GraphQL or build GraphQL APIs. Dynamic schema generation works with the GraphQL API to generate GraphQL schemas that address different, changing models.

In this post, you will learn more about this fresh approach and how it works to your benefit.

Reflecting on how GraphQL APIs are built

I came about this topic when we attended the GraphQL EU conference where Leanne Shapton from Shopify talked about how to tackle API design. Subsequently, I stumbled across Kyle from GitHub who spoke on internal changes at GitHub which account for their GraphQL API while building it.

While I was listening to these talks, I thought they were wonderful but their approaches were very different from how we work in Contentful.

Shopify is a good example for a traditional GraphQL API. They have a lot of customers but their data model is shared across all of them. This means they are able to give a lot of love and detail into schema generation, down to aspects such as how to group properties, how object types are linked to one another, and determining cases when these relations need pagination. They can analyze the potential size of the relations and based on this treat them as an array or collection. Leanne went into great detail about how they were thinking about each and every property, and how to name them and make them user friendly.

This is a typical example of how a company thinks about and designs their schema. But for organizations who are unable to give a lot of time into the details of their schema, there are many helpers in the GraphQL ecosystem.The easiest would be writing the schema in (the so-called) Schema Definition Language (SDL). For existing databases, a reflection over their structure is also possible, enabling a headstart into GraphQL land.

At Contentful, because our customers define the shape of their content, we’re not designing one API but thousands. As a result, the ecosystem out there isn’t particularly accommodating to our use case as most tooling is for single static schemas. We have to get your data model that you’ve designed and make the best out of it to generate a GraphQL schema that you can work with.

Deviation from the usual REST approach and a big shift in responsibility

In the past, we've defined how the API looks, documenting "sys" fields and collections. With only the data in "fields" being different for each customer, it looked the same for everyone. Now, this is different. We have a statically-typed setup that requires generating types upfront for everything in your content model. Documenting this lead to the team creating an example schema for use in our tutorials and documentation. It shows examples of how the API is derived and used.

The other option is letting the user describe the schema they want in SDL. Prisma is a good example of this practice, but this route comes with its own set of problems. It requires double the effort when changing the shape of your data model. To carry out the change, you would have to update both the content type in Contentful and the SDL. This tends to result in discrepancies –– especially in the time between content type change until the SDL is updated –– and difficulty in understanding user intention. A difficult task with the official SDL being available. Without creating new syntax on top of the SDL, it wouldn't have been possible. The SDL would also be specific to your SaaS, such as Contentful. The result? A vendor lock-in and a steeper learning curve. We quickly disqualified this as our first choice.

That said, it might still be an option later to perform fine-tuning of the schema to your needs. We're presently focusing on getting the best out of it without requiring you to manually write anything.

Design decisions and schema

In order to get the best developer experience out of the schema, we thought it would be useful to discuss the design decisions we made throughout the process.

1. Greenfield view layer

The first decision was that the new API should be a greenfield view layer (not to be confused with a greenfield app). The key differences that exist and that we strived to achieve include:

No need for changes to the models in a best-case scenario, because nobody really wants to change all the existing data just to be able to use a new API.
Working on the same dataset what we already have and same internal data query adapter. Internally it works much like how things are with REST and the Management API, just the final output looks slightly different and outside of GraphQL specifications.
We can break expectations where necessary and fix them since you can’t expect everything to work the same way as you would with REST. GraphQL API needs to be learnt and users will have to read documentation to discover how items are derived. This allowed us to drop maintaining backward-compatibility (though other assumptions you have with Contentful are still upheld).

2. Using Content Type IDs for type names

This sounds obvious at the beginning but we actually started with content type names. For the following example, we have name that looks much like sys id. Type names in GraphQL are generally in Pascal case (the first letter is capitalized, the rest is classic camelCase)

In this example, it’s obvious the type name should be Restaurant. But how do we deal with the following example?

There’s a different name and ID here, where the name can be changed by editors but content type ID cannot. This is a bonus for the name since we used to have automatically-generated content type IDs (and still do if you use the POST endpoint) which can be invalid if they were to ever start with a number.

We analyzed how many content type IDs and names would fail to generate, and discovered fewer content type names would fail. This led us to go with that at the beginning, but after customer research, the feedback was that customers did not prefer this option since they did not want to risk their schema breaking in the event that an editor decided to rename a content type name.

Since we were in alpha and could perform breaking changes, we decided to go with content type ID and now, we even support the auto-generated IDs that were invalid by prefixing it with ContentType. It looks weird when you query it, but at least the schema generation doesn’t fail and you can still use the GraphQL API. (pro tip: GraphQL allows field aliases, so you can rename a contentType54xct2kls4ksjf8sks4(id: “hello-world”) to blogPost in the response)

So the answer to the question above is to go with Category, and not FoodCategoryCuisine.

3. Content Type validations are reused to derive links between types

Whenever you link single or multiple entries, or assets, you can define what content types they should be so you can limit editors and prevent them from linking wrong content to the fields. We decided to piggyback that by reusing the information given by the user to limit the types that are valid for these reference fields.

Here’s our restaurant example again which implements the Entry interface and links to categories (which in this case, would be RestaurantCategoriesCollection).

When you have a multi-reference link, we perform the same collection-style linking that is done in REST or entries endpoint. That’s an advantage. Where in REST, you could only resolve into all or none of linked entities using the include parameter, here you could specify to be given, for example, the first 10 linked categories.

The items here are a very generic type Entry because no link content type is defined.
Entry is the interface every single GraphQL type generated from a content type implements. For the example above, this means an entry from any content type in your space might be linked.

The implication when using it in GraphQL is that you need to explicitly tell the type system what content type you’re interested in. That makes the query cumbersome because we have to use … on Category every time we want to use that field.

What we do want is categories with links defined. Now the GraphQL schema knows that it only can be of a type Category, allowing you to directly use the fields of the type without giving a hint which type we’re interested in.

Note how the items type did change. Even those just exploring the schema will be able to get a better idea of how your data is actually structured.

4. The generated type namespace is user-owned

In our REST API everything is in sys and fields, where sys is populated by Contentful and everything in fields was user-owed, so you could use any name you want.

We decided the sys would still be scoped so there would be a namespace where we can put our Contentful-specific information such as version and update date. But we also decided to move fields out of the fields namespace to remove an indirection. This helps prevent overwhelming long path identifiers (such as restaurant.field.categories.fields.name).

Instead, the generated GraphQL type namespace should be user-owned. We surely still need to prevent a few names like sys, but as field IDs could be changed, we did not worry about breaking things for the user here as they could account for that and adjust accordingly.
Of course, we initially performed a check to avoid frequently-used names to prevent naming clashes wherever possible. (This is in contrast to immutable content type ID. They cannot be changed so we had to figure out content type prefixes to enable everybody using GraphQL)

5. Useful error messages

But of course, when something does go wrong (like a naming clash above), we don’t want to give you vague errors such as “your schema could not be generated”. We also put a lot of love into our error messages, which can look like this:

In this case, it’s easy to diagnose the issue: schema generation failed because it contained a field called sys, a reserved name.

The GraphQL specification allows us to populate the extensions field to provide more information about the error that has occurred. We also scope that to Contentful’s namespace, so when you later use schema stitching to combine multiple GraphQL APIs together, for example, no fields get overwritten when two different parts of your API are giving errors.

In this contentful extension namespace we have a code that is unique and immutable. This is a lesson from the REST API where we couldn’t change some error messages because clients might be relying on them. For the GraphQL API, we explicitly defined that the message could be changed at any time to make it more informative to the user and for machine readability. We have the extensions field where neither the code nor details won’t change. There’s also a request ID, so if you ever reach out to customer support, you just have to present it and our developers can figure out what’s going on.

Error messages will look something such as what we have above, but of course the details will always be dependent on what kind of error you’re running into but that is also done in a machine-readable way.

6. Latest design decision

Early alpha users will know that before this change, we had the locale in the query string. That was a very simple design decision for us because we didn’t want the user to have to specify locale on every single field or reference. We’ve put that into the query string and were good to go.

This bugged our product owner because they wanted to keep everything according to the GraphQL specifications which states queries should be transport-independent. If we were to rely on the query string, that would no longer be the case.

What we came up with was to use the context. To explain, whenever you’re in a GraphQL resolver, you have parent type you’ve gotten in, some argument and an object called context defined by yourself. What we came up with was a cool way to cascade the values via the context, meaning we’re finally allowed to specify locale as the argument.

Querying with a locale could look something like this:

Let’s say you want a restaurant in English, you’ll get the description in English — if it wasn’t cascading, you would need to specify the locale on the categories field again. If something doesn’t change, you wouldn’t want to repeat it over and over.

Through the cascades and context, we can walk up the path and find the closest locale definition of a parent. As a great side effect, this means you’re now able to switch between languages (which previously wasn’t possible). A difference compared to REST is that you cannot use the asterisk to get all locales at the same time.

This isn’t possible in GraphQL because we generate the type beforehand and cannot change it later based on the locale.

Let’s say you have a localized webpage but the restaurant is a British one so you’d probably know you won’t have a description in German. So you’d like to show the description in English, but to make users happy, the categories should be in the locales of the user currently visiting the page so that you are able to get the localized names for the categories. This allows the user to filter more naturally, hence improving the user experience.

The description would be in English, the category name would be in whatever locale we pass in and everything below that stays in the same.

Once built, we recognized the same system can be reused to switch between published and preview content.

Let’s say you’re reworking how the categories are named, but someone on your team is currently working on the restaurant data, and this might break things on your frontend while reworked since we only enforce validations once you publish. So when you’re working on categories , you might want to review how the restaurant in its currently-published shape looks with the new categories.

So you ask for the restaurant description in the published form, together with a preview of how the categories look like.

There are two implications:

The entire query won’t be cached (much like how the Preview API works so changes you make are reflected immediately)
You need a preview access token. The preview token allows you to also access published data.

Don’t delay, try it today

So where can you try all of this out? If you head over to our GraphQL Content API documentation, you can read through it and get started.

If you add /content/v1/spaces/$SPACE_ID, that’s where the GraphQL endpoint is — this namespace allows us to bring up other of our organizations into GraphQL without having one massive API from the start.

We’ve also gone with explicit version naming here because we have versioning in the content type header. We didn’t enforce its use with REST and this made it hard for REST to version it — we’ve done it here so you won’t accidentally mix versions up and break your data.

If you append /explore with an access token to the above, then a graphical user interface, GraphiQL, gets spun up. You can check out how things work there, write example queries, and experiment how your very own GraphQL schema looks like.

While we’ve mentioned we don’t want to rely on the query string, there’s an exception here since you cannot send the Authorization header with your access token in the browser, so you’ll have to append the access token into the query string like so:

.../explore?access_token=$ACCESS_TOKEN