By Jakub Elzbieciak, on May 26, 2020

Building a serverless, AI image tagging app for Contentful

Person working with Contentfuls AI image tagging app framework

When we launched the Contentful App Framework in February, you probably saw the various apps we made available in our marketplace, including Netlify, Optimizely and Jira. They are ready-to-use add-ons for your Contentful spaces, integrating content editing experience with third-party tools.

But the main goal of our work on the App Framework is to create an open platform. We thought that we — as engineers at Contentful — shouldn’t be privileged over developers working for our customers. An external developer should be able to build any app that was part of the release using public APIs, libraries and SDKs.

In this article, we’ll dive into the architecture and deployment of the AI Image Tagging app. After reading, you’ll be able to create your own version of the app or use it as the basis for a similar use case.

AI Image Tagging App

The AI Image Tagging app uses AI to automatically assign tags to images. The tags are visible and accessible through the Contentful web app entry editor, and tags are searchable in the search bar.

The app packages:

  • An API that accepts images and returns tags

  • Installation screen that creates a content type used for storing image metadata

  • Editorial widget that calls the API and updates image metadata with tags

To make it happen, we use AWS Lambda and API Gateway to serve HTTP endpoints. UIs are built with an SDK and React components from our design system, Forma 36.

Now, on to how we built it. This article will give you an architectural overview of the app. The full code is available on GitHub in our open-source repository.

API-first

At Contentful, we’re big believers in the “API-first” approach. It means that we build features first as API features and everything else follows: user interfaces, CLIs and SDKs.

We need an API that, given an image stored in Contentful, will return a list of labels describing the content of the image. Of course, the hardest part is to perform detection, but thankfully we don’t need to implement it on our own. Instead we can use AWS Rekognition, an AI-backed image recognition service.

DetectLabels API method requires a base64-encoded binary of the image. We could transfer the whole image to our endpoint, but it would make calling the endpoint harder, especially from a browser.

To be smarter, we can inspect the payload of the Contentful Asset entity:

As you can see the file.url property of the asset contains a URL. Its general format is:

So instead of sending the whole image to our API, we could only send four identifiers listed above and fetch the image directly from the Images API in our service. The general flow looks like this:

Flowchart describing how images get tagged by AI image tagging api

This kind of API can be implemented using Express, node-fetch and the AWS node.js SDK. We managed to put it together in couple of hours (see code) and now can call it:

Deploying the API

We created our API as a regular Express server. Thanks to this, you can run it locally and test it as you would test any other node.js service.

But to make this API available to users, we need to deploy it to the public internet. This is where AWS Lambda comes in handy. Using a serverless technology to deploy our services reduces both maintenance overhead (no servers to manage) and cost (you pay only for real usage, not servers idling).

Thanks to AWS Serverless Express, any Express app can be wrapped to serve as a handler for a Lambda function.

With our server exported from app.js, we need less than 10 lines of code:

exports.handler is now ready to accept requests from API Gateway while running in the AWS Lambda environment.

The last step is to deploy our service to AWS. One of the popular options is Serverless Framework, which takes care of packaging and creation of required resources. You only need to create a declarative configuration file, serverless.yml to allow your service to call DetectLabels and route all the incoming traffic to our Express app:

Once we’re ready with the configuration, type sls deploy and see your API go live!

Exposing the functionality to end-user

So far so good: we can call our public API to tag any of our images stored in Contentful. To expose this functionality to editors in spaces, we need to create a user interface. With some help from our team’s designer, we prepared the following screens:

Installation screen

Space administrators use the installation screen to enable AI tagging in the entry editor. Tags are stored in a field of a special content type that references an image and stores extra metadata, including said tags. The installation screen will ask for name and ID for the content type and create it automatically during the installation.

The AI image taggingAPI installation screen

Field editor view

Once the installation process is completed, editors can use auto-tagging in the entry editor. Clicking the “Auto-tag” button will call the API we created and populate tag input with tags returned. It should be possible to add tags on your own, too.

Field editor view

Implementing the frontend

To implement the frontend for our app, we use the SDK for getting the value of an asset that a user wants to tag. By adding Forma 36 to the mix, we can produce native-like user interfaces that are consistent with the look and feel of Contentful.

Because our AI image tagging app consists of two views, we need to provide components for two “locations”. Location represents a place in the Contentful web app that can be controlled by an app:

Once our frontend is ready, we need to make it publicly available again. Because we’re using Express, we can use the static middleware:

Voila! Now both the API and frontend are served by our Express app which we can deploy to AWS Lambda.

Creating an app definition

With both components deployed to AWS Lambda, the last step is to create an AppDefinition. App definition is a Contentful entity that will make the custom app we’ve created available for installation in all space environments of an organization. 

While all app-related entities can be managed using the API or the SDK, the easiest way to get started is to use app management view inside the organization settings:

Screenshot at the AI image tagging app definition screen

In the app management view, we can name the app, provide the URL of its deployment to AWS Lambda and select locations which are implemented. In our case, we implemented the app configuration screen and entry field widget operating on lists of short strings (list of tags provided, either manually or using the AI auto-tagging API).

Summary

In this article we’ve presented a real-life process of architecting and building one of the 30+ integrations available now in Contentful Marketplace. You can start using it right away here in the web app.

If you’re interested in all the technical details of implementation or want to fork the app and tweak it to fit your needs, there’s good news: all the code is open sourced on GitHub!

Jakub Elzbieciak

Staff Software Engineer and Extensibility Team Lead at Contentful.