Contentful makes smart speakers even smarter

January 17, 2023


Smart speakers like Amazon Alexa and Google Home enable users to voice simple commands to do everything from turning on the lights to updating their to-do lists.

And considering that over 60% of Americans used voice assistants in 2022, they can help you to connect with your customers where it’s most convenient for them.

While each user may phrase a spoken request slightly differently, commands and responses can be captured in a structured content model that empowers flexibility, scalability, and reuse across all of your digital channels. Imagine serving the same FAQ from your website to your customers as they ask, “Alexa, how can I….?”

Let’s take a look at how the Contentful® Composable Content Platform can fuel your voice-led experiences with Alexa, one of the current most popular voice assistants, while ensuring ease of use for your editorial team.

Starting with Alexa (and pancakes)

Smart assistants like Alexa are powered by custom apps called “skills.” When we develop our own skill, we programmatically map a spoken language request to the intention behind it so we can deliver the appropriate content. 

This “intent,” in Alexa-speak, represents the action a user is trying to accomplish, such as finding a recipe for homemade buttermilk pancakes (yum!).

But different people may ask for the same recipe in different ways, which is where “utterances” come in. “What is a recipe for pancakes,” “how do I make pancakes,” and simply “pancake recipe” are all utterances with the same intent.

With our intent, utterances, and the recipe itself in mind, we can start crafting our Contentful-Alexa interaction.

Option 1: Connect Alexa directly to Contentful

We’ll begin with a simpler, straightforward approach in which we set up our Alexa intent to query Contentful directly, and later we’ll explore how to extend this option to make it more scalable.

Diagram illustrating connecting Contentful with a website and Alexa Skill.

In this version, we set up our intent in the Alexa Developer Console to correspond to particular content in Contentful. 

We can structure our content model to allow us to either deliver a recipe the same way to both smart speakers and our website, or we may wish to provide different content to each channel: recipe content that is speakable, and recipe content that is better suited to reading on screen.

A recipe content type using references to model each step in the cooking process.

Our recipe content type in Contentful consists of a text field for the title and a references field to list each step in the recipe, such as mixing the ingredients, stirring the batter, and cooking.

A step content type detailing what actions to take in the recipe.

The content for each step provides text instructions for what actions to complete, like sifting ingredients together. 

In this example, we include a Boolean field to determine whether or not the step should be read by a virtual assistant.

Depending on your business needs, you may wish to take this a step further and model some content types for delivery only to smart speakers, while other content types may be appropriate for other channels. 

In each case, we attach a function to our intent in the Alexa Skill that will query Contentful when corresponding utterances are heard by our smart speaker. 

So when we ask, “Alexa, how do I make the best buttermilk pancakes?” Contentful will return a structured JSON object with the information we need to make our breakfast.

A view of our pancake recipe intent in the Amazon Developer Console showing different utterances, or spoken phrases, users may say to invoke Alexa.

Now, whenever we need to update our recipe, our editors can do so in Contentful, where the content for all of your digital channels lives, and where they can enjoy the best editing experience.

This method does the job, but we’ll need to create a new intent with associated utterances for every new recipe we add in Contentful. 

And if anything changes with our existing intents, we may need to update the request logic in the function to fetch the right content. 

The amount of work involved grows with the size of our content, so in our next approach we leverage the power of composable content to manage our recipes at scale.

Option 2: Scale using a search engine

Instead of creating new intents for every recipe, it would be great if we could templatize the utterances to work with any recipe. 

“Slots” allow us to do just that. They behave like variables in our utterances that allow us to make them more generic, so an utterance can transform from “how do I make pancakes” to simply “how {query}.”

The updated intent which uses a generalized query.

We then introduce a search engine like Algolia to index all of our recipe content, and thanks to the Algolia app we can integrate it with Contentful in about five minutes. Our skill function passes the contextual query (utterance) to Algolia, which responds with the relevant content.

Diagram illustrating use of a search index to scale the use of Contentful to power voice assistant content.

Our authors now can work entirely within Contentful without needing to create a new intent every time they add a new recipe, or worse, having to contact the dev team to add the intent for them. 

Thanks to the Algolia app, the search index updates automatically whenever recipe content changes. And finally, we could include the utterances as a field on our recipe content type and further metadata using Contentful tags to improve Algolia’s results behavior.

Moving forward

Considering voice delivery of your content moving forward, the logic outlined in this post in theory could be applied to a Google Action or the Apple Homekit app. 

Google is experimenting with an addition to structured data called Speakable (beta) that allows you to identify content within a web page that is meant for audio playback, and that can be served by Google Assistant devices, like Home.

As an added bonus, including structured content in the JSON-LD markup of your webpage could not only enable smart speakers to deliver your content, but also improve searchability and SEO, and enable special search features. Win-win!

Wrapping up

By some estimates, over 120 million adults in the United States used voice assistants at least once a month in 2022, and that number is expected to grow. 

Contentful not only enables you to easily structure and manage your content for omnichannel delivery, but its composable architecture also empowers you to integrate with the devices and systems you and your customers prefer.

And while it’s convenient when your hands are covered in pancake batter to ask Alexa for the next step in the recipe, we can also use structured content to meet all of our customers — like those who rely on voice assistants for accessibility reasons — where they are most comfortable.

Contentful can make integrations like this a lot of fun, and a lot less daunting than maybe they seemed at first. Still looking for some help? Our professional services team would be happy to chat. Drop us a line.

Take a tour!

See Contentful in action with a personalized walkthrough.

About the authors

Don't miss the latest

Get updates in your inbox
Discover new insights from the Contentful developer community each month.
add-circle arrow-right remove style-two-pin-marker subtract-circle remove