Article Series icon

The Contentful SEO guide

Technical SEO elements

Picture of Joshua LohrPicture of Rob RemingtonPicture of Nick SwitzerPicture of Allan White

Joshua Lohr, Rob Remington, Nick Switzer, Allan White

Updated: December 6, 2023

This is chapter 5 of the series,The Contentful SEO guide.

Summary

“Technical SEO” is an area of responsibility that spans both your content and your code. It is where your structured content is transformed into machine-readable markup that search engines can “spider” effectively and accurately. This is achieved through the design, field validations, and clear coordination between your content model in Contentful and your web application.

Understanding the necessary outputs, structures, and best practices that search engines expect is essential for success in your overall SEO efforts. Common roles involved are content modelers (often admins), frontend developers, and devops or engineers responsible for site generation and delivery. One design goal is to keep the editorial experience as simple as possible for editorial teams; content creators shouldn’t have to understand these highly technical, specialist subjects to do their jobs.

In this chapter, we’ll review the most common technical elements required, and examine how your content models can be designed in Contentful to support these elements.

XML sitemap

XML sitemaps are structured data files that live on your site that help search engines index and understand your content. Google describes XML sitemaps as:

A file where you provide information about the pages, videos, and other files on your site, and the relationships between them. Search engines like Google read this file to crawl your site more efficiently. A sitemap tells Google which pages and files you think are important in your site, and also provides valuable information about these files. For example, when the page was last updated and any alternate language versions of the page. You can use a sitemap to provide information about specific types of content on your pages, including video, image, and news content.

These sitemaps help search engines maintain a site’s placement in search results and discover new and refreshed content. XML sitemap files must contain all indexable pages, have no errors or redirects, and be deployed at the domain’s root. If you have a large, complex site, or sections that are managed by different teams with different kinds of content, using a sitemap index file will let you manage multiple sitemap files.

Optimization & validation tips

  • The XML sitemap file (or XML sitemap index file) should be deployed at domain root: https://www.website.com/sitemap.xml
  • The XML sitemap should be as current as possible, auto-generated when new content is published or any metadata in the index is changed (such as title, description, or URL).
  • The XML sitemap file should contain all indexable pages.
  • If a page is set to “noindex” with the meta robots page-level element, then the URL should be automatically excluded from the XML sitemap.
  • If a page is set to redirect using (301 or 302 status codes) or returns an error (4xx or 5xx status codes), then the URL should be automatically excluded from the XML sitemap.
  • Omit canonicalized URLs. If a page is “canonicalized,” meaning the page specifies a canonical link element different from the page’s URL, then the URL should be automatically excluded from the XML sitemap. These pages will already be included through their natural path. This will also prevent the addition of URLs that are canonicalized to other domains.

Robots directives

We’re about to talk about robots, specifically robot.txt and meta robots tags. Both of these set controls for and give directives to bots (also known as search engine spiders) on how they behave on your website and subsequently index –– or don’t index –– your site’s content. They both have unique sets of controls and use cases, and can be used in conjunction to set strong signals about how you desire a bot to behave on your website, but, more importantly, should be thought of independently. Let’s dive into both.

Robots.txt

While not managed in Contentful, the crucial robots.txt file tells search engine crawlers what to crawl and what not to crawl by using directives instructing them on how to behave on your site, which can have a significant impact on content accessibility and visibility in organic search results. Additionally, the robots.txt file tells crawlers where to find the XML sitemap(s).

Directives can include:

  • instructions on how to index or not index site pages and folders,
  • which user agents (crawlers) can or can’t access, and
  • a path to the XML sitemap.

Optimization tips

  • Create and deploy robots.txt file at root domain, e.g., https://www.website.com/robots.txt
  • Recommend using the default directive allowing access to all crawlers to crawl the site:
User-agent: *
Allow: /
  • You may want to limit access to certain crawlers, but many crawlers will simply ignore this directive and crawl regardless or cloak themselves as googlebot, so it’s really not worth trying to control them with this method.
  • Include a path to your XML sitemap(s) in this format:
Sitemap: https://www.example.com/sitemap.xml
Sitemap: https://www.example.com/sitemap-2.xml
  • These days, there’s little need to block much of anything, and you should avoid unnecessary directives. If you need to remove or block a page from Google’s index, it’s best to use the meta robots tag.
  • Avoid “hiding” pages, folders, or admin paths using “disallow” directives as this can be a security and competitive risk as anyone can access this file and see these “hidden” paths. Check your competitor’s robots.txt files and see what they are “hiding.” It's fun, and can be enlightening!
  • Do not block asset folders such as JavaScript or CSS folders using “disallow” directives, per Google’s guidance, as search engines need these files to properly render this page. Blocking these assets can result in your content being improperly rendered and even considered as not mobile-friendly (since viewports rely on CSS), which can be a very damaging signal to visibility and ranking ability in search results.

Meta robots tags

Meta robots tags are html snippets utilized at the page level to control the indexation of an individual page. They are typically the most effective way to direct search engine crawlers to not index an individual page, and is more effective than canonicalization. Here’s what the snippet looks like in the code source:

<meta name="robots" content="noindex" />

Optimization tips

Here are some common use cases for meta robots “noindex” tags:

  • Preventing duplicate content (used in conjunction with canonical tags pointing at the preferred URL).
  • Hiding campaign-specific content, such as paid search landing pages, or pages not ideal or necessary for indexation, such as admin or thank you pages.
  • Preventing parameter indexation, such as internal search results.
  • Securing staging environments (not protected by authentication) from indexation.

Validation & help text

UI screenshot of meta robots noindex tag settings
Example Contentful UI settings for the meta robots noindex tag.

Validation

  • Provide content creators the ability to toggle “yes/no” for the addition of a meta robots noindex tag on individual pages in the “Building Block: Meta Information” content model.
  • When content creators employ a meta robots noindex tag, this should also trigger an exclusion of the page from the XML sitemap.
  • Note, there’s no need to include an index tag when content creators toggle “no” for a page to be indexed. It’s unnecessary code that is implied when a page does not include a noindex tag.

Help text

Here’s an example of what you can put in the “Help text” field which will show up below the noindex field to help content creators select the appropriate option:

“Selecting yes will keep the page from showing up in organic search results.”

Additionally, a “nofollow” tag is used to prevent search engine crawlers from following the links on a page. You may want to also have a toggle for this if your content strategy often employs guest posts, whose external links are all meant to carry a nofollow tag. Learn more about Google’s guidance on nofollow tags here.

Google Search Console & Bing/Baidu/Yandex Webmaster Tools verification

There are a few ways to verify Google Search Console and the other Webmaster Tools accounts to show that you are indeed the owner of the website, which of course provides super valuable access to organic search analytics, indexation controls and insights, visibility warnings and errors, and more.

One of the ways you can verify this is through adding a meta tag to your site’s <head>, which will look something like this:

<meta name="google-site-verification" content="......." />

This is often handled in your site code, but you can also manage these in Contentful such as part of a Site Settings content type that contains fields used globally by your website. These could include contact and social media information, along with fields for each platform you want to verify.

UI screenshot of content modeling for Google Search Console and other webmaster tool accounts
Example of a Site Settings content type containing content used globally for a website.

Up next

Next, we’ll talk about SEO best practices for URLs and redirects, including details around URL slugs, canonical link elements, breadcrumbs, and redirect management.

Written by

Picture of Joshua Lohr

Joshua Lohr

Josh is the SEO Lead at Contentful. With 15 years of experience working directly in SEO for global brands and agencies, he gets his kicks playing a variety of instruments and appreciating the nature of his adopted home in Scotland.

Picture of Rob Remington

Rob Remington

Practice Architect

Rob Remington is a Practice Architect at Contentful, where he collaborates across all areas of the company to ensure our technical teams have the resources they need to help our customers succeed.

Picture of Nick Switzer

Nick Switzer

Senior Solution Engineer

Nick is a technical people-person who lives for solving business problems with technology. He is a solution engineer at Contentful with 14 years of experience in the CMS space and a background in enterprise web development.

Picture of Allan White

Allan White

Senior Solution Engineer

Allan White is a Senior Enterprise Solutions Engineer at Contentful. A former Contentful customer, he has 25 years of experience in design, video, UX and web development. You can catch him on Twitter or in the Contentful Slack Community.

Unlock the power of digital content
Build experiences that drive your business forward with the composable content platform.
add-circle arrow-right remove style-two-pin-marker subtract-circle remove