Published on August 5, 2025
In the late 19th century, companies across North America started copying the inventions of locomotive engineer Elijah McCoy. Eventually, with so many copies of McCoy’s work circulating, disgruntled railway bosses were struggling to work out which were the originals — and starting to make clear that what they really wanted was “The real McCoy.”
Or so the story goes. The point is, over a century later, distinguishing originals is still a business concern, but one that has shifted to the digital landscape — and onto search engines and generative AI (GenAI) engines. Now, search algorithms have to crawl millions of webpages and distinguish relevance from vast amounts of similar content, in order to deliver the results that they think will fulfill users’ requests.
And that isn’t a trivial concern. Like humans, search engines and GenAI engines appreciate efficiency. If you make them sift through vast amounts of similar content to identify the “real McCoy” version of a page, they're not going to be happy with you, and might end up penalizing you by dropping your ranking in search engine results pages (SERPs). That means losing engagement, clicks, and, ultimately, customers.
Fortunately, digital marketers have an advantage that old-timey railway tycoons didn’t: we can help search engines find what they’re looking for by using canonical link elements — also known as canonical URL tags, or more succinctly, canonicals.
In this post, we’re going to discuss canonicals and their value to search engine optimization (SEO) — and, by extension, generative engine optimization (GEO). We’ll talk about how they work, why they matter, and how Contentful helps you use them.
In the broadest sense, canonicals are a meta tag that brands can add to the HTML code of a webpage in order to speak to search engines and improve SEO performance.
Canonicals are added to the <head>
of the HTML code and is typically written in the following way:
The canonical URL tag helps the search engine determine whether the webpage is the authoritative version of the page — or whether the page should be overlooked in favor of that authoritative version.
But, how does it do that?
When search engines crawl a webpage, they look for the canonical URL tag. If the tag matches the webpage’s URL (if it “self-references”), then the search engine understands that the page is authoritative and should be indexed in the search engine results page (SERP).
If, on the other hand, the tag doesn't match the URL, that indicates that the page is canonicalized and that the search engine should ignore it in favor of the page at the specified URL.
Why do we need to point a search engine away from one page and toward another?
It’s essentially a strategy to make the search engine’s job easier, and, in so doing, boost the performance of your content in SERPs and in the answers delivered by GenAI chatbots.
We typically use canonicals in contexts where there are multiple pages with very similar content, or even multiple pages that are exactly the same. That might encompass ecommerce stores with very similar products or blogs with hundreds of posts that lead to extensive pagination.
Let’s look at a specific example of duplication: a shoe brand that offers hundreds of variations of the same shoe. In this example, the brand’s website has hundreds (if not thousands) of URLs for the same shoe, varying by size, color, style, and so on.
When a customer searches for a particular shoe, the search engine has to decide which shoe page to serve the user in its search results, choosing from all the variations of the same shoe: red, yellow, wide, narrow, size eight, size nine, and so on. All these variations essentially represent near exact duplicates of each other, with very small variations.
And so, without the canonical, the search results would be crowded with dozens or hundreds of variations of the same page. That would create internal competition — or cannibalization — for the same search result, and provide a poor experience for the user.
This kind of (undesirable) internal competition is known as “index bloat,” and it’s bad news for SEO and GEO.
Index bloat doesn’t do any favors for the brand’s SEO because it makes it harder for the search engine to identify the thing that the searcher wants. These similar or duplicate pages can confuse the search engine about what content to rank — to the point that it might even put pages in direct competition with each other. And this cannibalization can result in reduced visibility in organic search results.
That’s if the search or generative engine is even able to find your content. The crawler bot might be overwhelmed by all the similar (or duplicate) content that it’s having to sort through and end up missing some important or unique aspect of it — and, consequently, fail to show your page to a customer because it’s confused or doesn’t know if it is relevant.
This message in Google Search Console might indicate there's a problem, but let’s not pack it all in just yet because canonicals can help.
By adding the canonical tag to the head of the page, the brand tells the search engine exactly which page it would prefer to be indexed instead of the (potential) hundreds of similar or duplicate pages that the search engine might find, and have to crawl.
Thanks to the tag pointing to the canonical version, the search engine now only has to index one page. In the example above, the shoe brand might have used this canonical link element to consolidate similar or duplicate URLs:
Here the tag tells the search engine it only needs to index its “pepperoni-low-tops” page where the brand has included a filter for the user to select the precise size and other specifications that they want for their shoe.
Canonicals make things easier for search engine and generative engine web crawlers by helping to increase their crawl efficiency, but they also help brands by protecting them from the content cannibalization we mentioned earlier. That’s a dramatic way of saying that very similar pages within a single website can dilute each other’s SEO value, inflicting a net-SERP-loss, where a single, authoritative page would have delivered a win — in other words, a well ranked and cited result.
It’s worth noting that canonicals also make things easier for users, who get better, more relevant results, and don't have to click through multiple links for what is, essentially, the same page. That convenience and efficiency should make their digital experience and shopping journeys better.
We’ve explored canonicals’ SEO value for “very similar pages,” but what if you feel your website doesn’t have similar pages? Could you get by without worrying about canonicals?
Probably not. That’s because pages can be similar, or duplicated, in ways that might not be obvious.
Let’s look at a homepage, for example: https://www.pizza-pieds.com
. That’s not the only way that particular URL could be presented and accessed — it might also be: http://pizza-pieds.com
, https://pizza-pieds.com
, http://www.pizza-pieds.com
, and so on.
So, while each URL variation would take us to the same page, a search engine would interpret that as having to crawl three duplicate pages — and ideally, would like to know which is the preferred version of the URL.
If you factor in the complexity of multiple landing pages within the same site, and the application of subcategories and filters (shoe size, style, color, size, etc.), the possibility of duplication, and index bloat, increases dramatically.
And there are plenty of other contexts in which canonicals are necessary. For example, you might be launching a guest blog with a partner website and need to direct traffic to a specific version of that content, you might have American English and British English versions of your pages, or you might produce different versions of your page for desktop and mobile. Or you might need to include tracking code parameters (e.g., UTMs) that append to the end of URLs when users click certain links.
The point is, don’t underestimate the importance of adding canonicals to your pages — they could make a huge difference to their SERP rankings and to the experience that customers get when they browse.
The good news is, it’s relatively straightforward to add canonical tags to the code of webpages. In fact, it’s good practice to add a canonical tag to every page within your digital ecosystem, even if you don’t think it needs one (we’ll explain why later).
On the other hand, it’s also easy to get the tagging process wrong, and end up losing out on the valuable SEO and GEO benefits. If you only have a few pages within your digital ecosystem, it’s possible that you could add canonicals by hand, but if you have many pages to deal with, things get a little more complicated.
With that in mind, let’s take a closer look at the canonical tagging process.
You’ll need to work out which pages could be candidates for duplicate content, which means exploring your digital footprint using specialized crawling tools. In some cases, it’s going to be obvious which pages need a canonical tag but spotting duplicate content isn’t always intuitive so you’ll need to be thorough in your approach — and automated tools will help significantly.
Once you have your candidates, you'll need to define certain canonicalization rules. For example, our shoe brand might decide to make its “high-tops” subcategory the authoritative page because that not only makes it easier for the search algorithm, but gives customers shopping for high-tops a decent starting point on their journey. With that in mind, the brand would seek to tag all other high-tops subpages (that are applying, say, “color” and “size” filters to narrow the results) in order to point crawlers back to the “high-tops” canonical URL.
If that seems like a lot of work, don’t worry. As we mentioned, there are tools available that can crawl your site automatically and flag duplicate content for canonicalization. If you need some suggestions, check out crawling software like Screaming Frog (the current king of web crawler emulation), or an SEO software suite like Ahrefs or SEMrush, both of which offer crawling and auditing as part of their toolset.
Decide on your preferred URL — the page that you’d prefer search engines to index over other similar or duplicative pages. You’ll typically be using the page that provides the most SEO value, which means factoring in the relevant performance metrics, such as bounce rate, time on page, conversion rate, and so on.
SEO value might not always be the deciding factor here. In the context of co-released partner blogs, for example, where the same content is published to two different sites, you’d typically come to some arrangement with your partner about which version should be authoritative.
The final step in the canonicalization process is to insert the canonical tag pointing to the preferred URL into the <head> code of the relevant pages.
It’s obviously going to be time-consuming to add canonical tags manually — which is why you should, again, leverage technology if you have to add tags to more than a handful of pages. In most cases, your CMS will automatically generate a canonical tag based on the rules you have determined — that’s particularly useful when dealing with dynamic page creation and the hundreds of page variations that it can result in.
For example, if our shoe brand generated a page for the search query “shoe brand, black sneakers, size 10,” then it would also automatically generate a tag for that page that pointed to its “black sneakers” canonical page.
There are a few things you can do to help your canonical tagging efforts.
All pages should get a canonical tag — and this includes pages that have no similar or duplicate variations. These pages should get what we call a “self-referential” canonical tag, meaning that the tag points to itself (to its own page). This also applies to duplicate pages, where the authoritative page you select for indexation gets a self-referential canonical tag. It’s worth doing this because the self-referential tag simply reinforces the message to search engines that they’re delivering the right content to SERPs.
Homepage URLs may not seem like candidates for canonical tags, but are vulnerable to duplication (see above), and so it makes sense to add a canonical tag to your homepage as a priority. Being proactive about homepage canonicals mitigates against future duplication with tracking code parameters, UTMs, or the other various ways that your homepage URL can be linked.
You should audit your canonical tags regularly to make sure that they’re aligned well with what you want in the search engine’s index. This is especially important for canonical tags that are being generated dynamically. The best way to conduct your canonical audit is to review the indexing report in Google Search Console to determine which pages are being indexed, and which pages are not.
Take a robust, consistent approach to the way that you establish authoritative pages, and how you apply canonical tags. If you end up creating conflicting canonicals, where two pages point at each other, or canonical chains where you create a circle of canonical pages each pointing to the next, you’ll likely see your pages drop in Google SERP rankings as a result of the cannibalization.
If you publish content to multiple websites, pay special attention to the content that you’ve reused in each domain and, in particular, to potential duplication issues that might affect SEO performance. It may be worth adding canonical tags pointing to authoritative versions of pages simply as a safety measure if the rate of duplication is high or excessive.
Streamline the canonical tagging process by choosing the right content management system (CMS) for your website. Modern CMSes, like Contentful, do that by eliminating the need for manual coding interventions, along with the potential for human error. They also give you the ability to “override” the canonical tag that the CMS selects, which is particularly useful in situations where you have shared content across partner websites, or if you have content in syndication.
There’s no point creating great content if search engines can’t find it. With that in mind, canonical tagging should be a priority for your content strategy.
However, in legacy CMSes, users often need a level of technical expertise to publish and edit content. These systems can limit your ability to add canonical tags, and make it much harder to respond if SEO problems arise. If you discover a duplication issue, you might need to submit a ticket to your developers, and then wait for them to get around to amending the canonical — while your content slips down the search rankings.
Contentful’s headless CMS transforms the way you work with canonicals by giving nontechnical users easy access to, and direct control over, the content publication and editing process.
In Contentful, content and marketing teams can add canonical URLs to dedicated metadata fields quickly and easily, without any need for developer intervention once you’ve defined your content model to support this. You can do that in addition to other SERP-boosting metadata features, such as SEO rich text markup. Even better, as your website grows, you can leverage dynamic tagging to add canonicals to every page within your digital footprint, across different websites and in different languages — all with logic to support your content goals.
In other words, Contentful makes it simpler and faster for search engines to find your “real McCoy” content, for customers to engage with that content, and for you to achieve outstanding SEO at scale.
Subscribe for updates
Build better digital experiences with Contentful updates direct to your inbox.