When to canonicalize, not index, or do nothing with similar content


Imagine your content as you do it yourself. Do you carry luggage that you could get rid of? Wearing something you want to keep but maybe want to reuse or look different?

It’s no different when it comes to website content. We’ve probably all sat down as a group of minds to think about what content we’d like to cut out of our website, but we realize there’s always a need, whether it’s for a specific prospect, an internal team, etc

As we look for ways to keep our websites as small as possible for content management purposes, we also want to do the same to appease search engine crawlers.

We want their daily visit, we hope, to our websites to be quick and succinct.

This hopefully shows them who we are, what we do, and ultimately, if we must have content that cannot be removed, how we label it for them.

Fortunately, search engine crawlers want to understand our content as much as we want them to. We have chances to canonicalize content and noindex content.

However, be careful, not doing this correctly could cause important website content to be misunderstood by search engine crawlers or not read at all.


Screenshot by author, July 2022

Canonical tags are a great way to tell search engines: “Yeah, we know that content isn’t that unique or valuable, but we have to have it.”

It can also be a great way to boost content from another domain or vice versa.

Nevertheless, now is the time to show crawlers how you perceive website content.

To use it, you must place this tag in the head section of the source code.

The canonical tag can be a great way to manage content that you know is duplicate or similar, but it needs to exist for the needs of users on the site or a slow site maintenance team.

If you think this tag is a good fit for your website, examine your website and address sections of the site that appear to have separate URLs but have similar content (e.g. text, image, headings, title elements , etc.).

Website audit tools such as howling frog and the Semrush Site Audit section are a quick way to see content similarities.

If you think there might be other culprits of similar content, you can dig deeper with tools like Similar Pages Checker and Sitelinerwhich will scan your site for similar content.

Now that you have a good idea of ​​the cases of similarity, you need to understand if this lack of uniqueness deserves to be canonized. Here are some examples and solutions:

Example 1: Your website exists on both HTTP and HTTPS versions of site pages, or your website exists on both www. and non-www. page versions.

The solution: Place a canonical tag on the page version with the most links, internal links, etc., until you can redirect all duplicate pages one by one.

Example 2: You are selling very similar products where there is no unique copy on these pages but slight variations in name, image, price, etc. Should you canonically point specific product pages to the product’s parent page?

The solution: My advice here is to do nothing. These pages are unique enough to be indexed. They have unique names that differentiate them, which might help you with instances of long-tail keywords.

Example 3: You sell t-shirts but have a page for each color and each shirt.

The solution: Canonical tag the color pages to reference the parent folder page. Each page is not a particular product, just a very similar variation.

Use Case: Canonical Markup Content Unique Enough to Succeed

Similar to the example shown above, I wanted to explain that sometimes slightly similar content may still be appropriate for indexing.

What if it was shirts with child pages for different types of shirts like long sleeves, tank tops, etc. ? It now becomes a different product, not just a variation. As also mentioned earlier, this can be effective for long-tail web searches.

Here’s a great example: a car sales site that has pages for car brands, related models, and variants of those models (2Dr, 4Dr, V8, V6, deluxe edition, etc.). The initial thought with this site is that all variations are simply close duplicates of the template pages.

You might be wondering why would we want to annoy search engines with this almost duplicate content when we can canonicalize these pages to point to the template page as the representative page?

We have moved in this direction, but even so, concern about whether these pages could be successful prompted us to canonically tag each respective template page.

Suppose you canonically tag the parent template page. Even if you show content importance/hierarchy to search engines, they may still rank the canonicalized page if the search is relatively specific.

So what did we see?

We’ve seen organic traffic increase to child and parent pages. In my opinion, when you give credit to child pages, the parent page seems to have more authority because it has many child pages that are now given “credit”.

Monthly traffic to all of these pages has increased fivefold.

Since September of this year when we revised the canonical tags, there is now 5x the monthly organic traffic to this site area, with 754 pages driving organic traffic compared to the 154 recognized earlier the year before.

Monthly traffic to all of these pages has increased fivefold.Screenshot by author with Semrush, July 2022

Don’t Make These Canonization Mistakes

  • Setting canonical tags that get redirected before moving to the final page can do a huge disservice. This will slow down search engines as it will force them to try to figure out the importance of the content, but they are now skipping the URLs.
  • Similarly, if you point canonical tags to target URLs that are 404 error pages, you’re essentially pointing them to a wall.
  • Canonical markup on the wrong page version (e.g. www./non-www., HTTP/HTTPS). We have discussed discovering through website crawler tools that you might have unintended duplicate websites. Don’t confuse page importance with weaker page version.

No index?

You can also use the metabots noindex tag to exclude similar or duplicate content entirely.

Placing the noindex tag in the head section of your source code will prevent search engines from indexing those pages.

Warning: Although the noindex meta bot tag is a quick way to remove duplicate content from ranking consideration, it can be dangerous to your organic traffic if you use it incorrectly.

This tag has been used in the past to weed out large sites to present only search-critical site pages so that site crawl spend is as efficient as possible.

However, you want search engines to see all relevant content on the site to understand the site’s taxonomy and page hierarchy.

However, if this tag doesn’t scare you too much, you can use it to allow search engines to crawl and index only what you consider to be fresh and unique content.

Here are a few ways noindexing could be considered a solution:

Example 1: To help your customers, you can provide the manufacturer’s documentation, even if they already present it on their website.

The solution: Continue to provide documentation to help your customers on site, but do not index these pages.

They are already owned and indexed by the manufacturer, which probably has a lot more domain authority than you. In other words, you probably won’t be the ranking website for this content.

Example 2: You offer several different but similar products. The only differentiation is color, size, number, etc. We don’t want to waste exploration spending.

The solution: Resolve through the use of canonical tags. A long-tail search could generate qualified traffic because a given page would still be indexed and able to rank.

Example 3: You have a lot of old products that you don’t sell much anymore and that are no longer a priority.

The solution: This perfect scenario is probably found in a content or sales audit. If the products do little for the company, consider retirement.

Consider either canonically pointing these pages to relevant category pages or redirecting them to relevant category pages. These pages are age/trusted, may have links, and may have ratings.

Use case: Don’t sacrifice rankings/traffic for crawl spend

When it comes to our website, we know we want to put our best foot forward for search engines.

We don’t want to waste their time exploring, and we don’t want to make most of our content seem lacking in uniqueness.

In the example below, to reduce the clutter of somewhat similar product page content from search engine reviews, meta bot noindex tags were placed on the product variant pages children at the time of a domain transition/relaunch.

The graph below shows the total number of keywords that moved from one domain to another.

When meta bot noindex tags were removed, the total number of ranking terms increased by 50%.

When meta bot noindex tags were removed, the total number of ranking terms increased by 50%.Screenshot by author with Semrush, July 2022

Don’t Make These Mistakes Meta Robots Noindex

  • Do not place meta robots noindex tag on a page with inbound link value. If so, you should permanently redirect the page in question to another relevant page on the site. Placing the tag will eliminate the precious link equity you have.
  • If you are not indexing a page that is included in the main navigation, footer, or supporting navigation, make sure the directive is not “noindex, nofollow” but “noindex, follow” so that search engines crawling the site can still pass through the links on the unindexed page.


Sometimes it’s hard to separate from website content.

The canonical and meta robots noindex tags are a great way to preserve website functionality for all users while informing search engines.

In the end, be careful how you tag! It’s easy to lose search presence if you don’t fully understand the tagging process.

Featured Image: Jack Frog/Shutterstock


