Nowadays, search engines have become too smart to identify duplicate or near duplicate content in a website and exclude it from indexing. They only index the master web page, denoted by a canonical tag.
These are never new concepts that need to be given attention; they have been a part of the Google algorithm since 2009.
It’s the role of the website owner or SEO practitioner to provide the canonical URL for the master web page. If failed, the web page will exclude from indexing, stating duplicate Without User Selected Canonical.
In other cases, Google will select any of the identical web pages as canonical versions, which could hurt your website’s organic reach.
What is Canonical Tag?
Canonical Tag is also referred to as canonical URL. It is an HTML code that helps the search engine crawlers to understand the master version of the content.
Many websites have duplicate or near-duplicate content, and this makes it a complicated process for the search engine to find what is original. As search engines only index the original copy of the web page.
Canonical URL helps search engines like Google differentiate both original and duplicate web pages.
Here is the piece of code that informs the canonical version of a web page,
link rel=" canonical" href="https://yourdomain.com/blog/seo-strategy/" />
This an HTML code and should be placed between <head>….</head>
rel=”canonical” -> is the command sent to search engines about the canonical version.
href=”URL” -> Master or Original URL of the web page.
How Does Google Choose Canonical URL of a Web Page?
Indexing is a crucial part of SEO and is the only way that your content can rank in the SERP (search engine result page) to make your website a magnet that attracts organic traffic.
Google’s primary job is to index only the master copy of any content and tries to exclude duplicate versions. So, it checks for the original version with the help of user-selected canonical (link rel=”canonical”)
I hope you would go through our article on how a search engine works, in that we have elaborated on the process of indexing. Once the crawling and rendering steps are completed, the search engine looks at its index database to diagnose the original version (canonical) of this web page.
Once it finds that the web page is original, Google starts inspecting user-selected canonical. If both user-selected and Google are in the same line, then the web page will be indexed.
If either the other has a contrast in the canonical URL, Google excludes it from indexing under a page index issue (previously known as coverage issues) duplicate, submitted URL not selected as canonical.
If any page is any web page without user-selected canonical, again there is a chance for the potential web page to exclude from indexing by duplicate without user-selected canonical.
Why is Canonical Tag is Important in SEO?
1. Crawl Budget and Priority
Every search engine crawler allocate specific amount of crawl budget for every website . Crawlers index the canonical page, and exclude duplicate or similar content from indexing.
Sametime, when you provide canonical tag for potential page, search engines crawl them frequently, and duplicate pages are crawlers very rare.
2. Rank Specific Web Page for the Query:
Usually, big websites have various version of web pages for better user experience in various devices. For example, you can see three various UX/UI for Amazon website.
So, there can be many URL associated with like,
https://yourdomain.com/
https://m.yourdomain.com/
https://amp.yourdomain.com/
But, the page to be indexed or ranked should be the original version (https://yourdomain.com/), the canonical tag.
3. To Pass Link Juice to Potential Page:
When a website has duplicate content or relatively similar content, such as dynamic URL or HTTP/HTTPS. In such case, the Page rank or link juice couldn’t be able to pass towards the potential web page.
When canonical URL is setup for master and duplicate web page, the link juice will pass only to master web page, and make it rank in SERP.
4. To Avoid Coverage Issues:
There are various Google index coverage issues related to canonical tag. They are,
- Alternate Page with Proper canonical Tag
- Duplicate, without user-selected canonical
- Duplicate, submitted URL not selected as canonical.
So, optimize your website with proper canonical tag strategies.
Reason for Duplicate Web Pages, if Content Aren't Duplicate:
It does not necessarily need to optimize a web page only when you have duplicate content (copied internal content). Duplicate content or nearly-similar issues can arise for the following reasons.
1. URL of Multiple Device Versions:
A website with different designs for all the devices can come under coverage issues, excluding duplicate pages.
- AMP and Non-AMP versions of web pages.
- Separate URL for mobile, desktop, and tablet devices
2. Search terms in URL at the end of absolute URL:
Every website has an option to search for any product or service, or blog.
So, when users search for any search term like “protein powder.”
The URL comes as,
https://yourdomain.com/?q=protein+powder (URL with search parameter)
https://yourdomain.com (Absolute URL)
So, a URL with the search parameter is a duplicate version of the absolute URL.
3. HTTP and HTTPS version of Web Page:
This is another reason is HTTP and HTTPS versions in the crawl queue. This could make the HTTPS version a duplicate version of the user-selected canonical is not implemented.
https://yourdomain.com/
http://yourdomain.com/
4. www and non-www Version of Web Pages:
Usually, every website focus on URL redirect resolution, so the www or non-www version has to redirect to the absolute version of the website.
We suggest keeping URLs with the non-www version.
https://www.yourdomain.com/
https://yourdomain.com/
5. Web Pages with and without Trailing Slashes:
Common issues that we find potential web pages excluded from indexing under alternate page with proper canonical tag. In most cases, the absolute web page ending with a trailing slash becomes a duplicate.
https://yourdomain.com/seo-strategy/
https://yourdomain.com/seo-strategy
6. Dynamic Web Pages create dynamic URLs:
When a website is dynamic, it creates multiple versions as dynamic URLs in the cases of products (size, color, etc.), events (session ids)
https://yourdomain.com/products?category=dresses&color=green
https://yourdomain.com/product/?size=medium
7. Multiple URLs for the same Web Pages:
Sometimes, the blog setting (autosave) might save two URLs (published & edited URL) before indexing. This also could cause a duplicate issue, and you need the canonical tag to solve this.
https://yourdomain.com/canonical-tag/
https://yourdomain.com/what-is-canonical-tag/
How to Optimize Canonical Tag:
Setting rel=canonical of HTML tag is not the only step to fix canonical issues. Yet, it is the primary step you should be processing while optimizing a web page.
Here are a few steps that you should follow to fix the canonical tag issues,
- HTTP header.
- XML Sitemap
- 301 Redirect
- Internal Links
Canonical tag in HTTP Header:
When you have a pdf or e-book, these web pages are not built in HTML codes, and thus don’t have a header section to set rel=canonical.
In such cases, you can implement the canonical tag in HTTP Header. Also, you can set the canonical tag using the HTTP header for standard web pages too.
Here is a sample of how a canonical tag looks in the HTTP header.
HTTP/1.1 200 OK Content-Type: application/pdf Link: <https://yourdomain.com/blog/canonical-tags/>; rel="canonical"
Canonical Web Pages in XML Sitemap:
XML sitemap has a huge role in crawling and indexing web pages. Search engine crawlers believe the URL in sitemap.xml is the master version. So, you should only keep the canonical web pages and exclude non-canonical web pages from sitemap.xml.
Google stated, “We can’t guarantee that all the web pages in the sitemap are canonical, but when it comes to large websites, sitemaps are considered a checkpoint to validate canonical web pages.”
301 Redirect of Duplicate Page to Canonical Page:
301 redirects are one of the fine optimization protocols that have to be performed in when web pages where the duplicate issue arise for the following reasons,
- www and non-www web pages
- HTTP and HTTPS versions.
- URL ending with or without trailing slashes
- Default Content in index pages
Example,
https://yourdomain.com/
https://www.yourdomain.com/
http://yourdomain.com/
https://yourdomain.com
https://yourdomain.com/index.php
In the above 5 URLs, the canonical version is https://yourdomain.com/, then, you should redirect all other web pages to reach the canonical web page.
Canonical URL Optimization using Internal Links:
Internal links help crawlers to discover new web pages; it help users to navigate to informational pages. But, these internal links sometimes can harm the canonical web page from indexing and ranking.
When you please a link over a text, always keep the canonical web page (absolute URL).
When you misplace the canonical URL. i.e. missing slash at end or www version or http version, etc, can cause an issue in canonical tag for absolute URL.
So, always provide the canonical URL while optimizing a content with internal links.