What Is A Canonical URL?

Date First Published: 8th August 2022

Topic: Web Design & Development

Subtopic: SEO

Article Type: Computer Terms & Definitions

Difficulty: Medium

Difficulty Level: 7/10

CONTENTS

How To Implement Canonical URLs?
Guidelines For Canonicalisation

Learn more about what a canonical URL is in this article.

A canonical URL is a HTML element used to tell search engines which URL is preferred when a single page is accessible by multiple URLs, such as the homepage being accessible by both ‘example.com’ and ‘example.com/index.html’. Without telling search engines the right canonical URL, search engines will automatically make the choice based on factors, such as HTTPS and page quality or they may consider both of them of equal importance, leading to duplicate content issues. Relying on search engines to automatically choose URLs as canonical is not recommended as they might select a URL that the owner of the website do not want to be canonical. The canonical link element was introduced in February 2009 by Google, Bing, and Yahoo.

When the rel=“canonical” tag is added to the head of the HTML page, search engines will crawl the specified URL as canonical and all other URLs will be considered duplicates and crawled less frequently. It can only be seen by viewing the source code of a HTML page.

301 redirects can be used for deprecating a duplicate URL. These redirect traffic from one URL to another. (e.g. if a page is reachable by two URLs with and without the 'www' prefix 'https://www.example.com and 'https://example.com', the URLs without the 'www' prefix could be chosen as the canonical version and all URLs could be redirected to there. These two URLs would be viewed as completely different pages by search engines even though they have the same content, which is why it is important to stick to one version and redirect all URLs from that version to the other.

Note:

In CMSes, such as WordPress, canonical tags can be automatically added for users with plugins, such as Yoast SEO, making it easier for webmasters and reducing the chances of mistakes.

How To Implement Canonical URLs?

For HTML documents, the canonical link element can be added to the head of the document by this HTML element.

An example of a HTML document that uses the canonical link element inside the <head> tag can be seen below. In this example, the code could be used on a page with a URL of https://example.com/index.html to tell search engines that https://example.com without the 'index.html' is the preferred version of the webpage.

<!DOCTYPE html>

<head>

<title>HTML Canonical URL Document</title>

</head>

<body>

</body>

</html>

Even though it is possible to map an unlimited number of duplicate URLs, adding the canonical link tag can increase the size of the page. In addition, this method only works for HTML pages since no other file types have the <head> tag.

For non-HTML documents, such as PDF files that have no method of placing canonical tags at the head of the page, another way of setting a canonical URL is the HTTP header. They cannot be set as part of the URL. For example, if an image name has a corresponding HTML page of the same name, the HTTP header could provide a canonical URL for the page associated with that URL. By checking the HTTP response headers in the inspector of a web browsers, it can be known whether it is working. Below is an example of the HTTP response headers of a PDF file with a corresponding HTML page with the URL of https://example.com/page.html.

HTTP/1.1 200 OK Content-Type: application/pdf Link: <https://example.com/page.html>; rel="canonical" Content-Length: 4223

This method is beneficial as it does not increase page size and an unlimited number of duplicate URLs can be mapped. However, it can be complex to maintain the mapping on large websites.

Guidelines For Canonicalisation

These guidelines should be followed for canonicalisation below:

Do not use the robots.txt file for canonicalisation purposes as blocking search engines from crawling a URL will make them unable to see any canonical tags on pages.
Do not block the canonicalised URL from being indexed by using the 'noindex' tag. This tag is designed to exclude a page from the search results rather than marking it as canonical.
Specify absolute (full) URLs rather than relative (partial) URLs in the canonical link element (e.g. specify 'https://example.com/page/page-1.html' instead of '/page/page-1.html'.
Do not point to a canonical URL that triggers a 4xx or 5xx error. Search engines do not index URLs that return with HTTP errors as they do not work. As a result, they will ignore the canonical tag pointing to these pages that trigger errors and end up automatically choosing the right canonical URL. Also, do not point to a canonical URL that redirects to another URL.
Do not use multiple canonical link tags as it is likely that having more than one of them will cause search engines to ignore them. This is often caused by tags added at several different points, such as manually, by CMSes, and by plugins.
Only list canonical URLs in sitemap files. Do not list non-canonical pages in sitemaps as sitemaps are a useful method of notifying search engines of important pages of a website, not unimportant pages that should not be indexed by search engines. It is not guaranteed that search engines will consider the sitemap URLs to be canonical, but it gives them a hint.
Do not have a non-secure HTTP version of a page and then specify the secure HTTPS version as canonical. Instead, implement a 301 redirect from HTTP to HTTPS. Users should not be able to access both HTTP and HTTPS versions of a website.
Avoid canonical chains. These occur when one or more pages specify a canonical URL and are then also canonicalised to another page. For example, Page A is canonicalised to Page B and then also canonicalised to Page C.