What Is An XML Sitemap?

What Is An XML Sitemap

Date First Published: 26th July 2022

Topic: Web Design & Development

Subtopic: SEO

Article Type: Computer Terms & Definitions

Difficulty: Medium

Difficulty Level: 6/10

Learn more about what an XML sitemap is in this article.

Not to be confused with a hierarchical sitemap used to provide a navigation of a website.

An XML sitemap, also spelt as site map, is an XML file that provides search engines with a list of important pages, images, and videos of a website. This ensures that search engines can discover and crawl these pages more efficiently and is useful for discovering pages that are isolated from the rest of the content of the website. Additional information can be included about each URL, such as the priority, change frequency, and last modified date.

Although the Sitemaps protocol was introduced by Google in June 2005, other search engines, such as Bing and Yahoo support XML sitemaps. Anyone can create a sitemap by writing an XML file with the necessary information, uploading it to a web server, and then submitting it to Google Search Console or Bing Webmaster Tools. If using a CMS (e.g. WordPress), SEO plugins, such as Yoast can be used to automatically update sitemap files so that when a new page is added to a website, a link to that page will automatically be added to the sitemap file. If not using a CMS, sitemaps can be manually created, but this is a time-consuming process. Using a sitemap generator, such as XML Sitemaps.com will save a lot of time. Sitemap generators work by scanning the links of a website and then generating an XML sitemap file with those URLs.

An example of a sitemap file can be seen below.

<?xml version="1.0" encoding="utf-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>http://example.com/articles/page-3.html</loc> <changefreq>weekly</changefreq> <priority>0.5</priority> </url> <url> <loc>http://example.com/page-2.html</loc> <changefreq>weekly</changefreq> <priority>0.5</priority> </url> <url> <loc>http://example.com/page-3.html</loc> <changefreq>weekly</changefreq> <priority>0.5</priority> </url> </urlset>

The different XML tags of a sitemap can be seen in the table below.

XML Tag Required or Optional Purpose
<urlset> Required The sitemap file opens and closes with this tag and it references the current protocol standard.
<url> Required Each URL entry starts and closes with this tag.
<loc> Required The URL of the webpage is located in this tag. The URL must begin with 'http' or 'https' and contain the domain name. Relative URLs, such as '/example/example.html' cannot be used.
<lastmod> Optional Informs search engines of the last modified date of the page. The entered date should be in the W3C Datetime format, which is YYYY-MM-DD (e.g. 2022-07-26)
<changefreq> Optional Informs search engines of how frequently the page is likely to change. This value does not control how often they crawl the page, but gives them a hint. There is a slight chance that search engines will use this information, but it is generally believed that search engines don't pay much attention to them and automatically crawl pages based on how often they detect changes.

Using a higher frequency tag than necessary will not cause penalisation. For example, if the sitemap file said that the contact page changed daily, but it is only changed once per year, search engines will stop checking when they detect that the content has not changed the last couple of times they have checked. The 'changefreq' tag can be set to one of seven different frequencies, including 'never', 'yearly', 'monthly', 'weekly', 'daily', 'hourly', and 'always'. Pages marked as 'hourly' may be crawled less often than that and pages marked as 'never' does not block search engines from crawling them and they might occasionally crawl them.

<priority> Optional A numerical value that provides search engines with a rating of priority of a page on a website. It can range between 0 and 1. The higher the number, the higher the priority. The homepage of a website generally has the highest priority of 1.0 and most content will be 0.5. If no number is assigned, search engines will automatically assign a priority level to pages, but it is recommended to manually set them.

The priority of a page has no effect on the position of different websites on the search results as it does not compare those values to other websites. Instead, search engines use the priority value when selecting between URLs on the same website.

Is A Sitemap Required?

It is not absolutely necessary to have a sitemap file. Search engines can crawl and discover most pages of a website as long as they are properly linked. This means that they can be found through internal links on a website (e.g. through the navigation bar of a website) and are not hidden. Listing URLs in a sitemap does not guarantee that search engines will crawl and index that webpage, but it is beneficial to have a sitemap. A sitemap may be required for:

  • A large website with thousands of pages as search engines might miss crawling some of the new or recently modified pages.
  • A new website with no backlinks to it. Search engines may not discover websites if no other sites link to them as most web crawlers crawl the web by following links from one page to another.
  • A website that has a large number of media files, such as videos and images that need to be crawled by search engines. If a sitemap is provided, search engines can discover and crawl the media files so that they show up in the search engine results page.
  • A website with isolated pages that are not internally linked and can only be reached by manually typing the URL.

Maximum Sitemap Size

Sitemaps can be no larger than 50 MB and cannot contain any more than 50,000 URLs. If a sitemap file is larger than that, webmasters will have to split the URLs into separate sitemaps. These limits are to ensure that web servers do not get overloaded by serving very large files.


Feedback

  • Is there anything that you disagree with on this page?
  • Are there any spelling, grammatical, or punctuation errors on this page?
  • Are there any broken links or design errors on this page?

If so, it is important that you tell me as soon as possible on this page.