Date First Published: 26th July 2022
Topic: Web Design & Development
Subtopic: SEO
Article Type: Computer Terms & Definitions
Difficulty: MediumDifficulty Level: 6/10
Learn more about what an XML sitemap is in this article.
Not to be confused with a hierarchical sitemap used to provide a navigation of a website.
An XML sitemap, also spelt as site map, is an XML file that provides search engines with a list of important pages, images, and videos of a website. This ensures that search engines can discover and crawl these pages more efficiently and is useful for discovering pages that are isolated from the rest of the content of the website. Additional information can be included about each URL, such as the priority, change frequency, and last modified date.
Although the Sitemaps protocol was introduced by Google in June 2005, other search engines, such as Bing and Yahoo support XML sitemaps. Anyone can create a sitemap by writing an XML file with the necessary information, uploading it to a web server, and then submitting it to Google Search Console or Bing Webmaster Tools. If using a CMS (e.g. WordPress), SEO plugins, such as Yoast can be used to automatically update sitemap files so that when a new page is added to a website, a link to that page will automatically be added to the sitemap file. If not using a CMS, sitemaps can be manually created, but this is a time-consuming process. Using a sitemap generator, such as XML Sitemaps.com will save a lot of time. Sitemap generators work by scanning the links of a website and then generating an XML sitemap file with those URLs.
An example of a sitemap file can be seen below.
The different XML tags of a sitemap can be seen in the table below.
XML Tag | Required or Optional | Purpose |
---|---|---|
<urlset> | Required | The sitemap file opens and closes with this tag and it references the current protocol standard. |
<url> | Required | Each URL entry starts and closes with this tag. |
<loc> | Required | The URL of the webpage is located in this tag. The URL must begin with 'http' or 'https' and contain the domain name. Relative URLs, such as '/example/example.html' cannot be used. |
<lastmod> | Optional | Informs search engines of the last modified date of the page. The entered date should be in the W3C Datetime format, which is YYYY-MM-DD (e.g. 2022-07-26) |
<changefreq> | Optional | Informs search engines of how frequently the page is likely to change. This value does not control how often they crawl the page, but gives them a hint. There is a slight chance that search engines will use this information, but it is generally believed that search engines don't pay much attention to them and automatically crawl pages based on how often they detect changes. Using a higher frequency tag than necessary will not cause penalisation. For example, if the sitemap file said that the contact page changed daily, but it is only changed once per year, search engines will stop checking when they detect that the content has not changed the last couple of times they have checked. The 'changefreq' tag can be set to one of seven different frequencies, including 'never', 'yearly', 'monthly', 'weekly', 'daily', 'hourly', and 'always'. Pages marked as 'hourly' may be crawled less often than that and pages marked as 'never' does not block search engines from crawling them and they might occasionally crawl them. |
<priority> | Optional | A numerical value that provides search engines with a rating of priority of a page on a website. It can range between 0 and 1. The higher the number, the higher the priority. The homepage of a website generally has the highest priority of 1.0 and most content will be 0.5. If no number is assigned, search engines will automatically assign a priority level to pages, but it is recommended to manually set them. The priority of a page has no effect on the position of different websites on the search results as it does not compare those values to other websites. Instead, search engines use the priority value when selecting between URLs on the same website. |
It is not absolutely necessary to have a sitemap file. Search engines can crawl and discover most pages of a website as long as they are properly linked. This means that they can be found through internal links on a website (e.g. through the navigation bar of a website) and are not hidden. Listing URLs in a sitemap does not guarantee that search engines will crawl and index that webpage, but it is beneficial to have a sitemap. A sitemap may be required for:
Sitemaps can be no larger than 50 MB and cannot contain any more than 50,000 URLs. If a sitemap file is larger than that, webmasters will have to split the URLs into separate sitemaps. These limits are to ensure that web servers do not get overloaded by serving very large files.
If so, it is important that you tell me as soon as possible on this page.
Network Services Network Setups Network Standards Network Hardware Network Identifiers Network Software Internet Protocols Internet Organisations Data Transmission Technologies Web Development Web Design Web Advertising Web Applications Web Organisations Web Technologies Web Services SEO Threats To Systems, Data & Information Security Mechanisms & Technologies Computer Hardware Computer Software Ethics & Sustainability Legislation & User Data Protection