What is an XML sitemap?

An XML sitemap is a file that lists your site’s key URLs. Think of it as a map for search engines. It guides their crawlers to find and index your pages better. An XML sitemap is made for bots, not for people. An HTML sitemap, on the other hand, is for human visitors.

In addition, your sitemap can hold useful metadata. For example, it can show when a page was last changed. It can also list other language versions of a page. Plus, it can provide details about images and videos.

Why is an XML sitemap important for SEO?

A good sitemap is key for technical SEO. It is not a direct ranking factor, however. Its main job is to improve how search engines crawl your site.

Big websites often have “orphaned” pages. These pages have no internal links pointing to them. A sitemap makes sure search engines find all your content. This helps them discover new websites and new pages much faster.

Furthermore, a sitemap helps you guide a search engine’s crawl budget. You should only include your valuable, indexable pages. This tells search engines where to focus their time. It helps get your most vital pages indexed quickly.

When should you use an XML sitemap?

Most websites can benefit from a sitemap. However, it becomes vital in certain cases.

  • Your site is large. Google suggests using one for sites with over 500 pages. On large sites, some pages might get missed by crawlers. A sitemap prevents this.
  • Your site is new. New sites have few external links. Search bots find pages by following links. A sitemap gives them a direct path to your new content.
  • Your site has rich media. Do you use many videos and images? A sitemap helps Google find and index this content correctly. This is also true for news sites.

You might not need one if your site is small (under 500 pages). This is true if it has great internal linking. Moreover, many modern content systems like WordPress or Wix create one for you.

The Anatomy of an XML Sitemap

To manage a sitemap, you must understand its code. An XML sitemap is a text file. It has a specific syntax that search engines must be able to read.

Core Structure

Every sitemap starts with an XML declaration. This line defines the file type and its encoding.

<?xml version="1.0" encoding="UTF-8"?>Code language: HTML, XML (xml)

Next is the <urlset> tag. This is the main container for all your page URLs. It also has a namespace that points to the sitemap protocol.

Do you need an SEO Audit?

Let us help you boost your visibility and growth with a professional SEO audit.

Get in Touch
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
</urlset>Code language: HTML, XML (xml)

Essential Tags: <url> and <loc>

Inside the <urlset>, each page gets its own <url> tag. The <url> tag is a parent container for one URL’s information.

Inside each <url> tag, the <loc> tag is required. It holds the location, or URL, of the page. The URL must be the full, canonical version. For example, use https://www.example.com/page-name. Do not use relative URLs like /page-name/.

A simple entry looks like this:

<url>
<loc>https://www.example.com/foo.html</loc>
</url>Code language: HTML, XML (xml)

The <lastmod> Tag

The <lastmod> tag is optional but highly recommended. It tells search engines when the page’s content last changed. The date must use the YYYY-MM-DD format.

<url>
<loc>https://www.example.com/foo.html</loc>
<lastmod>2022-06-04</lastmod>
</url>Code language: HTML, XML (xml)

Accuracy is vital here. Google only uses this tag if it is consistently correct. If you update this date daily without changing the content, Google will learn to ignore it. The <lastmod> tag should reflect real content updates.

The Debate Over <priority> and <changefreq>

Two other tags, <priority> and <changefreq>, often cause confusion.

  • <priority> suggests a page’s importance from 0.0 to 1.0.
  • <changefreq> suggests how often a page changes (e.g., daily, weekly).

However, Google has stated that it ignores both of these tags.

These tags are leftovers from an older protocol. Modern crawlers are much smarter. They use hundreds of other signals to judge a page’s importance. Many tools still include these tags. This leads some people to think they are important. Focusing on them is a waste of time. Instead, focus on clean URLs and accurate <lastmod> dates.

A sitemap is a file where you provide information about the pages, videos, and other files on your site, and the relationships between them. Search engines like Google read this file to more intelligently crawl your site. It tells the crawler which files you think are important in your site and also provides valuable information about these files, such as when the page was last updated, how often the page is changed, and any alternate language versions of a page.

Types of Sitemaps

While the most common sitemap is for the pages on a website, you can create specialized sitemaps to help search engines discover specific types of content. These are all based on the standard XML sitemap format but include extra tags for specific content types.

Standard XML Sitemap

This is the most common type of sitemap. It lists the URLs of a website’s pages. It can also include optional metadata for each URL, such as:

  • _<_loc_>_: The URL of the page. (Required)
  • _<_lastmod_>_: The date the page was last modified.
  • _<_changefreq_>_: How frequently the page is likely to change (e.g., always, hourly, daily).
  • _<_priority_>_: The priority of this URL relative to other URLs on your site, on a scale from 0.0 to 1.0.

Video Sitemap

A video sitemap provides details about the video content on your pages. This helps search engines find and understand your videos, potentially showing them in video search results. Additional tags include:

  • <video:thumbnail_loc>: The URL of a thumbnail image for the video.
  • <video:title>: The title of the video.
  • <video:description>: A description of the video.
  • <video:duration>: The duration of the video in seconds.
  • <video:publication_date>: The date the video was published.

Google News Sitemap

If you run a news publication, a Google News sitemap can help Google discover your news articles quickly. This is crucial for time-sensitive content. It has specific tags like:

  • <news:publication>: The name of the news publication.
  • <news:publication_date>: The date the article was published.
  • <news:title>: The title of the news article.

Only URLs for articles published in the last two days should be included, and the sitemap can contain no more than 1,000 URLs.

Image Sitemap

An image sitemap helps search engines discover images on your site, which can lead to better visibility in image search results. For each page URL, you can list the images it contains. Important tags include:

  • <image:loc>: The URL of the image.
  • <image:caption>: The caption for the image.
  • <image:geo_location>: The geographic location of the image.
  • <image:title>: The title of the image.

Sitemap Index

A sitemap index is a file that lists multiple sitemap files. Think of it as a sitemap of sitemaps. 🗺️

You would use a sitemap index file in the following situations:

  • Large Websites: A single sitemap file has a limit. It cannot contain more than 50,000 URLs and must be no larger than 50MB when uncompressed. If your site exceeds these limits, you must break your URLs into multiple sitemap files. The sitemap index file is then used to list all these individual sitemaps.
  • Organization: You might want to organize your sitemaps by content type. For instance, you could have one sitemap for your blog posts, another for product pages, and a third for your videos. A sitemap index allows you to submit just one file to search engines, which then points them to all your organized sitemap files.

A sitemap index file uses a similar XML format to a standard sitemap but with different tags:

  • <sitemapindex>: The parent tag for the file.
  • <sitemap>: The parent tag for each sitemap listed in the index.
  • <loc>: The location (URL) of the individual sitemap file.
  • <lastmod>: The last modification date of the individual sitemap file.

How to Create an XML Sitemap

There are a few ways to create a sitemap. Choose a method that works for your site and skill level.

  1. Use a WordPress SEO Plugin. This is the easiest method for WordPress sites. Plugins like Yoast SEO or All in One SEO create dynamic sitemaps. They update automatically when you add or change content.
  2. Use an Online Generator. Tools like XML-Sitemaps.com are good for small, static sites. You enter your URL, and it creates a file for you to download. The main drawback is that the file is static. You must remake it after any site changes.
  3. Use a Desktop Crawler. Advanced users can use tools like Screaming Frog. These offer total control. You can crawl your site and create a clean sitemap. You can exclude pages based on many rules. This is ideal for large, complex websites.
  4. Manual Creation. You can write the code by hand in a text editor. This is only for tiny sites or developers. It is very easy to make mistakes. Automated methods are almost always better.

Submitting Your Sitemap

A great sitemap is useless if search engines don’t find it. You need to tell them where it is.

First, submit it to Google Search Console (GSC).

  1. Log in to your GSC account.
  2. Go to the “Sitemaps” report under the “Indexing” section.
  3. Enter your sitemap’s URL (e.g., sitemap.xml).
  4. Click “Submit.”

Second, add it to your robots.txt file. This file is at the root of your domain. Add this line, using your own sitemap’s full URL:

Sitemap: https://www.example.com/sitemap.xml

This acts as a signpost for all search engines, not just Google.

After you submit, check the Sitemaps report in GSC. It will show if Google fetched it successfully or if there are errors.

Common Sitemap Mistakes to Avoid

A bad sitemap can be worse than no sitemap. It can waste crawl budget and send mixed signals. Here are common errors.

  • Including the Wrong URLs. This is the biggest mistake. Your sitemap should be a clean list of your best pages. Never include non-canonical URLs, redirects (3xx), or error pages (4xx). Also, exclude pages with a “noindex” tag or pages blocked by robots.txt.
  • Size and Format Errors. A sitemap must be under 50MB and have fewer than 50,000 URLs. For larger sites, use a sitemap index file to split it into smaller parts.
  • Fetch Errors. If GSC reports it “Couldn’t fetch” your sitemap, check your robots.txt file. Make sure you are not blocking access to the sitemap file itself.
  • Syntax Errors. Simple coding mistakes can break the whole file. Use an XML sitemap validator tool to check your code before you submit it.

Fixing sitemap errors is important. They often point to deeper technical problems on your website. Use the GSC report as a tool to find and fix these larger issues.

Key Takeaways

Follow these best practices for your sitemap.

  • Keep it clean. Only include your best, canonical, indexable URLs.
  • Keep it fresh. Use a dynamic method to update it automatically.
  • Use a sitemap index for large sites.
  • Use specialized sitemaps for vital image and video content.
  • Submit your sitemap to GSC and add it to your robots.txt.
  • Monitor the sitemap report in GSC for any errors.
  • Focus on what matters. Do not waste time on <priority> or <changefreq>.

Not getting enough traffic from Google?

An SEO Audit will uncover hidden issues, fix mistakes, and show you how to win more visibility.

Request Your Audit

Related Posts