Building an XML Sitemap

An XML sitemap feed lists all of the pages on your website that you want the search engines to know about. While theoretically the search engines should be able to find all of your pages by following links, it still helps to have it there for completeness and to take advantage of the benefits that the webmaster tools offer.

For SEO purposes, it is essential that you (a) build an XML sitemap and (b) keep it up-to-date in order to help improve spiderability and ensure that all the important pages on your site are crawled and indexed. XML sitemaps give the search engines a complete list of the pages you want indexed, along with supplemental information about those pages, including how frequently the pages are updated. This does not guarantee that all pages will be crawled or indexed, but it can help.

It’s worth pointing out that an XML sitemap is different from the standard sitemap that you include on your site. XML sitemaps are feeds designed for search engines; they’re not for people. They are merely lists of URLs with some optional ​meta data about them that is meant to be spidered by a search engine. An HTML sitemap, on the other hand, is a web page on your site that is designed to be viewed by visitors and contains links to help them navigate your site.

XML sitemaps were designed to help sites that historically could not be crawled by the search engines (sites with dynamic content, Flash or Ajax) get their content spidered and listed in the index. That’s not to say that using an XML sitemap is a way around building a spiderable website, however, since all it does is hand a list of available URLs to the search engines. When creating a new site, you want to make sure that you are creating it from a sound search engine optimization standpoint. Creating a sitemap will not pass on any link popularity, nor will it help with subject theming.

An XML sitemap is created using XML (Extensible Markup Language), which is a type of markup language commonly used on the web where tags can be created to share information. The required XML tags are: <urlset>, <url>, and <loc>. The tags <urlset> and <url> are for formatting the XML, and <loc> is for identifying the URL.

Optional meta data tags are:

  • <lastmod> – last modified date
  • <changefreq> – how often the page changes (such as hourly, daily, monthly, never)
  • <priority> – how important the page is from 0 (the lowest) to 1 (the highest)

Site owners aren’t required to use these tags, but the engines may consult them when deciding how often they should re-crawl pages. Google states in their Webmaster Guidelines that while they take these tags into consideration, they do not base their spidering decisions on them, and <priority> does not have any influence on rankings. Use these tags accurately to help the search engines spider your site more intelligently. Pages that you are optimizing should be set to a higher priority. If you have archived pages that haven’t been updated in years, then they can be set to a low priority with a <changefreq> of “never”.

An XML sitemap listing for a URL looks like this:

 
  
< urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
< url>
< loc>https://www.bruceclay.com/in/</loc>
< lastmod>2008-01-01</lastmod>
< changefreq>monthly</changefreq>
< priority>1.0</priority>
</url>
</urlset>

 

If you don’t want to have to type that out for each of your site’s pages, fear not. There are quite a few sitemap generators that can spider your site and automatically build an XML sitemap file for you. Two of our favorites are:

Be careful to set up the sitemap generator tool properly to avoid spidering pages you do not want indexed.

For very large websites, your XML sitemap feed should be broken up into multiple files. Google has set a limit of 50,000 URLs and a file size of ​50MB (uncompressed). Once you have created the ​sitemap file, upload it to the root of your website (for example, http://www.your-domain-name.com/sitemap.xml). Once this is done, it’s time to let the search engines know about it. One way you can do that is to specify your XML sitemap in your robots.txt file by simply putting “sitemap:” and the URL. It should look something like this:

User-agent: *
sitemap: http://www.your-domain-name.com/sitemap.xml

 

Google and Bing also offer engine-specific ways for you to alert them to your XML sitemap.

Google:

You can submit your sitemap through Google Search Console (formerly known as Webmaster Tools). This will allow you to see when Google last downloaded your sitemap and any errors that may have occurred. Once you have validated your site, you can also view information such as Crawl Errors (including pages that were not found or timed out), ​Search Analytics (top search queries, click-through statistics, etc.), and Links to Your Site. There are also other useful tools like the robots.txt Analyzer.

Google ​Search Console is an incredibly valuable source of information that can help diagnose potential problems and give you a glimpse into the way Google views your website. Google now offers specialized ​sitemaps for Video, Mobile, News, and Code Search. These allow you to tell Google about news articles, videos, pages designed for mobile devices and publicly accessible source code on your website.

Bing:

M​icrosoft Bing has its own webmaster tools called Bing Webmaster Tools. Similar to Google’s, it allows you to add your XML sitemap feed and, once your site is validated, ​to view loads of information about your website.

Yahoo:

When you submit your XML ​sitemap feed to Bing Webmaster Tools, you are effectively submitting it to Yahoo as well, since Yahoo currently receives its search results from Bing.

Ask:

Ask supports ​sitemaps but requires that you mention it in your robots.txt file for them to find it. There is no tool for submitting an XML sitemap to Ask manually.

 

There are many benefits to creating an XML sitemap. If you launch a new site, issue a redesign or perform a large update, submitting a sitemap is ​a good way to alert the search engine to the new pages and potentially get these pages indexed sooner than if you just waited for the spiders to find them. Another benefit to sitemaps is the webmaster tools Google and Bing have built around them. These tools can give you valuable information about how the search engines see your site and help diagnose any potential problems that could hinder your rankings.

Once you create your XML sitemap and let the search engines know about it, make sure to keep it up-to-date. If you add or remove pages, make sure your sitemap reflects that. You should also check Google Search Console frequently to ensure that Google is not finding any errors in your sitemap.

The goal of XML sitemaps is to help search engines crawl efficiently and thoroughly. So help them by using these tags appropriately to help them understand how to best crawl your site. You can find more information about the Sitemaps protocol and XML schema at http://www.sitemaps.org.



BRUCE CLAY INDIA PVT LTD
BHive, 94. Ishwar Nagar, Shambhu Dayal Bagh,
Baghpur, Okhla, New Delhi – 110020,
INDIA