BACK TO BASICS: Building an XML Sitemap
When we're speaking with new clients about search engine optimization best practices, one of the issues we discuss with them is the importance of building a Sitemap XML feed. What this feed does is list all of the pages on your Web site that you want the search engines to know about. While theoretically the search engines should be able to find all of your pages by following links, it still helps to have it there for completeness and to take advantage of the benefits that the webmaster tools offer.
For SEO purposes, it is essential that you (a) build an XML Sitemap and (b) keep it up-to-date in order to help improve spiderability and ensure that all the important pages on your site are crawled and indexed. Sitemaps give the search engines a complete list of the pages you want indexed, along with supplemental information about those pages, including how frequently the page is updated. This does not guarantee that all pages will be crawled or indexed, but it can help get your site spidered.
It's worth pointing out that an XML Sitemap is different from the standard site map that you include on your site. XML Sitemaps are feeds designed for search engines; they're not for people. They are merely lists of URLs with some optional Meta data about them that is meant to be spidered by a search engine. A site map, on the other hand, is a Web page on your site that is designed to be viewed by visitors and contains links to help them help navigate your site.
Sitemaps were designed to help sites that historically could not be crawled by the search engines (sites with dynamic content, Flash or Ajax) get their content spidered and listed in the index. That's not to say that using a XML Sitemap is a way around building a spiderable Web site however, since all it does it hand a list of available URLs to the search engines. When creating a new site, you want to make sure that you are creating it from a sound search engine optimizations standpoint. Creating an XML Sitemap will not pass on any link popularity, nor will it help with subject theming.
A Sitemap is created using XML (Extensible Markup Language), which is a type of markup language commonly used on the Web where tags can be created to share information. The required XML tags are: <urlset>, <url>, and <loc>. <urlset> and <url> are for formatting the XML, and <loc> is the URL.
Optional Meta data tags are:
- <lastmod> - last modified date.
- <changefreq> - how often the page changes (such as hourly, daily, monthly, never).
- <priority> - how important the page is from 0 (the lowest) to 1 (the highest).
Site owners aren't required to use these tags, but the engines may consult them when deciding how often they should re-crawl pages. Google states in their Webmaster Guidelines that while they take these tags into consideration, they do not base their spidering decisions on <changefreq> and that <priority> does not have any influence on your rankings. Use these tags accurately to help the search engines spider your site more intelligently. Pages that you are optimizing should be set to a higher priority. If you have archived pages that haven't been updated in years, then they can be set to a low priority with a <changefreq> of never.
A Sitemap XML listing for a URL looks like this:
<?xml version="1.0" encoding="UTF-8"?> < urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> < url> < loc>http://www.bruceclay.com/</loc> < lastmod>2008-01-01</lastmod> < changefreq>monthly</changefreq> < priority>1.0</priority> </url> </urlset>
If you don't want to have to type that out for each of your sites' pages, fear not. There are quite a few Sitemap Generators which will spider your site and build it for you. Some of our favorites include:
Be careful to set up the Sitemap Generator tool properly to avoid spidering pages you do not want indexed.
For very large Web sites, your Sitemap XML feed should be broken up into multiple files as Google has set a limit of 50,000 URLs and a file size of 10MB. Once you have created the Sitemap file(s), upload it to the root of your Web site (i.e. http://www.your-domain-name.com/sitemap.xml). Once this is done it's time to let the search engine know about it. One way you can do that is to specify your Sitemap in your robots.txt file by simply putting sitemap: and the URL. It should look something like this:
User-agent: * sitemap: http://www.your-domain-name.com/sitemap.xml
Google, Yahoo and MSN also offer other engine-specific ways for you to alert them to your Sitemap feed.
There are many benefits to creating an XML Sitemap. If you launch a new site, issue a redesign or perform a large update, Sitemaps are a good way to alert the search engine to the new pages and potentially get these pages indexed sooner. Another benefit to Sitemaps is the webmaster tools Google and MSN have built around them. These tools can give you valuable information about how the search engines see your site and help diagnose any potential problems that could hinder your rankings.
Once you have created your XML Sitemap and let the search engines know about it, make sure to keep it up-to-date. If you add or remove page, make sure your Sitemap reflects that. You should also check Google Webmaster Tools frequently to ensure that Google is not finding any errors in your Sitemap.
The goal of Sitemaps is to help search engines crawl smarter. So help them by using these tags appropriately to help them understand how to best crawl your site. You can find more information about the Sitemaps protocol and XML schema at http://www.sitemaps.org.