BACK TO BASICS: An SEO's Guide to Using the Canonical Element
In early 2009, the three search engine giants, Google, Yahoo! and Microsoft, came together for a rare collaboration to create the canonical element, a useful new search engine optimization tool for a webmaster's toolbox. The canonical element lets a webmaster specify which Web page URL should be indexed as the original, or canonical, version of a page that may have multiple URLs within a single domain. However, search engines treat this information only as a suggestion, and for this reason the canonical element is not the "end all" cure for duplicate content.
Solving Duplicate Content Problems
The new canonical element (or "canonical tag," as it's generally referred to) addresses a common problem: Web sites that just cannot avoid having duplicate pages within their domain. Identical content pages referenced by several URLs within a site can cause SEO problems for the site. Search engines may crawl all of the pages separately and then decide on their own which one should be delivered in search results. If there are links pointing to all of the various URLs, the page's link popularity value is split between several weaker URLs, rather than concentrated on a single page. This dilution can hurt a Web site's rankings in search results. In addition, search results pages may display long URLs that are not user-friendly, which reduces the likelihood that searchers will click the listing.
What causes duplicate pages within a site? The problem may result from poor site design, such as when a page is copied into more than one directory. More commonly, though, the culprit is a non-SEO-friendly content management system (CMS). A CMS may construct long URL strings containing parameters or other variables that all identify the same page of content. Or, the CMS may build separate pages that have nearly identical content, showing only slight variations due to the sorting or category choices the user may have made.
For example, a site that sells power tools wants to have only one page describing a particular Makita cordless drill. But a wayward CMS might deliver that page with a variety of different URLs depending on how the user navigated to the page and other variables. Or the CMS may build multiple versions of that page which contain all the same content except for a few words. The list of possible URLs for the page could go on and on:
The best way to eliminate duplicate pages on a Web site is to not create them in the first place. The SEO best practice is to have just one URL for each content page, and have that URL be as clean as possible (i.e., free of all non-essential parameters). So ideally, each page's URL should be normalized at the source, by setting up the CMS to build user-friendly, non-variable URLs. However, SEO takes a backseat with most content management systems, and the webmaster often does not have this level of control.
Search engines give site owners several ways to resolve duplicate page URLs. For one, sites can submit their list of preferred URLs to the search engines through an XML Sitemap. Primarily, the Sitemap makes the search engines more aware of the full content of a Web site, which should lead to more complete spidering of its pages. It also helps to clarify which URLs the site wants to have indexed, since they appear in the Sitemap while other duplicate URLs do not. This method is only a help, however; search engines may or may not use the information to influence its indexing decisions.
The more trustworthy solution is to set up 301 redirects pointing duplicate page URLs to the main, or canonical, version of that page. For example, sites typically use a 301 redirect to point the non-www version of their domain (e.g., seriouspowertools.com) to the www version (www.seriouspowertools.com). This technique seamlessly sends users and search engine spiders that come looking for a particular URL to the correct version of the page. They are redirected from the server, and never even see the originally requested URL. This makes a 301 redirect extremely reliable for search engine optimization, since it clearly tells the search engine that the page has been permanently moved. The engines promptly respond by transferring all link popularity and rankings from the old URL to the new page location.
A 301 redirect still offers the best way to consolidate extra or outdated URLs on a site. However, sometimes webmasters cannot create 301s for every variation of their page URLs any more than they can force their CMS to follow the desired single-URL-per-page standard. The canonical tag offers a "last resort" solution for those cases.
Using a Canonical Tag
The canonical tag is atag that a webmaster can add to the Head section of a Web page. This line of HTML code simply specifies the preferred, or canonical, URL for that page. The canonical tag has this format:
<link rel="canonical" href="http://www.mydomain.com/page" />
Using the examples given in the previous section, if the preferred URL for the Makita cordless drill page is:
then each of the other URLs could have the following tag inserted in their Head section:
<link rel="canonical" href="http://www.seriouspowertools.com/product.php?item=MakitaDrill1234" />
Limitations to the Canonical Tag
When a search engine reads the canonical tag, it immediately knows what the site's preferred URL is for that page. Though this is similar to a 301 redirect, the canonical tag is not as foolproof. It occurs at the page level, not preemptively at the server, so the search engine spider sees it after arriving at the requested URL. Search engines consider the canonical tag to be a suggestion, not an absolute redirection. Matt Cutts, head of Google's Web spam team, said in a search industry presentation, "This is a hint, not a directive/mandate/requirement. Search engines choose when to use the suggestion."
Therefore, it remains the engine's choice whether to index only the URL shown in the canonical tag, or to go ahead and index more than one URL. However, any opportunity a site owner has to influence how search engines will index their pages is a good thing.
Another limitation is that the canonical element offers an effective way to resolve duplicate content problems within a domain, not between them, though it can be used across subdomains. So a canonical element works for news.mydomain.com and www.mydomain.com, but it would not apply to duplicate pages between sites, such as on www.mydomain.com and www.otherdomain.com.
For more information on how to apply the canonical tag, the following articles and particularly the many questions and answers provided in their comments are helpful:
- Google's official blog post announcing the feature
- Follow-up article by Matt Cutts, head of Google's Web spam team, containing many clarifying points
- Video of Matt Cutts explaining the canonical tag
- Yahoo!'s blog post announcement titled Fighting Duplication
- Microsoft's article on the canonical element
- Ask.com's announcement that they also support the canonical element