Pagination & Canonicalization for the Pros – SMX Advanced 2012
What’s this techy session about? Here’s the description on the agenda:
Using the pagination tag with optional parameters, sort orders, and filters. Are there still reasons to use robots.txt or noindex? Can a canonical tag really replace a 301? How do you keep your IIS=based site from infinite redirect loops when you canonicalize default page names? And what about rel=alternate href=lang? We’ll go through the issues step by step so you can clear up the clutter on your site, maximize crawling and indexing, and eliminate duplicate content risks.
Moderator: Vanessa Fox, Contributing Editor, Search Engine Land (@vanessafox)
Q&A Moderator: Eric Enge, CEO, Stone Temple Consulting (@stonetemple)
Adam Audette, President, RKG (@audette)
Jeff Carpenter, SEO Manager, PETCO (@SanDiegoSEO)
Maile Ohye, Senior Developer Programs Engineer, Google Inc. (@maileohye)
Vanessa welcomes the audience and says there will be lots of question time. Adam Audette starts off the presentations. He’s really excited to geek out. It’s cute.
Pagination Dos and Dont’s
The best way to think about it: “Everything should be made as simple as possible, but not simpler.” – Einstein
Example: Zales, a large ecommerce site. You’ve got different sorts and pages of products. Based on sort, page view and page there’s tons of opportunity for confusion in the crawl. Over 100 duplicate results, easily.
Is this a big deal? Yes, especially after Panda – no likey duplication.
How to handle this?
Noindex pagination method: pages noindex, follow. The problem is getting them crawled
pages 2-N annotated with noindex, follow
pages 2-N self referencing rel canonical
pages 2-N contain unique titles, URLs and meta data
Rel prev/next Pagination Technique: a little harder to implement. Where noindex sort of passed equity to page 1 to be ranking candidate, rel next/prev rolls pages together into a series.
Deeper pages are still in the index, they can be pulled out with a site: search and they show up when Google considers it a relevant result. When rel canonical is self referencing, that’s appropriate, but when used to point to page 1 there’s a conflicting signal.
View All Pagination Requirements
pages 2-N specify View All as rel canonical target
An elegant solution
View Alls tend to convert better
Vanessa explains that an AJAX infinite scroll to display the products is a good user experience also.
Quora and Twitter both do this continually loading and refreshing method. Googlebot gets the first 500 words. That’s a potential gotcha if you want content crawled.
When you have a great VIew All that’s the elegant way to go
When View All isn’t an option, use rel next/prev
Two more options: append parameters to the RL with a #hash, and progressive rendering as users scroll
SEO for Faceted Navigations
If a facet is selected, categorize it as important for users but not SEO or important for SEO. Treat differently for each situation. Force same canonical path for URL regardless of how they’re selected.
solves nothing for decreasing crawl overheads
labor intensive and error prone
Common rel canonical gotcha: duplication of noncanonical referencing canonical to itself when there’s an actual canonical version. This is the most common issue they come across.
Use rel canonical to signal the preferred URL, not as a shortcut
Internal link signals should be consistent
Careful with self-referencing rel canonical
Jeff Carpenter is up next. He’s got a case study in Petco.com. Large amount of duplication based on categorization. Each sub-category had lots of refinement options. A site redesign recategorized categories and navigation and URL structure changed.
Reduce refinement options. Reviewed analytics to see the refinements that are used and not used. They went from 50 refinements to 12.
Cross department eduction. Education across depts led to unified URL formats being advertised
Implement canonical tags to match on-site dynamically generated navigation. It created uniform URL formats and improved analytics data.
Utilize noindex, follow on all pagination pages, reducing the potential for on-site duplicate content issues.
13+% increase in conversion rate from natural search in 6 months
Reduced amount of low value pages in SERPs
Overall rankings increased – approx 20% improvement across monitored phrases in 2 months
Direct SERP traffic to product list pages
Maile is going to give a group hug, explaining how the conference has given her and her team helpful feedback. Speaking here has been beneficial to them at Google. In 2009 she had a session about duplication and worked through issues of PageRank sculpting – fun. In 2012 a panel brought up faceted navigation issues. In 2011 they launched improved URL Parameters tool.
In 2011 a panel with REI brought up pagination issues, trying to use rel canonical for non-duplicate content, which wasn’t what they intended it for. Google rel next/prev support was released 5 months later. It helps Google identify more sequences than it can detect itself.
URL Parameters in Webmaster Tools
She apologizes for the blog post and Help center article not being as thorough as it could have been.
Assist understanding parameters to crawl site more efficiently
Craw your site more efficiently
Helps more unique fresh content to be indexed
For removals, go to URL Removals in WMT
Page level markup applied separately after page is crawled and still taken into consideration
URL parameters can be a helpful hint and are not directives
It’s an advanced feature. Some times sites already have high crawl coverage as determined by Google. Improper actions can result in pages not appearing in search results.
Issue: inefficient crawling
Eligible URLs: key=value&key2=value 2
Step 1: Specify parameters that do not change content
1. Do I have parameters that do not affect page content (sessionID, affiliateID, trackingID)?
Likely mark as “does not change content”.
Step 2a: Specify parameters that change content
Step 2b: Specify Googlebot’s preferred behavior
Sort parameter changes the order content is presented.
1. Is the sort parameter optional throughout the entire site?
2. Can Googlebot discover everything useful when the sort parameter isn’t displayed?
If yes to both, likely that wiht your parameter you can specify “crawl No URLSs.”
Verify examples displayed aren’t canonical and that the canonical can be reached by navigation.
Or, same sort values site-wide?
1. Are same sort values used consistently for every category?
2. When a user changes the sort value is the total number of items unchanged?
If yes, likely that with your sort parameter you ccan specify “only URLs with value x” where x is one of the sorting values used sitewide.
Narrows filters the ocntent on the page by showing subset of total items.
If the narrows parameter shows less useful content thats a subset of the content from th URL without narrows parameter, you might be able to specify “Crawl No URLs.”
Double-check by verifying that the URLs shown in the example provide redundant content.
Specified parameter determines the content displayed on a a page.
Translates parameter, unless you want to exclude certain languages from being crawled/available in search results, specify “Crawl every URL.” Best practice to place languages in subdirectory or subfolder rather than parameter to help search engines more easily understand site structure.
Paginations displays a component page of a multipage sequence. Use Crawl every URL.
What about multiple parameters in one URL? Imagine all URLs begin as eligible for crawling, then apply each setting as a process of elimination, not inclusion.
If any parameters in the URL matches a URL parameters setting, if the matching setting specified crawl No URLs then do not crawl. If URL makes it all the way trhough, it’s crawled.
Internal links should only include canonical URLs
List canonicals in Sitemaps
Helps with canonical promotion
Provides more accurate index counts
On page indexing markup is still helpful. rel canonical, rel next/prev can be used in tandem.
Utilize URL parameters for more efficient crawling