Dealing With Domain Names, URLs, Parameters & All That Jazz – Technical SEO Tactics
Moderator: Vanessa Fox, Contributing Editor, Search Engine Land
Jonah just asked if any livebloggers were in the audience and I raised my hand. I’m being rewarded with his slides! Yay!
Maile’s up first with the lay of the land.
- Understand domains and URL structure
- Learn options for paginated content
- Use standard encodings in URLs
- Set preferences in webmaster tools
- Roll like a winner
Geotargeting priorites with Google:
- ccTLD, like .fr, .co.uk, will automatically be geo-targeted to the appropriate region
- webmaster tools manual geo-targeting with gTLD (if set)
- words for sub-domain or subdirectory
- requires about a week to take effect
Neither are really preferred for the Google index. They work equally well. It just depends how you want to service your customers.
Magic signals (can override some of the above) one example is if they’ve extracted a physical address from your home page, then they might over write the other settings, but this is extremely rare
The next slide is crazy but she says they’re going to have a blog post on the Google Webmaster Blog that explains it all in detail:
Next is some jazz…
Existing “view all” pages can have a rel=”canonical” from paginated URLs:
- All linking properties, etc., to “View All”
- Only “View All” displayed in SERPs
- Consider user experience
- Test load time
Next is parameters. Google advocates standard encodings, which means name/value pairs. Create algorithmically easily understood name/value pairs for dynamic URLs. Duplicates can be detected this way. The crawling is done more efficiently if they understand you can throw out the parameter. On the other side of the same coin, avoid maverick encodings. They are difficult to detect the similarity and that means duplicates will be crawled.
Set your parameter preferences in Webmaster Tools. You definitely don’t want to do it wrong because you could tell Google to ignore something important.
Roll Like a Winner: An Example
January 3, 2010
Web site with standard encodings:
- Eligible for URL parameter tool
- Compatible with crawling algorithms
Google is aware of 1,587,811 URLs on the site
Google attempted to crawl 885,482 URLs
On January 4, 2010
The webmaster set their preferred parameters in Webmaster Tools. After this configuration, Google was aware of 887,203 URLs, not 1.5 million. They attempted to crawl 799,000ish. That means crawl coverage is at 90 percent versus the 56 percent from previously.
The index selection increased by thousands. Likely fewer filtered results and more unique content.
Next up is Richard.
Load Balanced Hosting – What and Why
What: Load-balanced hosting enables site traffic to be distributed to a site that is hosted on more than one redundant server.
Why: Sites that generate high amounts of traffic often require advanced hosting solutions to maintain site stability. Overloading a server with large amounts of traffic can cause it crash.
SEO Challenges from Load-Balanced Hosting
In many instances you’ll see the server numbers displayed in the URL. The main issue with this is the duplicate content, which can impact rankings. End-users and bots should see the same version of the URL at all times.
Always displaying the same sub-domain to all audiences. Some options:
- Enable the hosting system to always display the same server name no matter where the user / bot is directed.
- URL rewrite proxy of the sub-domain to the root www1, 2 or 3 rewrites/ displays to www. This can happen at the load-balancer level vs. server level
- 301 redirect to www or preferred version
- Hardware/TCP round robin. Load balancing performed at the TCP/IP address level
Displays the preferred URL to all audiences
Requires complex technology implementation
Can be difficult to retro-fit
Search Engine Tag Protocol
Leverage the rel=”canonical” tag to direct bots to preferred version. This is easy to implement and is supported by all major search engines. However, it’s not guaranteed. It can take a long time to implement and sometimes link equity from duplicate URLs doesn’t always pass.
Search engine webmaster tools can relieve a lot of the technical implementation. When you do this you must also have the removal command in the robots.txt file.
- Avoid SEO performance issues by solving for duplicate content at the sub-domain level
- Create an identical experience for bots and consumers
- Start early! Plan for any modifications to a new load-balancing platform early to avoid costly retrofitting
- Leverage shared search engine protocols to reactively solve for sub-domain duplicate content
- Leverage search engine webmaster tools to expedite and minimize the cost of implementation
Next is Sean. The domestic US market is mostly where are clients are. But in Google, the English speaking world stretches from New Zealand to the UK. There are considerations for organizing multilingual and multinational web site content. The two dimensions are language and technology. Ways to organize content, in Sean’s preferred order:
- top level domains (TLDs)
- directories (folders) on the Web server
- URL parameters (not recommended)
- Chance (a.k.a. no organization – not recommended)
Considerations driving a domain/URL strategy
Search engines try to guess the intended market a site is trying to reach. They’ll figure this out with country codes or server IP for generic domains.
Influence of incoming links: search engines will likely consider the “location” of sites providing incoming links when ranking a page for a particular market
Users scan search result URLs
Multiple studies have noted users scan URLs when deciding which result to select. Users probably prefer their country specific domain in search results.
Jonah is next with pagination decisions you should not ignore. There are going to be some assumptions up front:
- PageRank means SOMETHING
- Higher PR is better within domain
- PR0 is better than graybar
- Graybar PR ~ supplemental index
- Not index, not going to rank
PageRank doesn’t flow out of a page equally. It flows to the links that Google thinks the page likes most. Where the links are placed and where they’re in the page does matter.
Pagination can fail and here are some examples. Yahoo! results will drop PageRank from page one to page 2.
GreatSchools.org Case Study
They do unbiased school ratings for public and charter schools. They did recently change from .net to .org but it doesn’t seem to have screwed anything up. They have a geo-driven hierarchy and pagination of browse pages. In 2009 they had 10 entries per page and the pattern was a odd shaped drop around the 37 page mark. Then they moved to 25 entries per page and there’s a pretty clean step down graph. By sorting the results differently you can see that most PageRank was being passed closest to the browse page.
There’s an alternative way to push PageRank through a site. Hierarchical distribution with links in footer.
- Use unique titles on pagination pages
- Avoid using parameters for pagination
- Less links per page means the links are more valuable
- Hierarchical distribution works
- Internal Links only pass 2-3 deep
- Multiple sitemaps makes diagnostics easier
Jonah’s made his presentation on pagination available to all!
Vanessa asks Maile a question about AJAX crawling, which just launched yesterday. Check it out on Google Code.