Diagnosing Web Site Architecture Issues
Back from lunch. Carrot cake is delicious and Vanessa Fox is now moderating the Developer’s track with speakers David Golightly (Zillow), Jonathan Hochman (Hochman Consultants) and Chris Silver Smith (Netconcepts).
Vanessa says we’re going to talk through the kinds of things they look for when they’re going through sites and trying to locate issues. There is also a case study because case studies are delicious. Like carrot cake. Or maybe 3 mini carrot cakes. Don’t judge me.
Vanessa says you have to look at the things that really matter and prioritize. You want to hit the big stuff first.
What really matters: The pages should be accessible and discoverable. You want to know whether or not you’re found in the results and if users are staying on your site. Are you offering searchers something that makes users want to click through?
She’s zipping through slides without ever actually stopping on one. I think she doesn’t want me liveblogging. I’ll be here drinking my water.
Chris Silver Smith is up to talk about basic stuff because he figured there would be a lot of newbies in the audience. He didn’t get the “Advanced” memo.
Diagnosing Crawling Issues:
How Big Are You: Do a search command on your site to see how many pages are indexed in the various engines.
Query for Session Ids: [inurl:sessionid] will help you spot these in the results. The same page indexed multiple times with different session IDs can cause duplicate content.
Check the Robots.txt exclusions in Webmaster Tools. You don’t want to accidentally block your site from being indexed.
If you have a redirect going on, you want to make sure the bots can hop through it. Check the headers that are returned by the server. FireFox Header Spy Extension is good for checking status codes.
Is Content Crawlable: Check in Lynx Browser like http://seebot.org. You can also use Firefox’s Developer Toolbar to view your pages like a bot.
He also recommends Firefox’s link counter extension. It shows you how many links are going out on a page, how many are nofollow’d, etc. Helps you analyze how much PageRank you’re passing off to other sites, as compared to how much you’re retaining.
Acxiom’s Marketleap – Benchmarking Link Popularity: Type in your top competitors and it will tell you how many pages are linking to each competitor and the link popularity over time.
Use Google sets to identify your competitors as the search engines see it and also to see if your site is being categorized appropriately. Your competitor may not be who you think it is. You can see who Google thinks is equivalent to you.
Next up is Jonathan Hochman.
Googlebar (Not Google Toolbar): One click for Google’s cached pages. Highlights search terms. You can run any Google search. Back to the Gillette site he found that the page wasn’t being cached so that makes him ask why. He opens up his next tool…
Live HTTP Headers: Will expose redirects.
- Replace HTML content with rich media content by manipulating the Document Object Model. Open source solutions for Flash: SWFobject 2.0 or sIRF
- For SilverLight, create your own search engine optimization-friendly insertion code. Better yet, nag Microsoft to provide a ready-may function.
SWFobject: Part of Google Code so it’s okay to use.
Xenu Link Sleuth: A free spider that crawls a href links just like search engine bots. Generates a list of broken links and outputs a site map using each page’s Title tag. Use site map to look for missing pages, bad titles and duplicate content. You can check for broken links before deployment.
Watch out for search problems with frames, iframes, Flash and SilverLight. Each object is treated as a separate thing, not as part of the host page. This may hinder external linking to deep content. Cannot add a unique title and description. Someone can navigate into a frame and there’s no navigation because the menu is in a different frame. It creates orphan pages.
Up is David Golightly to talk about his experiences at Zillow. He’s the case study.
Zillow’s Search Interface
A front end for Zillow’s powerful distributed search engine, serving a database of 80 million homes.
Goals for the interface:
- Highly configurable for different data sets (For Sales listings, Recently Sold, Most Popular, Regions…).
- Responsive to a range of user actions (Filtering, sorting, map interactions…)
- Dynamic back-button support
- Bookmarkable URLs (cross-visit state preservation)
- Offload presentation-layer processing cycles to user’s machine.
Server provides config and initial search results as JSON text embedded in initial HTML.
Browsers builds everything – filters, map control, result list, breadcrumbs, etc — based on server provided config using client side templating
Result as of 1/2008:
- Of 80,000,000 homes, only 200,000 were indexed in Google. Only 20 percent of search referrals did NOT contain Zillow-branded keywords.
- Of top industry keywords (real estate, homes for sale, etc), Zillow didn’t rank in the top 10 pages of Google results.
They haven’t rolled out their new search UI, but it was obvious what should be done.
By doing this they would gain accessibility for non-JS-enabled user agents and decrease their page load time.
Guiding the Bots:
- Footer and site map are entry points to their search results.
- Provide a top down navigation tree
- Link each from home detail pages laterally to other
- Provide a transparent URL structure
Bots reward accessible application design with better rankings, more thorough indexing
Don’t do in the browsers what you can’t do on the server
Duplicating code on both browser and server is sometimes a necessary cost
SEO should work in concert with great UX
AJAX on top, not on bottom
The next generation: Microformats.