Diagnosing Web Site Architecture Issues

Back from lunch. Carrot cake is delicious and Vanessa Fox is now moderating the Developer’s track with speakers David Golightly (Zillow), Jonathan Hochman (Hochman Consultants) and Chris Silver Smith (Netconcepts).

Vanessa says we’re going to talk through the kinds of things they look for when they’re going through sites and trying to locate issues. There is also a case study because case studies are delicious. Like carrot cake. Or maybe 3 mini carrot cakes. Don’t judge me.

Vanessa says you have to look at the things that really matter and prioritize. You want to hit the big stuff first.

What really matters: The pages should be accessible and discoverable. You want to know whether or not you’re found in the results and if users are staying on your site. Are you offering searchers something that makes users want to click through?

She’s zipping through slides without ever actually stopping on one. I think she doesn’t want me liveblogging. I’ll be here drinking my water.

Chris Silver Smith is up to talk about basic stuff because he figured there would be a lot of newbies in the audience. He didn’t get the “Advanced” memo.

Diagnosing Crawling Issues:

How Big Are You: Do a search command on your site to see how many pages are indexed in the various engines.

Query for Session Ids: [inurl:sessionid] will help you spot these in the results. The same page indexed multiple times with different session IDs can cause duplicate content.

Check the Robots.txt exclusions in Webmaster Tools. You don’t want to accidentally block your site from being indexed.

If you have a redirect going on, you want to make sure the bots can hop through it. Check the headers that are returned by the server. FireFox Header Spy Extension is good for checking status codes.

Is Content Crawlable: Check in Lynx Browser like http://seebot.org. You can also use Firefox’s Developer Toolbar to view your pages like a bot.

He also recommends Firefox’s link counter extension. It shows you how many links are going out on a page, how many are nofollow’d, etc. Helps you analyze how much PageRank you’re passing off to other sites, as compared to how much you’re retaining.

Acxiom’s Marketleap – Benchmarking Link Popularity: Type in your top competitors and it will tell you how many pages are linking to each competitor and the link popularity over time.

Use Google sets to identify your competitors as the search engines see it and also to see if your site is being categorized appropriately. Your competitor may not be who you think it is. You can see who Google thinks is equivalent to you.

Next up is Jonathan Hochman.

Essential Tools:

NoScript: When activated, it blocks all client side scripts like JavaScript, AJAX, Flash and Silverlight. You can safely view pages with malicious code. See what pages look like to bots. See if content is accessible.

He brings up the SMX site and shows that with NoScript turned on the page doesn’t render correctly. It won’t affect rankings but it may affect people’s impressions of the page. Only 2 percent of people surf without JavaScript turned on. It’s not a high percentage, but they may be an influential percentage.

He brings up the Gillette Web site which is all in Flash. They use a JavaScript function called swfobject that switches on HTML content if Flash is off. It’s good but they do it in an ugly way.

Googlebar (Not Google Toolbar): One click for Google’s cached pages. Highlights search terms. You can run any Google search. Back to the Gillette site he found that the page wasn’t being cached so that makes him ask why. He opens up his next tool…

Live HTTP Headers: Will expose redirects.

Optimizing Rich Internet Applications: Feed the bots something they can understand. Add (X)HTML content to pages with content generated by JavaScript, AJAX, Flash or SilverLight. (X)HTML content can be generated by server side scripts accessing the same database as the rich media application. This ensures consistency and avoids the appearance of cloaking.

Coding Options

  1. Replace HTML content with rich media content by manipulating the Document Object Model. Open source solutions for Flash: SWFobject 2.0 or sIRF
  2. For JavaScript/AJAX, modify DOM to replace HTML content, or use noscript tags.
  3. For SilverLight, create your own search engine optimization-friendly insertion code. Better yet, nag Microsoft to provide a ready-may function.

SWFobject: Part of Google Code so it’s okay to use.

Xenu Link Sleuth: A free spider that crawls a href links just like search engine bots. Generates a list of broken links and outputs a site map using each page’s Title tag. Use site map to look for missing pages, bad titles and duplicate content. You can check for broken links before deployment.

FireFox Web Developer: You can disable/enable JavaScript. Report JavaScript errors. Disable CSS. Edit CSS or HTML. View alt attributes on images. Looking for missing or inaccurate Alt attributes.

Watch out for search problems with frames, iframes, Flash and SilverLight. Each object is treated as a separate thing, not as part of the host page. This may hinder external linking to deep content. Cannot add a unique title and description. Someone can navigate into a frame and there’s no navigation because the menu is in a different frame. It creates orphan pages.

Up is David Golightly to talk about his experiences at Zillow. He’s the case study.

Zillow’s Search Interface

A front end for Zillow’s powerful distributed search engine, serving a database of 80 million homes.

Goals for the interface:

  • Highly configurable for different data sets (For Sales listings, Recently Sold, Most Popular, Regions…).
  • Responsive to a range of user actions (Filtering, sorting, map interactions…)
  • Dynamic back-button support
  • Bookmarkable URLs (cross-visit state preservation)
  • Offload presentation-layer processing cycles to user’s machine.

Implementation: AJAX

Server provides config and initial search results as JSON text embedded in initial HTML.
Browsers builds everything – filters, map control, result list, breadcrumbs, etc — based on server provided config using client side templating

The interface they created was very heavy and complicated JavaScript. Without JavaScript, users and bots saw nothing. No support for users without JavaScript or Flash including screen readers, text-based browsers or search engine bots. They also had really cryptic URLs.

Result as of 1/2008:

  • Of 80,000,000 homes, only 200,000 were indexed in Google. Only 20 percent of search referrals did NOT contain Zillow-branded keywords.
  • Of top industry keywords (real estate, homes for sale, etc), Zillow didn’t rank in the top 10 pages of Google results.

They haven’t rolled out their new search UI, but it was obvious what should be done.
Start by using some semantic HTML. Start with a basic, usable Web site using page refreshes, build page structure with semantic HTML. Then, use JavaScript (where available) to enhance the HTML baseline.

By doing this they would gain accessibility for non-JS-enabled user agents and decrease their page load time.

Guiding the Bots:

  • Footer and site map are entry points to their search results.
  • Provide a top down navigation tree
  • Link each from home detail pages laterally to other
  • Provide a transparent URL structure


Bots reward accessible application design with better rankings, more thorough indexing
Don’t do in the browsers what you can’t do on the server
Duplicating code on both browser and server is sometimes a necessary cost
SEO should work in concert with great UX
AJAX on top, not on bottom
The next generation: Microformats.

Lisa Barone is a writer, content marketer & VP of strategy at Overit Media. She's also a very active Twitterer, much to the dismay of the rest of the world.

See Lisa's author page for links to connect on social media.

Comments (0)
Filed under: SEO
Still on the hunt for actionable tips and insights? Each of these recent SEO posts is better than the last!
Bruce Clay on July 10, 2024
DIY SEO: 9 Doable Steps for Beginners
Bruce Clay on July 9, 2024
3 Ways To Use SEO for Brand Awareness
Bruce Clay on July 3, 2024
10 Game-Changing Benefits of Working With an SEO Agency


Your email address will not be published. Required fields are marked *

Serving North America based in the Los Angeles Metropolitan Area
Bruce Clay, Inc. | PO Box 1338 | Moorpark CA, 93020
Voice: 1-805-517-1900 | Toll Free: 1-866-517-1900 | Fax: 1-805-517-1919