The Really Complicated Technical SEO Infrastructure Issues – SMX Advanced
After a very long and delicious lunch courtesy of our friends at AimClear (there was cake!), I’ve had a sufficient amount of rest to dive into this panel. I hope.
How hard can a panel called “The Really Complicated Technical SEO Infrastructure Issues” be to liveblog, right? It’s not like the panel is full of totally smart people who can think rings around the rest of us or anything…
Moderator: Vanessa Fox, Contributing Editor, Search Engine Land
Q&A Moderator: Alex Bennert, In House SEO, Wall Street Journal
Jonathon Colman, Internet Marketing Manager, REI
Kavi Goel, Product Manager, Google
Steven Macbeth, Group Program Manager, Bing Search Quality, Microsoft
Todd Nemet, Director,Technical Projects, Nine By Blue
Maile Ohye, Senior Developer Programs Engineer, Google Inc.
…oh. Wait. That’s totally what it is. Well, okay, maybe this is going to be a tough and wild ride. Hold onto your hats and let’s hope my fingers can keep up.
Vanessa is super chipper and attempts to get everyone’s attention. The people next to me just keep chatting. Shhh.
She brings up the Schema.org announcement and that’s going to be our first topic. Steven and Kavi are going to explain what’s going on with this.
How did they come together on this? 9 months ago, they started a dialogue around building a common set of vocabularies on the web. Rich Snippets, Yahoo Search Monkey, Bing Tiles were all playing in sort of the same space so they began to talk about how they could drive a standard around those things.
Schema.org is about standardizing the additional tags that you can add to pages in order to present enhanced search results. This is their 0.9 release and they’re still looking for a lot of feedback. There’s a lot of content in there about what kinds of information that you can mark up including reviews or movies or articles or other media objects. It’s not everything that people can talk about but it’s a start.
They wanted to build a core schema but also allow people to extend the schema for their own organization. The Association of Educational Publishers is adopting schema.org plan. (They show a video, here’s a press release.)
Vanessa asks: The industries that have adopted the schema are doing so because they feel it will help their content be better found, can you speak to that?
Kavi: We’ve had rich snippets for a couple years now and we did a lot of testing to find out what people are looking for and how they can better find things. There’s stuff he can’t talk about yet coming from the various companies.
Right now they support display information like rich snippets – including things like recipes. Right now there’s nothing that they haven’t previously supported because they just launched.
Bing tiles is more of a ‘logo plus’ type thing.
Why should people go through all the effort of doing this?
Google has no secret plan to blow up your website if you don’t implement this immediately but Kavi thinks that people are going to start using it and that sites should begin to adopt them. Steven says they tried to make it as easy as possible and to think of webmasters and designers when they built it. Both think this is something to take into account during site redesigns and incorporate this into your CMS.
They want to roll this into Sharepoint but they haven’t had any discussions yet with CMS providers.
Schema.org is in English but the markup can already be used worldwide.
Are all rich snippets automatic or is there some kind of white list?
Kavi…doesn’t really answer the question. Yes, they crawl but no it’s not automatic. You don’t have to email anyone to get them to show up.
Vanessa reminds everyone to use the rich snippet testing tool.
Question: More tags = more spam? Won’t this lead to abuse?
The engines have spam teams to look for this stuff for one thing. And beyond that it’s actually just annotations on the current page, not hidden tags or anything.
Google announced rel=”author” this morning. Should you use that or should you use the schema.org standard?
HTML5 is what schema.org is based around. The rel tags aren’t that. Either way works fine from Google’s point of view. If you just want something simple, go with rel. If you have more to say, use schema.org.
Question: What do you want us to do with this? What’s your goal? You just want the big stuff, right?
Kavi says to use your judgment. Put in what’s useful and don’t go crazy (by which he means overboard.) And then he totally pawns the rest of the question off on Matt. Hee.
Steven says take the first step. Figure out what most applies. Don’t apply something to every single line.
And now more from the presenters.
Maile is up now to talk about international concerns (i.e., to actually do what this panel was originally about.)
Considerations for international expansion of your site?
First: DO you really have time to
- create and review content?
- build and maintain a new site?
- develop relevant business relationships
- support customers in a new region and/or language?
In your expansion, what’s the key factor?
1. Language: often informational sites such as encyclopedias or medical journals.
2. Country/Region (and therefore often language too): commerce, government regulated
- Perhaps simpler to place all languages on gTLD and use subdomains. (Like Wikipedia)
- Language still remains a factor (multiple languages in a country, phrasing)
- Commerce issues.
- may need the ccTLDs. If you do, are they affordable?
- And if there are multiple languages, what do you do? Use subdomains or subcategories. (see United Airlines for example)
each URL should be shareable (URL gives the same information regardless of user IP or language preference)
-no cookies for content-rich language
-indicate the language in the URL itself.
-match URL structure
Provide ability for user to navigate in their desired language and site (ccTLD)
-alternative country/lang link clearly visible
-linking to equivalent page
-consider the tourists (English available on a German site)
Geo metatags are not used by Google in search.
rel=”alternate” is not for complete translations. Often user generated sites.
On different ccTLDs, duplicate content shouldn’t be a problem. (e.g., Two English versions (.com/.co.uk) are okay but even more customizations are better for users.)
Your site won’t automatically rank. You need to be smart as possible as the technical details and make sure to add value to the users.
Jonathon is up next! He’s a friend of the blog! He got his start in the Peace Corps and that made him a team player. We’re too focused on being rock stars. He’s not a rock star, he sucks as a rock star. He would rather be part of an agile, awesome team. He’s going to talk about agile development.
To solve his duplicate content issue, REI canonicalized their results pages to just the first page (of say 20 total pages). In order to deal with the fact that there is then 19 more pages that don’t get crawled, they have developed faceted navigation that they directed Google not to ignore. They have seen indexing jump and a 96% decrease in duplicate content. They also improved their page load time and site performance which upped their crawl performance.
Vanessa does a quick tutorial on the RIGHT way to use the canonical tag (it’s not Jonathon’s way)
1. you have 100 products, 10 products on each page and a view all page. The view all page can be the canonical page.
2. You have 100 products, 10 products on each page and no view all page. You CANNOT make page 1 the canonical for pages 2-10 because page 1 doesn’t have the tents that are on page 2-10.
So….yeah. Canonical pages need to have all the information on the pages that point to it. Maile is passionate about this.
Jonathon defends himself! You can iterate on things you’ve done. Make improvements and see results. Agile development is the best way! (Maile says ‘which means that on, say, Wednesday, you could REMOVE that rel canonical on page two!’ Hee.)
You can view Jonathon’s full slide deck presentation here.
Todd is up now and he already warned me he’s going fast. I’m just going to start crying now.
Stuff that he’s seen:
really long redirect chains
- hurts page speed, affects relative links, it’s more URLs for crawlers to check, crawlers will give up after too many redirects and of course AdWords adds even more redirects.
Solution: Redirect in one step. Rationalize your rewrite rules.
1. by referring URL
2. IIS browscap “cloaking”
3. robots.txt issues – character encoding issues
IIS: Error Page handling
- 302 to 404 which returns a 200. Fail.
Error pages among the most frequently crawled, crawl inefficiency.
Solution: change redirectmodeproperty
[You have to get his slides. He’s very fast and there’s a lot of code here]
Bonus: IDS blocking Googlebot
Technically NIDS blocking Googlebot (network intrusion detection system) because they think it’s an attack. Monitor your logs and WMT.
There’s some chatter about pagination and the fact that in Vanessa’s opinion, there’s a difference between product results and multipage articles. Maile disagrees. Content pages are content pages and they should rank if they’re the best result.