Ask the Search Engines: SMX West
Danny Sullivan, our fearless leader, moderates this panel; he’s already in casual mode. Google’s rep Matt Cutts, Google Inc., is here. Other search engine reps Keith Hogan, Ask.com, Sasi Parthasarathy, Live Search, Microsoft, and Priyank Garg, Yahoo!, are ready to sit quietly while Matt gets a billion questions. Too harsh? I’m just trying to be accurate.
Wow, big news.
Right before the session started, the big three announced a new tag rel=canonical. There’s coverage popping up everywhere and I won’t try to duplicate it so just get yourself over to the Google Webmaster Central blog and read about it. In talking to Matt, he emphasized that this is not a substitute for a proper redirect. If you can, do that first. Vanessa Fox covered it for Search Engine Land. Joost de Valk does a good job of explaining what it means over here Canonical URL Links, and he’s already got plugins ready for this for WordPress, Magento and Drupal already. Can you say on the ball?
Matt’s going to do the official introduction now so if those links aren’t enough, you can try to follow along here.
Duplicate content is the bane of a lot of people’s existence. There are many many preferable ways to fix duplicate content: Fix your CMS, link consistently, make all non-canonical URLs redirect, etc. However, that doesn’t mean that you’re going to catch every single instance. Even Her Majesty the Queen has dupe content issues.
So they’ve come up with a very simple link element that you put in the head section that identifies that clean, pretty, preferred URL. It’s .
They’d rather not have to resort to this. Do everything else first. This is a hint, not a mandate. They’ll choose. And they reserve the right to treat spammy uses of it as spam.
This only works on the same domain. It does work across subdomains/host. You can use it for https versus http. They don’t have to be 100 percent identical but the differences should only be slight. You can use relative or absolute URLs but they’d prefer you use absolute URLs. Google can follow a chain of canonicals but… you know, don’t do that.
What if I point to a 404 or have an infinite loop or an uncrawled URL or a non-www/www conflict, etc.? Don’t cross the streams. They’re going to handle it as best they can.
Thanks go to Joachim Kupke for doing the heavy lifting at Google to implement this.
Okay, ready? Time for questions.
What are you doing about the problems related to deep crawling?
Matt: We try to crawl as best we can. We look at PageRank and backlinks and if you have a higher PageRank, we’ll try to crawl deeper. If you’re not getting crawled, you might need more links. Also, check for duplicate content and make sure you have a simple site structure.
Priyank: Having a clear site map helps Yahoo. The link canonical will work very similarly to a 301. Be careful; don’t mark unique content with a link canonical.
Sasi: Those two pretty much summed it up. Ease of navigation and site architecture is important. At the end of the day, if it’s good content, they want to crawl it.
Keith: We go a step further and look at the content. If it’s good content, we’ll crawl faster. If it’s bad content, we’re going to slow up and not crawl as much.
Should the new canonical tag be used on pages that we don’t think there’s a problem with?
Matt: Yeah, you can but be careful about it. Think about what the user is going to want. It is just a hint though so if you pick the one that we think is not as good, we reserve the right to pick the URL we think is best.
Priyank: Be careful. We all want to emphasize that. Use it with caution.
If someone under a penalty 301s to you they can pass the penalty. How do you protect from that?
Matt: We do look at it. We try to be fair.
Does Yahoo have a hard time with special characters like pipes and ampersands in Title tags?
Priyank: That’s a very specific question but no, I don’t think so.
How is the canonical tag different than IP-based cloaking?
Matt: One is a mini 301 and the other is not in any way.
Why doesn’t the canonical tag work across domains?
Matt: You could use it across domains but it would be hard and it could be a little unsafe, so we decided not to implement it like that.
Will the new tag slow down crawling?
Matt: It won’t slow it down, but we don’t follow 301s immediately anyway so it’ll just get stuck at the end of the queue.
Priyank: Same. We also have dynamic URL in Site Explorer so that you can tell us if you have a session ID or something that can be dropped while crawling and that will speed up the crawl.
Let’s revisit nofollow. Do nofollow comments mean no impact or help to you?
Sasi: It’s not helpful from a search engine perspective, but it may be from a user perspective. We just ignore them.
Matt: We drop it completely off the link graph, nothing flows, no PageRank.
My competitor is spammy. What is going to happen to them?
Matt: White text on white background is definitely spam. Catch me afterward, I’d love to hear examples. [There's much laughter and Danny uses this to segue into...]
Google Japan has been buying links. What the heck, Matt?
Matt: I just want to take a minute and apologize for it. They didn’t think about how it was going to affect search engines but that doesn’t excuse it so they were lowered from 9 to 5 to reflect the decreased trust in Google.co.jp. They have to file a reconsideration request, just like anyone else. There are a lot of people at Google who were angry about that but that was the right thing to do.
How does the message of what you can and can’t do get out?
Matt: I think we took a very clear position that this is bad. I’ve been thinking a lot about this the last few days. Google can’t just assume people know, they have to keep talking about it and they still have a lot of communication to do about it.
How do you determine what’s natural and what’s not?
Keith: We look at each individual page, at the content.
Sasi: We use neutralization as well as penalization.
Priyank: We run analysis on every page. Cloaking is remarkably ineffective.
Matt: All of the above, particularly what Sasi said. Google’s more likely to penalize sellers than buyers because it’s usually more clear cut, but they will penalize for it if they have a high degree of confidence.
The new spam technique of 2009 is going to be hacking sites. You need to keep your servers patched, watch your open redirects and monitor them. Watch your server load. Keep your software patched. If you don’t have a recent version of WordPress, you’re going to get hacked. Stay patched.
Priyank: I agree with Matt. The onus is on you.
Sasi: Security should be a primary focus for webmasters. It’s your page, not anyone else’s. You’ll end up getting the penalty.
Matt: Danny had a really good point about “craphats” and how it’s like racist jokes — they’re that out of fashion. Don’t put up with it.
My competitor is coming up as a “did you mean” for my site! How do you stop that?
Matt: We do pushes every few months so sometimes that fixes it. OR you should do a blog post and call Google an idiot for getting it wrong. [Hee!!!] You can also report it via a feed.
All others: We want to know about it and want to fix it. Give us feedback.
Will you notify webmasters if you see them get hacked?
Yahoo does already and so does Google. Ask has a partnership with Norton for notification. If you get flagged in Yahoo and you’re not malware, there’s a link right there in the warning that you can respond. Google and Microsoft will tell you if you’ve been penalized as well.
Matt: I want to know from those in the audience. Would you want a notification in case of hacking or penalization? Do you want an email? (The audience says YES.) Should it be a check box? (Another YES!)
Priyank: How many people target content at a state, regional or city level? Language? Country? [I can't see the responses.]
Matt: Would people want a fetch as googlebot feature? (The audience says YES.)
Priyank: How many people would like to have their content crawled at a specific time of day? [No one seems to care. Except Hilton.com who doesn't want to be crawled during the day. Um... yeah.]