You&A with Matt Cutts
Our first session of the conference is the hotly anticipated question and answer with everyone’s favorite Google engineer, Matt Cutts. It’s a good thing Matt’s been on vacation because he’s going to need all his energy for this crowd. Lisa and I slide in late because she had to blowdry her hair. Mine’s still wet.
Matt and Danny are up on stage but I can’t see them because I’m sitting on the floor at the back. They’ve already started so let’s jump right in. I didn’t catch everyone’s names–comment if you asked a question and want credit.
Someone wants to know how closely can Google determine where someone is? Matt says that they’re able to pinpoint it about 80 or 90% of the time. IPs are a pretty good determiner of where someone is. (Except for AOL where everyone is from Virginia.) Matt mentions that he checked his IP at the hotel and found that it was able to identify him within just a few blocks.
When you’re trying to figure out where are you on a map by IP, you can get close to a fair degree. Matt thinks that it’s a good thing over all–when the bath is overflowing you just type plumber not Seattle plumber. So then Google tries to figure out where you are so that they can return relevant local results.
Question: Pat (BC blog commenters, represent!) wants to know about the Google guidelines. Where is the future of the guidelines? Why are they so brief and do they plan to expand them, get more detailed? (Danny: Don’t ruin it!)
Matt: I’ll explain the philosophy behind the guidelines. Matt mentions group theory and I think longingly of the coffee I’m going to need to understand this. Basically there are four rules that taken together are the basics of modern mathematics. They tried to do the same with the Guidelines, have a base set of principles. Even with it basic, people split hairs about it, argue over what commas mean and what the definition of ‘is’ is; people get too nuanced. Essentially they’re trying to avoid losing sight of the forest for the trees.
The guidelines are ‘Avoid these sorts of things, you should be able to use common sense on the whole.’ They didn’t want to say ‘this is bad’ and then have to go in and adjust when something new but not kosher came up.
Matt thinks they might be due for an update, and it might be good to get some examples in there. More detail rather than just ‘avoid link schemes’. He mentions Pat’s site Feedthebot.com, as a resource for more explanation about the guidelines. Yay, Pat!
Question: Someone saw a job ad looking for an in-house SEM with “Expertise in buying links.” Are paid links the death of the algorithm?
Matt: When people advertise for buying links, Google gets them forwarded to them by people who are all about being offended. Lots of people want to report links, just like lots find it really lame. Webmaster console might have a link reporting form in the future. Webmasters want to have good content and Google wants to present good content.
When they introduced the spam report form in Nov 2001 there was the same kind of uproar. They have algorithms but they’re not averse to using some manual invention. They try to approach things algorithmically but they try to make things scalable.
You can do what you want to your site but Google is going to choose the sites that are best for their index.
Question: How many links out should I have? Will it hurt you to link out too much?
The idea is that more links into the site from other domains is good. Links out are good for your users, and that’s good for search engines. There are not particular guidelines. It’s good for users, therefore they bookmark more often and return more often. (Danny: I think you just said that if you link to SEL, that’s good.)
Question: I have 50,000 products, how to prevent problems across several different categories? (also something about internal search pages)
Matt on indexing search pages in general: Technical guidelines say ‘not good’ Webmaster spam guidelines don’t. Because SERPS don’t add value to the user in general. It’s not spam but it’s a poor user experience. A common complaint is “I did a search for something and I only got places to buy or only shopping sites, etc.” No reviews of something. Users hate that.
What’s the value? If they’re just search results then they’re not valuable but if you have a value add to the page then it’s something that might be useful. Take a step, pretend you’re a competitor and look at the page, try to decide if you’d complain if it wasn’t your page?
Question: What is the clickthrough impact?
Matt: We haven’t talked about if it affects regular search. If you did use it, it would be really noisy so you’d have to be careful. He mentions programming the toolbar and how then there would be “Happy face rings”. Noise level would make using clickthrough as a barometer really hard.
MSN has said they use it. Google hasn’t and probably won’t ever say if they do or not.
Question: Why does Google love Wikipedia and when will you break up?
Danny: I think Matt wanted to tell them privately.
Matt: The first thing you need to learn when you go to Google is that you aren’t a regular user. Asking “Why do you rank wikipedia above accurate sites?” is like “Have you stopped beating your wife yet?” He uses for example wanting to find out what order to read Terry Pratchett’s Discworld series. The first result is Terry Pratchett’s official site and it’s wrong. The second result is Wikipedia: it’s there and it’s in order and it’s right.
Sometimes Wikipedia isn’t the best result but there are lots that are good. Regular users wouldn’t type the searches that experts complain about. Someone from Edmunds.com says ‘we have tens of thousands of accurate pages on autos by make and model and wikipedia still ranks better.’ Why? Matt says he’ll take that feedback back because it’s a good thing to know.
Question: During site review, Matt checks the other domains owned: what business is that of his and do the domains that you own affect the others?
Matt: I want to know what kind of person that webmaster is. A webmaster with two sites is different than a guy with 1500 sites. They do things at a different level. Your gambling site won’t necessarily hurt your sweater site. Domains by proxy is a little bit of a flag but not a strike. Are you a habitual domain buyer or just a mom and pop and you didn’t know that something was wrong?
Follow up: if you think one site is spam will that cast aspersions on the others?
Matt: If you have 200 sites that are spamming, wouldn’t you look at the 201st more closely? There are people who say I have a hundred domains that are my trademark and they’re 301ed, why can’t I have them to protect my trademark? You can, it’s not a problem. That’s good. Owning domains isn’t a strike against you.
Question: Calcanis says there’s too much spam in regular search engines. He thinks Mahalo is better. Someone wants to know what Matt thinks.
Matt: Lots of people are trying different things and he’s glad that Google is becoming more about showing that Google isn’t just algorithms but it’s algorithms that are evaluating humans. “Scalable and robust techniques” are the focus. There’s a human at the base of all Google algorithms, it’s not just machines.
Question: Categorization when the number of permutations are in the millions, competition with the resellers, how do you focus?
Matt: Try to figure out the primary category rather than a part in 30 different categories. Matt uses a complicated analogy with Play-doh and describes siloing but he doesn’t call it that. He says to look at your resellers and see how they’re structuring it. Do competitor analysis and figure out how you can do the same as they do.
Question: What’s the story with Googlebombs? Is it algo, human tweaking, etc?
Matt: Googlebomb is completely algorithmic. The algo hadn’t changed when the “Greatest Living American” bomb was created and defused but it only runs every 2-4 months. They pressed the button to run the algo; didn’t change it at all, it just hadn’t been run in a while. They do reserve the right for human intervention on spam. “It’s not the sort of thing that’s worth us really caring that much.” People assumed they had an editorial agenda (on the miserable failure googlebomb) so they fixed it but really it doesn’t matter too much to them.
Question: Image results in main SERPS, how will that evolve? Search George Bush and get Jimmy Carter.
One box, you want to return the best results for that. Sunsets, rainbows, J. Lo. People want pictures for those queries. Fix a sink: might be a good video. There’s an underlying parameter that tries to assess intent. It’s hard to do image search well but they’re getting better. If an inaccurate result comes up in the one box, they can edit to be more accurate.
Question: Theming LSI, similar words, keyword grouping; how has that progressed?
Matt: Theme across the website, don’t theme across the website, what do you do? Try it and see what works for you. You end up with lots of related words and if you’re doing the synonyms then Google doesn’t have to. Use synonyms in a natural way. Don’t force keywords. Bio and biography are synonyms. But not apple and apples.
Google does a lot of semantic understanding underneath the hood.
Last question comes from Matt: What do you guys want from the console?
- Realtime reports (Matt: what EASY things do you want?)
- The 200 factors in the algo
- Errors without having to click through to each domain (red flagged when there are errors)
- Spider traps
- Shared logins
- Make it searchable
- RSS output of reports and links (Matt: that’s hard too!)
- Emailed reports
- 404 reports (where did this link come from? How did they get to this error?)
- More data on the query