Morning Keynote: Andrew Tomkins
It’s starting to feel a bit like Groundhog Day with all these early morning keynotes, isn’t it? Another keynote. Another cup of coffee. Another mysterious headache that feels like there are oompa loompas dancing on my head.
I’m actually curious to see what the turnout will be for this one. I’m pretty sure half of the attendees are still stumbling to make their way home from SearchBash. I’m also curious why there is loud, angry music on at 8:30am when half the crowd is surely hung over. That just seems mean.
Okay, we’re starting.
Kevin Ryan says that Mike Grehan called Andrew the smartest guy in the business. He understands the world of search and everything in it. He can also explain it to dumb people like us.
Andrew wants to walkthrough where he sees search going. What are some of the trends in the industry? He’s going to give us a detailed look. That means lots of typing. Yey.
The Internet has firmly moved from a curiosity to a substrate for life activity. Content is growing, changing, diversifying, and fragmenting, with search evolving in response. Value migrating to ecosystem. Semantics of content unlock the value of the ecosystem.
Getting Things Done
No one really goes online to search. It’s just a tool that they use to do what they really want to do.
For example: I want to book a vacation in Tuscany. I start off by going to Google, which leads me to Yahoo Farechase. I find a site specific to Tuscany. I order a rental car. I’m weeding my way in and out of search. I’m going for information and for services.
I go to Tuscany and come back. I loved the vacation and want to make that sweet Italian coffee at home. I search for how to make great espresso at home. I find an enthusiast site. I go there and come across a really detailed information article. I’m moving from my broad landscape into specific details. I decide I want that machine. I get pricing information and then I look for a merchant. My search helps me find one.
This process can take months or years.
Dawn of search: Navigational queries and pockets of information
Today: Increasing migration of content online. New forms of media only available online. Infrastructure for payments and reputations sufficient for many years.
Things to Notice:
- Long running users’ goals
- Search as a hub – start there, return for resource discovery and at task boundaries, traverse Web broadly to complete task
- Web services integrated into task
Search is going to be less about integrating social and being entertaining, and more about hardcore productivity. Going online to get things done that you need to get done. The Internet is for the important life stuff.
Published content – 3-4 GB produced every day
Professional Web content — ~2GB a day
User Generated Content – 8-10 GB a day
Private Text Content — ~3 TB a day
Upper bound on typed content — ~700 TB
Anchor text – 100 MB of metadata produced per day
Tags – 40 MG a day
Page views – 180 GB a day
Reviews – Around 10 MB a day
Consumption is fragmented. Nobody owns more than 10 percent of the Web’s page views. No single place will own all the content. Best of breed processing will operate on the Web version. Value transitions to ecosystem.
Content consumption is fragmented across users. They did a study of the interests people self-defined themselves in the context of LiveJournal broken out by ages. The 1 to 3 LJ users (people who create blogs for their baby/pet) are interested in treats, catnip, daddy, mommy, playing, etc. The greater than 57 demographic is interested in death, cheese and photography. Heh.
The thing that stood out in the study was that you can be cohesive within your demographic group and really experience that as the universe without needing to be bombarded by the larger set of topics going on out there.
Content access is fragmenting: He looks at Facebook. We’re not used to dealing with access control on that level. We’re used to info either being private or public. Facebook gives us more segments through networks.
Content itself is changing. It used to be that you go to a page, you open it open, you parse it and you index it. Now, Web pages are increasingly based on AJAX. It’s like a Choose Your Own Adventure novel. It’s all little fragments of XTML. Crawling it is a hard thing to think about.
The Search Interface
We saw very few changes in search through 2005. Now we’re entering a period of massive change to handle more complex content. Rich media, aggregation, simple task analysis, etc. Moving beyond the stateless query/response paradigm because users need it. Personalize theory.
Rich Media and Search Assistance
In Yahoo, you type in "the game plan" and you get all sorts of neat stuff. You get the Search Assist player, a movie shortcut (shows task level ambiguity – what do they want to know?), etc.
Andrew show’s how Google solves simple task-focused queries like giving out flight times and definitions. Shows Microsoft’s product search results.
Structured database power a vast majority of pages on the Web – Certainly ecommerce catalogs but also UGC. Content owners open to exposing structure, but don’t see how and why. Micro-formats adoption at an all time high, yet it produces much more than is consumed.
Experiments with "pure" structure data aggregation have met with limited success.
The data Web needs a killer app.
What we have announced:
- The Killer App is search
- Wide-ranging support for semantic Web standards
- Vocabulary to surface structure and semantics
- Community tools to evolve standards and vocabulary.
Search as the Killer App: Publishers and search engines are going to collaborate together. The users will see a richer search experience and accomplish their tasks faster and more effectively.
Andrew shows some search results of the future. They basically look very blended.
The industry needs comprehensive support for emerging Web standards. That includes:
- Microformats: hCard, hEvent, hReview, hAtom and more as they get adopted
- RDFa and eRDF makeup
- Open Search
- Atom/RSS Feeds
After a site does this, there will be richer information about them in the search engines.
Yahoo Open Search does not modify rankings. Richer abstracts may provide more information to users and draw higher quality/quantity of clicks. They want rich abstracts that give users a better experience.
The Whole Story:
User needs are becoming more complex. Content is growing, change and diversifying. Search is responding by increasing its sophistication.