Semantic Search: How Will It Change Our Lives
Personal favorite Kevin Ryan is moderating with Nagaraju Bandaru (BooRah), Amit Kumar (Yahoo! Search), Erik Collier (Ask.com), Scott Prevost (Powerset) and Kartal Guner (hakia.com). This session is packed. And I promised Erik that I would say nothing bad about him. However, I can’t promise that I will not trash Ask.com. Mostly because they deserve it.
Kevin starts us off saying that semantic search is an area where lots of people are asking questions (pun intended?). And there definitely more questions than answers at this point. He asks how many people understand the concept of semantic search and not too many people raise their hand. Kevin calls semantic search the next generation of relevance.
Up first is Amit Kuman. Everyone please wave. Thank you.
He’s going to convince us that semantic search is getting closer and closer to reality. You can participate and there are good reasons for you to do so – higher click rates, making the world a little better, etc.
What does Semantic Search Mean to Us?
Understand underlying structured data. Search engines and many other companies have recognized that getting to the core is very important. Applying semantic understanding to search helps users get from “to do” to “done”.
Amit is going to run us through some potential semantic search user experiences.
[sam pullara san Francisco] — You want to know what this guy looks like, what are his interests? He pulls up a sample result page. We see his Facebook page, then his LinkedIn profile, etc. For [definitely, maybe] you get the IMDB listing for the movie, the movie Web site and the Wikipedia page for the Oasis album. For [can pigs fly] you get the Yahoo Answers page.
It’s about bringing relevant structured data together so that the results are more valuable.
All of the examples he shows us are live today. Getting structured data out of the places they’re hiding behind is happening and it’s providing great value.
Building it Together
You can help Yahoo perform better in structured search.
- Expose structure data: Microformats, data feeds, etc.
- Build SearchMonkey applications: Use Yahoo’s easy-to-use Developer Tool. Submit your application to the Gallery.
- They promote your applications to your users: Sometimes to all users via Yelp, LinkedIn, Yahoo Answers, etc.
In return for helping Yahoo, you get more clicks. They’ve seen a 15 percent increase in clicks for those that create and promote applications. The result is a more engaged user experience.
[Kevin is making some witty banter while the Mac guy gets his computer set up. The man can work a room.]
Nagaraju Bandaru is next up.
He’s going to start off with some theme music that has been semantically matched. But then no music plays. Oh well, no music for us. Imagine music on your own. And if that’s not a commentary on semantic search, I don’t know what is.
Nagaraju works for BooRah. BooRah is a local search company that uses natural language processing to extract sentiment from user reviews and blogs and summarizes multiple sources to tell you what’s good and bad about a place.
They’re looking to increase the relevance of search results through structured data. They’re using tags, keywords and information you already know.
Next Generation Search
Google is continuing with its approach of indexing smart content and using keywords to get some of the same information. They have some behind the scenes semantics that will give you Related Searches or categories of neighborhoods or romantic restaurants for those of you with active social lives.
Yahoo’s taken a more open approach. They have opened up and started integrating content from publishers via microformats. He talks about SearchMonkey, which takes content from verticals and uses it to enhance your search results.
Both of these approaches rely on some sort of statistical analysis to determine what the keywords are and how relevant they are.
Enhancing Search Results
- NLP Search for Long Tail Queries: 5 percent of searches have natural language intent. “What movies did Bruce Willis start in?” Companies like hakia and Powerset understand the query and match them up with keywords they may be relevant for.
- Recommendations for Better Navigation/Discovery
- Enhanced Local Listings for Local Search: You want to know if the restaurant you’re thinking of is good or not. How do you get the granularity you need for a local search?
- Does Search Behavior Change the Consumer Experience?
The keyword search is a fairly strong paradigm. But users are going to find more results through the semantic framework because they can search for long tail queries and find results that are related to that. Once they can search for those things more, it will change how they search. We’ll see more long tail searches helping people get to what they want much faster.
Sentiment Extraction to Enhance Search
- Summarize gobs of content: Category specific scores, normalize different data sources, easy search & sort.
- Inferred Meta Data: Leverages existing content – reviews, blogs, message boards, etc
- Increased Relevance & Context: LocationAware Mobile Applications. Avoid keyword spam.
Kevin asks if blog content deserves to be mixed into the “real” SERP content. He says that by nature, people are idiots and maybe shouldn’t be given that much power in the search results..
Amit says that blogs do contribute. They add value, so yes, Kevin, they do deserve to be there.
Erik Collier is next.
Humans understand rephrased queries naturally, machines do not. Erik says that if the meaning of your query is the same, you should get the same answers. It doesn’t matter where the content comes from. Web, structured content, blogs, etc., it’s all the same. He shows some examples that prove that’s not true today. People don’t get the same content.
At Ask, structured content comes first. They use classification, parse out the concept and then vertically tag.
TV Listing Test Case
They started with a structured TV content feed that contained all listings for a one week period. Matching used a combo of business rules and other techniques. They targeted users by where they were from.
When is Seinfeld on TV next?
“Next” means the next time it’s on. Ask shows them the next showing.
Is CSI on TV this Wednesday afternoon?
It understands that CSI is a television show and what the date is.
What movies is Will Ferrell in on TV next week?
Lists shows only within the next week. It identifies the time period.
That was short, but kind of impressive. And I didn’t even have time to be mean to Erik. :)
Kevin comments that over the years we’ve taught people to search in cavemen speak. [Japan, population] How are people going to adopt this new form of interaction? How are we going to change the minds of people?
Erik says they’re going to hit people over the head with a club caveman style, heh. They already over-index in that area so they have a leg up. You can also surface up refinement tools with search suggestions that are phrased as questions. Expose that information as much as you can through relations. If you do a search for [Tom Cruise], they’ll show relevant information.
Kartal Guner is up.
Semantic technology embodies cognitive knowledge and operates on concept relations. It paves way to text-meaning-representation and conversational aptitude.
Challenges: Know-how and time constraints/scalability
Current Web search suffers from the limitations of statistical methods operating on keywords:
- Dependence on link referrals, behavior tracking, corpus selection etc.
- Long tail phenomena
- May be vulnerable to external manipulation
Semantic search operations at hakia: Generalization, Parallelization, question type, categorization, compression, content characterization, disambiguation.
Generalization example: What drug treats urinary tract infection?
Understanding the context in which the sense of drug is correct and the drugs name is accurately displayed.
He compares that same query to a traditional result which requires you to perform a second search to find the same information.
Elevating expectations:
- Searchers: higher relevance, freedom to search in natural language and a new search experience.
- Advertisers: higher relevance in contextual advertising and automated advertising systems to monetize the long tail.
- SEOs: A new way of Web site optimization with a focus on content design vs keyword design.
Scott Prevost is next.
Powerset is about bringing structure to unstructured data. Search is not a solved problem. Keyword techniques involve shallow representations of document meaning and user intent. Better relevance can be achieved through improved models of the meaning of documents and queries. Better understanding of document content and queries enables new features that impact the entire end-to-end search process.
Powersets Vision for Search:
- Improve search relevance
- Apply deep natural language processing to extract semantic features from text and encode hem in the index.
- Extract semantic features form queries and
- Retrieve and rank documents based on semantic keyword and other features.
The Impact on Relevance
Improving recall through word & phrase variation: Synonyms, hypernyns and anaphors. Eg: Knowing that Barack Obama can be referred to as “Obama” or “he” in an article.
Improving precision through linguistic structure and context: Connections between words are important. Word sense and disambiguation in context.
The Impact on User Experience
More natural and flexible querying: Keywords, topics, phrases and questions.
Improved presentation of results.
Dossier and answers on the SERP.
We get a quick look at how Powerset works and how they deal with question queries, dossiers with disambiguation, etc.