Patent Reveals Insight into How Google Generates Answer Boxes via Content Scores
Optimizing a page in hopes of capturing an answer box (or featured snippet) is a trending SEO specialty. A newly filed patent suggests insights into how Google approaches answer boxes. (Thanks to Google patent expert Bill Slawski for surfacing the document.)
The U.S. patent application filed by Google on Jan. 12 gives search marketers a look into the search engine’s plans for answer boxes. The patent application titled “Generating Elements of Answer-Seeking Queries and Elements of Answers” covers a lot of technical ground. For digital marketers engaged in SEO for answer boxes, two key insights and one question emerge:
- Content will receive a score, and the content with the highest score earns the answer box.
- Search queries will not need to use question words to generate an answer box.
- Could answer boxes be comprised from multiple sources?
Read on to discover more about each of these coming developments.
What is an answer box? Answer boxes are direct answers to queries that appear above the search results. Answer boxes are displayed for queries that Google algorithmically determines are “answer-seeking.” The content inside an answer box is pulled from one of the top ten results on a search engine results page (SERP) and appears in a light grey box at the top of the SERP. Why does this matter? “Ranking zero” with an answer box can drive more traffic to your site than a No. 1 ranking.
Here’s how Google describes the process of displaying an answer box, straight from the patent: “When the search system receives a query having elements that are characteristic of an answer-seeking query, the search system can identify a corresponding answer that has characteristic elements of an answer to an answer-seeking query. The search system can then generate a presentation that prominently displays an answer to the answer-seeking query.”
Answer boxes are sometimes referred to as featured snippets, direct answers, and zero rankings, among other terms.
Answer Box Scores
The patent outlines the process of generating the answer box, and includes this step:
“(Compute) a respective score for each of one or more passages of text occurring in each document identified by the search results, wherein the score for each passage of text is based on how many of the one or more answer types match the passage of text.”
Earlier in the patent, an answer type was defined as “a group of answer elements that collectively represent the characteristics of a proper answer to an answer-seeking query.”
What does this mean for digital marketers optimizing content for answer boxes? The patent begs the question: does content with more answer types win the answer box?
For example, if Page A has a chart, two respective text paragraphs, an image, and a bullet list that all qualify as answer elements, does that page receive a higher score than Page B that has text paragraphs alone? Even if the text of Page B has higher scores than the text paragraphs in Page A?
In other words, does a page with more answer types fare better than a page with equal or greater relevance of content and a single answer type? Based on the patent alone, this seems to be a logical conclusion.
When it comes to optimizing for answer boxes, then, the new content publishing process involves production of multiple answer types to answer the targeted question.
The patent goes on to state that all scores must meet a threshold to be considered for inclusion in an answer box, indicating that even if your content is the best of the possible answer boxes, it still must reach a certain level of quality to be considered.
Search Queries Will Be Identified as Answer-Seeking without Use of Question Words
No need for the searcher to include “who,” “what,” “when,” “were,” “how” or “why” in a query to trigger an answer box. Google wants to identify queries as “answer-seeking” regardless of inclusion of question words. Here’s what Google had to say, straight from the patent (emphasis ours):
“A search system may consider a query to be an answer-seeking query because its terms match a predetermined question type. However, the query need not be expressed in the form of a question, and the query need not include a question word, e.g., ‘how,’ ‘why,’ etc.”
Google provides this example of how it should work (click to enlarge):
Figure 1 from Google patent application “Generating Elements of Answer-Seeking Queries and Elements of Answers.” This figure shows how queries need not include question words to be classified as “answer-seeking.” As the patent states, “In this example, the search system provides the answer box in response to the query even though the query is not phrased as a question and even though the query does not include a question word.”
Google also notes: “In this example the answer box is identified as a good answer to the query even though the answer does not include the term ‘cooking,’ which occurred in the query and even though the answer does not occur in a document referenced by a highest-ranked search result. Rather, the answer in the answer box is identified as a good answer because the search system has determined that the question type matching the query is often associated with an answer type that matches text of the document referenced by the search result.”
That’s another key insight: Your content does not have to rank No. 1 to earn the answer box. That’s something answer box researchers already knew, but it’s always good to have statements directly from Google that support the current understanding how the search engine is working.
While you do not have to rank No. 1, you do need to rank in the top ten results. Our own research at Bruce Clay, Inc., as well as research by other SEO agencies, points to the fact that you must be in the top ten if you want a chance to rank zero.
The trigger isn’t that you’re using a question word, but rather that you’re searching for an answer-seeking entity. After publishing, Bill Slawski pointed out the relationship between answer boxes and the Google Knowledge Graph:
“The patent doesn’t specifically state that it is connected to Google’s Knowledge Graph, but it does say that it might look for information about entities when gathering answers for answer boxes. There’s a good chance that Google will look at its Knowledge Graph for entities that it might collect answers about when it is performing the process described in this patent.”
Could Answer Boxes Be Comprised from Multiple Sources?
The patent explains that after “determining that the one or more passages of text have respective scores that satisfy a threshold” the search engine will select “one or more passages of text having respective scores that satisfy the threshold for inclusion in the presentation.”
Let’s say Page A has the highest scoring paragraph for a query. Page B has the highest scoring image that answers that same query. Page C has the highest scoring table, and Page D has the highest scoring video. Is there any reason that, in the future, answer boxes could be comprised from multiple web pages? In reading the patent, we don’t see any reason why not; Google’s statement that “one or more passages of text” will be included does not describe those passages as being on the same page.
What Next Steps Should SEOs Take When Optimizing for Answer Boxes?
Given that the document is a patent application, we can’t take the statements in it to be fact. There is a good chance, of course, that answer box scores will become a reality. In fact, they could already be a reality or in beta at this moment. This patent application, nonetheless, provides valuable insight into how Google is thinking about answer boxes and what answer boxes might look like in the future.
As digital marketers, we always seek to stay several steps ahead — anticipating the coming algorithm and search feature changes so that we’re prepared when they happen. Patent applications take us behind the Google curtain and can help us understand what’s down the pike.
Want more ways to stay ahead of the search curve? If you’re obsessed with winning at web traffic, you don’t want to miss Bruce Clay’s Advanced SEO Workshop at Search Marketing Expo (SMX) West in San Jose on March 20.
This full day of master SEO training will equip you with cutting-edge SEO techniques. Bruce will tackle answer boxes, RankBrain, voice search, AMP and more. Learn how to help raise your rankings and visibility in search engines. Save 10% with our exclusive discount code: BRUCECLAYSMXW17.
2 Replies to “Patent Reveals Insight into How Google Generates Answer Boxes via Content Scores”
Very interesting blog. Lots of useful insights. The way these answer boxes are fetched remains mystical and even most popular SEO blogs can’t find enough hacks to bring out the best practices to achieve. I am quite impressed with your pick on multiple content types on a single page to bring up the higher score and lack of importance on including similar questions in the page. It triggers a great amusement to look at how Google displays each content type from multiple pages if that really going to happen.
Thank Bharathi! Anything we can use to “demystify” the process of optimizing for answer boxes is something I want to explore. I really hope this helps digital marketers on their quest to rank zero :)