Bill Slawski has been performing deep research on Google since before I optimized my first title tag. As Director of SEO Research at Go Fish Digital, Bill has been influencing others with his blog SEO by the Sea since 2005. His research on Google patents and whitepapers attempt to help the community gain a closer level of understanding about Google’s best-kept secrets; their algorithms. While Bill admits to still knowing very little about the actual workings of Google’s recipe (as does everyone else who has never worked directly on the algorithms), he has found great value in studying some of the patents Google uses to protect their ideas. Such patents can be indicative of what is, or what may eventually become, part of Google’s algorithmic mystery box.
*Disclaimer: Although the following information indicates possible incorporation of certain patents into Google algorithms, it does not guarantee if, or to what degree these patents are actually being used in Google’s algorithms.
Phrase-based indexing
Bill began his presentation with a story about Anna Patterson.
Anna, a former Director of Engineering at Google, is listed as the inventor of several patents owned by Google. Perhaps the most notable of these are her patents on “phrase-based Indexing.”
Anna, Bill says, was hired an Google’s Director of Engineering following a paper she wrote in 2004 called, “Why Writing Your Own Search Engine is Hard.” She then stayed with Google until 2007 when she left to start Cuil, a search engine of her own making.
It failed.
Two weeks after it failed, Google brought Anna back as VP of Engineering where she began filing patents on phrase-based indexing.
Moral of the story?
Google must really like Anna’s phrase-based indexing inventions.
What is phrase based indexing?
I’m still learning more about phrase-based indexing myself, so I’ll do my best to describe the general premise as I understand it. This patent’s official description is:
An information retrieval system uses phrases to index, retrieve, organize and describe documents. Phrases are extracted from the document collection. Documents are the indexed according to their included phrases, using phrase posting lists. The phrase posting lists are stored in a cluster of index servers. The phrase posting lists can be tiered into groups, and sharded into partitions. Phrases in a query are identified based on possible phrasifications. A query schedule based on the phrases is created from the phrases, and then optimized to reduce query processing and communication costs. The execution of the query schedule is managed to further reduce or eliminate query processing operations at various ones of the index servers.
In more simpler terms, I would describe phrase-based indexing as a method for search engines to understand commonly-used phrases that relate to keywords and queries, then apply this information to improve search query results. I view this as an indexing system that goes beyond keywords, synonyms, and close variants to understand the relevancy of a given page to a users’ search query. For a better explanation, see Bill’s post here.
Bill explained that a phrase-based indexing system looks for phrases that are “complete meaning phrases that predict.” He used the following examples of complete meaning and predictive phrases:
• Incomplete: “President of the”
• Not Meaningful: “Top of the Morning”
• Predictive (Whitehouse): “Oval Office” Rose Garden” “Secretary of State” “President of the United States”
Optimize for a phrased-base index using top ranked pages
Once we know a bit about phrased-based indexing, how do we optimize for it?
Bill’s advice is a tried and true method; study the top ranked pages for your keywords, look for co-occurring phrases on those pages, and think about how to incorporate them on your page.
So, if you were to study the top-ranking pages for a keyword like “Whitehouse” and found that a substantially high number of top-ranked pages also include “President of the United States” and “Oval Office,” you may be able to improve the relevancy of your own page by incorporating these predictive and complete meaning phrases into your optimizations.
This is a tactic that I do use often in my page optimization projects, although it’s great to understand a little more about the history and context of phrase-based indexing.
Annotation text – an updated anchor text patent
Like phrase-based indexing, this was another familiar concept to me, however, I learned a lot more about origins and specifics of this concept through Bill’s research.
Here, Bill pointed to a patent labeled, “anchor tag indexing in a web crawler system.” This is a patent whereby search engines use what they call “annotation text” to understand and assign value to links. Annotation text is described in the patent as being “text within a predetermined distance of an outbound link to a target document.”
What this means for SEO is that annotation text (text surrounding a link) can be used in addition to the link’s actual anchor text to help the search engine understand the link’s relevancy and value.
Semantic Topic Modeling
My takeaways on this portion of Bill’s presentation were much more brief. Rather than attempting to summarize this portion of Bill’s presentation, I’ll go ahead and link to a few resources below for further reading.
• Moz Whiteboard Friday – What SEOs Need to Know About Topic Modeling & Semantic Connectivity
• MarketMuse – Topic Modeling for SEO Explained
• Go Fish Digital – Semantic Topic Modeling for Search Queries at Google
Additional Takeaways
• Asked where SEOs can spend more time researching, Bill recommended spending more time on Google’s Developer pages.
• Bill mentioned that although Schema markup is not correlated with getting content to appear in featured snippets, his research indicates that this may be changing when it comes to answer-based information such as can be found on FAQ pages. He says Google may eventually use Schema as a way of discovering this content, so it might be a good idea to markup FAQ portions of a website.
• Mentions don’t count as links, but links count as mentions. When a website, organization, or brand is appears on the web without the use of a link, this is considered a “mention.” Asked about how Google treats mentions Bill said, “Mentions don’t count as links, but links count as mentions.” In other words, these mentions are still valuable, but they aren’t treated the same way a link to your site will be treated.