Google Search’s Internal Documentation Leak Points to New SEO Insights

Recently, the search engine optimization (SEO) community has been shaken up by the news of a significant leak of Google Search’s internal documents. In an effort led by industry experts Rand Fishkin and Michael King, SEOs have been analyzing these documents, which offer unprecedented insights into how Google Search might operate. Some of these revelations even challenge long-held assumptions based on public statements made by Google representatives over the years. 

While there is no way to know if the systems described are actually part of Google’s ranking algorithm, this glimpse into Google’s “black box” offers search professionals an incredible amount of information, allowing them to explore some of the intricacies of Google’s ranking algorithms. 

What Was Revealed about Google’s Ranking Algorithm 

The leak, which was shared by an anonymous source and has since been confirmed as real by Google, originates from Google’s internal Content API Warehouse. Thousands of documents detailing Google’s purported ranking features and signals were released, offering a comprehensive look at how Google Search might function behind the scenes. 

This leak has serious potential to impact the SEO landscape. Many SEO professionals are undoubtedly looking for information that might help bolster their strategies even if the use of any of these systems in the ranking algorithm is still, and will likely remain, unconfirmed. 

While the data provided is dense, there are some key points from the leaked documents that Fishkin and King have shed some light on. These include:

  • Ranking Features and Attributes: The documentation lists 2,596 modules with 14,014 attributes. However, it does not specify the weighting of these features.
  • Twiddlers: These are described as functions that can adjust the “information retrieval score” or ranking of a given document.
  • Demotions: There are various reasons that can lead to content demotion, such as mismatched links, SERP signals indicating user dissatisfaction, and product reviews.
  • Linking: Links continue to be a ranking factor, with diversity and freshness of the links mattering, and relevance being key. But anchor text links “don’t seem to be as crucial or omnipresent as I’d have expected from my earlier years in SEO,” according to Fishkin. 
  • Successful Clicks: Metrics like badClicks, goodClicks, lastLongestClicks, and unsquashedClicks may be used to measure user interactions. Longer documents may be truncated, while shorter content is scored based on originality. Your Money or Your Life (YMYL) continues to play a role here as well.
  • Brand Importance: It seems to be critical to establish a strong, well-recognized brand outside of Google search. This is emphasized as vital for SEO success. Fishkin emphasizes that brand presence and notoriety matter, making it important for SEO to build a popular brand in your space that is well-recognized with positive reviews.
  • Authorship: Google stores author information. This indicates that authorship and entity recognition may play a role in ranking.
  • Considering Chrome Data: Data from the Chrome browser can be used in ranking considerations, indicating that user behavior on the browser influences search results.
  • Whitelists: Certain domains, especially those related to elections and COVID-19, can be whitelisted. This means that specific sites may be given preferential treatment in search results.
  • Dates and Freshness: Google may consider various date indicators, such as byline dates, URL dates, and on-page content dates, to determine content freshness, which it seems to prefer.
  • Even Page Titles and Font Sizes: Page titles and the average weighted font size of terms in documents may be ranking factors.
Understanding the Context of the Leaked Google Documents

While the leaked documents provide valuable insights, it’s crucial to understand the context in which this data exists. The documents are related to Google’s Document AI Warehouse, a cloud-based platform for storing, searching, organizing, and analyzing documents and their metadata. This platform is used for analyzing both structured and unstructured data, extracting AI-generated metadata, and assigning tags. This context is essential to avoid misinterpretations and overreactions. The documents offer a glimpse into Google’s complex algorithms but do not necessarily reflect the entire picture or the current state of Google’s ranking system. Google has urged caution, stating that the leaked information may be out-of-context, outdated, or incomplete. 

We know that won’t stop those looking to find an ace in the hole among the documents. Ongoing analysis of the documentation seems to focus on understanding the broader implications and verifying the authenticity and relevance of the documents. Posts on social media had debated whether the data was “leaked” or “discovered,” with some suggesting it was accidentally included in a code review and pushed live from Google’s internal code base. Google has since confirmed the authenticity of the documents but not without qualifying that search experts should be cautious of investing in anything found there.

New Possible Ranking Factors: The Implications for SEO

The confirmation of metrics like siteAuthority and the use of Chrome data bring to light some inconsistencies in Google’s past claims that these metrics are not ranking factors. Taking these possible ranking factors into account, including link diversity and user behavior, refreshes the challenge of developing effective SEO campaigns. 

For SEO professionals, these leaks point to the importance of continuing to focus on fundamental principles: creating quality content and ensuring a positive user experience. According to King, it’s these values that successful SEO strategies should still be based on to help drive qualified traffic and earn diverse links. This seems to align with the overall theme of the leaked documents as well. The leaked documents also emphasize the importance of user interactions in ranking. The recently uncovered metrics, like goodClicks and lastLongestClicks, suggest that engaging content that meets and anticipates users’ needs is what Google’s algorithms prefer. 

The newly emphasized role of brand and entity recognition helps to reinforce the need for a holistic approach to SEO. As we’ve known, building a strong brand presence, both online and offline, can enhance visibility and credibility in search results. The importance of authorship and the presence of whitelists for specific domains continue to highlight the need for transparency and authority in content creation (on our end and hopefully on Google’s part as well).

Moving Forward in a Brave New Search World

As the SEO community digests these insights, it is key to regroup to a more realistic perspective about what comes next. The leaked documents provide valuable information, sure, but they’re just one piece of the puzzle. 

There will always be uncertainty about how the Google ranking algorithm works, but that does not stop us from building strategies together around what we know is best for users. It will continue to be relevant (and necessary) to create nuanced yet helpful, human-first content that anticipates and aligns with the needs of searchers. Continuous learning, open-minded adaptation, and a healthy respect for best practices will help us weather the ever-changing waters of Google search.

About the Author:

EXPLORE OUR BLOGS

Related Posts

Sign up for our mailing list

Get the latest on the world of digital marketing right to your inbox.

    Share This Resource, Choose Your Platform!

    Join the JumpFly Newsletter

    Get Our Marketing Insights Right To Your Inbox

      Schedule a Call

        Fields containing a star (*) are required


        Content from Calendly will be embedded here