Google won’t comment on the potentially massive leak of search algorithm documentation

Google’s search algorithm is perhaps the most important system on the Internet that determines what pages live and die and what content looks like on the Web. But how exactly Google ranks websites has long been a mystery put together by journalists, researchers and people working in the field of search engine optimization.

Now, an explosive leak that reportedly shows thousands of pages of internal documents appears to offer an unprecedented look under the hood of how Search works — and suggests that Google hasn’t been entirely truthful about it for years. Google has not yet responded to several requests for comment on the legitimacy of the documents.

Rand Fishkin, who has worked in SEO for more than a decade, says the source shared 2,500 pages of documents with him in the hope that reporting the leak would debunk the “lies” shared by Google employees about how the search algorithm works. The documents outline Google’s search API and break down what information is available to employees, according to Fishkin.

The details shared by Fishkin are dense and technical, probably more readable for developers and SEO experts than lay people. The leaked content also doesn’t necessarily prove that Google uses the specific data and signals it mentions for search rankings. Rather, the leak outlines what data Google collects from websites, sites and search engines, and offers SEO experts indirect hints about what Google appears to be interested in, SEO expert Mike King wrote in his review of the documents.

The leaked documents touch on topics like what kind of data Google collects and uses, which sites Google singles out for sensitive topics like elections, how Google treats small websites, and more. According to Fishkin and King, some of the information in the documents appears to contradict public statements made by Google representatives.

“‘Lied’ is harsh, but it’s the only accurate word to use here,” King writes. “While I don’t necessarily blame Google’s public representatives for protecting their proprietary information, I do not agree with their efforts to actively discredit people in the worlds of marketing, technology, and journalism who have presented reproducible discoveries.”

Google did not respond The Verge’with a request for comment regarding documents, including a direct request to refute their legitimacy. Fishkin said The Verge in an email that the company did not dispute the veracity of the leak, but that the employee asked him to change the language in the post regarding how the event was characterized.

Google’s secret search algorithm has spawned an entire industry of marketers who carefully follow Google’s public guidelines and implement them for millions of companies around the world. The ubiquitous, often annoying tactic has led to a general narrative that Google Search results are deteriorating and cluttered with junk that website operators feel obligated to produce in order to get their sites seen. In reaction to The VergeGoogle representatives have reported on the SEO tactic in the past, often falling back on the familiar defense: it’s not what Google’s guidelines don’t say.

But some details in the leaked documents call into question the accuracy of Google’s public statements about how Search works.

One example given by Fishkin and King is whether Google Chrome data is used in the evaluation at all. Google representatives have repeatedly stated that it does not use Chrome data to rank sites, but Chrome is specifically mentioned in the sections on how websites appear in Search. In the screenshot below, which I captured as an example, the links shown under the main vogue.com URL may be partially created using Chrome data, according to the docs.

Chrome is mentioned in the section on creating additional links.

Image: Google

Another question raised is what role, if any, the EEAT plays in the assessment. EEAT stands for Experience, Expertise, Authority, and Trustworthiness, a metric Google uses to rate the quality of results. Google representatives have previously stated that EEAT is not a ranking factor. Fishkin notes that he did not find much in the documents mentioning the EEAT by name.

However, King detailed how Google appears to collect author data from a page and has a field for whether the author is an entity on the page. A portion of the documents shared by King states that the field was “mainly developed and tuned for news articles… but is also populated for other content (e.g. scientific articles). While this doesn’t confirm that sidelines are an explicit ranking metric, it does show that Google is at least tracking this attribute. Google officials have previously insisted that bylines should be done by website owners for readers, not Google, because it doesn’t affect rankings.

While the documents aren’t exactly a smoking gun, they provide an in-depth, unfiltered look at the heavily guarded black box system. The US government’s antitrust lawsuit against Google — which revolves around Search — has also led to internal documentation becoming public, offering further insight into how the company’s flagship product works.

Google’s general acumen about how Search works has led to websites looking like SEO marketers trying to outsmart Google based on the tips the company offers. Fishkin also labels publications that credulously support Google’s public claims as true without further analysis.

“Historically, some of the search industry’s loudest voices and most prolific publishers like to uncritically repeat Google’s public statements. They write headlines like “Google says XYZ is true” rather than “Google claims XYZ; The evidence suggests otherwise,” Fishkin writes. “Please do better. If this leak and the lawsuit with the Department of Justice can bring about only one change, I hope it’s that.”

Leave a Comment Cancel reply