Google is said to have confirmed the authenticity of the thousands of internal documents that were reportedly leaked earlier in May. The data reportedly includes information about how Search works and Google’s user data collection for web page ranking. While the company initially refused to react to the leaks, it is reported that it has now been acknowledged, although Google has also advised caution against “making inaccurate assumptions”.
Google confirms Search leak
In an email to The Verge, Google spokesperson David Thompson said, “We would caution against making inaccurate assumptions about Search based on out-of-context, outdated, or incomplete information”. Thompson also claimed that Google is working to protect the integrity of the search results from manipulation, adding that the company has “shared extensive information about how Search works and the types of factors that our systems weigh.”
The issue reportedly came to light when search engine optimisation experts (SEO) Rand Fishkin and Mike King published analyses of 14,014 attributes (internal API documents) leaked from inside Google’s Search division and shared with them by a source.
These documents are reportedly part of the “Content API Warehouse” that the company’s employees use as a repository. It is further reported that the document’s code was uploaded on GitHub on March 27 and wasn’t removed from the platform until May 7.
Contradictory information
In a blog post, Fishkin claimed that many claims that Google has made over the years contradicted the information provided by the source, such as consideration of clickthrough rate (CTR) as a ranking signal and subdomains as a separate entity.
In another example of contradiction, the documents reportedly mention Chrome data when it comes to ranking websites on Search. However, the tech giant has time and again claimed otherwise, saying it does not make use of Chrome data to rank web pages.
According to Fishkin, many of these claims also overlap with what Google revealed in its testimony during the US Department of Justice antitrust case. Furthermore, other claims also suggest insider knowledge. Although most of the information would be better understood by SEO personnel, Fishkin’s analysis reveals what data Google actually collects from searches, web pages, and sites.