Product reviews, deals and the latest tech news

Russian search engine ranking factors are revealed by a massive Yandex code leak

The foundations of the various applications and services offered by Russian IT giant Yandex have been exposed thanks to about 45GB of source code files that were reportedly stolen by a former employee. The search engine’s primary ranking parameters for Yandex were also disclosed, including those that are seldom discussed in public.

On January25, a torrent file claiming to contain the “Yandex git sources” was uploaded with data that seem to have been captured in July 2022 and date back to February 2022. A software engineer named Arseniy Shestakov says he has confirmed with current and former Yandex workers that some archives “definitely include recent source code for corporate services.” In an interview with security site BleepingComputer, a representative from Yandex said emphatically that “Yandex was not hacked” and that the leak originated with a departing employee. After investigating, Yandex found “no danger to user data or platform performance.”

The documents are especially from 2022, when Russia launched an all-out invasion of Ukraine. According to BleepingComputer, a former executive at Yandex described the leak as “political,” adding that the ex-employee had not attempted to sell the code to Yandex rivals. The anti-spam code was not disclosed either.

The release of 1,922 ranking criteria in Yandex’s search algorithm has caused quite a stir, however it is unclear whether or not this has any security or structural repercussions. Martin MacDonald, an SEO specialist, said on Twitter that the breach was “perhaps the most exciting thing to have occurred in SEO in years” (as noted by Search Engine Land). Alex Buraks says, “there is a lot of good information for Google SEO as well,” in a thread outlining some of the most prominent elements.

Four former Google workers have reportedly found work at Yandex, the world’s fourth most popular search engine. Yandex closely follows many of the ranking parameters in Google’s code in order to better compete with Google. After being cut off from its bank accounts and payment systems, Google’s Russian subsidiary has filed for bankruptcy. According to Buraks, “PAGE RANK” seems to be linked to the original algorithm developed by Google’s founders, since it is included as the first element in Yandex’s list of ranking factors.

Buraks explains (in two posts) how the Yandex search engine values content by emphasising the following criteria:

  • Do not have a ripe old age
  • Gain a large amount of exposure from unpaid sources (unique visitors) and less through paid search engine listings
  • have a shorter URL with fewer integers and fewer slashes
  • Use PR=0 and optimised code in place of “hard pessimization.”
  • are housed on dependable servers
  • are either articles on Wikipedia or links on Wikipedia
  • are accessible through a domain’s top-level pages
  • Have relevant terms included in the URL (up to three)

Rob Ousbey has created a search engine where you can search and navigate through all the criteria. It’s possible you’ve noticed that over 200 ranking criteria are labelled as “TG UNUSED” and almost a thousand as “TG DEPRECATED.” As the code was stolen in July 2022, Yandex’s search algorithm has very probably been updated since February 2022. However, this breach is a rare glimpse into the inner workings of a site that serves the biggest country in the world.

In 2015, a former employee attempted to sell Yandex’s search engine code on the dark web for $28,000 to finance his own business, marking the second time that Yandex had seen its search engine code leave the building. It was clear he was unaware of the true worth of Yandex’s flagship product from the shockingly low estimate he gave for its core code. The worker received a two-year jail term that was later suspended, and the code was never made public.