From the pillars of advanced learning and GenAI to the best approach to training search engines, Dmitry Masyuk, Director of the Search and Advertising Technologies Business Group at Yandex, outlines the challenges faced by the industry and how Yandex is empowering its users by combining advanced learning from large models with a search engine.
Integrating AI into search engines has marked a significant shift in the industry. How is Yandex tailoring its AI strategies to meet the unique demands of web users?
Around a year and a half ago, when the first language models like ChatGPT were released, there was a misconception that they might challenge major search engines. However, this was a myth. Regardless of the type of language model or neural network, they lack specific knowledge about the world. While they can articulate impressively on general topics, they falter when asked about specific details or recent events.
Seeing this, we recognised we were positioned uniquely. After all, we have a search engine that knows almost every detail about the world since it continuously takes in new information from the Internet. Thanks to our advanced search engine, we have real-time access to the most recently uploaded online information.
At first, we were unsure about comparing ourselves with OpenAI. However, we are now confident that we can develop similarly sophisticated language models and Generative AI.
Our recent release, Neuro, is a unique hybrid product that combines the power of our proprietary language model, YandexGPT, and our own search engine, which is free for our approximately 100 million monthly users. Currently operating in Russian, users can ask questions in natural language and receive detailed, up-to-date responses. If information is available online, Neuro will provide a single, summarised answer, citing all the sources.
This is the biggest search-related update that Yandex has rolled out in the last 20 years. The launch of Neuro is a significant milestone for us and for the rest of the world.
What challenges does Yandex face when ensuring the accuracy and reliability of AI-generated information?
The fundamental challenge here is clear. In response to a user query, a traditional search engine provides 10 relevant sources on the first page. In this case, the search engine is not really responsible for the content of the sources it provides. However, it’s a whole other story when you’re giving a direct answer to a user’s question. This is why we’re seeing many emerging AI products avoid answering sensitive questions.
Our aim is to maintain the universality of search products while ensuring accuracy, reliability and ethics in our responses. From the outset, our product development has focused on these principles. Technically, this involves two components.
Firstly, our search engine with 27 years’ experience filters out low-quality resources. When using our hybrid search, the search engine provides material sources, which are summarised by the LLM into a single answer.
Secondly, we train our model to provide balanced, ethical and accurate responses through a team of specialised editors. During the training process, multiple individuals review material to ensure alignment and balance. Plus, Neuro’s answers are based on the information found online and the answer always includes links to the sources used.
In summary, the first component involves providing relevant and high-quality sources, a common practice in any search engine. The second component utilises a team of cross-checking editors to train the model to provide accurate and balanced answers.
How does Yandex differentiate itself in a competitive market?
We are a bit of a technological miracle as we are managing to strongly compete against Google.
Approximately 65% of Internet users in Russia opt for Yandex as their search engine of choice, a well-known fact that is openly available. This showcases our ability to compete with global companies.
Our size is somewhat paradoxically advantageous. While we’re significantly smaller than other giants on the global market, it allows us to be more dynamic and agile. Over the past five years, we’ve nearly doubled our staff, growing to 27,000 employees. However, we’ve managed to maintain a startup mentality, making quick decisions and staying nimble in a rapidly changing market.
So, how are we differentiating ourselves from other global players? First of all, especially when entering new markets, we place a huge focus on product localisation. The first thing we did when we started actively developing in the CEE region was a massive push towards raising the quality of Yandex Search in the Kazakh language. I’m not saying that other global companies don’t care about localisation, but for us this is a number one priority.
We make sure that our AI and search solutions work better in the local language, including, but not limited to, Russian. And we regularly compare thousands of cases to make sure of that.
Another crucial aspect is our talent pool. Russian engineers consistently excel in programming competitions like ICPC where Russia has won 14 of the last 20 years, showcasing our country’s remarkable engineering talent. Obviously, we couldn’t build any of our cutting-edge technologies without hundreds of talented professionals and we’re constantly hiring new ones. We now have 1.5x more ML engineers than we did before 2023.
What future developments can we expect from Yandex in the AI and search engine fields?
We are at the beginning of a new era; a technological revolution that could last five to 10 years. Just as smartphones and the Internet transformed our lives, AI has the potential to do the same, if not more. We’re only one and a half years into this AI renaissance, but already, the potential is staggering for GenAI.
Regarding Yandex’s plans, I’m personally inspired by OpenAI’s advancements and it is important for us to adhere to the new standard of AI it has set. Our strategy is generally to ensure the fundamental AI technologies are on par with global companies in specific domains, or even better.
But there is a challenge that’s being widely discussed within the professional community — monetisation. The question is, how do you make a great product, which is always the first thing on your mind, and also turn it into a profitable business? Sure, there are traditional monetisation approaches like advertising and many companies rely on subscriptions. But we aim to distribute our general-purpose products, like Neuro, for free. We might adopt a monetisation approach at some point, but it’s too early to say.
Speaking of advertising, we’ve been quite successful in ad tech, which we have years of experience and strong expertise in. To put things into perspective, Yandex Direct — our platform for placing contextual and banner ads — has more than 400,000 advertisers who place an average of 4.5 billion ads a day. Around 25 different neural networks are involved in delivering ad impressions to users and our entire Yandex Advertising Network has more than 55,000 partner platforms.
Aside from that, Yandex has quite a few plans on the international market. I believe this year we will be marked by a significant expansion in the non-Russian-speaking world.
In essence, our focus is on improving fundamental technologies, developing sustainable business models, evolving our products and expanding internationally. We are excited about the journey ahead.
What is the potential of AI and what global trends are you seeing that could be key to its advancement?
The concept of AI is fascinating considering its potential to revolutionise technological advancements. AI essentially makes intelligence cheaper and faster, much like how the Internet made information more accessible.
The efficiency gains from AI are substantial. Over the next five to seven years, we can expect a 3% to 5% increase in daily productivity for the average Internet user. This will be particularly pronounced in fields such as software engineering, customer support and legal professions where AI can streamline tasks by up to around 10%.
Contrary to concerns about job loss, AI is expected to enhance productivity and create new jobs and businesses. Software engineers, for instance, will receive substantial support from AI systems, enabling them to work more efficiently. Additionally, AI solutions will be extended beyond B2C applications. We already offer access to our models via API, allowing companies to seamlessly integrate our AI into their systems.
Ultimately, AI will democratise access to information and services making them more efficient and accessible globally. Whether it’s offering medical advice to remote communities or streamlining business processes, AI will transform industries and improve lives.
What are the latest developments in voice tech in regard to search and AI?
Voice technology is a fascinating area often overlooked in discussions. It holds immense promise, particularly in the field of real-time translation which is not yet fully developed; however, the market is making significant progress. I’m confident that in a few years there will be technologies capable of providing real-time translation for conversational purposes.
We are already working in this field and have successfully implemented real-time, AI-powered video translation in Yandex Browser.
We have made significant progress in voice recognition and speech synthesis. The quality of real-time voice-to-text conversion on mobile devices is impressive, thanks to advanced AI and Machine Learning systems.
While voice recognition and speech synthesis have come a long way, there is still much to achieve, particularly in understanding emotions. Smart assistants lack the ability to convey empathy effectively or read emotional content with appropriate intonation.
One of our goals in the Machine Learning department is to improve the emotional recognition capabilities of our voice assistant, Alice. (Its technology was also brought in for the creation of Yasmina, a bilingual AI assistant that speaks both Arabic and English). By the end of the year, we aim to enhance its ability to efficiently interpret emotional overtones and make them more relatable. It is already remarkably human-like, capable of making jokes and offering an engaging conversational experience. It is simply enjoyable to interact with it, however, we still need to improve its emotional understanding capabilities.
Overall, the trend is towards humanising technology. We’ve already introduced features like whispering, allowing the virtual assistant to respond softly when spoken to in a whisper, as well as speaking louder or quieter depending on the distance from the person speaking.
Overall, the field of voice tech is definitely shifting towards a more human-centred approach. It is not only about intonations, but also other aspects of human interactions, some of which may be very subtle. However, when you deal with technology, there remains a gap that needs to be bridged. Despite this, I believe that the voice tech domain will be fully explored within the next three to five years.
Click below to share this article