State of Voice Search in 2020: All about Voice Search Technology
Understanding how voice search came into existence and what are the challenges it faces currently.
Currently, we live in the age of seismic technological shifts of the 21st century. From Facial Recognition unlock to voice search to electric cars by Tesla, we have been at the edge of massive revolutionary innovations. However, very few continue impacting and transforming our lives as they did since their introduction in the market. Voice Search is one of those trends. From helping us “Call Mom’ to answering our queries on “What’s the Current score in El Clasico” or resolving argument about “Who is the greatest actor of our times?” to “Stock Market Alerts,” voice search has essentially filled up every space of our daily lives. When Apple’s Siri hit markets in 2011, it managed to gain an impressive attraction of tech enthusiasts, yet no one was certain about how voice search shall evolve into a necessity. Fast forward to today, we are blessed with Google Voice Assistant, Amazon Alexa, and many more. Things took a turn when Google Home, Amazon Echo, and Apple HomePod went mainstream in 2017. All these instances back the fact that this technology is beginning to grow to impressive possibilities.
While the voice recognition technology started in 2011 with the release of Siri on the iPhone 4s, the technology was first developed nearly 60 years prior to Apple’s brainchild. Nicknamed AUDREY, the huge Automatic Digit Recognition machine, designed by Davis, Biddulph, and Balashek of Bell Laboratories came along in 1952. It was capable of recognizing fundamental units of speech sounds called phonemes and occupied a six-foot-high relay rack. AUDREY could understand the sound of a spoken digit – zero to nine – with more than 90% accuracy, at least when uttered by its developer HK Davis.
Then came IBM’s ‘ShoeBox’ Machine, at the 1962 World Fair. It was capable of understanding 16 spoken English words. In 1972, the US Department of Defense’s research agency DARPA funded five years of a Speech Understanding Research program, intending to reach a minimum vocabulary of 1,000 words. Several companies and academia, including IBM, Carnegie Mellon University (CMU), and Stanford Research Institute, took part in this initiative, which leads to the birth of ‘Harpy.’ One of the unique characteristics of Harpy was, it could recognize entire sentences. With 1011 words, it shared vocabulary of an average three-year-old with remarkable accuracy.
Decades later, when neural networks and machine learning algorithms became popular, Google released Google Voice Search app for the iPhone, where it used cloud computing to process the data received by its app. Then in 2012, Google launched Google Now for its Android operating system. In 2010, Google added, “personalized recognition” to Voice Search on Android phones, and Voice Search to its Chrome browser in mid-2011. Meanwhile, Apple quickly offered its own version, called Siri. Later Microsoft and Amazon jumped on the bandwagon, unveiling Cortana and Alexa in 2013 and 2014, respectively.
How does it all work?
The voice search function is based on speech recognition technology. It starts when the browser collects voice audio from the user via a microphone. The analog audio file is then converted to digital and sent to speech recognition software. There it is broken down into phones (individual letters), and phonemes. E.g., if we say the word cat, the software breaks it into ‘c,’ ‘a,’ and ‘t’ as separate phones for better recognition. After that, the software combines phones and phonemes to turn our audio into a set of letters and words to put a context for further analysis. It then predicts the words which were said, converts them back to the text, performs the necessary search by connecting to external data sources such as search engines to find the relevant information, and relays the results back to the user. This is a brief insight into how the voice search actually works.
Despite the surging popularity, some hurdles are preventing this technology from reaching its full potential. This ranges from the optimization of voice-activated results to appear outside the SERPs to substantial errors in the directory content, that restricts reaching sites through voice search. Other concerns include privacy issues, problems of language support, and regional accents. Further, Google does not currently allow filtering of voice searches. Further, there are instances of misunderstandings when voice search software fails to detect what the user is actually speaking and ends up creating hilarious yet risky translations for them. Moreover, till date, web browsers themselves still mostly lack voice search functionality.
Another challenge for voice search that needs immediate attention is automatic speech recognition in noise areas. Also known as the cocktail-party effect, where, unlike human ears, voice recognition software cannot understand a single speaker in a crowded, noisy party. While voice assistants no longer need wake words like “Ok Google,” “Alexa,” nor “Hey Siri,” one needs to speak clearly and loudly when requesting a query to them.
Voice Searches have come a long way since AUDREY, and 2020 promises new trends in this field. Now we can have more streamlined conversations with the voice search assistants while their makers are looking to enhance their compatibility and integrations with other devices. Voice search is also entering the e-commerce and retail industry for better customer interaction, engagement, and offer personalized experiences. As voice search becomes more and more relevant with the rise of voice assistants such as Alexa, Amazon Echo, or Google Assistant, their developers must deal with the challenges so that this technology sees a broader market and application in the future.