Siri and Alexa are Hindering the Growth of Voice Assistants. But How?
The main use cases for voice user interfaces are voice assistants such as Alexa, Google Assistant, and Siri, at least for the number of monthly active users. This growth of voice assistants helps users complete simple tasks such as setting alarms, playing media, or answering questions, hands-free and with the most intuitive user interface voice.
Lack of Feedback
But after years of hype about the growth of voice assistants, few people use them for more complex things. This is quite different from human-to-human conversations where parties use nonverbal communication cues such as gestures and facial expressions to indicate that they understand or do not understand. From time to time, can these signals be verbal, like short aha or eh?
The voice assistants industry don’t have these clues, so they’re based on making voice input and then waiting for the assistant to respond as expected.
If the voice assistants do not understand correctly, the user will have to start over with the experience. Let’s take a look at the Flight Booking User task in the Assistant Paradigm again. The user says, “I want two tickets from Berlin to New York in business class.” The assistant answers “There are two flights available from Beirut to New York. Option one departs at 7.52 a.m. and option two at 2.33 p.m.. Which one would you like to book?” and starts waiting for user input. Because there’s a mistake, the user needs to reset the conversation and start from the beginning. And most critically, the assistant has already wasted several seconds speaking totally incorrect and irrelevant information.
In a human-to-human conversation, the salesperson would reply with something like “So Beirut to New York, let’s see” and the customer could immediately correct him with “Sorry, I mean from Berlin to New York,” and the conversation would continue naturally.
The big difference between the experiences is the feedback loop which led to Siri and Alexa hindering the growth of other voice assistants. Human conversations have a fast, natural feedback loop that makes it easy to correct mishearings or misunderstandings. How could this be replicated in a voice user interface?
A Better Alternative
The key to improving the user experience of the voice user interface is to remove the natural language response and replace it with real-time visual feedback. When the user is giving input to a computer by using touch, mouse, or keyboard, he sees in real-time how his input is affecting the graphical user interface. The same should happen when using voice input.
This real-time visual feedback allows users to make more complex utterances and naturally correct themselves if they make a mistake. Voice platforms compete to improve the accuracy of speech recognition software, but the key to a good user experience is not perfect accuracy, but the ease of modification. Think about the keyboard. We’re always making typos, but people think keyboard technology isn’t mature enough to actually use it.
While voice assistants are dominated by voice user interfaces, it’s time to abandon the conversational voice assistant paradigm and see voice as another modality alongside touch and vision. Voice should be seen as an addition to enhance them, not as a replacement for the current user interface.
When voice is used as a supplemental modality, the graphical user interface gives people clues about what they can do and eliminates the problem of skill recognition by voice assistants. The graphical user interface responds to user input in real-time and corrects the lack of voice assistant feedback. This enables more complex user tasks and can finally turn voice into what it can be: the most natural and efficient input modality.