As a proud comic nerd, I've always been fascinated by the heroics of Jarvis from the Iron Man franchise of the Marvel Universe. But what has hit me, even more, is the fact that we are half way there, not only in reel but in real-life as well.
It feels like yesterday that we had our first chat with Siri on iPhone 4s. Today, GoogleAssistant, Alexa, Cortana and hundreds of more such devices are household names. It got me thinking, how did things evolve so quickly?
So, my tech-savvy brain got curious, and decided to decipher the magic wherein you could chant a few mantras and obtain the desired search results, play your choice of music, set up an alarm, etc. that otherwise would have been a tedious process of grabbing hold of a screen, typing a command or toggling through menus and then getting the job done.
And the answer I found was: Voice Interface Technology, or VUI.
What is Voice Interface Technology or VUI technology?
The name Voice Interface Technology itself is self-explanatory. VUI is the interface that allows users to interact with any system using voice commands. Some of the most common and household examples of devices employing VUI technology are Google Assistant, Siri, Alexa, etc.
VUI eliminates the visual and tactile approach of interacting with a system.
Like any app based on an operating system, VUI operates on three layers:
All three layers need to work in sync for voice-based interactions to function smoothly. This is because each of these three layers uses the layer below it to support itself. Contrary to what most people think, the voice interface lies in the upper two layers, i.e. the app and the platform which are on the cloud and not on the device.
Why is VUI becoming so popular day after day?
- Moore's Law: Moore's Law states that we can expect the speed and capability of our computers to increase every couple of years, and we will pay less for them. According to Moore, this growth is exponential. Extending the law further in the context of VUI, what initially started out as an experiment is now a household technology, and more number of patrons are accepting and adopting it.
- Gets things done faster: It takes an average human 6.67 minutes to speak 1,000 words while typing the same volume takes 25 minutes, which is approximately four times as much. This clearly shows that having voice as an interface can facilitate faster interactions between the system and its commandant.
- Expands the scope of technology: Long gone are the days when home assistants could process and carry out measly operations such as'sending someone a text' or 'reading out an email'. Home assistants can now do stuff that at one point, would have been borderline otherworldly and unbelievable. Take, for example, Google Home that allows its patrons to control over a thousand smart home devices such as kettles, microwave ovens, and thermostats.
- Finally some inclusion for the specially-abled: The rise in prominence of voice interface technology has also provided some respite for the specially-abled, a demographic, that in a manner of speaking have always been technologically isolated.
Also, speaking comes more naturally to humans than typing, so the volume of queries that a person poses per unit time also tends to be significantly higher.
Apart from controlling stuff, virtual assistants can now book you an appointment, order food for you, and even drive your car. Cool, isn't it?
Most VUI powered devices have an inbuilt translation tool that can automatically translate what is said in a foreign language into a language that you can comprehend, helping break language barriers across boundaries.
VUI technology-enabled devices can read out emails and messages for the visually impaired and they can convert audiobooks and messages into text for people with hearing impairment, who can then read it with ease.
GUI (Graphics user interface), through no fault of its own, has always plagued the technological advancements for the specially-abled,but when integrated with VUI, could finally be the bridge between them and technical autonomy.
Why VUI still has a long way to go?
Despite progressing leaps and bounds in recent years, voice user interface technology is still in its infancy, and faces its fair share of challenges. Here, I've listed some of them:
- Privacy and Security: The very fact that these AI-powered machines are on the lookout for voice commands poses a big privacy concern for their users. Especially in light of the infamous Facebook Cambridge Analytica scandal, where it was claimed that devices like Google Home and Amazon Alexa were eavesdropping on their customers' private conversations.
- Language Support: Humans have long wanted to teach computers, how to comprehend languages, but have had an impossible time doing so. However, with recent advancements in Natural Language Processing (NLP), singular aspects of language such as entity recognition, classification, sentiment analysis, and questioning& answering have begun to be comprehended.
Legal authorities around the world have since taken cognizance of this issue, and data protection laws around the world have seen massive overhauls. Afraid of major sanctions and the negative publicity that comes with data breach lawsuits, VUI service providers have been forced to provide easier and comprehensive access of their data to the customers.
But in a technology where the language is the key tenet to comprehend commands, singular processing processes sometimes don't suffice.
Google Researchers have been continuously working on Bidirectional Encoder Representations from Transformers (or BERT),that can outperform 11 of the most common NLP tasks after fine-tuning at one go.But even this is in its infancy, and it will take years for BERT to match the littlest intricacies in a real human voice. Hence,it will take a lot of time for voice user interface powered systems to fully evolve and comprehend the commands in all aspects, akin to the human voice.
What the future holds for VUI?
There is a very famous saying, "If you can't beat the competition, partner with it." This holds out perfectly for VUI. The current form of voice user interface has a lot of shortcomings and inadequacies, because of which it alone cannot fulfil the daily requirements of users.
The right way forward would be to forge a partnership with Graphical User Interface (GUI) so that they can both overcome the shortcomings of each other, and provide an amazing voice assistant experience.
Who knows at some point in future, we might all have our personal Jarvis, 'WOW'-ing us every single second of their existence!