By Richard Quinnell, editor-in-chief
I have seen numerous product announcements from multiple vendors over the last few months that seek to simplify the development of voice control for electronic systems. Apparently, the industry hopes and believes that voice control is the next big thing in user interfaces, for consumers and industry alike, and they are all aiming for that big design that means millions of units per year. Setting aside the question of whether that belief is accurate, I have to wonder if voice control would be a boon or a bust.
There certainly has been a lot of promotion. I recently wrote about the chorus of voice preprocessing chips that have come out with development kits in support. Since then, even more announcements related to voice control have appeared. As recently as last week, Libre Wireless announced a wireless, IoT hardware platform with voice control built in that it calls MAVID. Still more are likely to appear as the year finishes up.
It’s easy to see the appeal. “Star Trek” fans like myself have long dreamed of being able to interact with their computers by just talking. And being able to get things done even if your hands are occupied or you’re out of reach of the controls can be a tremendous convenience. All you need to do is ask, and the voice control systems snap to the task. But will it be more than a fad?
I have been following speech-recognition technology since the mid-1990s, when I wrote an article for EDN entitled “Speech Recognition: no longer a dream but still a challenge.” As you might expect, the technology’s advancement has addressed many of the issues raised in that article. We now have speech recognition that is context-aware, natural-language-friendly, speaker-independent, and free from the need for noise-cancelling, near-field microphones to capture the sound. Yet some of the operational concerns I had way back then are still valid now, which leaves the future of voice-controlled systems hazy in my mind.
I will be the first to admit that I dearly love the convenience of today’s voice control technology — for some things. I have an Amazon Echo, three Dots, and even a Google Home Mini that I use around my house. It’s now such a habit when we have a quick question to simply ask “Alexa, what is…” that my wife and I find ourselves wanting to say that in our car (which has no such capability) when a question comes up in conversation. It’s also nice to be able to activate outdoor lighting when guests are coming or going simply by asking. Similarly, the bedside lights are on voice control so I don’t have to get out from under the covers to turn them on or off.
Second-generation Echo. (Source: Amazon)
But there are still some challenges. Sometimes the Echo will react during my conversations with friends when it hears a word that is vaguely like the wake-up command. Other times, it misunderstands a word, frustrating me but sometimes at least with amusing results. And I must remember exactly what I named the various lights and groups that the system controls as well as the specific wording that such control requires or I get no useful response from the Echo. So there remains a bit of a requirement that the user adapt to the technology rather than the other way around.
More importantly, however, I wonder if the current excitement about voice control will end up being too much of an occasionally good thing. Vendors are looking forward to voice control applications such as eliminating the need for a TV remote, hands-free operation of appliances such as ovens, and a host of others. The problems I see with such proliferation are two-fold. One is that the user training needed to operate all ofthese systems becomes increasingly burdensome. I already must be careful in my word choices to properly control and to avoid unintended triggering of my Echo. If my television, oven, door locks, coffee-maker, and numerous other devices in my home all require their own wake word and command syntax, it can quickly become unmanageable or perhaps lead to unfortunate results. And pity the poor visitor to my home who will have no idea how to turn on the bathroom light because they don’t know its name.
But more importantly, to my mind, is the increase in vocal noise that a proliferation of voice control devices will engender. If you work in a typical office environment, you can easily imagine how much the background babble will increase if everyone is talking to their phones, computers, lamps, copy machines, and the like. And when that copier hears someone talking about the most recent sports scores, will it inadvertently produce one copy for each point on the winning team?
Some of these problems can be addressed with further improvements in the base technology. The Google Home Mini already includes speaker recognition so that it can identify who is talking to it and can tailor its responses to the individual. And the Echo is beginning to recognize when multiple devices are picking up the same voice and limit the response to the device closest to the speaker. So, I can easily imagine that something like a copy machine can, once activated, respond thereafter only to that speaker’s voice for the remainder of the command and thus ignore the sports scores being discussed nearby.
But the increase in babble remains a problem. Developers thus should think about where, when, and why voice control is a benefit and not just slap it on everything just to be trendy. They should also think about how the proliferation of voice control will affect individual device usability as well as alter the sound environment and structure their designs to mitigate some of the potential negative effects. Otherwise, voice control will undermine its own success and become just another technology fad that showed a lot of potential but didn’t really catch on.
Learn more about Electronic Products Magazine