Advertisement

Low-power chip could make speech recognition ubiquitous in electronics

A world where voice control is practically everywhere could be here sooner than you think, thanks to MIT

Seemingly ready to kick off the age of voice-controlled electronics, MIT researchers have built a low-power chip specialized for automatic speech recognition. Compared to a cell phone running speech-recognition software that might require about 1 watt of power, this new chip requires between 0.2 and 10 milliwatts, depending on the number of words it has to recognize. 

Voice_Control_Everywhere

This could translate to a power savings of 90% to 99%, which could make voice control practical for simple electronic devices. This includes power-constrained devices that have to harvest energy from their environments or go months between battery charges.

“Speech input will become a natural interface for many wearable applications and intelligent devices,” said Anantha Chandrakasan, the Vannevar Bush Professor of Electrical Engineering and Computer Science at MIT, whose group developed the chip. “The miniaturization of these devices will require a different interface than touch or keyboard. It will be critical to embed the speech functionality locally to save system-energy consumption compared to performing this operation in the cloud.”

According to design leader and graduate student in electrical engineering and computer science, Michael Price, the team didn’t develop the technology for any particular application. “We have tried to put the infrastructure in place to provide better trade-offs to a system designer than they would have had with previous technology, whether it was software or hardware acceleration,” he said.

Currently, the top-notch speech recognizers are based on neural networks. Much of the new chip’s circuitry is concerned with implementing speech-recognition networks as efficiently as possible.

Of course, even the most power-efficient speech recognition systems can quickly drain a device’s battery if run without interruption. That’s why the chip includes a simpler “voice-activity detection” circuit that monitors ambient noise to determine whether it might be speech. If so, the chip fires up a larger, more complex speech-recognition circuit.

For experimental purposes, the researchers’ chip had three difference voice-activity-detection circuits, each with different degrees of complexity and power demands. Which circuit is most power efficient depends on context, but in tests simulating a wide range of conditions, the most complex of the three circuits led to the greatest power savings for the system. Although it consumed almost three times as much power as the simplest circuit, it generated far fewer false positives; the simpler circuits often chewed through their energy savings by spuriously activating the rest of the chip.

Typically, a neural network consists of thousands of processing “nodes” capable of simple computations but densely connected to each other. In the type of network commonly used for voice recognition, the nodes are arranged into layers. Voice data are fed into the bottom layer of the network, whose nodes process and pass them to the nodes of the next layer, whose nodes process and pass them to the next layer, and so on. The output of the top layer indicates the probability that the voice data represents a particular speech sound.  

One issue here is that a voice-recognition network is too big to fit in a chip’s onboard memory. This is a problem because going off-chip for data is much more energy intensive than retrieving it from local stores. To avoid this, the MIT researchers’ design concentrates on minimizing the amount of data that the chip has to retrieve from off-chip memory.  

A node in the middle of a neural network might receive data from a dozen other nodes and transmit data to another dozen. Each of those two dozen connections has an associated “weight,” a number that indicates how prominently data sent across it should factor into the receiving node’s computations. The first step in minimizing the new chip’s memory bandwidth is to compress the weights associated with each node. The data are decompressed only after they’re brought on-chip.  

The team’s research was funded through the Qmulus Project, a joint venture between MIT and Quanta Computer, and the chip was prototyped through the Taiwan Semiconductor Manufacturing Company’s University Shuttle Program. 

Source: MIT

Advertisement



Learn more about Electronic Products Magazine

Leave a Reply