Advertisement

Audio edge processors deliver context-aware audio experiences

Dedicated audio edge processors with machine learning will deliver increasingly natural communication experiences.

The last decade, but particularly in 2020, saw a massive increase in video calling and engagement with voice-enabled devices primarily through the work-from-home trend created by the pandemic. However, our use of video calls isn’t restricted to just work Zoom meetings. We are now using video calling for teaching, exercise classes, to experience live music, as a tool for interacting at conferences, and more. This virtual engagement occurs via laptops, smartphones, tablets, home assistants, and other IoT devices such as Amazon Echo Show, Facebook Portal, Peloton, Tempo Studio, etc.

One factor that limits enjoyable and engaging interactive audio, video call, or home assistant experiences is consistent sound quality in the presence of noise and other distractors. The capability of the device to intelligently manage sound is what makes or breaks your ability to communicate. 

Increasing audio and voice sophistication requires new technology

Dubbed ‘smart sound’ by product makers, audio intelligence is the ability of a device to process sound to deliver the best user experience. As the increase in voice-first device use for communication, entertainment, and health management increases, so does the demand for a seamless, low-barrier experience with increased functionality.

Users now expect devices to understand more than simple wake or keywords (such as Alexa) and seek the ability to move between devices and applications with superior sound quality to enable an immersive and seamless experience, whether for professional conferencing or personal entertainment. Devices should be able to combine your voice and/or voice command with your personal preferences and environmental data to adapt sound processing to your specific context. This is known as contextual awareness.

Contextual awareness explained

Contextual-aware devices combine user-specific information, such as location, preferences, and ambient sensor data, to better understand what the user is asking for and more accurately execute functions in response to a certain command or trigger.

Always-listening devices use signal processing technology combined with machine learning (ML) to differentiate between sound types such as natural sounds, voices, background interference, etc. These sounds are often divided into “scenes” and “events”. A scene is the users setting such as a noisy airport terminal or a quiet working space, while events include someone speaking, a glass breaking, or a dog barking. A context-aware device can process these groups of sounds to ensure the intent of the action, whether it is a video call or a voice command, for an optimal experience.

 

Knowles context-aware audio system schematic

Context-aware system schematic (Image: Knowles Corp.)

Why dedicated audio edge processors?

Dedicated audio edge processors with ML optimized cores and a focus on audio fidelity are the key to supporting contextually aware and high-quality audio communication devices. These processors can deliver enough compute power to process audio using traditional and ML algorithms while using a small percentage of the energy of a generic digital signal processing (DSP) implementation.

While the cloud may offer some great benefits, edge processing allows users to harness the full capability of their device at any time without the need for a high bandwidth internet connection. For example, edge audio processors enable a superior user experience in virtual communication through low latency processing of audio with contextual data while also keeping the contextual data local and secure.

A look at specialized audio edge processors

There are several features that audio edge processors must get right to deliver exceptional audio and voice command experiences.

  • Noise and distance

Beamforming, using signal processing algorithms, focuses or directs sound in a particular direction, to enhance sound quality, while noise suppression makes conversations in everyday scenarios possible. A listening device determines the direction of the voice and noise sources. ML classification techniques are used to determine which beams have voice or noise in them.

The DSP then focuses on beams with voice content only for further voice UI processing. For example, in a conferencing system, the device must identify the direction of the sound, and multiple speakers must be tracked at all times in a 360-degree fashion. Noise sources can also be classified for audio event detection like glass break, fire alarms, etc., further expanding the role into audio-aware smart home systems.

Proximity detection is also essential for a dynamic listening and speaking experience.  The device detects user proximity to the microphone and adjusts the microphone’s gain. This feature enables active video conferencing for presentations, workouts, and learning environments. These features are central to the design of advanced video conferencing devices such as Amazon’s new Echo Show that features a screen that rotates as the user moves, so the screen is always facing the user.

  • Latency

Humans can generally tolerate up to 200 ms of end-to-end latency on a call before we begin to talk over each other. Low latency processing in edge processors is, therefore, a critical requirement for ensuring high-quality voice communication.

  • Power consumption

Audio edge processors designed with proprietary architectures, hardware accelerators, and special instruction sets can optimally run audio and ML algorithms. These optimizations help with reducing power consumption in audio-heavy use cases such as video conferencing.

  • Integration

Audio edge processors that open up their architectures and development environments accelerate innovation by providing audio application developers with the tools and support to create new devices and applications. Future audio devices will be a collaborative effort.

  • Security

Edge processing can minimize the need for cloud connectivity and provide many benefits, including increased data security. For example, most consumers are not comfortable with data from personal smart home devices being transferred continuously to the cloud for processing. These fears have proven true with several major breaches by leading device manufacturers in the last years.

Peace of mind can be achieved when personal data is processed for analysis or inference at the device. An excellent example of this is a smart home security device that has been trained to hear certain event sounds, such as glass breaking, that act as a trigger to alert the homeowner. As the processing of the sound and the alert occurs on the edge processor, it does not need to be in continuous connection to the cloud, which enhances the security of the system.

Conclusion

Dedicated audio edge processors will define the next generation of audio and voice-enabled devices creating more context-aware, immersive, and seamless audio communication experiences. Their ability to enable efficient processing for low power and low latency voice communications, noise reduction, context awareness, and accelerated ML inferencing of sensor inputs opens the possibilities to an explosion of new user experiences for the human-machine interfaces.

About the authors:

Raj Senguttuvan, Knowles, director, strategic marketingRaj Senguttuvan is an accomplished innovation and business leader with an electrical engineering background. Raj has over 15 years of experience in new technology development for consumer and industrial applications, early-stage business development, and project management for several semiconductor companies.

In his role as director, strategic marketing for Knowles, Raj directs audio solutions strategy, drives venture investments and partnerships, and marketing for IoT and consumer technologies including audio processors, algorithms, microphones, sensors, and receivers.

Raj holds an MBA from Cornell University and a Ph.D in Electrical Engineering from the Georgia Institute of Technology.

Vikram Shrivastava, Knowles, senior director, IoT marketingVikram Shrivastava has almost 30 years of experience in product marketing, strategy, and management in the semiconductor and technology industry. Vikram’s educational background in electrical engineering, specifically in control systems and silicon design, provides him the ability to understand, execute, and communicate marketing strategy that fits the technical needs that engineers, developers, and OEMs have.

In his current role as senior director, IoT marketing, for Knowles, Vikram develops strategies and products to enable IoT platforms with intelligent voice capability.

Vikram holds an MBA from the University of California, Berkeley’s Haas School of Business.

Advertisement



Learn more about Knowles

Leave a Reply