Remote work. Video calls with family. Taking an important phone call in a busy café. Today, more than ever, we are looking for ways to block out environmental background noise, keep focus and minimize distractions. For example, many of us have been on a work or important personal call when our dog barks, interrupting the flow of the discussion.
Traditional noise cancellation, such as what you may find in higher-end headphones, analyzes incoming sound waves and inserts negative opposite sound waves to “cancel” out the incoming signal. While this can be effective in some circumstances, it is not selective about what it cancels out. Specifically, it cannot differentiate between speech and background noise, and when enabled, the volume of everything is reduced indiscriminately.
Leveraging the power of AI, you can greatly improve speech clarity and minimize listener fatigue by separating speech and noise and then outputting only the speech portion. This is called AI noise suppression (AINS).
AINS technology is specifically crafted to provide noise-suppression capabilities in devices such as handsets, headsets, conference systems, automotive telematics, wireless microphones and audiovisual systems. It can be implemented across a wide array of hardware platforms, from standard desktop CPU/GPU systems to edge systems, such as audio headsets. Additionally, it can be seamlessly integrated into most existing audio-processing paths without introducing additional latency.
This has started to gain popularity with platforms where voice communication is essential, including videoconferencing and gaming chatrooms. Even the mobile phone industry has jumped on board. In short, many people prefer clean, clear audio when communicating via voice, and tech platforms are taking notice.
As we continue to increase the use of digital communication platforms in our everyday lives, noise-suppression technology has gone from a luxury feature to a critical one.
How AI noise suppression technology works
AI-powered technologies represent a leap in performance because instead of an engineered solution, a neural network learns the optimal solution from example data. Being trained on a diverse range of speech and noise signals also ensures effectiveness across many environments.
In addition, audio-processing technology works best when operating at peak performance, with as minimal a delay as possible. When not using Bluetooth or connecting to cloud servers or desktop PCs, all processing within this setting is performed locally. Consequently, tests have shown an average latency of less than 10 ms. Another benefit to local processing is 100% privacy.
Improved audio experiences for entertainment
Beyond implementation in commonly used communication platforms and mobile phones, AINS technology can also be integrated into non-traditional applications, such as entertainment systems.
How many times have you watched a movie and there’s an explosion, crash or other loud noise that hurts your ears? You turn the volume down, but then when the dialogue picks back up, you can’t hear anything. It’s a frustrating experience, especially in our advanced technological world. Why haven’t we fixed this yet?
AINS can be integrated into soundbars, speakers or almost any device that outputs sound. We have a test version in our office that ensures unpleasantly loud noises, such as explosions or breaking glass, have low volume, while dialogue and other speech sounds are kept at a normal level. This creates a much more pleasant user experience and demonstrates that the technology has applications beyond phone or video calls.
Customization
Sometimes, context matters. A cough during a conference call with hundreds of colleagues is a disruption, but that same cough may be important on a telehealth call with your doctor. AINS models can be trained and customized to exclude or allow certain noises, making sure noise cancellation behaves how the user wants.
What’s next
AINS represents a groundbreaking solution that caters to the requirements of a wide range of industries. Its capacity to provide high-quality noise suppression while simultaneously minimizing CPU load and latency in flexible use cases is a valuable asset for professionals engaged in audio/video production and playback. The software’s versatility and ease of use are poised to transform our everyday audio experiences, and we’re just getting started in exploring its many applications.
About the author
Alexander Kain joined Skyworks in 2020, as the lead of the artificial intelligence team. Before that, he was a professor of biomedical computer science at the Oregon Health & Science University for 13 years. He has over 25 years of experience in machine learning, with over 60 peer-reviewed publications. He was principal investigator on eight grants from the National Institute of Health and the National Science Foundation and participated as key personnel in many other grants.
Learn more about Skyworks Solutions