Machine learning model could improve human speech recognition

0

&Cartridge; physics 15, 38

A tool that predicts how many words per sentence a listener understands could one day allow companies to make custom hearing aids with enhanced features.

Yuli/stock.adobe.com

A model that predicts how well a hearing-impaired person understands speech in different acoustic environments could be used to develop the next generation of speech enhancement algorithms for hearing aids.

In 2019, hearing aids brought the gift of hearing to 7.1% of the US population ages 45 and older. But these hearing aids are far from perfect. Researchers think they can improve these devices by integrating them with speech processing models that predict how people with varying degrees of hearing loss distinguish words in noisy environments. In a move that could enable more individualized hearing restoration, Jana Roßbach, Bernd Meyer and their colleagues from the Carl von Ossietzky University of Oldenburg in Germany have now developed a machine learning model that can correctly predict speech intelligibility for a wide range of listening conditions [1]. They say a future version of their model could be integrated into hearing aids to improve speech intelligibility for the hearing-impaired.

Modern hearing aids convert incoming sound waves into numerical codes and then send amplified versions of those waves through a speaker into the ear. The codes contain information about the frequencies of the waves and their amplitudes. But hearing is more complex than simply detecting sound waves.

The ability to distinguish phonemes—the units of sound that make up words—is a key component of hearing. This ability is often limited in hearing-impaired people. Hearing aids help mitigate this loss by using signal processing algorithms to improve speech recognition. However, the development and validation of these algorithms typically requires time-consuming listening experiments that test the capabilities of the algorithms under myriad acoustic conditions.

To solve this problem, Roßbach, Meyer and colleagues developed a machine learning model that determines the acoustic conditions a listener is experiencing and then estimates how well that listener can identify words in that environment. To make this estimation, the model uses an automated speech recognition system based on machine learning.

The researchers trained and tested their model with recordings of sentences that were degraded to mimic how people with different types of hearing impairments perceive speech in different noisy environments. The team then played these recordings to normal-hearing and hearing-impaired listeners. They asked participants to write down the words they heard for each track. From these responses, the team determined the noise threshold (in decibels) that resulted in a 50% word error rate for each listener and for each environment, finding good agreement with the model predictions.

Roßbach, Meyer and the rest of their team hope that a future version of their model could end up in hearing aids. But before that can happen, they need to fix some issues with the current version. One of those problems is that the model “needs information about what’s actually being spoken,” says Meyer. But this information does not exist in real situations. The team is working to fix this and other issues with the goal of creating a machine learning model that can maximize speech intelligibility for all hearing-impaired individuals, Meyer says.

Torsten Dau, a researcher in hearing technology at the Technical University of Denmark, says Roßbach’s model is an important step towards a “non-intrusive” method of improving speech recognition for the hearing-impaired. He notes that the model “performs very well” under the acoustic conditions used by the team. “It will be exciting to see how this approach generalizes [other] acoustic conditions,” he says.

– Rachel Berkowitz

Rachel Berkowitz is Corresponding Editor for physics based in Vancouver, Canada.

references

  1. J. Rossbach et al.“A Deep Learning-based Speech Recognition Model for the Hearing Impaired”, J.Acoust. society Am. 1511417 (2022).

subject areas

Recent Articles

Optical sensor for a pancreas-on-a-chip
Acoustic crystals with a Möbius twist
Waves in a solid imitate twisted light
Nonlinear Dynamics

Waves in a solid imitate twisted light

Vibration waves traveling through the walls of a pipe can carry orbital angular momentum, which new theoretical work suggests could be used for various purposes. Continue reading “

More articles

Share.

Comments are closed.