Written by: Audiotelligence
5 ways to help people hear better in noisy environments
There’s a great deal of technology in the market already to help people hear better in noisy environments: beamforming, digital noise reduction, AI. What measures can we use to not only compare those audio processing techniques, but also to improve their performance and solve the problem of hearing in noise?
1. Improve the intelligibility of the speech, not its volume
Before we look at solutions, let’s define the problem of hearing in noise, often called the ‘Cocktail Party Problem’. What exactly is it? When our hearing is healthy, our brains are really good at extracting single sound sources out of a complex mix. If we are at a party, we are able to follow a conversation even though there are a lot of people just babbling around us. This is usually not a problem for young people, who are able to hear the speech they want to listen to, even in a complex mixture of sound: background noise, competing speech signals, music and so on. But as we age, this ability declines becoming the first sign of age-related hearing loss and it’s made worse if we also start suffering from hearing loss. And that’s what we call the “Cocktail Party Problem” – the inability to hear speech clearly, and understand it, in noisy environments.
Therefore solving the “Cocktail Party Problem” is not simply a matter of increasing the volume of the speech. We have to increase its intelligibility, which shows how well a hearer can understand it. To be able to increase intelligibility, we first have to know how to measure it.
2. Map the Speech Reception Threshold instead of measuring sound pressure levels
Which brings us to the second thing – measuring intelligibility. One common measure often used is the Speech Reception Threshold. This is defined as the sound pressure level necessary to be able to understand 50% of the words in a sentence. Just think about that for a moment – 50% of the words in a sentence – which means that the hearer cannot understand half of any conversation. After that, they are just totally lost in the noise.
So when talking about this measure, we don’t usually talk directly about sound pressure level, because that isn’t very helpful. What we really want to measure is the Speech-to-Noise Ratio because we modify the volume of our speech depending on the noise around us. We tend to raise our voice or even shout when there’s a lot of noise or talk a bit more quietly when it’s very quiet around us. It’s more useful to map the Speech Reception Threshold and to look at it as the speech-to-noise threshold instead of measuring sound pressure levels.
Looking at the Speech-to-Noise Threshold gives us the opportunity to compare different technologies and see how AudioTelligence’s technology is improving intelligibility at different SNR (Speech-to-Noise Ratios).
3. Increase the SNR levels
The third thing to realise is that studies have shown that the Speech Reception Threshold for hearing impaired individuals is about 1.6 dBs – that’s the level of the Speech-to-Noise Ratio necessary so that we are able to understand 50% of the words in a sentence. If you go to a restaurant or a busy café, then we have Speech-to-Noise Ratios of about -3dBs which means that actually your speech is lower than the noise. It then becomes very challenging for most people to hear in these situations, especially if they’re already experiencing hearing loss, suffer from the Cocktail Party Problem or losing their hearing due to age.
4. Challenge the effectiveness of current hearing solutions
In order to be effective and truly enhance the user’s experience, any hearing intervention needs to improve the Speech-to-Noise Ratio. So now we come to the fourth point: how effective are currently available solutions?
Digital noise reduction
Many studies have shown that noise reduction techniques make minimal, if any, improvements to intelligibility and often make it worse. They tend to eliminate the parts of the speech that were masked by the noise along with the noise. So while the noise is reduced, no extra signal is revealed to make the understanding easier.
There is strong evidence that good noise reduction techniques can help with listening fatigue, so if you could understand the noisy speech, your brain has to make less effort to achieve the same level of understanding in the denoised version.
Audiologists tend to recommend assistive listening devices for those who suffer from the ‘Cocktail Party Problem’. But although some of these devices may improve the Speech-to-Noise Ratio slightly, they don’t solve the problem for hearing-impaired individuals. Beamforming (which is the technique that’s used in most assistive listening devices) tries to focus on one direction using spatial filters and multi-microphones, choosing the direction of the desired sound source. The main problem with beamforming is that it’s selecting all of the sounds from this direction. If you are listening to one person but there’s a noisy source behind them, a beamforming device is going to amplify the voice as well as the noise behind, which is what we want to remove. Beamformers pick up a significant amount of sound from other directions too – the amount they can pull up the signal relative to the background noise is limited.
We also hear a great deal about AI based solutions now. They do a good job of trying to strike the sound source, but they still struggle with interfering speech. And they tend to add a lot of artefacts to the signal they produce.
AI often uses Deep Neural Networks (DNNs) to separate speech from noise. There are two issues with DNNs. First, that of latency: using a DNN to process all the sound risks introducing large compute latency. Large latency causes lip sync issues which are not acceptable for real life use cases. Secondly, although a DNN can be trained to separate speech from noise, it is not effective in separating one voice from another.
5. Discover AudioTelligence’s aiso™ for Hearing – the solution to the “Cocktail Party Problem”
And finally – the fifth thing to understand is how AudioTelligence’s solution – aiso™ for Hearing – that combines Blind Source Separation and Low Latency Noise Suppression, can solve the “Cocktail Party Problem”.
Blind Source Separation (BSS) tries to imitate what the brain is actually doing: when you are in a busy environment, with many people talking at the same time, the brain will extract the different sound sources and then focus on the one of interest. And that’s exactly what BSS tries to do. It separates the mix of sounds in the acoustic scene into different channels. Each channel will contain one source separated from the others so the user can then decide which one they want to listen to, without the interference of the other speakers.
To prove aiso™ for Hearing works we took a normal array of six microphones and we measured the speech intelligibility using an objective measure called the short time intelligibility test (STOI). Separately, we conducted recordings of clean speech and background noise: real babble noise in a busy cafe. Then we mixed the signals at different SNRs so that we could evaluate the intelligibility: we used the STOI measure technique, which as said is an objective measure of intelligibility, and correlates very well to intelligibility tests done with real people. This gave us a really good idea of how our technology works compared to not adding any processing.
aiso™ for Hearing can improve the SRT by 16 dB, and at an SNR of minus -5 dB, a typical hearing aid would improve intelligibility by around 50%. One of the existing assistive listening devices would would improve it by around 80%, but aiso™ for Hearing can actually improve speech intelligibility by up to 98%. All of these figures are results from testing the devices not in a lab, but in a real world situation.
Helping people hear better in noise is a complicated issue which is about the intelligibility of speech, not its volume. But not only is measuring speech intelligibility complex, improving it significantly is a hard problem. Commonly used audio processing techniques don’t provide a solution; but a combination of BSS and low latency noise suppression can do the job.