Written by: Audiotelligence
Why AI should team up with blind source separation
The news that Microsoft is using artificial intelligence (AI) to improve the sound quality of meetings using its Teams communication and collaboration tool comes as no surprise. AI has been used in VoIP conferencing systems like Webex and Zoom for some time.
It’s one of the reasons I’m often asked why AudioTelligence’s technology is better than AI when it comes to noise reduction. But that is the wrong question. The two technologies address different problems and – more importantly – are completely cross-compatible with each other.
Our blind source separation (BSS) technology is the grown-up successor to beamforming. Beamformers are a form of spatial filter that use a microphone array to focus in a particular direction. Traditional beamformers need to know all sorts of information about the acoustic scene – such as the target source direction and the microphone geometry – and the more sophisticated ones need precisely calibrated microphones.
AudioTelligence’s BSS works its magic by learning from the data. For each acoustic source in the scene, it learns a spatial filter that focuses on the region containing the source and optimally eliminates all the other sources in the scene. This means we get excellent interference rejection – without knowing anything about the positions of the sources, the microphone geometry or the calibration of the microphones.
It also means we don’t need any training to learn the array characteristics for any deployment. Our BSS still picks up any ambient noise coming from the same region as the target source. But interference signals are rejected and ambient noise is reduced.
This is where AI noise reduction comes in. It is simply the latest incarnation of a long line of noise reduction technologies going all the way back to simple spectral subtraction. It uses a completely different principle from a spatial filter – it analyses the signal in the time-frequency domain and tries to identify which components are due to signal and which components are due to noise.
The advantage of this approach is that it can work with just a single microphone. But the big problem with this technique is that it extracts the signal by dynamically gating the time-frequency content – and this gating can lead to unpleasant artefacts in poor signal-to-noise ratios.
We’ve all heard mangled VoIP calls where the other person sounds like they’re underwater – that’s the gating eating into the voice. We simply don’t get these sorts of artefacts when using spatial filters.
Now for the big secret… the two technologies work really well together. Put a microphone array in front of your VoIP call and BSS will give you a signal with the interference rejected and the ambient noise reduced. It has significantly improved the signal-to-noise ratio. Now those AI noise reduction technologies will find it much easier to identify the residual ambient noise and get rid of it without all those unpleasant artefacts. Sounds like a winning combination to me.