Amber Technology Blog

Exploring Yamaha Human Voice Activity Detection (HVAD)

Written by Karin Cahill | 20/04/22 04:00

Create better conference calls with Human Voice Activity Detection

Conference calls are now a part of everyday working life, but meeting rooms often don’t have proper acoustic treatment (like Primacoustic) or high quality AV equipment, which can make conference calls difficult to execute perfectly. General background noises, softly spoken participants, and indiscernible speech are just a few factors that go can interfere with conference call effectiveness when talking with clients or colleagues. One way to address these issues and enhance conference calls is through voice activity detection.

Voice Activity Detection Explained

Voice activity detection technology detects human speech even when background noise is present. Products such as the YVC-1000, YVC-330, and YVC-200 use voice activity detection to significantly enhance accuracy whenever they pick up audio signals from their microphones. It can also help to save network bandwidth and computation since it prevents unecessary coding of silence in Voice over Internet Protocol (VoIP) applications.

Voice activity detection plays an important role alongside three signal-processing capabilities:

  • Noise reduction
  • Automatic tracking
  • Automatic gain control

Voice Activity Detection and Noise Reduction

The noise reduction sound-processing function detects steady background noises (like air conditioning units) and minimises or eliminates those noises from the sound pickup signals. Conventional commercial systems can reduce constant noises, but only when there are no voices. These systems might recognise steady human voices (like a a drawn-out umm) as misidentified noise components and eliminate them. 

The YVC-1000 leverages Human Voice Activity Detection (HVAD) to generate a much better noise reduction and signal-to-noise ratio than commercial systems. Yamaha’s HVAD can filter steady noises not only from the background but also through the speech bandwidth range.

Voice Activity Detection and Automatic Tracking

Automatic tracking is a sound-processing capability that detects a speaker’s location within a room and directs towards that voice, which is a highly productive solution in noisy conference rooms. The YVC-1000 picks up the audio source location using the in-built microphone’s array control function, which features three microphone elements. The HVAD technology embedded in the YVC-1000 dramatically enhances the accuracy of detecting the speaker’s location. When the sound sources are detected, the technology can tell whether they’re human voices or not. HVAD uses those results to decipher the areas of any steady noises or isolated sounds to minimise inaccurate identifications. So even if there’s a fan blowing or papers shuffling, the microphones will not lose focus of the speaker’s location.

Voice Activity Detection and Automatic Gain Control

The automatic gain control function automatically adjusts and normalises the level of a speaker’s voice. It adjusts for people who speak softly, talk too loudly, or are further away from the microphone. Conventional commercial systems can’t distinguish noises and low voices well, making it difficult to raise the volume of low voices. YVC-100 uses HVAD to increase the accuracy of voice determinations and decipher between human voices and steady noises. Automatic gain control can be stabilised and the voice output to the far end is a consistent level for participants.