Wake Word Processing: In the Cloud or On-Device?

 

Sensory’s History with Wake Words
Sensory has always had a forte in wake words (also known as hotwords, keywords, voice triggers, or activation phrases). We first developed what we called “voice triggers” back in the early 2000s as a way for Hallmark to introduce stories with plush pets that would interact as you spoke certain words (see Interactive Story Buddies).

Sensory was able to get the accuracy up and the power consumption down and introduced wake words to mobile phone and tablet vendors…Hey Siri to Apple, Hey Cortana to Microsoft, and OK Google to Google. Sensory made the first “Hi Galaxy” for Samsung’s phones, and Sensory was the “voice” in MotoVoice and the hands-free MotoX. We even helped Amazon with low-power wakewords for their first hands-free Alexa tablets.

Many big companies were able to get by with good (not great) on-device performance by having the audio go to the cloud for a secondary review. This happens for example when your Echo lights up, but no response is given. The Echo thinks it heard the wake up word but upon reviewing what you said in the cloud, it decides you did not intend to talk to Alexa. This process compromises security and privacy by sending your data off at random times. One of the bad offenders is my Android phone which seems to start listening whenever I talk about Google because it thinks I said “Hey Google” but Google does a reasonable job in the cloud of “revalidating” what was intended. For many cloud implementations the revalidation includes listening to personally identifiable speech before and after the perceived wake word, once again at the cost of privacy!

The choice between on-device and cloud-based hotwords (often referred to as keywords, wake words, trigger words, or voice commands) processing involves trade-offs in terms of privacy, accuracy, and reliability. Let’s explore each of these.

Privacy

When it comes to keeping your voice data secure, on-device processing is the Fort Knox of privacy. Your voice data stays local, locked down on your device. It’s a win for data protection regulations and privacy-conscious customers.

Cloud systems attempt to balance convenience with caution by implementing advanced encryption and anonymization techniques. However, the transmission of voice data to external servers inherently increases the attack surface.

Sensory’s containerized solution, or the option to connect with a large language model (LLM) in the cloud, allows us to run wake words and STT locally, sending only text to the LLM. This provides increased privacy, as no voice data leaves the device, and reduces costs since less data is sent to the cloud.

Accuracy

Once upon a time, cloud-based systems were the undisputed champions of accuracy. With their massive computational muscle and ever-evolving algorithms, they could handle tricky accents and noisy environments like a pro.

Now, on-device solutions like Sensory’s embedded wake word tech are outperforming big names (like Amazon). Unlike some platforms that may inadvertently collect voice data due to false activations, Sensory’s on-device technology maintains high accuracy while keeping user privacy at the forefront. Download the full performance report here.

Reliability

Cloud-based systems might be smart, but if you find yourself with no internet? No hotword. This vulnerability to network disruptions can of course be problematic in critical applications.

On-device wake word systems, on the other hand, are known for their reliability and functioning consistently regardless of internet connectivity. This makes them ideal for use in areas with poor or no network coverage, ensuring uninterrupted voice activation capabilities. And, embedded systems mean lower latency and quicker responses with zero cloud communication delay.

In the Cloud or On-Device?

The choice between on-device and cloud-based wake word processing involves careful consideration of these key factors. Businesses exploring wakeword or voice activation technologies must weigh the importance of privacy, accuracy, and reliability based on their use case and user expectations.

As voice technology continues to reshape our world, one thing’s for sure: the debate between cloud and on-device processing is far from over. If you would like to learn more about Sensory’s on-device hotword technology, chat with our team of experts by visiting https://www.sensory.com/contact/.