Expert Roundup: Maximizing Hotword Value with LLMs

A wake word (also known as a hotword or trigger word) like “Hey Siri” and “Alexa” have become staples of everyday technology, making voice interactions feel natural and effortless. But what makes a great hotword? And how do you design a hotword that works seamlessly with large language models (LLMs)?

We turned to the experts at Sensory for insights from their recent webinar, “Maximizing Hotword Value with LLMs.”

Todd Mozer, Founder & CEO of Sensory

“The real value of hotwords is convenience; button presses aren’t required!”

Todd set the stage by showing how wake words (also known as hotwords, speech triggers, wake up phrases, trigger words) simplify interactions. He highlighted a demo from OpenAI’s ChatGPT-4, where it was obvious the lack of a true wake word led to clunky button-pushing and interruptions.

Todd went on to emphasize the pros of on-device wake words, including:

Privacy: By processing wake words locally, your data stays private.
Efficiency: Sending only text to the cloud reduces bandwidth use by 640 times.
Power Conservation: On-device processing uses less energy than always-on cloud listening.

And finally, Todd explained, “Wake words create what I call ‘mini marketing moments,’ reinforcing your brand each time they’re used.”

Ember Van Allen, Senior Director of Speech Technology Development at Sensory

“A hotword is your starting point for a positive user experience.”

The ideal wake word experience should be completely speech-based, natural, and easy to use. Ember broke down three essential tips for effective wakeword design:

Uniqueness: “Three to four syllables is the sweet spot for ease and accuracy,” she shared. Adding a salutation like “Hey” can enhance performance while creating a more natural and intentional user experience.
Acoustic “Secret Sauce”: Sounds like “Z” or “J” stand out, while softer sounds like “H” can blend into background noise.
Cultural Awareness and Localization: Ember warned about mishaps when people don’t consider global markets, like how “Chat GPT” sounds like “cat, I farted” in French.

Her advice? Always test wake words with your target demographic and in the intended use environment.

Grace Hynes, Customer Services Manager at Sensory

“Testing for optimal FA and FR is important, and directly impacts the user experience.”

Continuing on the theme of creating a positive user experience, Grace presented a deep dive into the technical side of hotword performance, emphasizing the importance of reducing error rates. She highlighted the role of on-device models, which minimize:

False Accepts (FA): Triggering on non-target phrases.
False Rejects (FR): Failing to detect the wake word.

Finding the right balance between FA/FR is key, and Sensory uses an “operating point” to fine-tune sensitivity. For effective testing and fine-tuning, Grace recommended using a large dataset replicating the actual domain where the product will be used. Of course, there are tradeoffs depending on your model size. Smaller models are faster and more power-efficient but might sacrifice accuracy in noisy environments, and larger models are more accurate but may add latency.

Grace concluded with one possible solution: “Smaller on-device models are fast and power-efficient, but combining them with larger cloud-based models for revalidation gives you the best of both worlds,” she said.

Jeff Rogers, VP of Sales and Marketing at Sensory

“It’s really slick and easy to use.”

Jeff showcased Sensory’s free VoiceHub tool, which allows developers to create custom wake words in just hours. He also demonstrated advanced use cases, such as:

Speaker Verification and Identification: Tailor interactions based on who is speaking.
Customizable Wake Words: Let users define their own activation phrases.
Integration with LLMs: Combine local speech-to-text with cloud-based AI for seamless performance.

One of Jeff’s favorite demos ties a wake word to ChatGPT, showing how wake words enable truly hands-free, natural interactions. “Sensory can even act as an arbitrator, routing multiple wake words to different systems, such as Alexa, Siri, or in-car solutions,” he explained.

The Future of Hotwords

“It’s getting better all the time” – The Beatles

Wrapping up the session, Todd shared his vision of wake words evolving alongside LLMs and AI:

Dynamic Thresholding: Algorithms that adjust sensitivity based on noise levels or accents.
Cross-Device Consistency: Assistants that follow users across all of their devices, integrating personal preferences (eg. Shoe size, what we like to eat).
Enhanced Context Awareness: Combining voice with other sensors like cameras to create more human-like interactions.

“Wake words are only getting smarter,” Todd said. “They’re going to play a central role in making voice assistants more intuitive, action-oriented, and personal.”

The team at Sensory is paving the way for the next generation of voice technology. Whether you’re a developer or a brand looking to enhance your products, tools like VoiceHub make it easier than ever to create custom wake words and voice solutions.

Want to learn more? Get started with Sensory VoiceHub today, or watch the full webinar for more information on how to take your products’ voice interaction to the next level!