Do Voice Agents, Voice Assistants and LLMs Need Wakewords?

Large language models (LLMs) have become an integral part of our digital landscape, powering everything from action-oriented Voice Agents to automotive Voice Assistants. Their ability to process natural language and generate human-like responses has made them indispensable. However, as these models become more advanced and ubiquitous, the need for responsible and controlled interaction has grown. One critical feature that should be implemented in conversational AI systems is the use of a wakeword—a specific phrase or word that must be spoken or typed before the model begins processing user input.

When OpenAI rolled out ChatGPT 4o, they made an impressive video (shortened version below) showing the powerful new features. They called it by name saying “Hey ChatGPT” but they used Fakeword, not Wakewords. They were using button presses to turn on and off listening modes to prevent frequent interruptions. OpenAI is a smart company but I’m surprised they still haven’t introduced a hotword-based approach, as this approach introduces a host of advantages and features that many might not understand, and by its nature, it can’t effectively be built into the LLM directly!

Let’s take a look at these advantages and some of the hidden features that Sensory offers:

General Wake Word Advantages

Enhancing Privacy and Security. One of the biggest concerns with LLMs is privacy. Without a hotword, these models might be continuously listening or processing input, raising significant security and data privacy issues. A wakeword ensures that the model only activates when explicitly prompted, reducing the risk of unintended data collection or surveillance concerns.
Preventing Unintentional Activations. Without a wakeword, users may accidentally trigger the model when conversing with others. We saw this happen in the OpenAI video when they forgot to turn it off and ChatGPT interrupted them.
Improving User Control and Transparency. A wakeword enhances user control by providing a deliberate mechanism to start an AI interaction. This helps users understand when the model is active and when it is not, improving transparency in digital conversations.
Managing Computational Resources Efficiently. Running an LLM continuously without a trigger mechanism can be resource-intensive, consuming unnecessary computing power and increasing costs. Sensory’s wake words can run at ultra-low power (under 1 mA!), with minimal memory (under 100KB), yet maintain an extremely high accuracy.

Sensory’s Hidden Features

Sometimes on, sometimes off. Wakewords don’t have to be always on. For example, after responding to a query a voice assistant can listen for follow-on questions without a wakeword for a preset time window. Conversely, a wearable might want the wakeword to be active for a very short time AFTER an action to improve ease of use without button pressing, yet highly conservative of batteries.
Variable Sensitivity. Wakewords can be more or less responsive, trading off the risk of false activations and false rejections depending on the product, the user, or even the environment.
Constrained listening windows. Certain actions can trigger the opening of a wake word set for just a short listening window. For example, if a phone call comes in, the device can listen for a variety of terms that mean answer the call or reject the call.
Multi-hotword solutions. Sensory has many customers who want to combine their own branded wakeword with Alexa or Siri! Sensory can do this!
Fast, easy, and free development! The VoiceHub tool allows users to type in wake phrases, select a language, model size and platform, and build a hotword or set of activation words using synthetic data. It usually takes a few hours of processing but yields reasonable results quickly. Custom models with custom data collections and tuning can also be built as a Sensory service offering.
Accuracy, accuracy, accuracy! A poorly performing wakeword is as bad as no wakeword at all. Many of Sensory’s customers have tried open-source and low-cost solutions and found that the implementation cost and quality just weren’t acceptable.

Conclusion

As LLMs continue to evolve, implementing a wakeword is a simple yet powerful way to enhance privacy, security, user control, and efficiency. It establishes a clear boundary between when the AI is active and when it is not, fostering responsible AI interactions. Whether for enterprise applications or personal AI assistants, the use of a wakeword should be considered a fundamental best practice for deploying conversational AI safely and effectively.