Sensory’s recent webinar offered a technical deep dive into their TrulySecure Sound & Voice SDK, including on-device voice biometrics, sound recognition, and speech-to-text technology. Watch the highlights here!
Meet the Speakers
- Andi Hagen: Director of Machine Learning at Sensory, with a PhD in computer science and over two decades of experience in speech and sound recognition.
- Jeff Rogers: VP of Sales and Marketing, a 29-year industry veteran whose work has shaped hundreds of products for global brands including Apple, Amazon, Google, and Samsung.
- Christian Kouten: Research Scientist specializing in biometric and speaker verification, with a PhD in software engineering and a background in computer vision and natural language processing.
The Growing Demand for Voice Biometrics
Jeff Rogers opened the session by reflecting on Sensory’s legacy, noting, “Back when we first started Sensory in ’94, I think it was the very next year, ’95 or maybe it was ’96, we actually developed a voice password chip… We’ve been doing biometrics for a lot of years, obviously.” He emphasized the recent surge in demand: “More recently, we’ve seen a huge growth in interest in voice biometrics, as well as our face biometric and computer vision.”
The need for secure, user-specific access is more pressing than ever as devices proliferate and AI assistants become commonplace. Jeff highlighted, “There’s going to be a need for a local biometric that’s running on your own personal device that says, ‘Okay, yeah, this is Jeff, and this is me asking for these types of things,’ rather than having others be able to hack in and do things that you might not want them to do.”
Sensory’s Market Reach and Industry Impact
Sensory’s technology is trusted by over 200 companies and embedded in more than 3 billion products worldwide—a testament to its reliability and scalability. Their solutions are widely used in automotive, medical, and consumer electronics sectors, with a focus on privacy and always-available functionality.
Why On-Device AI Matters
A central theme was the importance of on-device processing. Jeff explained, “It needs to be always on, always available. I can’t have to rely on a network connection, or internet, or any other thing… running completely on-device is really important.” He added, “Complete data privacy and control, that’s also important. When I enroll my voice password, it stays on-device. It’s my voice and my password.”
On-device solutions also reduce operational costs by eliminating the need for constant cloud connectivity: “As far as cost of ownership, it’s super low, because I don’t have to worry about cloud connectivity. Every time you go to the cloud, there’s a cost to doing that. And the more users you have and the more products you have, those costs can grow very quickly.”
The TSSV SDK: Technical Overview
Andi Hagen introduced Sensory’s Truly Secure Sound and Voice (TSSV) SDK, which encompasses four main domains:
- Sound ID: Detection of 16+ sound classes across health (cough, sneeze, snore), safety (glass break, gunshot, fire alarms, sirens), and home (baby crying, dog barking, cat meowing) domains. The system can be extended to new sound classes as needed.
- Speech-to-Text: A newly launched, highly accurate, and low-latency speech recognition system supporting 36 languages, customizable vocabularies, and both batch and streaming modes. Acoustic models range from as small as 5MB up to 140MB, adaptable for various device capabilities.
- Voice Biometrics: Both text-independent and text-dependent speaker verification, with liveness detection to prevent spoofing. Security thresholds are adjustable to fit different application needs.
The SDK is cross-platform (Android, iOS, Mac, Windows, Linux) and supports C++, Python, Java, Swift, C#, and Objective-C, making it easy for developers to integrate.
Notable Features and Statistics
- 3+ Billion Products: Sensory’s technology is embedded in over 3 billion products worldwide.
- 16+ Sound Classes: The Sound ID feature covers a broad range of health, safety, and home sounds, and can be extended to new classes with additional data and training.
- 36 Languages Supported: The speech-to-text engine is multilingual and supports custom vocabulary injection for domain-specific accuracy.
- Flexible Integration: The SDK runs on everything from deeply embedded DSPs (ARM M0, M4, M7) to CPUs, GPUs, and MPUs, ensuring broad hardware compatibility.
Security and Customization
Sensory’s voice biometrics offer both text-dependent and text-independent verification, as well as liveness detection. Andi explained, “You basically get displayed a sequence of digits, and then you need to say those digits. This prevents… an attacker, for example, playing a pre-recorded sample of somebody’s voice. You need to say exactly that sequence that’s displayed, and therefore you have this liveness check.”
Security thresholds can be tuned for different use cases, from consumer convenience to high-security environments.
Looking Ahead
The webinar concluded with a Q&A session, where Sensory’s experts addressed questions about integration, customization, and real-world applications. Their message was clear: Sensory’s on-device AI solutions are designed to deliver privacy, reliability, and flexibility at scale.
In summary: Sensory continues to set the standard for secure, on-device voice and vision AI, empowering businesses to deliver smarter, safer, and more personalized user experiences—without compromise on privacy or performance.
Watch the full webinar recording here.