Webinar Recap: Advanced Siren Detection – A Technical Deep Dive

Sensory recently hosted a webinar that dove deep into the technology behind advanced siren detection systems and Sensory’s Emergency Vehicle Detection technology. Led by Sensory’s Andi Hagen, Director of Machine Learning, and Jeff Rogers, VP of Sales and Marketing, the webinar offered a technical, yet accessible, look at this technology.

Missed the live webinar? No problem! We’re recapping the key takeaways below. You can also watch the full webinar or download the slides here!

Deep Dive into EVD Systems

Andi kicked things off by highlighting the two pillars of an effective EVD system: technology and data. He stressed the importance of a diverse dataset to account for variations in siren sounds across different regions, environments, and distances. Sensory’s EVD solution is built on a foundation of extensive self-collected and web-scraped data.

Andi then gave an overview of Sensory’s unique two-tiered system: an optional DSP solution and an AP-level solution. The DSP component acts as a pre-filter, significantly reducing the audio workload on the head unit by filtering out approximately 99% of audio when no siren is detected.

The AP-level solution then takes a closer look, employing a two-stage process: a statistical first stage for quickly analyzing audio, followed by a deep net revalidation model for final verification of a siren event. Andi explained that the key to evaluating EVD accuracy lies in two critical metrics:

False Reject Rate (FRR): The rate at which the system misses genuine siren events.
False Alarm Rate (FAR): The rate at which the system incorrectly identifies a siren when there isn’t one.

The goal is to find the sweet spot on the ROC curve, balancing these two metrics. Sensory is targeting an impressive one false alarm in 24 hours of driving time – which, as Andi pointed out, translates to potentially weeks of real-world driving without a single false alert.

Andi then presented performance data for Sensory’s models, comparing them to open-source alternatives like YamNet and PaSST. He emphasized that Sensory’s advantage comes from its laser-focus on siren detection and the sheer volume of its self-collected dataset. Further, Andi reminds us that real-world implementation goes beyond just raw accuracy, highlighting the importance of low latency, precise timing, and seamless deployment across various automotive chipsets.

The Sensory Advantage

So, what makes Sensory’s EVD models stand out?

Speed: Reacts to sirens within a few hundred milliseconds.
Optimized for footprint & latency: Our tech maintains a sliding acoustic history of 1.5 seconds ensuring quick processing without sacrificing accuracy.
Platform versatility: Seamlessly integrates with a wide range of DSP and AP-level platforms, supporting various programming languages for easy development.
Accuracy: Our EVD models achieve an incredibly low FRR of just 1.6% on a 1.5-second window, ensuring that real siren events are detected with high reliability.
Real-world robustness: Comprehensive noise training across a vast library of sounds– road noise, engine hum, chatting passengers, and even music – ensures consistent performance in any driving environment.
Cost-Effective Adaptability: Works seamlessly with in-cabin microphones, leveraging existing hardware to reduce cost and simplify integration.

Q&A

The Q&A Session offered valuable insights into key considerations for EVD implementation. Here are some of the highlights:

Q: What is the biggest advantage of having multiple stages in your EVD system?

A: The primary advantage lies in efficiency. The DSP filters out most of the sound, reducing the workload for the AP-level solution. At the AP level, a statistical first stage further reduces the workload for deep net revalidation, which is more computationally costly. Without these stages, the neural network would have to be constantly engaged, leading to higher costs. The statistical first stage filters out obvious non-siren sounds at a cheap cost, allowing us to focus on the tougher cases.

Q: Have you tested the EVD system at negative SNR (Signal-to-Noise Ratio)?

A: Yes, we have tested at negative SNR levels. Performance degrades slightly in terms of the false reject rate as noise becomes stronger. However, our models are trained to handle these scenarios, with noise sometimes overshadowing the siren.

Q: Do you always recommend the embedded (DSP) stage?

A: No, it’s not always necessary. The embedded stage is beneficial if you are sensitive to power consumption on the head unit. It reduces MIPS on the head unit by filtering out most non-siren sounds. However, if MIPS isn’t a concern, you can run without the DSP.

Q: What is the nature of the noise that you train the EVD system under?

A: We train under a wide variety of noises, including typical road noise, engine noise, human speech, and music. For music, we account for situations where music is playing from a smartphone or other sources.

Q: What is the goal in terms of accuracy that you are working towards?

A: Our goal is human-like performance. The system should detect a siren when an excellent hearing human can hear it.

Q: What are the advantages of in-cabin microphones?

A: The primary advantage is cost. Modern cars already have internal microphones that can be used for siren detection without adding additional hardware.

Get Started with Sensory EVD

Sensory’s EVD technology is actively being designed into OEMs today. Want to learn how Sensory’s EVD can give your vehicles a crucial safety advantage? Download our one-page overview, or get in touch with an expert today to discuss your specific needs and requirements.