Going Deep Series – Part 2 of 3

Going Deep Banner small

 

 

 

How does Big Data and Privacy fit into the whole Deep Learning Puzzle?

Privacy and Big Data have become big concerns in the world of Deep Learning. However, there is an interesting relationship between the Privacy of personal data and information, Big Data, and Deep Learning. That’s because a lot of the Big Data is personal information used as the data source for Deep Learning. That’s right, to make vision, speech and other systems better, many companies invade users’ personal information and the acquired data is used to train their neural networks. So basically, Deep Learning is neural nets learning from your personal data, stats, and usage information. This is why when you sign a EULA (end user license agreement) you typically give up the rights to your data, whether its usage data, voice data, image data, personal demographic info, or other data supplied through the “free” software or service.

Recently, it was brought to consumers’ attention that some TVs and even children’s toys were listening in on consumers, and/or sharing and storing that information to the cloud. A few editors called me to get my input and I explained that there are a few possible reasons for devices to do this kind of “spying” and none of which are the least bit nefarious: The two most common reasons are 1) The speech recognition technology being used needs the voice data to train better models, so it gets sent to the cloud to be stored and used for Deep Learning and/or 2) The speech recognition needs to process the voice data in the cloud because it is unable to do so on the device. (Sensory will change this second point with our upcoming TrulyNatural release!)

The first reason is exactly what I’ve been blogging about when we say Deep Learning. More data is better! The more data that gets collected, the better the Deep Learning can be. The benefits can be applied across all users, and as long as the data is well protected and not released, then it only has beneficial consequences.

Therein lies the challenge: “as long as the data is well protected and not released…” If banks, billion dollar companies and governments can’t protect personal data in the cloud, then who can, and why should people ever assume their data is safe, especially from systems where there is no EULA is place and data is being collected without consent (which happens all the time BTW)?

Having devices listen in on people and share their voice data with the cloud for Deep Learning or speech recognition processing is an invasion of privacy. If we could just keep all of the deep neural net and recognition processing on device, then there would be no need to risk the security of peoples’ personal data by sharing and storing it on the cloud… and its with this philosophy that Sensory pioneered an entirely different, “embedded” approach to deep neural net based speech recognition which we will soon be bringing to market. Sensory actually uses Deep Learning approaches to train our nets with data collected from EULA consenting and often paid subjects. We then take the recognizer built from that research and run it on our OEM customers’ devices and because of that, never have to collect personal data; so, the consumers who buy products from Sensory’s OEM customers can rest assured that Sensory is never putting their personal data at risk!

In my next blog, I’ll address the question about how accurate Sensory can be using deep nets on device without continuing data collection in the cloud. There are actually a lot of advantages for running on device beyond privacy, and it can include not only response time but accuracy as well!