A week after a report in The Guardian revealed that humans in Apple’s Siri “grading” program were hearing private and illegal activity, Apple has suspended the program to conduct a review. It’s also working on a software update to give users the ability to opt-out (or maybe opt-in).
Apple issued a simple statement: “We are committed to delivering a great Siri experience while protecting user privacy. While we conduct a thorough review, we are suspending Siri grading globally. Additionally, as part of a future software update, users will have the ability to choose to participate in grading.”
That’s the right thing to do, but it makes me wonder what the path forward is supposed to be. Because, while most people don’t realize it, machine learning (ML) and AI is built on a foundation of human “grading” and there’s no good alternative in sight. And with Siri frequently criticized for being a year or two behind its rivals, it’s not going to be easy for Apple to catch up while protecting our privacy.
Everybody does it
What’s this Siri grading program all about? Basically, every time you say “Hey Siri...” the command you utter gets processed on your device but also semi-anonymized and sent up to the cloud. Some small percentage of these are used to help train the neural network that allows Siri (and Apple’s Dictation feature) to accurately understand what you’re saying. Somebody, somewhere in the world, is listening to some of the “Hey Siri” commands and making a note of whether Siri understood the person correctly or not.
Then the machine-learning network is adjusted, and re-adjusted, and re-adjusted, through millions of permutations. The changes are automatically tested against these “graded” samples until a new ML algorithm produces more accurate results. Then that neural network becomes the new baseline, and the process repeats.
There’s just no way to train ML algorithms—for speech recognition or photos recognition or determining whether your security camera saw a person or a car—without a human training it in this way. If there was a computer algorithm that could always accurately determine whether the AI was right or wrong, it would be the AI algorithm!
Apple, Google, Amazon, Microsoft, and anyone else producing AI assistants using machine-learning algorithms to recognize speech or detect objects in photos or video or almost anything else are doing this. They’re listening in on your assistant queries, they’re looking at your photos, they’re watching your security cameras.
(In fact, Google has also just suspended reviews of its language recordings after a German investigation revealed that contractors leaked confidential info to the press. Oops.)
You can certainly train ML algorithms using a bunch of commercially-purchased and licensed photos, videos, and voice samples. And many companies do, but that will only get you so far. To really make your AI reliable, it needs the same quality photos, videos, and recordings that are taken on your company’s devices. It needs messy, accent-ridden speech from six feet away on your phone’s mic with wind noise and a lawn mower in the background.
Human training of AI isn’t some rare event, it’s common practice. Tesla’s self-driving capacities are being built with human beings training a neural network by looking at the camera data from its customers’ cars and marking signs, lanes, other cars, bikes, pedestrians, and so on. You just can’t train a high-quality machine learning algorithm without humans reviewing the data.
Anonymous, but not entirely
Because it’s simply not possible to train a high-quality AI algorithm meant to be used by millions of people without human review, most companies at least attempt to make it semi-anonymous. Before any human hears a recording, it is stripped of any data that could be used to identify a precise user. At least, that’s what the companies tell us they do.
But a certain amount of data beyond the actual voice recording or photo/video is usually needed, so it can’t be completely anonymous.
For example, if I say, “Hey Siri, what time does the UPS Store on Greenback Lane close?” and Siri thinks I said “What time does the UPS Store on Glenn Brook Lane close?” I’m going to get a bad result. There is no Glenn Brook Lane near me, and certainly no UPS Store there. But there’s no way for an automated system to know that its transcription was wrong, because that’s certainly a thing a person could say.
So a human being has to review these things, and they need to know roughly where I was when I made the request. These human “graders” aren’t going to know that Glenn Brook Lane is wrong without enough location data to know that there’s no Glenn Brook Lane near me, right?
Similarly, a person reviewing Ring video footage to differentiate moving cars versus peoplevmay need to know if they’re looking at footage from an outdoor camera (which sees lots of cars) or an indoor camera (which should only see cars through windows).
Full disclosure is key
It’s hard to know exactly how consumers would react to the way their data can be used to train AI algorithms if they knew exactly how it works and exactly what was being done to protect their privacy. I have a feeling that most would be okay with it (if people were all that concerned about personal info and privacy, Facebook wouldn’t be used by 1.2 billion people).
But they don’t know, and none of the companies involved seem interested in explaining it. Short statements to the tech press are not the same thing as informing your hundreds of millions of users. Hiding permissive statements 4,000 words deep into your dense Terms of Service agreement doesn’t count. This lack of disclosure is a key failure.
One of the biggest problems is the fact that virtual assistants often record things they’re not supposed to. Siri, Alexa, and the Google Assistant are basically always recording. They listen to a few seconds at a time in a constantly looping on-device buffer, sending no information anywhere until they hear the wake-up phrase: Hey Siri, Alexa, or OK Google / Hey Google. Only after that do they activate the network connection and send your data to the cloud.
As we all know, sometimes these wake phrases don’t work, and sometimes they are triggered even when nobody said them. Those false triggers are what end up causing the human “graders” to hear to snippets of private conversations, drug deals, sexual activity, and so on.
Again, there’s no simple solution. These assistants aren’t going to get perfect at hearing their wake-up phrases unless human beings actually tell them when they got it wrong.
Doing the work ourselves
That doesn’t necessarily mean that we have to pass our data along to others. We could do the training and grading ourselves. Apple could change the iPhone so that every single time Siri is summoned, we are presented with simple “correct” or “incorrect” buttons. If the user marks one incorrect, perhaps they could offer more info—the correct phrase, or the way that the answer they were given was not what was expected.
Smart speakers could be given keyphrases that allow us to do the same thing with our voice, perhaps using a linked phone to make corrections.
Then the adjusted algorithm—but none of our personal data—could be sent back to the parent company to be combined with everyone else’s and incorporated into the next software release. Some companies already use this method for certain kinds of ML algorithms, like smart predictive text in keyboards (where, by its very nature, we all correct mistakes).
The vast majority of users would never bother to grade and correct their virtual assistant, of course. The whole point of them is to avoid this tedium, and who wants to review every mis-diagnosed movement trigger on their smart security camera or mis-labeled photo in an AI-powered photo album? That’s work. That’s the opposite of what AI is for.
But with a big enough audience, and Apple can certainly lay claim to that with over a billion devices in use, even a tiny fractional percent of active users training their devices would be a huge sample to draw from. It might even be enough to make Siri an exceptional AI assistant, which it currently definitely is not.
Would a company like Apple be willing to go that extra mile? To tarnish its slick design and “it just works” appearances with an easily accessible interface that, by its very existence, implies something doesn’t work often enough? Probably not. Apple will likely quickly complete its review of its grading program and re-instate it with a toggle switch in the privacy settings to opt out. It’s the simple thing to do, but it’s a missed opportunity to turn at least a small portion of hundreds of millions of Siri users into active Siri improvers.