About a year ago, my sister and I sat on her couch, catching up. We talked about my nephew, our weekend plans, hummus recipes. She said something like, “I usually mix cilantro into my chickpeas, but some people think it tastes like soap.”
Across the living room, the top of a black cylinder on the TV stand lit up with a glowing blue band. “Kitties are pretty, and so are their purrs and meows,” the Amazon Echo announced.
We stared at the device for a moment, startled, before hunching over in laughter.
It was just a small reminder of something we knew, but had forgotten: Alexa was always listening.
Smart speakers, like Amazon’s Echo and Google Home, listen constantly to the homes they’ve been welcomed into — and that’s a lot of homes. By November 2017, Amazon had sold around 20 million Echo devices, and Google had sold about 7 million Google Home devices, according to leading investment research company Consumer Intelligence Research Partners, LLC.
The devices are master eavesdroppers, always on the lookout for that one trigger — the “wake word” — that tells them it’s time to act. Once triggered, the speakers begin recording what’s being said. They send this information in real-time to a cloud server, which processes it, and sends a response back.
The device’s wake word is the crux of many people’s privacy concerns — concerns that everything spoken in the privacy of their homes is being recorded. “OK Google,” “Alexa,” or any other device-specific word that tells it to begin recording — these phrases make the difference between conversing in private and holding a conversation that is captured and sent hundreds of miles away, where its content will be stored, and potentially sold to marketers. Neither Google nor Amazon agreed to comment on this concern. Though, anyone with a smart speaker has likely experienced how messy wake words can be; often, the speaker starts recording everything it hears when you haven’t spoken to it.
“It’s a big problem for present wake-up words,” said Veton Këpuska, an associate professor of computer engineering at Florida Institute of Technology. He’s been working on speech recognition for decades, and his voice raised in high-pitched excitement as he talked about these nuances. He helped me understand how wake-up words work, and why it’s so hard to make them error-free.
According to Këpuska, wake word engineers usually base their software on another machine used to process language: the human brain. They call this genre of software architecture “neural networks” because its programs resemble the mechanics of the brain’s neurons.
When my sister and I sat on the couch, swapping recipes, the Amazon Echo sat on the other side of the room running tests on every word we said. A word would come in — “chickpeas” — and hit the first layer of tests. This layer made a decision about the word, and sent that decision to the next layer. In this case, it might determine that “chickpeas” did not warrant a response. Or, it might think “chickpeas” sounded kind of like “Alexa” and send it on to the second layer. In that case, the second layer has a chance to weed out the word and keep the smart speaker quiet. Every word uttered around a smart speaker triggers this layer-by-layer analysis.
“They have to include a significant amount of layers in order to distill the complexity,” says Këpuska. The more layers, the deeper the neural network — and the deeper the network, the better. “Today it’s not unusual to have a ten-layer depth,” he says.
Despite so many layers, and so many chances to weed out words that aren’t directed at a smart speaker, the devices still tend to mess up pretty often. By Këpuska’s estimate, conventional speech recognition systems make about one error for every 100 words they process. That would allow for 10 to 30 misheard words within a ten minute conversation, based on estimates of English speech rates made by linguists at the University of Pennsylvania. And your privacy is compromised with each mistake.
Leaders at smart speaker companies have assured customers that their data is safe. Amazon’s Vice President, Rohit Prasad, told Quartz last November that Echo is designed so that nothing is stored on the device itself. It only records for long enough to know whether you’ve said a wake word, and it continues to record only if you have. Google’s device information webpage provides similar reassurances, and tells users how to manually delete information Google has gathered about them.
Given the imperfections in wake word technology, these assurances might be moot. Making wake words work better, though, is a tough problem, according to Këpuska.
The first problem is something even our brains struggle with: a lot of words sound alike. Once I thought my friend asked me “Are you angry?” but I found out later she actually wanted to know whether I was hungry. Similarly for Amazon’s Echo, when someone talks about “flexing,” being “perplexed” or their friend “Alex,” it’s a big task for the machine in the room to distinguish these words from its wake word, Alexa. Engineers choose words intended to reduce these confusions, Këpuska says. They look for something “short enough but also distinct enough to be able to trigger when spoken.”
The second problem — the one that Këpuska is working to solve — is in distinguishing the meaning behind what someone has said. “There are distinct ways that we humans use a wake-up word,” he said. When we’re talking to the device, and we say “Alexa, play rock and roll,” we’re alerting the device; we want a reply. When we’re talking about the device, and we say “I started using Alexa a year ago,” we’re referring to the device; we don’t want a reply.
Both of these challenges contribute to errors like the one my sister and I experienced in her living room — errors that lead to people being recorded when they don’t want to be.
The implications can be more sinister than the device making an unexpected announcement about kitties. “A lot of people who say ‘I have nothing to hide’ are wrong,” Justin Cappos, a cybersecurity expert and associate professor of computer science at New York University told me. “Everyone presumably farts.”
Beyond smart home devices documenting bodily functions, Cappos envisions more worrisome possibilities: marketing, for one. “If Amazon can come up with a way to make you order more stuff you don’t need, then that’s what they’re going to do.”
The ability of governments to peer into people’s homes also worries Cappos. “It’s in the [National Security Agency’s] mission to use this information,” he says. Last year, recordings from an Amazon Echo were used as evidence in a murder court case. These possibilities extend beyond U.S. agencies, too: Cappos wonders whether this information could help Russians spread information more effectively than they did during the 2016 presidential election, when they used information about people’s interests, based on their Facebook activity, to target users with political content.
The potential uses of information gathered from the most private nooks of everyday life can be alarming. And the weaknesses of wake word technology means that, as long as you’ve brought a smart speaker into your home, there’s really no way to know when that information is being gathered. So, how do you protect yourself? In Cappos’ opinion, there’s only one fail-proof defense mechanism: “Unplug it.”