After a month of using Siri, the new voice-controlled “personal assistant” available on the iPhone 4S, I’ve decided it may be time to add voice control to the list of paradigm-shifting ways to interact with a computer—right behind the mouse, keyboard and, more recently, touch gestures. While voice control remains far from perfect, the ease of use and instant results Siri delivers may be just enough to shift people’s habits. It’s certainly changed mine.
Controlling computers using voice commands has been a promised fantasy for years. Though various companies have tried, none has delivered something easy, convenient, or reliable enough to work well for most users. Apple’s Mac OS has had voice commands built in since the mid-1990s, and I recall Windows booths at CompUSA staffed by Dragon Dictation engineers wearing awkward headsets, as OS/2 Warp gathered dust on the shelves.
In fact, most phones have been able to do voice-controlled contact and number dialing since before the arrival of smartphones. Despite widespread availability, voice control never gained traction because the effort required to get it to work right wasn’t worth it for most people. Voice control—from the old Speakable Items in Mac OS to the method of dialing contacts on older cell phones—always required specific phrasing that sounded more like a command than natural speech.
“Dial 5-5-5-5-5-5-1-2-3-4”—enunciating each word and number—is a lot harder to do on a regular basis than to simply say “Call mom.”
How Siri is different
Siri changes things in much the same way the original Mac changed computing for many people. Before the Mac arrived in 1984, most computers required specific text commands to be entered into terminals. The combination of the mouse and the graphic user interface not only forever changed the direction of those who built and designed computers; it also opened up computing to a new batch of users. Similarly, touch-screen devices were available long before the first iPhone arrived in 2007, but it was the iPhone’s hardware and software combo that changed expectations of what a next-generation phone should be like—and opened the door to the iPad three years later. How you connect with technology matters, whether it’s by GUI, touch or voice. And new ways of interacting with technology can pique the interest of people who have avoided it in the past.
This is what makes Siri different—and better—than earlier voice technology. With Siri, the syntax—that is, the way you phrase an inquiry—doesn’t always have to be exact. For the most part, when you make a request for information, dictate an email or issue a command, the technology behind Siri parses out what is meant and responds accordingly. As noted, most phones understand a “dial” command followed by a string of numbers, but Siri knows exactly what to do when told to “create a reminder for every Thursday morning at 7:08 to take out the trash.”
That doesn’t mean that Siri reads minds. When it listens to a sentence, the response is triggered by certain keywords or variations of what is meant. While the artificial intelligence behind Siri is better than previous voice command technology, there are times when specific syntax is still important. “Send a message to my sister telling her to call me later” will result in a text message to my sister that reads: “Call me later.” Impressive, right? But saying “Show me upcoming birthdays” will cause Siri to respond, “Sorry, I don’t understand ‘Show me upcoming birthdays’ ” (with the option to search the Web). But if I phrase the query this way—“Show me birthday appointments”—then the proper information from Calendars will be retrieved and shown.
In other words, there are still times when you have to adapt to Siri rather than the other way around. (It’s also why the technology is still technically beta.)
Even though the use of specific syntax is essential sometimes—for instance, if you want to tell Siri to specifically search Google, Bing, Yahoo or Wiki—the need for it has been minimized. More importantly, Siri currently recognizes enough commands to lower the bar of entry to entice users that may have given up on good speech recognition. I know it has enticed me.
Personality goes a long way
But Siri goes beyond answering questions or pulling up results. Siri will respond with questions in some cases to help refine your query, and it’ll walk you through dictating emails and text messages. Even more endearing is that Siri has a bit of personality. For instance, “Open the pod bay doors” is a popular command that numerous iPhone 4S users have tried—and posted about online. (Anyone familiar with the movie 2001: A Space Odyssey will understand the reference.) Siri’s answers vary from “sigh” to a slow, drawn-out imitation of HAL, quickly followed by a sarcastic “Are you happy now?”
During my initial testing of Siri right after I got my iPhone 4S, I wanted to know the exact date for the upcoming Friday, so I asked Siri, and Siri told me. I followed up, “When is Halloween?” Siri responded, “Halloween is on Monday, October 31, 2011. I sure hope I get the day off.” It was enough for me to do a double-take; and I haven’t been able to duplicate that response since.
Pop culture references, smart aleck remarks, and sometimes unintentionally funny responses create an emotional, visceral connection with the device; you never quite know what answer you’re going to get. It helps humanize the technology further and subtly encourages you to keep asking questions and interacting with Siri. As Samuel L. Jackson said in Pulp Fiction, personality goes a long way. The seeming randomness of Siri will get people to use it, but it’s not just for fun.
In the past month, I’ve used my voice to create a wide variety of reminders, notes, appointments, emails, and text messages. I’ve used my voice to look up word definitions, check traffic, and find the location of my friends. Since I spend a lot of time driving in my car, the ability to do this by simply speaking out loud is a big deal. Ideas that I couldn’t write down in the moment are easily transcribed by tapping my hands-free unit and starting with “Note to self”; things I need to do or items I need to buy are quickly added to my lists (“Siri, add this to my To Do—or To Buy or To Fix—list, thanks”). In the past month, I have used reminders, timers, calendars and messages more than I did during the entire summer I spent working with the dev builds of iOS 5, and I’m convinced that once people are more aware of Siri, they will, too.
Why? Siri is effective and simple. Tapping out letters and words—which requires unlocking the phone, navigating to an app, launching the app, oftentimes hitting a + button to add a new note or reminder, then typing—suddenly seems like a waste of time. With Siri, completing tasks has essentially been reduced to thinking out loud.
Some of the things Siri can do are as fascinating as some of the things it can’t. For instance, I can tell Siri, “Remind me to call Mom when I get home,” thereby creating, in effect, a “geo-fence” around my home location using the built-in GPS. I’ll get an alert upon arrival at the house. But I can’t tell Siri to decrease the brightness of the display or to toggle Bluetooth on or off. (There’s a tip guide built in so you don’t have to guess too often about what Siri is capable of.)
More work needed
Although Siri easily crosses the bar set by earlier voice command software, there’s obviously still work to be done; voice interaction and the technology behind it are very much a work in progress. Siri’s voice recognition is handled by Dragon—still at it, after all these years—and the noise-canceling technology built into modern gadgets means clunky headsets are increasingly optional. But any ability to control a device by voice alone is only as good as the ability to transcribe the voice accurately. Siri’s software can still be thrown off by regional accents, slang, and excessive background noise.
More annoyingly, Siri requires an active network connection to work—even for tasks local to the phone. As most AT&T subscribers can attest, this just isn’t possible at all times. Even if you have a connection, Apple’s servers—which process the commands—have to be up and running as well, and they’ve already had brief outages. There’s nothing more annoying than the sudden fail of technology you’ve grown to rely on.
Despite existing shortcomings, the crazy-good part about Siri is that this is just the beginning. How quaint the software that powered the original iPhone now looks, four years in. Imagine how quaint Siri 1.0 will seem four years from now. Like the hardware and software that hosts it, Siri will only become better with time.
Any technology hoping to gain mass appeal has to be good enough to change the thought process from “Why are you using that?” to “Why aren’t you using that?” In essence, it has to offer a continuing “wow” moment that starts a feedback loop of sorts. You try Siri and find that it generally works well enough to keep trying it. You find that it’s not just useful, but fun, which encourages more experimentation. And that helps the technology behind it “learn” how users are using it, thus allowing engineers to make it better and, in turn, encourage even more use.
Like any good paradigm-shifting technology, Siri removes layers that have, until now, prevented many people from interacting with the wealth of information available at their fingertips. Its arrival marks another turning point in how we integrate technology (and information) into our daily lives.
On a lark, I told Siri, “You’re pretty cool technology, Siri.”
The response: “Am I? I’d like to be.”
Michael deAgonia, a frequent contributor to Computerworld, is a writer, computer consultant and technology geek who has been working on computers since 1993. You can find him on Twitter (@mdeagonia).