Waiting for me under the tree this Christmas was a shiny new SONOS Play 1 Speaker… Great!
I unpacked it, plugged it in and connected it to my Amazon Alexa network. All good so far.
Then I tried to make it play my Spotify “Christmas playlist” (Mariah Carey, Slade, all the usuals). Here is where I hit a problem that perfectly illustrates one of the biggest challenges facing Smart Speakers in 2018. After trying various command combinations I eventually found success with the clunky request: “Alexa, play my Christmas playlist on Spotify on Sonos play in lounge”.
It doesn’t feel right, does it? It feels unnatural, awkward and difficult. Yes, whilst there are things I could have done in the system set up to make the process a little easier (setting up Spotify as my default music service for one) for me it illustrated an issue with the technology that currently presents a huge challenge for developers but also a huge opportunity.
Voice First technology is all about frictionless activation. No buttons, no screens: You simply ask and you get. For that to be truly effective then the voice-communication between speaker and user needs to also be frictionless. Smart Speakers must understand human communication, process different commands and have a level of intuity that means a user, even one completely unfamiliar with the system, can get the desired action requested, whatever they say.
In other words; Smart Speakers need to get Smarter.
Right now interaction with these devices feels like you are communicating with a machine and sometimes a machine with limited capabilities. Alexa in particular requires certain commands said in certain orders to complete a task and whilst Google (with years of search-engine experience behind them) shows a greater understanding of the way in which humans request things, it is still far from perfect.
Artifical Intelligence is already adapting fast and learning more about the way in which we communicate but anyone who has ever tried to learn a foreign language will understand exactly the challenge that lies ahead for Smart Speaker developers. It’s not just a linear conversation between man and machine; conversation and requests can branch in any direction and the devices need to understand and read those diversions. Human communication is full of “nuances” that give important clues to an individual’s intentions and meaning (we’ve all heard the phrase “it’s not WHAT you said its the WAY you said it”). AI is already developing the ability to pick up and interpret THESE elements and once it can, and this technology is integrated into Alexa and her siblings, then Voice First technology will truly flourish.
It’s not just user experience that will be effected either. A voice first device that can understand hidden meaning and emotion could provide incredible insight to brands as Social Chain’s Oliver Yonchev pointed out in a recent post “The Brave New World of Voice Recognition”:
“If AI can learn to pick up on these nuances as sophisticatedly as human beings do, the insight-driven algorithms which are currently cutting-edge in personalisation will be dwarfed in comparison to this new level of intelligence. Combine this with the promise of distribution through search and commerce hardware and social-first software, and voice recognition technology looks set to change the digital landscape as we know it.”
Research from Yale University last year proved audio to be THE most powerful tool when building emotional connections so just imagine how effective a marketing tool Smart Speakers could be when they not only fully understand human communication but can detect mood and deliver bespoke, commercial messages and product suggestions based on those understandings.
Right now these devices offer brands a new, clever and effective way to connect and communicate with their customers in the future when their understanding of humans communications has improved, the could be the most powerful marketing weapon in the armoury.