AI-Enabled Speech Technology Has Yet to Cross Finish Line
(Pictured above: The AI- and ML-enabled speech technology panel at The AI Summit last week in New York City.)
While AI-enabled voice technology has gained traction, it hasn’t crossed the finish line in maturity or adoption. That’s because emulating the thought underlying what we know as human speech is difficult.
Artificial intelligence (AI)-enabled voice employs context and nonsequential speech patterns that are difficult to train machines to understand.
“The way we think, talk and communicate is very nuanced,” said Mark Beccue, a principal analyst at Tractica. Beccue headed headed the panel “AI- and ML-enabled Speech Technology” at the AI Summit in New York City last week.
Experts emphasized that AI-enabled speech technology needs to be more conversational and less transactional.
“We’re trying to design more fluid conversations,” said Claire Mitchell, director of VaynerSmart at VaynerMedia, an advertising company. “We’re in the early days of single-turn interactions. It’s still a bit clunky.”
Panelists discussed how AI and machine learning (ML)-learning enabled voice have gathered steam and where they fall short. As panelists noted, the technology challenge with AI-powered voice requires training large volumes of data on the vagaries of human speech, including hundreds of different languages, nonlinear conversation and idiomatic thought.
That’s why AI-enabled speech is challenging but also promises ROI.
“Speech is the most difficult but it also can bring the most value,” said Laura Horvath, director of product marketing at Figure Eight (recently acquired by Appery.io). “Just the handling the dialects within the United States is tough.”
Still, AI-enabled speech technology has grown in just a few years. ComScore predicts that, by 2020, roughly half of all searches will be carried out through voice rather than text. And the global speech and voice recognition market is expected to grow at a CAGR of 17.2% from 2019, to reach $26.79 billion by 2025. And e-marketer estimated that nearly 40% of all internet users were using voice-activated devices in 2019, up nearly 10% from the prior year.
There are concurrent technologies driving AI-enabled speech assistants. According to a recent survey by OC&C Strategy Consultants, the voice segment will be driven by a surge in the number of homes using smart speakers. The survey also reveals that voice shopping is poised to reach a whopping $40 billion by 2022.
Experts also talked about a shifting, broadening role for AI-enabled voice technology. Today, AI-enabled speech assistants such as Alexa and Siri are primarily about acquiring information. But in the future, AI-enabled assistants will be more seamlessly enabled for a variety of industries and interactions and business processes, from retail and travel to health care.
One area for progress is in understanding nonlinear conversation and history.
“We’re humans; we talk back and forth,” noted Beccue. “We will refer to something we said 10 minutes ago, and a human can recognize that, but machines can’t.”
He said that two years ago, he tested voice-activated devices to see whether they could track conversational context.
At that time, they lost the thread of conversation. Today, however, assistants can follow a conversation that asked a virtual assistant first who the 32nd president was (Frankline Delano Roosevelt), then about how many elephants there are in Africa (more than 5,000), then the name of Roosevelt’s ‘s wife (Eleanor). Compared to two years ago, the assistants could answer correctly and follow conversational history.
Even more important, AI will ultimately be about supplementing human intelligence, alongside human workers. AI-enabled voice tools ultimately will help even highly trained workers to augment their base of knowledge.
Figure Eight’s Horvath envisioned a world where AI-enabled speech technology could eliminate some of the inefficiency in a patient’s experience at a doctor’s office.
Today, a patient has to call, communicate symptoms over the phone, then repeat them as he or she enters the office, sometimes three times — upon admission in a form, then to a nurse, and then to a doctor, With AI-enabled speech technology, symptoms can be captured the first time a patient communicates them. Further, that information could be recorded by a voice assistant, then combined with knowledge bases and other sources to help a doctor reach a diagnosis.
“We could leverage AI to find a treatment the doctor didn’t think of,” Horvath imagined. “AI can make it personal and give better care.”
For partners, a serious challenge over the next few years will be finding and retaining talent for AI-based speech technology. Demand for these skills far outstrips supply. According to some estimates, fewer than 10,000 people have the skills necessary to work on AI problems.