We’re getting used to “conversational AI” even if we rarely name it. We encounter it when we refill a prescription through a voice recognition system at our pharmacy, or when we engage with a chat service that pops up on a website we’re using, an experience that’s becoming increasingly common as chatbot technology grows in availability and decreases in price. Many of us even have devices that rely on conversational AI in our homes, like “smart” speakers from Google and Amazon. Conversational AI encompasses “messaging apps, speech-based assistants, and chatbots” that automate communication, either to direct us to the right human to help us accomplish something, or to complete a task or solve a problem without involving a human employee at all. The technology can be used to save time, effort, and costs for the companies involved, as it reduces the need for human employees and prevents existing employees from wasting time on the simplest and most repetitive tasks. The only difference between voice-based and text-based conversational AI is that voice-based technologies also involve Natural Language Processing to recognize our spoken words, whereas text-based technologies don’t require that additional step.
Just as we’ve all encountered conversational AI, most of us have also been frustrated by it at some point, too. The more complex our request, the less likely we are to have it handled properly by conversational AI. A chatbot might easily refer you to the proper human employee if your request is, “How can I buy your product?” But if your needs are more nuanced, for instance, “How can I view my account balance?” or “I’d like to add another item to my in-progress order,” conversational AI may be stumped.
Why? Because most models for conversational AI are not able to contextualize what we type or say. This is because they have only limited associations with each word that they have “learned,” whereas humans have almost unlimited associations, both conscious and unconscious, with every word we hear and see.
One excellent example of this phenomenon would be a chatbot designed to facilitate a food order, specifically at a pizza restaurant. If a human being handles your order, they are able to apply extensive context to what you say or type. If you use the phrase, “A large pizza,” they might ask, “Do you mean a 14 inch or a 16 inch pizza?”, since either could be construed as “large.” If you ask, “Can I make the Veggie Lovers’ pizza spicy?”, a human would ask whether you wanted to add spicy base tomato sauce, add a side of hot sauce, add spicy sausage, or add pickled jalapenos to the toppings. All of this nuance, which seems quite straightforward to us, is beyond a basic piece of conversational AI, because unlike a human being, artificial intelligence only knows what it has been specifically taught. It cannot pick up context cues or draw on experience like people do. Unless its vocabulary has been tagged descriptively, the chatbot may only know that the “Spicy Buffalo Chicken Pizza” is spicy. It may have no idea that sausage or jalapenos are also considered “spicy” elements by human beings.
However, despite its limitations, conversational AI is still being used more and more widely. Not only can it save time and costs, it can also help us in particular during the current pandemic, by reducing the number of in-person interactions necessary for us to complete a task or solve a problem. So how can this useful technology be improved and made more effective?
The short answer is: through better, more customized taxonomy, or labeling and classification. Sticking with the example of the pizza restaurant, creating an extensive custom taxonomy, or system of labels and classifications, for all of the words that might be used to order pizzas could help solve the problem of lack of context. In an existing taxonomy, ingredients might only be tagged as “crust,” “sauce,” “topping,” and with one or two other popular classifications like “vegetarian” and “gluten free.”
But an expanded, targeted taxonomy would account for more customer inquiries and variations in requests. One topping, for instance “sausage,” might be tagged as a “topping,” but it might also be classified as “non-vegan,” “non-vegetarian,” “non-Kosher,” “pork,” “beef,” “sugar free,” “gluten free,” “spicy,” and “meat.” “Gluten-free crust” might be tagged as “crust,” but it might also be classified as “vegan,” “vegetarian,” “low-glycemic,” “sugar free,” and “diet.” With this type of extensive taxonomy in place, the pizza restaurant’s chatbot could answer more complex inquiries like, “Is the sausage pizza gluten-free?”, “Can I order a vegan mushroom pizza?”, and “Is your vegetarian pizza Kosher?”
This type of custom taxonomy combines common knowledge, domain-specific knowledge (context pertaining to the particular business, in this case pizza), and syntactic/semantic knowledge (customer intent, in this case requests for pizza types) to provide the conversational AI with far more context than it previously had. The result is conversational AI that can extract the right information from a conversation to complete a specific customer request or answer a specific customer question.
The future of conversational AI is massive, with nearly 50% of inquiries expected to be conducted this way by 2023—although the coronavirus pandemic may speed that timeline up even more than currently projected. If businesses employing conversational AI invested in expanding their taxonomy, they could achieve much better results from its implementation. Rather than customers opting out of automated problem-solving technologies and tying up phone lines and customer service employees with simple questions and requests, customers might be less likely to grow frustrated and therefore more likely to continue using AI to meet their needs.