About the Author and Article: Dr. Ender Ricart is a Principal UX Researcher at LivePerson, a company at the forefront of conversational AI applications for customer service. The content and recommendations in this article are informed by insights derived from a series of in-depth qualitative research on customer experience with conversational AI and user-journey with customer service.

Below is a *real* conversation between a customer and a bank’s digital assistant. For the sake of anonymity I have renamed it “Hana” (in several examples, identifying information of the customer, business, and bot have been anonymized).

In this scenario the customer has a new debit card and is not seeing their card information reflected on the online personal banking portal. They are asked by the bot to rephrase multiple times, select from various options, and ultimately cannot get the digital assistant to provide help, resulting in the customer requesting a person instead.

This transaction between a customer and the bank’s digital assistant formed one of the scenarios I had study participants enact for qualitative research. The goal was to understand the user’s point of view and behavioral drivers as they engage with a conversational AI.

I witnessed as person after person struggled. Presented only with Hana’s intro and an invitation to ask her anything — “How can I help” — , all study participants hesitated, uncertain how to proceed. They had an inherent skepticism going into the conversation that Hana could even handle their issue.

All of the 15 participants were prompted to reword, and then trial and erred various phrasings in an attempt to “trigger” the bot to comprehend. What’s more, the selection options, if they triggered them at all, caused additional friction as they were uncertain which particular option their issue might fall under.

I observed them deliberate, select one, realize it was wrong, and then frustratingly try to course correct. Ultimately, they felt their original skepticism about the bot being able to handle only “basic” issues and therefore not this particular issue, was, in their eyes, validated.

As an experienced researcher in the space of conversational AI, the above example exemplifies a pattern in customer experience and behavior I have observed in a number of studies over the years — uncertainty about what conversational AI’s functions are and how to interact with it. A vast majority of customer pain points fold into the following two source issues:

Customer is lacking an applicable mental model — The user does not have an applicable mental model (nor a nearest neighbor mental model that is successfully transferable) of how to interact with conversational AI. It also results in a lack of clarity about what the bot can do. Customers need help building or transforming applicable mental models.
Failure to work backwards from customer — A failure on the part of the bot (or rather, the bot’s maker ) to work in and from the customer’s understanding and applied mental models.

In the following article, I will unpack these two source issues and provide recommendations for how to remedy them, supporting the customer in their engagements with conversational AI.

Customer is lacking an applicable mental model

The term “mental model” has been around for some time and there have been a variety of concepts ascribed to this term. Here, “mental model” refers to people’s expectations and understanding about how to interact with a particular object, product, or piece of technology that in turn leads them to behave in certain ways.

Mental models are learned and transform over time. They correspond to products’ interactive paradigms that are consciously designed by companies. In new and emerging technological spaces like conversational AI, it is imperative to help users build new mental models or transform the pre-existing mental models they are applying when interacting with your products.

An aspect of this is being clear and transparent about what a specific piece of technology’s functions are and guiding the user in how to interact with it.

Let’s think about Apple products. Apple has established an interactive paradigm across all their product offerings from the user interface, iconography, operating system functionality, information architecture, all the way down to the look and feel of the products.

When I first got an iPad, I knew immediately how to use it as I could easily map my existing mental model of how to use an iPhone to the iPad. They had the same interactive paradigm. However, I have not used a Microsoft OS in over a decade.

When I got one for work, it was a real struggle to figure out how to do basic things like change settings, keyboard shortcuts, or customizing the background image.

This is because the closest mental model (what I call the “nearest neighbor mental model”) that I had was that of the Apple OS, and it did not overlap significantly enough to make my transition to a Microsoft OS pain-free.

I had to adapt my existing mental model and transform it into a new one that enabled me to interact with the Microsoft OS with a higher degree of success. The further away the nearest neighbor mental model is, the harder it will be to adjust and adapt it.

Sometimes, we don’t even have a mental model available to apply. I think about the time I had a friend over to play video games. She had never played one and, in fact, had never seen one being played. Naturally, I selected what I believed to be, at that time, *the best* game — The Legend of Zelda: Ocarina of Time.

It was shocking to realize how much I took for granted in gameplay. She had no comprehension of how the controller mapped to actions on the game, what buttons to press, and even the Zelda franchise gaming tropes. She was confronted with a lit lantern, an unlit lantern, and a sealed door.

Now, anyone that has played a game in the Zelda franchise immediately recognizes what to do; you light a stick/torch/arrow in the lit lantern and use that to light the other lantern, and this will open the sealed door.

However, she did not have this pre-existing mental model to mobilize. She didn’t know you could light a stick on fire and use it to light the other lantern. Nor did she know that would open the other door. She had to learn all of this from scratch, and it was frustrating enough for her (and me to watch) that she handed the controller back to me.

Because there is little to no consistency in the interactive paradigms used by chatbots, customers struggle to build new or transform existing mental models to aid in successful interactions. With chatbots in customer service, the nearest neighbor mental model that people consistently apply includes;

1) phone-based IVR and 2) interactive expectations drawn from established conventions in messaging, texting and email. These applied mental models result in the following behavioral manifestations:

Nearest neighbor mental model — phone-based IVR

Customers expect that chatbots can only handle basic transactional requests similar to a phone-based IVR system (e.g., check my balance, status of order, cancel appointment, etc.). Chatbots are also sometimes thought to serve as routers, again like phone-based IVR systems, directing the customer to the relevant customer service department.

The below example is an actual customer conversation. The customer applies the logic and language used in IVR to override an IVR’s selection options to skip talking with the bot and connect directly to a customer service representative.

The application of this mental model essentially results in people having certain expectations of what a chatbot can do and the functions it serves.

If they attempt to type out their question and the bot fails to understand, this failure solidifies their expectations that the bot’s capabilities are limited (similar to that of an IVR). They are unlikely to try using a bot again for any queries they deem to be more than basic.

Nearest neighbor mental model — messaging, texting, and email

When it comes to a textual-based interaction with a conversational AI, people will emulate the communication conventions currently in place for said mediums, namely texting, online messaging, and email correspondence (email is less the case for younger generations). Examples include:

A) Writing detailed messages with compounding or multifaceted intents (see the below conversation with an airline’s virtual assistant. Again, this is an actual customer conversation with some modifications to anonymize).

B) Splitting a single message across multiple entries or lines.

C) Signaling emotions through typos, emojis, rushed typing, caps lock, emphasis through exaggerated spellings, expletives, and use of multiple punctuations.

At present, chatbots are not meeting customers where they are, working backward from the mental models being applied. Common typos, exaggerated spellings, and phone autocorrects are not included enough in word ontologies. Even though people double-barrel intents or split intents across multiple entries, bots, more often than not, read intents from the first intent in the first entry.

Work backwards from the customer

Working backward from the customer amounts to recognizing what mental models are being applied by customers and then working in a stepwise fashion to help them build out new and improved ones. In doing so customer satisfaction and frequency of use will increase in a virtuous cycle of usability, reinforced by growing knowledge and awareness of how to interact with bots.

Below are recommendations of how to work backward from where the customer is now, improve customer experience, and grow mental models for how to interact with bots.

Reducing use of conversation in bot interactions

Reducing and restricting the use of conversation in bot interactions will shift the mental model users are applying (open-ended conversational turn-taking akin to texting, messaging, and email) to another (controlled and discrete interaction of selection).

One way to do this is the progressive, unilateral disclosure of information with corresponding selection options similar to an information tree of branching logic.

Smart replies that disable a text input field is a means of progressively disclosing and collecting information in a controlled way. Both techniques are best applied at the beginning of the conversation with a bot to narrow the scope of the inquiry and simultaneously train the customer in what language to use to successfully communicate with the bot (and business).

This will reduce the effort and emotional tax associated with trying to stab in the dark to guess at what language and organizational schema the business has applied.

It streamlines a customer’s exposure to this otherwise chaotic Kafkaesque organism of categories, systems, and logic by introducing just the information they need, guiding them through, in a stepwise and contained fashion, to a resolution of some kind — needed information, an answer, a transaction, a link, or a handover to the right agent that can help.

However, and this is critical, the topic areas, categorical schema, selection options, and information architecture you surface to the customer must overlap enough with the customer’s point of view! Do user research (such as card sorting, first click, or tree testing) to identify how to label and construct information hierarchies and categories so it resonates with the customer’s understanding.

As with the opening example of the banking digital assistant Hana, we do not want customers to struggle repeatedly with matching their issue to an ambiguous array of selection options. Failure to do so will nullifying the positive benefits of this form of interaction with bots and result in customer frustration and abandonment.

Additionally, I highly recommend including among the selections the option to “speak with a representative/agent.”

2. Transforming and confirming customer’s intent

A simple dialogue management technique that bots can utilize to help build a customer’s mental model, is to take what a customer says, translate it into the language and culture of the business, and then echo it back to the customer to confirm the bot’s comprehension.

This is a tactic used in human-to-human communication as well. Imagine you are communicating with Person B, and they tell you “I am going to a concert next Tuesday.” Today being Sunday, you may be confused by what they mean by “next Tuesday.” Did they mean this Tuesday or the following Tuesday?

Why didn’t they just say this Tuesday? Sometimes people think of the start of the week as Sunday and others Monday. So, you might say back, “The concert is the day after next?” Essentially, this is an act of quality checking that we are both on the same page, to make sure information processing is in sync. Bots can do this too. An example is provided for an imaginary airline

In this example, the bot restates what the customer originally said. Doing so, functions to confirm comprehension and educate the customer of the airline’s terminology for their request: adding a lap infant.

In the future, the customer will know how to communicate better with the bot. They will immediately type “I would like to add a lap infant to my reservation.” They have refined their mental model for how to talk to the bot and learned what the bot can do — add a lap infant to their reservation. This is satisfying experience and constructive process for the customer.

3. Asking customers to annotate their intent (disambiguation technique A)

There are those times when a customer comes into a conversation with a lot of content and detail, making it difficult for the bot to identify the customer’s want/need (as described above with nearest neighbor mental model of texting, messaging, and email).

In situations where the intent is not clear — perhaps intent is below accepted confidence thresholds — the bot can be transparent about its confusion by surfacing some of its best guesses, allowing the customer to annotate their own query in the language of the bot and business’ knowledge culture.

An example might be as follows; this one modifies the article’s opening example. Let’s say the intent recognition confidence level is below the accepted threshold for the bot to fire on it.

The bot surfaces the top three scoring intents as selection options to the customer to choose from as well as an “other” option. In this case the “other” option should take the customer immediately to a human agent to minimize the cumulative time and level of effort expended by the customer to have their issue resolved.

As with using automation in bot UX, I highly recommend that any time categories or selection options are surfaced to customers, they be based on the user’s point of view. You cannot just surface internal naming conventions for intents to customers and expect them to understand.

Their “intents” are different from a business’. Doing iterative user testing is important as it ensures when this information is surfaced, it makes sense to the customer where to bucket their problem/query and not find that their issue straddles multiple options.

Additionally, I recommend using this disambiguation method rather than asking the customer to rephrase the problem or question. Asking a customer to reword or rephrase their problem/question does not provide any constructive feedback about what to do differently.

This disambiguation technique lifts the burden off the customer; they do not need to think about how to reword what they just wrote, especially when they are already skeptical the bot can even help them in the first place.

By surfacing the top three suspected intents here — Card not working, Problem with online account, and Activate new card — it reduces a customer’s cognitive load by narrowing down selection options to related topics only. They can quickly identify from those three if one is a match and move on.

4. Requesting additional information for clarification (disambiguation technique B)

Another method LivePerson encourages customers to use in bot building is to have the bot proactively ask for additional information. As opposed to the previous dialogue management technique, this one instead requests additional information to help make an intent determination. If we were to apply it to the same scenario given above, it might look something like the below dialogue.

In this scenario, we can imagine that none of the intents score highly in terms of confidence. In this disambiguation technique, the bot does not surface the three top-scoring intents as it did in the self-annotation technique. This scenario asks a binary yes/no question to eliminate one or more of the possible intents and then, upon confirmation, proceeds.

In this scenario, imagine that the intent confidence score for “Card not working” and “Problem with online account” were about equal. The third option of “Activate new card” was well below the other two options’ confidence scores.

Once the bot disambiguates between “Card not working” and “Problem with online account,” it confirms the latter as the intent in question, and the bot can proceed down the correct intent-path.

5. Recognizing when to use a bot and when not to

In the research we performed, we learned that the customer’s engagement with customer service takes place at the very end of a larger user journey regarding a specific topic or issue.

For many, it did not make sense for them to engage with a bot that was sending them to FAQ pages or answering FAQs. The customer felt they were beyond the self-service and troubleshooting phase of their journey, faced with a “unique” issue and not a “basic” (aka FAQ) one.

We recommend positioning FAQ Bots and Search Bots earlier in the user’s journey, perhaps living on the home screen of your website or in the app. Customers that are struggling to troubleshoot or self-service would benefit from a quick way to link to the appropriate page or information.

An FAQ and Search Bot are best suited for this. If you have a FAQ or Search Bot, situating it earlier in the users journey by placing it on the landing page of the website or on FAQ pages will match current consumer mental models of a bot’s capability.

From there, you can introduce additional functions the bot may have — “Thanks for using Cool Airlines FAQ Bot. Did you know I can also help you change your reservation?”

Conversational AI is a technological product and should be treated as such

As the applications for conversational AI continue to broaden, especially in the area of conversational commerce, the quality of a person’s experience will be a deciding factor in their brand loyalty and satisfaction.

The “quality of experience” that I speak of is not only about advancing machine learning or natural language processing, but implementing conversational design that starts from the customer’s point of view.

The conversational AI industry has done itself a disservice by touting naturalistic conversational experiences with AI. They are by no means “natural,” (read intuitive), but technological, and as such the interactivity needs to be crafted — studied, developed, and designed — to create satisfying customer experiences.

Learning from past shortcomings, the first steps to building a satisfying customer experience with conversational AI are to:

Identify which nearest neighbor mental models, if any, customers are applying when engaging with your conversational AI.
Work backward from the customer’s point of view.

Once you know the nearest neighbor mental model your customers are applying, you can develop and design your interactions and conversations around it.

Decide on an interactive paradigm for all of your conversational AI to ensure your customers encounter repeatable and consistent quality experiences. You should also work to gently educate the customer about how best to interact with the bot and business to help grow and build new mental models and interactive frameworks.