So you have deployed a chatbot, but it seems to fail so often because users are saying unexpected things which your chatbot fails to handle. Rings true?!

There are many factors that need to align to make a chatbot conversation successful. Any misalignment between these factors may lead to conversational breakdown. Understanding how and why conversations break down can help conversational AI designers design better conversational experiences (CONX). Designing conversational happy paths is not difficult. It is getting your head around the unhappy paths that is tricky.

So, why do conversations fail? What do users say that derails the conversation from the happy path? And why do they say that? There are many reasons why! To understand this, we need to dig into the factors that make conversations succeed.

1. Attention

For a dialogue to succeed, both parties need to pay attention to what the other person says. Very obvious! Isn’t it? What happens when one person doesn’t? What happens if the user is distracted? He could respond with “Could you repeat that please“, “Sorry i didn’t get that”, “one more time!” or even simply “What!”.

If the user had initiated the conversation, chances are the user is paying attention. But if the chatbot had called the user, then there is a chance that the user is not paying attention. So it is important to be ready for requests to repeat, respond to misinterpreted chatbot utterances, long delays in response, etc.

In addition, chatbots can be proactive and check if the user is (still) listening (especially when it comes back after finishing a backend activity), or check if the user has time to spare for the conversation before the conversation even starts. The aim here to get the user’s attention before engaging in conversation and constantly maintaining it throughout.

It is also equally possible that the user may think that the chatbot is not listening. And in order to inform the user that the chatbot is indeed still listening, it could use back-channel utterances like “mm”, “uh huh”, etc. This will assure the user that the chatbot is actively listening and therefore may keep to the happy path. In addition, chatbots need to be designed to respond to attention questions with appropriate responses, while not losing context of the conversation.

2. Channel noise

Channels need to be noise free and deemed appropriate for the impending conversation. Voice channels like telephony do still have the noise problem. If you can’t hear each other properly, how can you expect to have a good CONX? Chatbots need to be able to detect noise and suggest alternative options — move to another channel, suggest alternative times, etc. Otherwise, the bot is going to perform badly in understanding the user’s intent. The same is true for Internet issues on text channels — intermittent or slow connections.

Being able to detect such issues and pre-empting them would be ideal — e.g. “I think the connection is a bit slow today.. Please bear with me..”. On the other hand, if the user complains, then have your chatbot acknowledge and take appropriate action — e.g. “yeah, looks like we have a bad connection today. Should we try this later?”

3. Context

So what is the conversation about? Do both the partners know the context? Are they on the same page? The assumption usually is that the user who initiates knows what the chatbot can and cannot do and the purpose of the chat. However, if the chatbot initiated the conversation, it should be able to provide context and motivation to the user so that there aren’t unnecessary questions about the purpose of the conversation in the user’s mind, when the conversation is going on. If so, the user is bound to ask the question — “What are you talking about?” , “Why are you asking me this?”, etc.

4. Preferences and constraints

User’s preferences and constraints need to be understood by the chatbot. The chatbot needs to adapt to users based on these factors. Failure to adaptation would bring such issues to fore when the user mentions them explicitly during conversation. Some users may have a preference on what channels to use and when. So, outbound calls may be met with — “Call me later”, “sorry, can we speak later, i am busy right now”.etc. Chatbots need to be able to accommodate such requests as well.

Another instance of preference would be to avoid giving out private information (e.g. about gender, ethnic info, etc) and the user would go — “i prefer not to disclose this info”, “can i skip this one”, etc.

5. Understanding what is said

Understanding what user said is the next barrier to cross. There are many steps here — recognising words, identifying entities, understanding intents, etc. Homonyms (different words having same spelling or pronunciation), spelling errors, and ambiguous words and phrases can lead to clarification requests from users — “did you mean x or y?”. Another source of issue is the use of jargon and abbreviations. Will users understand them? If not, anticipate clarification requests — “what’s that?”, “what do you mean by ABC?”, etc.

To pre-empt these issues, make sure you get the copy checked for spelling errors, excessive jargon/abbreviations. If its a voicebot, you will probably need to check the voice for confusing words or phrases. And to handle these ex post facto, allow for clarification questions mentioned above.

6. Latency

Latency can cause unnecessary issues in communication. Latency is the problem of delay in response which can be due to many factors. If the conversation is over voice, the chatbot is tasked with converting voice to text, text to intent, backend processing, dialogue management, response generation and finally converting response to voice. All these can take a while. The general expectation is to respond within 500 milliseconds. Any further delay can cause the user to speak again thinking that the bot did not get his input. This will create duplicate processing if the bot is still listening.

Latency issues can be avoided by sacrificing accuracy for speed in terms of processing natural language inputs. Another approach would be to design the system to handle latency better. By having the bot detect or predict potential latency delays, it can be made to verbalise it to the user — e.g. “Ok, two tickets to London! Gimme a min..”. Acknowledging receipt of user input can help anxious users stay calm and not try re-saying what they already have.

7. Taking turns

Channels of communication are mostly full duplex. This means that both interlocutors can speak and listen at the same time. Contrast this with half-duplex communications settings like when on radio, you need to push to talk. When you are finished with your turn, you are supposed to say ‘over’. When you want to end a conversation, you would say ‘over and out’. Using these keywords to signify the dialogue partner when she could take her turn makes it easy to manage turn taking.

Smart speakers enforce a strict order of turn taking. Alexa would finish speaking and wait for a bit for the user to respond. All the while it will be flashing its blue light as a sign of yielding the floor to the user. User’s turn is released when Alexa detects an end-of-speech silence. User can speak no more after the silence. Same is true when the blue light stops flashing. This actually irritates me as a user as I am not allowed to think naturally as I would do in a human-human conversation.

Other chatbot communication channels are full duplex. Interlocutors can start speaking at the same time, overlap each other’s turns, interrupt each other and so on. Are chatbots ready to handle such behaviors? In text messaging channels, users could type in messages just as chatbot is sending them a question. What happens next? Should the chatbot revise its conversational task or wait for an answer from the user?

When turn taking is not managed efficiently, users might say things out of order in order to interrupt the chatbot or correct what they previously said. This can cause conversational breakdowns that are not cheap to repair. To deal with this issue, make effective use of typing indicator or earcons to let the user know that the bot now has the ground and he has to wait for his turn.

8. Knowledge

Does the user have knowledge sufficient enough to request and respond to your chatbot? Firstly, the user must know what your chatbot can and cannot do. This will ensure that user’s requests are properly handled. In absence of such knowledge, the chatbot can misunderstand user’s request and lead her into a wrong conversation. The user will refuse to engage and say unanticipated things in response — “what are you talking about?”, “i didnt ask for this”, “you got me wrong”, etc.

Secondly, during conversation, the chatbot could ask a question for which the user may not know the answer. The user could get an answer from elsewhere, but it could take time. So user could say things like — “er.. i don’t know”, “i probably need to get back to you on that one”, etc. In both of these cases, the chatbot could set expectations before the start and also should allow for the user to pause a conversation and get back later.

9. Understanding why

Understanding what is said is different from why it is said. Understanding the words and deriving meaning of the utterance is step one. But to put that into context and deriving the pragmatic meaning it step two. If the user does not understand why he is being asked a question or requested to take some action, he is not going to engage blindly. He is bound to ask why! This is about the logical relationship between what is being asked and the goals of the user.

If the conversation deviates from the user’s goal, she might want to say things to bring it back — “sorry, can we talk about this later, i need to get my tickets booked right now”, etc. So stick to the joint goal and provide closure before you move on to the next item on the agenda.

10. Building trust

Most importantly, trust is a key issue and the lack of it will contribute to conversational breakdowns. Have you built trust with the user so that she is willing to part with sensitive info in response to chatbot’s questions? For instance would users be happy to provide payment info to a chatbot? If not, you must anticipate questions like — “is this secure?”, “can I do payment on a secure page?”, etc.

Building trust is hard. Sometimes trust can be borrowed from the brand that the chatbot is representing. Otherwise, you will have to build it from scratch. Do what you promise, mean what you say, always follow up, be proactive and helpful. Build an emotional relationship. And the more trusted you are, the more you will be forgiven when failing.

There is a lot of grey area between the factors listed above and some of them are related or may have overlaps as well. So you may not be able to see a clear separation of concerns between them. However, I believe the above list could help you anticipate all sorts of detours that can lead to conversational breakdowns and design conversational experiences that are ready to pre-empt such detours and if not, be ready with repair strategies to get back on track.

The above is just the list of factors that can make or break a cooperative conversation. There may be others to consider if the conversation is not cooperative — argumentative, behaviour changing, debates, diatribes, etc. Probably a topic for another article!