AI Chatbots: Reality vs. Hype

Chatbots give a perception of being intelligent, but intelligence is a long way away.


Navveen Balani

3 years ago | 19 min read

Welcome to the world of intelligent chatbots: your companion and conversation agents who should make your life smarter. A leading research paper even said that by 2020, the average person would have more conversations with bots than with their spouse. So, be ready to embrace this new life in a year from now.

Ok… Hold on. Have you ever tried telling Siri or Google to “find restaurants that don’t serve pizza?” At least they are both consistent in that they gave the same answer — suggesting restaurants that do serve pizza.

The first citizen humanoid robot, Sofia, is making her way to every media event, conducting interviews using human-like conversations.

How does she compare to these competitors? Well, the truth is far from reality. Chatbots provide an illusion of understanding conversation, but as you start asking intelligent questions, you begin to realize that they can only answer a fixed set of queries.

Well, by now, you should be able to block out the noise from reality.

Should you invest in chatbots with all these limitations? Every technology has its flaws, but you need to be aware of what you can build now, what to avoid, and how to work around the limitations.

I have seen many companies trying to build sophisticated chatbots using products from leading chatbot vendors and cloud offerings, spending millions of dollars, and hitting roadblocks. If you begin this venture based on what is being projected and start building it out, you will soon realize these restrictions one way or another. However, most vendors claim that it’s effortless to build a chatbot, but in reality, all of these techniques fall short when it comes to building a true conversational agent.

With the current implementations of chatbots, we are probably at the first generation of AI chatbots, which are scripted at giving answers to pointed questions. What I mean by scripted is that they are trained to understand general vocabulary, entities, metaphors, synonyms, etc.

The chatbot uses a fixed set of flows to understand the context. For domain-specific use cases, additional training is required, and you need to train on specific domain terminology and the relationship between the words.

For instance, if you are building a shopping advisor chatbot, the term “black and white dress” implies “black and white” as color and dress as category. You might expect that the color “black and white” is fairly generic and should be easily identified by the AI system, but that’s not really the case, which I will go through during the course of the article.

The intent of the article is to help readers make informed decisions on how to design AI chatbots and workarounds with the existing chatbot implementations.

What Are Chatbots?

A chatbot is a software program that carries out a conversation with a human. The conversation can be through textual methods, voice, or even by recognizing human expressions.

Chatbot interactions can range from simple answers to questions like, “What is the outside temperature,’ to sophisticated use cases that require a series of dialogue to arrive at an outcome like using a chatbot for booking holiday trips or providing financial advice.

What Are the Technologies Used to Build Chatbots?

Chatbots are not a new concept. Earlier technologies used a fixed set of input from users to drive conversations or scanned the input message to find keywords and lookup information/responses from a database.

These were mostly rule-based and keyword-driven without the bot understanding the context and meaning of the input message. Based on the input, a predefined programmed response would be provided.

With the advent of AI, Chatbots use technologies like Natural Language processing to understand the language and intent from the input message and take corrective action.

As the system tries to understand the language while users ask the same questions in multiple ways, the system is able to understand the overall intent. Once the intent is identified, you can extract the interested topic from the input.

For instance, “Find the cheapest flight from US to UK,” is similar to, “Find me lowest airfare from US to UK.”

The intent is the cheapest or lowest flight, while the location is from the US to the UK with an action of search flights.

An AI open source package or an AI NLP cloud service can be used to develop chatbots. Let’s refer to this as chatbot implementation for future references. I would talk about chatbot implementation in detail during the course of this article.

What Should I Keep in Mind for Developing an AI Chatbot?

Chatbots work well when the domain is well-understood by the AI system.

As the AI chatbot relies on NLP to understand the semantics of the input message, unless the NLP parser is trained on the domain, the accuracy of recognizing the intent and topics of interest would be very low or not acceptable criteria.

Take an example of a shopping chatbot that advises users on what to buy based on the latest fashion trends.

Consider 3 queries below from a user.

Query 1 — “Show me trending black and white AND dresses for a Christmas party in medium size.”

Query 2 — “Show me white 3-inch platform heels.”

Query 3 — “Find a black and white floral dress under $2,000.”

Here, the chatbot needs to understand the following:

  • The shopping language.
  • The intent as a shopping query
  • The domain as a shopping query for apparel and shoes. (i.e. there can be multiple domains — grocery, electronics, books, etc.)
  • Clothing shopping categories and terminology:
    • Category — dresses, sandals, etc.
    • Variants — sizes (medium/large, etc.), color (various colors and combinations like black and white), heel size (3 inches), etc.
    • Prices and ranges — $2000-$3000, etc.
    • Brands like — AND, Nike, etc.

Out of the box, any chatbot implementation wouldn’t understand the domain. You need to train the chatbot on the custom domain to recognize the context and the language.

For instance, out-of-the-box NLP parsers would not recognize “AND” as a brand. Let’s inspect how well some of the leading Cloud AI NLP services recognize the sentence, “Find a black and white floral dress under $2,000.”

Here is a snapshot from Watson NLP (out-of-the-box) implementation.

Figure: Keywords from Watson NLP

Figure: Concepts from Watson NLP

Figure: Part of Speech from Watson NLP

As you can see, the Watson NLP recognizes “white floral dress” as keywords and “black” as concept. Ideally, it should have recognized “black and white” as a concept since we are looking for a combination of these colors.

The dress could also be a concept, as it’s quite generic. The floral can be a keyword that has a dependency on the dress. Identifying all the facts in the right way is important since based on the facts, you would convert this to a search query to get the required details from the data store (or from respective search indexes).

For instance, the above should result in:

Color = “black and white”

Category = “Dress”

Gender = “Female”

Price < 2000

Pattern = “floral” or Keyword within category = “floral”

(where color, category, gender, price, pattern are all the columns or indexes you are searching against)

The Watson NLP parser doesn’t recognize “AND” as a brand but “AND” as a conjunction (“CCONJ”), which is expected, as it’s not trained on this input.

Let’s check how Google NP classifies this sentence. Here is a snapshot from Google NLP.

Figure: Entity classification from Google NLP

Figure: Part of Speech from Google NLP

As you can see from the above figure, Google NLP identifies the entity as “dress,” but doesn’t identify the colors “black and white.” With respect to the part of speed tagging, it’s like Watson NLP recognizing “AND” as a conjunction (“CONJ”) and not as a brand.

The above is true for any of the available NLP implementations (that is available today), where it fails to understand the correct context of the sentence. The use case was pretty simple. Even if we train the NLP implementation on these examples, it would fall short, as you need to plug-in specific NLP rules for such conditions to get the desired results.

As the complexity and context that needs to be inferred increases, training would also not help, as you can never come up with a generalized model for such conditions. That is the single largest limitation of chatbots if we only rely on today’s generation of NLP implementation.

Based on my experience in building a sophisticated shopping personalized advisor, none of the out-of-the-box AI NLP implementations fit the requirements. A simple scenario of this is presented in these 3 set phrases — “black and white dress,” “AND black dress,” and “blue jeans and white shirt.” In all 3 examples, the use of the word “and” has a different meaning.

In the first case, it represents a combined color “black and white,” in the second instance, “AND” represents a brand, and in the third instance, two queries are joined by a conjunction (i.e. and). Even with required training, a generalizing model was not possible with any of the available solutions.

These are just a few of the many examples I am highlighting. Imagine the complexity when dealing with medical literature. In our case, we ended up building our domain-specific NLP implementation which worked for all such scenarios.

In general, while designing chatbot solutions, start with a closed domain and what kind of questions the chatbot needs to answer. Don’t start building a general-purpose chatbot from start, as it would be difficult to get the required accuracy. Secondly, if you are using a cloud vendor or third-party implementation, ensure your use cases can be simply solved by the default implementation or you will need to build components to work around it.

What Are Typical Use Cases for Building a Chatbot?

In today’s digital age, customers are looking for instant information and speedy resolutions to all their queries.

Chatbots provide an efficient way to stay connected with end customers directly and provide information at their fingertips — be it through a messaging chat application or through a voice-enabled service like Alexa or Google Home.

Some of the typical use cases are listed below:

  • Ability to know your customers and directly interact with them over various channels, like retail brands directly connecting to their end customers.
  • Improve customer engagement, interaction, and provide speedy resolution.
  • Scaling customer service operations by providing relevant information 24/7 at a customer’s fingertips.
  • Understand customers and their preferences better to provide hyper-personalized service, like a personal assistant.
  • Provide an ability to interact with connected devices, like Smart Homes, in a natural and intuitive way.
  • Provide expert guidance, like a financial assistant chatbot providing investment suggestions.

What Are the High-Level Steps for Building an AI Chatbot?

The following are high-level steps to build an AI chatbot:

  • Define the business use case and end goal for building the chatbot.
  • Define conversation interfaces:
    • Define what kind of questions need to be answered
    • Define conversation/dialog flow on how various interactions would happen with the user. For instance, booking a flight is one dialog flow, booking a hotel is another dialog flow, etc. Within a dialog flow, what would be the interaction flow with the user?
    • Define how to capture the feedback from the user regarding the answers provided. Feedback can be explicit, like the user rating the answer or implicit on how much time a user spends looking at the answer and follow-up activity after that.
  • Question/Answer exploration
    • Identify existing sources (if any) for questions, like website FAQ, call center logs, etc.
    • Create a representation of questions that would be asked.
    • Create variations of questions to train the chatbot to understand the language and be able to generalize well.
    • Identify the source of answers — whether it would be a programmed response or coming from internal knowledge sources and documents (like available technical manuals for troubleshooting device related queries)
  • Pick up a technological approach

In this step, you will decide how to implement the chatbot. There are 2 approaches: building your own chatbot implementation using available frameworks (like TensorFlow, NLP implementations like NLPTK) and custom components or using an existing platform service like Google NLP, Amazon Lex, or Azure Bot service.

In both the approaches, you would need to train the chatbot implementation to recognize the question intent, domain, and the language. Existing platform services have simplified this process by providing required utilities that make it easier to create chatbots. For more details, kindly refer to “How do you build a chatbot using chatbot platforms?”

  • Pick up a delivery channel

In this step, you will decide how to expose the chatbot to end users through the required channel. The channel can be web, mobile, or voice-enabled devices.

Your chat implementation would typically expose an API (to ask questions and get responses), which can be called by a channel implementation. You can also release your chatbot implementation over third-party services like Facebook Messenger or voice-enabled services like Amazon Alexa. For more details, kindly refer to “How do you integrate chatbots with third-party services?”

  • Release, monitoring, and feedback

Once the chatbot is released, you would typically store all the user interactions to help you analyze the user behavior and their preferences better. The user and behavior data, in turn, would be used to provide a more personalized service. How you would use this new user information depends on your use case. For instance, if a travel chatbot is recommending a new holiday trip, it can suggest options based on your last trip interaction. You need to build a recommendation system that looks at the history of the user interaction in the past and suggest options. For details on how to build recommendation systems, kindly refer to the Recommendations Chapter.

Another important point is to capture feedback from the user at regular intervals to understand if a chatbot is providing the right information. The feedback captured will be used to improve the chatbot implementation, which can lead to training the chatbot implementation with new information.

For instance, your chatbot may not be trained on recognizing certain entities and concepts and, as a result, the responses would not be proper. You need to plan for building and releasing incremental models based on the feedback.

How Do You Integrate Chatbots With Third-Party Services?

As part of your chatbot technology implementation, your chat implementation would typically expose an API (to ask questions and get responses), which can be called by a channel implementation.

The channel can be web, mobile, or voice-enabled devices. If you already have an existing mobile application, you can embed this as part of the mobile application.

You can also release your chatbot implementation through third-party chat enabled services like the Facebook messaging application or through a voice-enabled service like Amazon Alexa as a skill.

All of these chat-enabled services provide a framework to plug-in your own implementation. The framework provides hooks or code interceptors for intercepting the chat message.

You need to extend their framework and plug-in your own implementation. For example, if a user asks a question on Facebook messenger, the question would be handed in your chat implementation through predefined hooks. You would process the message and send the response back, which would be sent back to the user.

Similarly, if you need to make your chatbot available over Alexa, you need to wrap it as an Alexa Skill using the Alexa Skills Kit interface. Once your skill is enabled in Alexa by the user, any voice messages will be intercepted by your skill and you can provide the required implementation and responses as per your chatbot.

For more details, kindly refer to “How do you build a chatbot using chatbot platforms?”

How Do You Build a Chatbot Using Chatbot Platforms?

A chatbot platform provides you with a set of services to design, develop, and deploy your chatbot. They provide you with a framework and guided set of utilities to build a chatbot.

Cloud providers like AWS, Azure, IBM, and Google Cloud provide you with a set of services that can help you generate conversations, understand the conversation language using NLP techniques, create hooks to take required action, and deliver the solution via APIs.

The fundamental approach adopted by each of these providers is the same. They allow developers to:

  • Design conversation flows using some visual interface or tooling provided by cloud provider.
  • Through these conversation flows you can:
    • Provide a set of questions and multiple ways you can ask the same question.
    • Define what the intent of the question is. For example, for the question, “Find cheapest flight from US to UK,” the intent is to find the lowest airfare.
    • Find what entities of interest to extract from the intent. The chatbot provider needs to be made aware of these entities. In the above example, entities are a country list: UK, US, etc. These entities can be generic which are recognized automatically by the cloud provider or the cloud provider provides a mechanism where you can provide or train these entities (including synonyms, metaphors, etc.).
    • Use the entities extracted to carry out the required action for the intent. For instance, in the above example, call a flight API service providing UK and US as “from” and “to” locations.
    • Provide the response.
  • Test and expose the chatbot through an endpoint
    • The cloud vendor typically provides an ability to expose the functionality of your chatbot through an endpoint, like a REST API.

The above technology works for simple to medium complexity flows — like FAQ, pointed questions and answers for customer query, a fixed step of steps (booking a cab), etc. Anything that requires sophisticated handling of queries, like the shopping advisor example, needs to be custom developed using NLP and other techniques.

Info — Microsoft has a QnA service ( that lets you create a bot from FAQ.

What Is Not Real About Chatbots?

The current generation of chatbots can be thought of as smart dialog systems driven through techniques like NLP and fixed conversation flows.

Out of the box, a chatbot doesn’t understand any domain. We need to train the chatbot to understand the domain. Also, based on the complexity of the domain, you would incrementally train and add subdomains. For instance, a chatbot helping you book a cab is an example of a fixed domain, while a chatbot helping to assist doctors for cancer treatment would be trained on various types of cancer incrementally.

As mentioned in the shopping advisor example, understanding the meaning of the same word in different contexts is difficult for the current generation of NLP implementation to resolve, and you need to rely on custom techniques to handle such conditions.

Now, let's look at some marketing gimmicks around AI chatbots:

  • INGEST AND KNOW IT ALL chatbots — These are chatbots being marketed where you can ingest millions of documents, like medical literature and can ask questions, which can provide expert assistance like diagnoses of diseases. Such kinds of systems, unless trained appropriately, will never provide desired accuracy. By appropriately, I mean it can take years to train these systems. The fundamental problem with these systems is that they still don’t understand the complete language and complexity of the domain. You typically end up with custom domain adoptions and infinity language rules, which is definitely not smart enough to manage in the long run. The predictions of such systems are usually not accurate.
  • Self-learning chatbots — How often you have heard this terminology? This, again, is a misconception where chatbots are said to learn on their own. You must train a chatbot on what you what the chatbot to learn. Usually, you would capture the user behavior details through their interaction with the chatbot application.
  • This would include capturing user analytics information like capturing his likes or dislikes in some way, either through explicit or implicit means. Explicit information can be a user rating a product and implicit can be the time a user spent looking at a response.

Once you know the user well and have their data, it becomes a recommendation problem on what you want to recommend to the user. So, you end you building a recommendation algorithm to recommend something. For instance, for a FinTech application, this would mean recommending similar stocks based on what stock he views regularly or his portfolio.

Different domain and use cases would need different recommendation algorithms and that needs to be developed as part of the chatbot. However, the learning is boxed; for instance, if you have a chatbot that can assist you in booking restaurants, it can recommend similar restaurants, but it can’t recommend places to stay, as it only knows about your restaurant tastes.

Well, someone can build a recommendation system that tracks what users eat and where they stay and then try to come up with a correlation that provides a recommendation, as the system now knows — “User eating XYZ is most likely adventurous. So, recommend a trekking place.''

Again, in this case, the recommendation is boxed on what you know and what you want to recommend. I don’t know if any such hypotheses exist, but only through data and feedback that can be inferred. The point is, all of these hypotheses, data, and feedback need to be designed and developed, and saying that chatbots learn on their own is quite misleading.

  • General purpose, generative chatbots — A chatbot that is capable of learning new concepts from scratch and provides responses like a human. As it learns from open domain, the chatbots would start behaving similarly to the famous Microsoft Tay chatbot, which was forced to shut down on its launch day, as it started learning unwanted details from tweets and started posting inflammatory and offensive tweets. This is a classic example of what I quoted in my earlier article — “AI can learn but can’t think.” The generative chatbots are formulating the response based on the probability of words and creating a grammatically correct sentence, without understanding the real meaning of it.

As I mentioned earlier, the first focus should be on getting domain-specific chatbots right and, with the current techniques, we are far away from realizing the vision.

Will Chatbots Make Human Agents Obsolete?

To answer this question, let’s understand what functionality chatbots currently provide.

Current chatbot implementation does well for handling a fixed set of dialogs with the user, repetitive tasks, and certain initial aspects of customer service tasks. Wherever there is a fixed set of processes and flows to automate, chatbots can be used to provide 24/7 support for any queries. If human expertise is used for answering basic sets of questions where answers are readily available, it would be eventually be replaced.

But in real-life scenarios, most of the conversation usually doesn’t follow a fixed flow paradigm. But if the conversation moves from basic questions to questions, which need further analysis, or the topic of conversation gets changed, you need a sophisticated chatbot implementation to take care of various conversation flows, identify the context switch, identify intents that your chatbot may not be aware of, and create queries to find that information from your knowledge source. You are now moving from a fixed set of flows to more dynamic flows that need to be interpreted by your chatbot. Building such complex chatbot implementations requires sophisticated domain-specific adoption using machine learning techniques and custom solutions. Current out-of-the-box chatbot services fall short of building such chatbot implementations.

Even if you have all the data in the world at your disposal, infinite processing, and computation power while using the current generation technology and research, you can never build a system that can compete with an expert human in the field. Taking even a 5-year horizon from now, I don’t think we can develop such a level of intelligent chatbots.

For instance, can chatbots or an assistant help a doctor to recommend cancer treatments accurately and consistently? The answer is No.

The information provided from a chatbot can aid doctors to take a clue from the answer provided, it may be right or wrong. You can never certify this. The chatbot would always act as assistance to an expert person to get some job done. Ultimately, these systems are throwing a bunch of answers based on some probabilities. The answers are limited to what you have fed into the system, you can’t infer a new knowledge on the fly or correlate information like a human expert to come to any conclusion.

While there is research attempting to determine the ability to use deep neural nets for conversation flows, we are still quite far away from building truly conversational interfaces that understand the nitty-gritty of language and domain. Also, the answers provided need to be explainable, and, unless you have a way to backtrack on why a particular answer was provided, such deep neural systems can’t be used for use cases that require auditability and explainability.

In short, enjoy the smart chatbots that give a perception of being intelligent, but understand that true intelligence is a long way away.

Can AI Generate Dynamic Responses to Questions?

You can use deep learning to build a chatbot. Various deep learning architectures are available to solve a specific variety of use cases. For instance, for computer vision (i.e. image recognition) you would use a convolutional neural network as the starting point. For language translation or text generation, you would go with recurrent neural networks and so on.

For understanding chat conversations, you would start with a variant of a recurrent neural network. You will build a sequence-to-sequence model. A sequence-to-sequence model, in simple words, consists of 2 components, the first component (encoder) tries to understand the context of input sentences through its hidden layers, and the second component (decoder) takes in the output from the encoder and generates the response.

The above techniques require you to have a large set of training data, containing questions and responses. The technique works in a closed domain, but as the responses are dynamic in nature, putting it directly to your end users can be a bit risky. Secondly, these techniques don’t work when you want to interpret the input sentence to extract the information and formulate a response on your own, like the shopping advisor query use case that we discussed above.

In the case of an open-ended domain, the chatbots would start behaving similar to the Microsoft Tay chatbot example I gave earlier.

Tip — With RNNs, the response/answer is dependent on its previous states (or earlier states). So, for a deep conversational use case where context needs to be available, the RNNs don't work. You need to employ variants on RNN called LSTM. (Long Short-Term Memory networks). There is a lot of research being performed in this area. Going through various deep learning architecture is outside the scope of the article.


The current generation of chatbots is a weak form of AI, which offers an ability to understand the intent of the input message/question. In order for chatbot systems to understand the intent, it needs to be trained with the corresponding domain. You can ask the same question in multiple ways and the chatbot implementation can still infer the intent.

For dialogs, the current technology offers to define fixed conversation flows, so the interactions are boxed and finite.

Chatbots do well for managing productivity and certain aspects of customer service tasks. However, as the complexity of domain increases, current technology falls short, as even after enough training, you would not get the required level of accuracy.

You would need to rely on a combination of other machine language technologies and solutions like rules, inferences, and custom domain metadata to get the solution delivered.

These become a one-off solution, which becomes difficult to generalize. In some cases, even the one-off solution would be very complex, like building an advisor for recommending cancer treatments accurately and consistently.

While there is research regarding using deep neural nets, we are still quite far away from building a true conversational chatbot that understands the nitty-gritty of language and domain.

Also, the answers provided need to be explainable, and unless you have a way to backtrack on why a particular answer was provided, such deep neural systems can’t be used for use cases that require auditability and explainability.


Created by

Navveen Balani

For more details, visit my website @







Related Articles