In this age of automation, personal assistants are ubiquitous for various tasks, right from getting weather details, making calls, setting reminders, managing calendars to making appointments. At the heart, these applications are state of the art deep sequence deep neural network models with an attention mechanism.
Training these bots requires tremendous amounts of data and computational resources which makes it a machine learning engineering challenge. Tech giants who have access to zillions of documents and can afford high-performance GPU clusters have trained these systems and offered readymade solutions like chatbots to websites which help in engaging the customers and thereby effectively managing the traffic.
Sagar Patel and Vivek Bakaraju, Data Scientists at INSOFE aimed at building a user-friendly chatbot that can assist in answering questions and provide information from any given text data. Also, to understand the scientific and engineering challenges involved in building such systems.
Broadly speaking, there are two types of chatbot models. One is declarative models, where the user initiates queries with automated responses and conversational menus. These bots use natural language processing and not much of machine learning. They are highly specialized with structured interactions and are pre-defined with a conversational flow. So, when a user throws a query, the bot responds with a pre-defined script from the library.
The second type is an ML/AI bots that are data-driven trained on huge amounts of data which includes context, questions, and corresponding answers. These models are considered ‘intelligent’ bots because they can act smart and respond back with the most appropriate answer possible. This is possible because they get trained to user’s queries and makes a pattern to respond back with the most reliable answer possible.
- Resource saving for the sales team
- User experience improvement
- Purchase process simplification
- Customer care improvement
- Personalized service
Stanford Question Answering Dataset (SQuAD) is a new reading comprehension dataset consisting of questions posed by crowd workers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage. With 100,000+ question-answer pairs on 500+ articles, SQuAD is significantly larger than previous reading comprehension datasets.
Below mentioned are some of the key Features of SQuAD:
- It is a closed dataset meaning that the answer to a question is always a part of the context and also a continuous span of context.
- So, the problem of finding an answer can be simplified as finding the start index and the end index of the context that corresponds to the answers.
- 75% of the answers are less than equal to 4 words long.
BiDAF (Bi-Directional Attention Flow): This model is a hierarchical multi-stage process and consists of five layers:
- Word Embedding Layer: Maps each word to a vector space using a pre-trained GloVe word embedding model.
- Contextual Embedding Layer: A Long Short-Term Memory Network (LSTM) is used on top of the embeddings provided by the previous layers to model the temporal interactions between words. These first three layers are applied to both the query and context.
- Attention Flow Layer: Couples the query and context vectors and produces a set of query-aware feature vectors for each word in the context.
This layer is responsible for linking and fusing information from the context and the query words. Unlike previously popular attention mechanisms, the attention flow layer is not used to summarize the query and context into single feature vectors. Instead, the attention vector at each time step, along with the embeddings from previous layers, is allowed to flow through to the subsequent modelling layer. This reduces the information loss caused by early summarization.
The inputs to the layer are contextual vector representations of the context and the query. The outputs of the layer are the query-aware vector representations of the context words, along with the contextual embeddings from the previous layer.
In this layer, we compute attentions in two directions: from context to query as well as from query to context. Context-to-query (C2Q) attention signifies which query words are most relevant to each context word. Query-to-context (Q2C) attention signifies which context words have the closest similarity to one of the query words and are hence critical for answering the query.
- Modelling Layer: Employs a Recurrent Neural Network to scan the query aware context.
- Output Layer: Provides an answer to the query. The QA task requires the model to find a sub-phrase of the paragraph to answer the query. The phrase is derived by predicting the start and the end indices of the phrase in the paragraph. We obtain the probability distribution of the start index over the entire paragraph.
- Training Details
The context sequence length was 300, query sequence length was 30 and embedding size was 100. We trained the network on Nvidia GTX 1080Ti card for 5 epochs which took about 40 minutes. Adam optimizer was used with a learning rate of 0.001. We deployed the TensorFlow model with flask and created a REST API for a web application. We can enter any paragraph as a context and ask questions from the context. Pasting any lengthy text will replace the existing context with the text entered.
Here are some of the results from Sahadev, our bot:
We also observed that the response depends greatly on how we frame the question. Below we can see how the same question can get a different response on the sentence structuring. Hence, it is important to make sure that the grammar structure is consistent.
The above approach has a limitation of 300 words for the context data. However, in a real-world case, the context is typically a collection of numerous web pages. My next article will address this limitation and I will talk about more generalizable AI assistants. Stay tuned!