Machine Learning started as a bunch of standalone algorithms learning patterns from data and solving problems like regression (predicting a continuous value) and classification (categorizing a data point). Today these approaches have evolved and have become the building blocks for much more complex systems in domains like computer vision and natural language processing. We are inching closer towards replicating human-like intelligence and along with eliminating the aspect of human error. This has propelled Machine Learning as one of the most trending technologies of the current decade.
Why deploy machine learning models?
Machine Learning applications have evolved from standalone analytics to Machine Learning as a service. This approach reduces the entry threshold for businesses to adapt Machine Learning. Organizations no longer have to invest huge amounts of resources to build a Data Science team and wait for years to achieve excellence. Rather they can just start using a third party service who are specialist in solving certain problems.
Deploying Machine Learning models gives you the ability to make them available as a third-party service. Other alternative benefits include modular architecture; which in turn helps you consume a single service on a variety of platforms. E.g. a language translation service once deployed can be used through a mobile app, web-app or even Alexa!!
Is it different from the test pipeline in standalone ML?
No, the intuition behind test pipeline and model deployment remains the same. However, one clear difference to note it that test pipeline assumes data to be stored in files and hence relies on file I/O for data retrieval and storage of results. Model deployment (for ML as service) on the other hand has a requirement to communicate with a multitude of devices across various platforms. Hence a platform-independent communication protocol is required for smooth functioning. Generally, this is achieved through APIs.
What is an API?
An API is a software intermediary that allows two software applications to communicate with each other. They generally use HTTP requests and follow certain standard guidelines making communication universally comprehensible.
Imagine yourself as a customer in a restaurant. The kitchen is your service provider that will fulfil your order. You need a link and communicate your order to the kitchen and then to deliver your food back to your table. This link (waiter) is synonymous to API in the software development world.
Deployment case study
Here I will talk about deploying a conversational agent that I recently made with my colleague Vivek.
Problem statement – Develop a chatbot that can replace a human desk assistant and answer questions regarding a particular context.
Model – Seq2Seq LSTM with Attention (using TensorFlow)
Input – Context paragraph and questions regarding the context
Output – A paraphrased answer from the given context
1. Model Training
Model training and validation are the pre-requisites for model deployment. This phase can be done in isolation and usually, GPU servers are used to speed up the training process. Once we are happy with the model performance we can port the model to relatively inexpensive CPU servers for inference (test predictions). In a deployment scenario where test data is likely to come as single data points, the execution time of GPU and CPU are pretty similar.
Train on GPU to speed-up model training.
Save conventional ML models into pickle format. OR
Save neural networks weights so that it can be ported to a different machine.
2. Choosing a deployment machine
The deployment machine will serve as the central point of contact for your customers to avail Machine Learning as a service. Hence it should be a highly available, fault-tolerant and scalable system. Cloud systems provide all such features and much more, thereby becoming our first choice for deployment. In our scenario, we used a dedicated Virtual Machine on Azure (for maximum flexibility and customizability).
Setup a machine on the cloud. OR
Get a dedicated IP for your hosting machine to make it accessible from the external network.
3. Setting up the deployment machine
Most nascent Machine Learning models are developed in Python. Hence using a Python-based web application server becomes an obvious choice. Python Flask is a popular minimalistic web microframework. While it can be used to develop a full-fledged web application, its simplicity in handling user request makes it ideal to make custom APIs.
Install Python Flask and other dependencies for model inference.
Add firewall exception for the port on which the flask application listens.
4. Keeping the model in memory
APIs have a set of predefined instructions that gets executed on user request. While this is good for most of the scenarios, certain Machine Learning models can be large (500~900 MB). Loading them every time on user request will make the response time very slow because of the file I/O. Hence it’s better to keep the model in memory to have a faster turnaround time for user requests.
5. Exposing API
In order to make our API available, we need to set routes. Routes are specific URLs where a client can send an HTTP request along with JSON payload ( a dictionary of data associated with the request).
Whenever a client sends a request, the function corresponding to that route gets executed. This function needs to extract important data from the payload, pre-process it (if required) and then pass it to the model. The model will make the predictions which should be returned by the function as a JSON payload.
The API created using flask can now be used in various ways:
For each message that the user types, the front-end makes an AJAX POST request to our server URL (where out model is deployed and Flask is listening).
Once the server gets the request (a question in our use-case), it is passed through our model which generates the answer. This answer is returned as the request response which gets reflected back on the client screen.
The same deployed model can be used by Alexa as well. All you have to do is set up an Alexa skill developer account and create a new custom skill.
Configure this skill to use a custom HTTPS endpoint (our server).
Once configured, all request made to Alexa will be forwarded to our hosted model.
Just like we set up routes in Flask, we can set intents in Alexa which define the nature of response and interaction with Alexa.
Problem-solver at the core, Sagar likes to call himself "jack of all trades". He has experience in various areas like hardware networking, online marketing, business operations, web development, cloud computing and DevOps. With his diverse experience and love for algorithms, he plans to leverage data and Computer Science to optimize daily processes. He also intends to push human boundaries through research and applied engineering. Don't be surprised if he helps solve cancer one day ;)