Deploy a Machine Learning ONNX Model on AWS using FastApi and Docker
This article talks about deploying a simple machine learning model on AWS. Cloud deployment is a skill every data scientist or machine learning developer should know and this blog takes a deep dive into the entire process of saving a model file and deploying it on cloud.
Photo by slon
There are so many tutorials about the intricacies of various machine learning models, but very few talk about actually creating a production-ready API. Data scientists have two kinds of roles.
- Generating insights and creating models from the data given to them.
- Developing software that uses ML at its core.
This article will focus on the second part and will show you how to save a machine learning model in ONNX(Open Neural Network Exchange) format and then create a production-ready API from it. If you are working in a company, it becomes difficult for a non-machine learning engineer to get some insight from the model you have created. It also becomes difficult to handle both machine learning and app development. Production-ready API makes this job easier as it makes it simple for you to integrate your work with other developers who can make a better web app.
This article will be helpful for those who don’t have much experience in model deployment and, instead of going directly into a complex model, wants to start from a basic one. Before moving forward, clone this repository and then follow along.
By the end, you’ll understand how to:
- Save a model in onnx format.
- Create an API using FastAPI and uvicorn.
- Containerize the application using docker.
- Deploy the Docker image on AWS Elastic Beanstalk.
- Test the API using python.
Why not put the ml model at the back-end?
Before moving forward, let’s talk about why we are doing this in the first place.
Suppose you are making a website or a mobile app that will be using some machine learning model, then why not put the entire code in the backend. Some of the reasons are:
- The model becomes scalable when it is a separate entity and not implemented from scratch at the backend.
- Easy to integrate with multiple apps as the developer need not worry about the ML part and can directly use the API as a black box.
Now, let’s start the deployment.
Open Neural Network Exchange(ONNX)
This is a new standard for exchanging machine learning and deep learning models. It makes these models portable and prevents vendor lock-in. When a model is saved in .onnx format, a single onnxruntime can be used for inference for all different models. This tutorial will show you how to save a simple random forest classifier in onnx format and generate inference from it. All this will be done inside the FastAPI app.
How to save a model in ONNX format?
For this tutorial, a very conventional dataset will be used as it will be easier for a beginner to follow along. The dataset is called iris dataset. Before moving to the code make sure you install skl2onnx by running.
pip install skl2onnx
If you open the model.py file you’ll see the code.
There are two functions in this file, load_data(), which load the iris data to save the column names and the target names in a pickle file.
We will call the API by passing a dictionary with key as column names(which in this case will be petal_length,petal_width,sepal_length, and sepal_width) and values as the value we want to assign. The output will be a number 0, 1, or 2. We then map it back to the flower names to make it easier to understand.
The second function is where we are training a random-forest model and saving it in onnx format. Line 42 defines the initial data type, which in our case is a FloatTensorType. We pass the shape as None, X.shape. We then use the function convert_sklearn (Line 45) from skl2onnx. In this function, we pass the rf object and input datatype. We then open a file, serialize the model(Line 48–49) to string, and save the model with the name rf_m.onnx. To learn more about onnx follow this.
Now, if you understood this, open the command line and go to the folder containing model.py (in this case, it is the Fast_api_model-master folder), and execute it using
After executing it you’ll see the files rf_m.onnx, target.pickle and features.pickle in the app folder.
Creating an Endpoint using FastAPI
The majority of the articles on deployment on cloud discuss Flask, and there’s nothing wrong with it, but it is not that fast compared to FastAPI. Just like Flask, an endpoint can be created with minimal code. It automatically creates Open API(Swagger) and Redox documentation. In this article, I won’t be going into too much detail. To learn more about FastAPI, click here.
To host this model, we will be using uvicorn. It is an ASGI server that allows asynchronous processes compared to a more general WSGI server. This speeds up the process. Before proceeding further, make sure you install FastAPI and univorn.
pip install fastapi
pip install uvicorn
To make inference you will need onnxruntime. Install it by running:
pip install onnxruntime
After installing the necessary packages, we create the main.py file, the code for the endpoint creation.
We start the code by importing the necessary libraries, and just after that, an object of class FASTAPI is created, and it is named app. This will be used to run the application. We then load the target and features file created using the model.py file. Now an inference session is created. The same Inference session will be used irrespective of the ML library, which is one reason why the onnx format was created. This inference session will generate the input and output names used while saving the onnx file.
One of the reasons FastAPI is preferred is its ability to create documentation, and we do this here by creating a class Data that inherits from a class BaseModel. In this class, we provide information about the features we are using to generate the prediction.
The most crucial step in creating an endpoint is handling GET or POST requests. Here we are accessing the API by using a POST request, and we achieve this by using.
This tells the application to call the predict function when a post request comes. In this function, we first convert the data to a numpy array. Since we are passing one data point at a time, we have to reshape it to (1,4), where four is the number of features in the dataset. Since we defined the input data as a float tensor, we explicitly typecast the input to float. To generate a prediction, we run the session and pass the output, input data names, and the data for which the prediction is to be generated. This prediction value is then converted into a dictionary and returned.
Running the model and generating Requests
To run this model open cmd and traverse to the Fast_api_model-master folder and execute.
The server will start and you can access it from the link 127.0.0.1
Use requests.post() method to send a POST request to the API. There are multiple ways to send a post request but I found this method relatively easier.
To access the swagger page, go to the URL the uvicorn server is running and add /docs at the end. The swagger page will look like this. Here you can see the schema where features used in the data-set are given, and you can even test the API from here.
Containerizing the application
The portability aspect in containers enables easy and quick deployment to multiple hardware platforms and operating systems. To achieve this, we will be using docker. To set up docker, follow the instructions here. You’ll also need a Docker Hub account.
To create a docker image, two files will be required
The docker file contains the instruction for creating the environment, installing dependencies, and running the application.
The requirements.txt file contains the necessary packages required to run the application.
Inside the Fast_api_model-master directory run:
docker build -t <image_name>./
docker run --name <container_name> -p 8000:5000 <image_name>
To test the application first determine the host IP using:
In this case, it is 192.168.99.100. We can test this just like we tested the local deployment.
Push this image to your docker hub account by running the commands:
docker tag <image_name> <dockerhub_username>/<image_name>:<tag_name>
docker push <dockerhub_username>/<image_name>
AWS Elastic Beanstalk
Now we host the app on AWS, you’ll need an AWS account (If you don’t have it create one from here). It will ask for a credit/debit card, and if you follow the options without changing any configuration, you’ll be eligible for a free tier account, and the cost will be minimal.
After creating the account, search for Elastic Beanstalk and open it. The following screen will be visible.
Click Create Application. Add a name and a description. You then have to specify the platform which in this case, will be docker running on Linux.
For the Application code, we have two options either select “Sample Code” or “Upload your code.” We will upload our code, and for that, we will create a JSON file and name it Docker.aws.json.The JSON file looks like this:
Select Upload Your Code and upload this file and click Create Environment.
If you want to customize according to your requirements, click “Configure more options.” This will take some time to deploy, and when it is completed, the following screen will come:
We can test this by going to the link
Remember this link be different in your deployment and you can test that link by using requests.post() function.
In this tutorial, we trained a simple random forest classifier on the Iris dataset, saved it in onnx format, created a production-ready API using FastApi, containerized it using docker, and deployed it on AWS. You can now create your model and try this approach. This approach makes it easier for developers to work with your Machine Learning models because they just have to learn how to use the API without knowing the nitty-gritty of the model. If you have any feedback feel free to share them with me. This is my first blog on Tealfeed, and if this helped you, please like the article. Thanks for reading!!!!