cft

Predicting Used Car Prices using MindsDB

This tutorial walks you through the process of creating a Predictor model using MindsDB and predicting the prices of used cars based on different Feature parameters.


user

Rutam Prita Mishra

2 years ago | 8 min read

Introduction

With MindsDB, we can start predicting in SQL straight away. We can fetch our dataset and create a model instantly that can be used to predict the Target value on the basis of the Features.

In this tutorial, we will be training a model that can predict the market price of the used cars based on several features like the model, transmission type, fuel type, and so on. We will be using a used car dataset from Kaggle that would contain the data for several car makers ranging from Audi to Volkswagen.

We will then log in to MindsDB Cloud, connect it to our database and then train a model based on the dataset we have in our database, and then try predicting the prices for the used cars.

Importing Datasets as Tables in our Database

Step 1: Once you have downloaded the Dataset from the above-mentioned link, save the CSV file in a folder. Make sure you have MySQL installed on the system and is active currently. You can use MySQL Workbench to test the Database connections and make sure everything is up and running.

Testing Database Connection on MySQL Workbench

Step 2: If the connection is successful, you can click on OK to save the connection. Then click on the connection and it should open the Localhost dashboard along with the query editor.

Localhost Dashboard

Step 3: Now click on Create a new schema in the connected server, provide a name for the schema and then hit Apply, and then click on Finish.

Creating a new Database Schema

Step 4: Once you have created the database, select Schemas from the bottom left and then right-click on the Schema name you just created and select Tabel Data Import Wizard. This should open a dialog box where you can select the CSV file you downloaded and then click on Next.

Table Data Import Wizard

On the next screen, select Create a new table, give a name for the table as you wish and click on Next.

Selecting Table Name and other options

Now, on the Configure Import Settings dialog box, leave everything at default and click on Next, followed by Next again twice and then finally click on Finish.

Table Import Setting Confuguration

This completes the table data import and now our table is ready for manipulations.

Step 5: Now let's just run a sample SELECT query for our table to check whether we have all the data in the table or not.

Select * from vehicles.allcars LIMIT 20;

This should return a list of 20 rows from the allcars table in the vehicles database. We are now ready with the database and the table to perform further operations.

AllCars Table Data in the Database

Connecting MindsDB to the Database

Step 1: Create an account on MindsDB Cloud if you don't have one yet or you can log in to the portal with your existing credentials.

MindsDB Cloud Sign in Page

Step 2: Once you have logged in, you will find the MindsDB Dashboard loaded with a Query Editor on the top and the Result Viewer at the bottom, and some Learning Hub resources for your reference on the right. Now click on Add data from the top right corner of the screen.

MindsDB Dashboard

Step 3: Now you have to select the Database that you want to connect to. For now, we will be working with MySQL, so I will go ahead and select that from the list.

Database Selection on MindsDB Cloud

Step 4: On the next screen, we have to execute the query on the screen passing all the required parameters to establish a successful connection to the Database. To do that, we will now figure out how to expose the localhost:3306 to the web so that MindsDB can interact with the Database over a TCP/IP connection.

SELECT statement to connect the Databse

Step 5: So, we have to download ngrock and expose the localhost and port 3306 to the internet by using this command.

ngrok tcp 3306

Make sure you sign up for ngrok and connect your account so that there's no time limit for hosting your localhost on the internet. You can connect your account with the authorization token that is generated after you have signed up on the website with the following command.

ngrok config add-authtoken YourGeneratedAuthTokenHere

Running ngrock

Step 6: Now map the values of the parameters needed for the connection query as mentioned below and hit the Run (Shift + Enter) button on the screen. The query should execute successfully without giving any errors on the Result Viewer.

CREATE DATABASE display_name (You can choose your desired Display Name)

WITH ENGINE = "mysql", (Name of the Databse Engine you selected)

PARAMETERS = {   

"user": "root", (Default root user or you can create another one)

"password": "password123", (Password for root User or the User you created)

"host": "0.tcp.in.ngrok.io",(Host URL from ngrok as shown in the image)

"port": "10960", (Port exposed by ngrok as shown in the image)

"database": "Vehicles" (This is optional to provide a DB name)

You can also run a query on the MindsDB Web Editor to check whether your database and the tables are now available or not.

Select * from vehicles.allcars LIMIT 5;

This query should return 5 rows from the allcars table in the Vehicles database.

Checking Table Data on MindsDB Cloud

Training a Model

We are now ready with the database and the data needed to achieve our objective of price prediction for the used cars. But first, we need to create a Predictor model that can do the prediction for us. So, let's get started with that.

Step 1: We have to now use the CREATE PREDICTOR syntax to create a Predictor model. Use the code below to get it done.

CREATE PREDICTOR mindsdb.predictor_name (Use the name you want)

FROM databse_name (Use the Database Name- Vehicles)

(SELECT * FROM table_name) (Use the training table name- AllCars)

PREDICT target_parameter; (Use the parameter to predict- Price)

This query should return successful status without any errors in the Result Viewer.

Creating a Predictor
Note: The query might error out sometimes saying you have more than 10K records and MindsDB supports only upto 10K records in it's free version. So, try using LIMIT 10000 within the SELECT query to cut down the records and create the predictor succesfully.

Step 2: Training the model might take some time. So, we can keep checking its status with the following command.

SELECT status

FROM mindsdb.predictors

WHERE name='name_of_the_predictor_model';

Initially, the status should be training and gradually return complete status pointing that the model is now ready to use.

Predictor Model Status

Describing the Model

Now, that we have created a model and already trained it to do the prediction for us, let us dive deep into it and find out the minute details about the model.

MindsDB provides the DESCRIBE statement to explain the different attributes of the available models. We can describe a predictor in the following ways.

  • By Features
  • By Model
  • By Model Ensemble

DESCRIBE mindsdb.model_name.features;

This statement is used to display the way that the model encoded the data prior to training. The output should be as follows.

Model Features Description

We can further break down columns in the result table as follows.

  • column - List of all the columns used from the Training Table
  • type - Type of data inferred
  • encoder - Encoder used
  • role - Defines the role that the column serves for the model (Here Price is the target as we want to predict it and all others are Features based on which we will try to do the prediction.)

DESCRIBE mindsdb.model_name.model;

This statement is used to return the performance of the candidate models used to train the Predictor.

Predictor Candidate Model Description

The columns in the result table can be explained as follows.

  • name - Name of the candidate model
  • performance - Defines the accuracy of each candidate model which varies between 0-1 and the value closer to 1 depicts that the model is more accurate
  • training_time - Time taken by the candidate model to complete the training
  • selected - Either 0 or 1 is set for each candidate model. 1 is set for the most accurate model (LightGBM here) and the rest all are set at 0.

DESCRIBE mindsdb.model_name.ensemble;

This statement returns a JSON-type object explaining the attributes used to select the best candidate model to do the target prediction.

Attributes used to find the Best Model Classifier

Thus, we know our trained model completely now and it's time to query the model to find the predictions.

Querying the Model

We are now ready to do our first prediction based on the features using the SELECT statement. We can predict the price of the used car based on a single feature or a set of features together. Let's try doing it both ways one after the other.

SELECT target_column_name (Name of the Target Column - Price)

FROM mindsdb.model_name (Model Name - used_allcars_price_predictor)

WHERE feature_name="value_of_the_feature"; (Feature Name - For e.g., CarName)

(Feature Value - Name of the Car)

Now, this SELECT statement should return the price of the car we mentioned in the query upon successful execution. You can expect an output as shown below.

Price Prediction of a BMW X4 based on single Feature

Now let's try to predict the price of a car based on more than one parameter using the SELECT statement. The query should look as follows.

SELECT target_column_name (Name of the Target Column - Price)

FROM mindsdb.model_name (Model Name - used_allcars_price_predictor)

WHERE feature_name="value_of_the_feature" (Feature Name - For e.g., CarName, FuelType)

AND feature_name_2="value_of_the_feature_2"; (Feature Value - Name of the Car and Fuel Type)

When this query gets executed without any errors, we can expect to get an output like the one shown below.

Price Prediction of a BMW X6 based on multiple Features

Conclusion

We have finally predicted the car prices as we wanted by using a predictor model powered by MindsDB. As all of you would have noticed, it is so simple to get this task done. All you need to do is connect your database, import a dataset, and run a few simple queries to get the prediction model up and running.

This concludes the tutorial and I hope all of you had a great time reading this. You can use this dataset to train your own predictor models and predict the price of different car makers if you want to have a hands-on after reading this. Feel free to drop in your comments or suggestions and don't forget to give it an upvote if you really found this useful.

Note: All the dashboard images used in this tutorial belongs to MindsDB, MySQL and ngrok and are used here only for illustrative purposes.

Upvote


user
Created by

Rutam Prita Mishra

I am passionate software developer mainly focussed on web development and currently learning Web3. I even write technical articles in my free time and love travelling and exploring regional delicacies.


people
Post

Upvote

Downvote

Comment

Bookmark

Share


Related Articles