5 ML models every data scientist should know

Machine Learning can be confusing. Here's a quick guide to the best models to start out with.


Sarah Brabec

2 years ago | 2 min read

Sometimes the hardest part about machine learning is figuring out where to start. ML has made waves in the past few years, and the options for what models to use are ever-growing. Many times there isn’t one “right” model for the problem you’re trying to solve, so it can be helpful to be familiar with several options. With so many factors such as size, quality, and type of data, it’s important to be familiar with different models, so you know the algorithm that aligns best with your needs. Here are five general algorithms that help provide a basic understanding of ML and help you find the perfect place to start.

Logistic regression model

In terms of regression analysis, logistic regression models estimate the parameters of a logistic model. The most common models are structured for binary outcomes (yes/no). Logistic regression models are used to consider the probability of an event taking place with log odds combined with 1+ independent variables. Logistic models are helpful for classification problems where you are trying to determine the best fit.

Decision Tree Model

Decision Tree models are a subset of supervised machine learning. A classification model reads input and generates a classification that collects the data into a category. An example of binary classification is creating a model that decides whether or not an email is labeled as spam. Decision tree models use a sequence of queries or tests that adapt as data is analyzed, allowing the model to evolve and come to pretty significant conclusions. Decision trees are one of the best ways to use modeling for data exploration and learning, and they boost your analysis with easy interpretation, accuracy, and efficiency.

Naive Bayes

Naive Bayes Algorithms (NB for short) are Bayesian models that utilize nodes for each of the columns or features in your data. The reason for the naive is that it ignores any previous parametrical distribution and makes an initial assumption of independence of all features.

K Nearest Neighbor

The KNN model is a supervised, non-parametric model which uses proximity to classify and predict insights about the grouping of a single data point. KNN models rely on the initial assumption that similar data points will be similarly placed, and extrapolates this concept into classification analysis.

Support Vector Machines

In machine learning, support vector machines are supervised learning models with associated learning algorithms that analyze data for classification and regression analysis. Overall, SVM models are often significantly accurate and use less computing power in comparison to other models.


There are many more models that deserve recognition and awareness. However, these are a great start if you’re trying to build your understanding of Machine Learning. Leave a comment sharing your favorite model and thanks for reading!


Created by

Sarah Brabec

data is in my DNA | ml & ai nerd | cat mom | hiker







Related Articles