Machine learning — a subset of Artificial Intelligence — incorporates neural networks to create some amazing software that we use on a daily basis.

If you used Google to find this medium article, you used Google’s neural network that ranks the most relevant pages based on the keyword(s) you gave it. If you recently went on Amazon.com, all the recommended products that the website suggested to you was curated by a neural network. Even today, if you used your phone, you probably encountered a neural network that made your life easier. It’s all around us, and they all do different things and work in different ways.

So… What is a “Neural Network”

The word ‘Neural’ is just another word for the Brain. “So it’s a brain network?” In essence, totally! A neural network is a simplification of our most powerful tool, the brain. It uses neurons that are all connected to each other through weights (the lines in the image below). The neurons are given some numerical input and are multiplied by the weights. The weights are the heart of the neural network, and by changing them to specific numerical values, we can process any input and get a desired output. A neural network is just a way to process data. Data is the 🔑 here and by manipulating data through a variety of neural networks… we can build very powerful tools that do some insane things!

The Perceptron — The Oldest & Simplest Neural Network

The perceptron is the oldest neural network, created all the way back in 1958. It is also the simplest neural network. Developed by Frank Rosenblatt, the perceptron set the groundwork for the fundamentals of neural networks.

This neural network has only one neuron, making it extremely simple. It takes n amount of inputs and multiplies them by corresponding weights. It computes only one output. It suffers because of its lack of complexity in that it can only process data with one level of complexity.

📷📷

Use cases:

Understanding the human brain
Scaling up for more advanced neural networks

Multi Layer Perceptron — What are layers?

A multi layer perceptron (MLP) is still a perceptron however there is added complexity through the advent of layers. There are three types of layers in an MLP:

Input layer:

The input layer is what it sounds like, the data you are inputting into the neural network. Input data has to be numerical. This means you might have to take something that is non-numerical and find a way to make it numerical. The process of manipulating data before inputting it into the neural network is called data processing and often times will be the most time consuming part to making machine learning models.

Hidden layer(s):

The hidden layers are composed of most of the neurons in the neural network and is the heart of manipulating the data to get a desired output. Data will pass through the hidden layers and be manipulated by many weights and biases. It is called the “hidden” layer because developers of neural networks will not directly work with these layers, opposed to input and output layers.

Output layer:

The output layer is the final product from manipulating the data in the neural network and can represent different things. Often times, the output layer consists of neurons that each represent an object and the numerical value attached is the probability that it is that specific object. Other times, it could be one neuron output that is the value of something when given certain inputs. The main idea is that output layers is the result of the data when passed through the neural network, and the goal we are trying to reach.

Feed forward Principal:

The idea is that we pass numerical data into the networks and it continues forward having many operations done to it. We feed data forward. To get the right operations such that any input given will always produce a desired output requires training. Training is essentially finding what yields the best results and applying them to the network.

Use cases:

Computer Vision
Natural Language Processing
Basis for other neural networks

Convolutional Neural Network — Convolutional Layers?

A convolutional neural network still uses the same principles that MLPs use, however this neural network implements convolutional layers. It is important to note that convolutional neural networks are usually used for images and video.

It is important to recognize that images are just a grid of numbers, and each number tells you how intense a certain pixel is. Knowing that it is a grid of numbers, we can manipulate these numbers to find patterns and characteristics of the image. Convolutional layers do this by using filters.

Filters

A filter is a defined N x M (N & M represents the size of the grid) grid of numbers that is multiplied with the original image multiple times. To understand what is actually happening, refer to the animation.

The filter is moved across the grid and produces new values. These new values can represent edges or lines in the image. For example, take the filters below:

The horizontal filter tries to eliminate the values other than the vertical center. It does this by using negative values to get rid of the edges and 0 for the center to make the pixels neutral. If the filter is successful, you will be able to see a horizontal line from the new values. The same is true for the vertical filter just reversed.

After the filters have been applied throughout the full image, we can easily extract the main features found by the filters using a pooling layer. Figuring out what numbers should be in the filter is decided when training the model. Figuring out what are the best numbers will yield the best results for the overall task.

Pooling Layer

Pooling layers do what they sound like. They “pool” together the most important characteristics found by the filters. This is done by using multiple methods. One popular method is Max Pooling, where for each filtered part of an image, the largest number is taken and stored into a new grid. What this basically does is takes the most important characteristics and compresses them into one image, for it to be processed into an MLP. This process can also be known as data sampling, and using this process yields very promising results.

Use Cases

Image classification
Computer vision
Find characteristics / patterns in images

Recurrent Neural Network — Temporal Data?

The data we can analyze with neural networks is not completely confined to static data. Things like images, numbers and frames are all data that can be analyzed by itself. However, data that depends on past instances of itself to predict the future are examples of temporal data. Things like stock market data, time-series data, brain-wave data and more are always analyzed by using past instances of a dependent variable. The neural networks mentioned thus far don’t address other states of data, however RNNs are the solution.

State Matrices

RNNs remember previous states of data by storing the last output in it’s own memory. These are called state matrices. It works like a normal layer in an MLP but it uses the state matrix to calculate the new output. Using previous outputs and states of data essentially considers that data in the final output. This is crucial for applications like stock-market predictions and time-series forecasting.

LSTMs

Long Short Term Memory networks further expand on this idea of saving state matrices in two states. There is a long-term state and a short term state. If a state persists in the model output, it will become a long-term state matrix and will weigh more when considering new data.

The LSTM system is super efficient when finding patterns in continuous data and is at the forefront of stock market predictors.

Use Cases

Natural Language Processing
Stock Market Predictions
Time based data predictions

Autoencoders — Representing Data in a Compressed Way

Most neural networks take in data and make some types of decisions. Autoencoders have a different task, and that is to figure out a way to compress data but maintain the same quality.

Traditionally in machine learning, the labels attached to our data are different and the goal of the neural network to produce. In an auto encoder, the labels are the same as the inputs.

So, in this architecture, you have an identical input and output layer. The hidden layer is smaller than the input and output layer (in terms of nodes) and is called the “bottle neck”. Since the bottleneck is smaller, it is forced to find a way to compress the original data and put it back in the output layer. This compression is often better than conventional means because it can still maintain high quality.

Use Cases:

Mostly for representing large amounts of data in a smaller, compressed way.

Key Takeaways

Multi-layer Perceptrons (MLPs)

Basic neural network
Used for simple tasks

Convolution Neural Networks (CNNs)

Uses filters and pooling to find characteristics in data
Mostly used for image tasks

Recurrent Neural Networks (RNNs)

Uses the previous result of data in figuring out new output
Used for temporal data

Auto encoders

A new way to compress data without losing quality