A time series is a sequence of data points that occur in successive order over some period of time. It allows one to see what factors influence certain variables from period to period. It is used to predict future values based on previously observed values.

Time Series Forecasting is an important technique in machine learning, applicable to several domains including medicine, weather forecasting, biology, supply chain management and stock prices forecasting, etc.

As different time series problems are studied in many different fields, many new architectures have been developed in recent years. This has also been simplified by the growing availability of open-source frameworks, which make the development of new custom network components easier and faster.

Why Deep Learning? Deep learning neural networks are able to automatically learn arbitrary complex mappings from inputs to outputs and support multiple inputs and outputs.

Artificial neural networks (ANNs), usually simply called neural networks (NNs), are computing systems inspired by the Biological Neural Networks (BNNs) that constitute animal brains.

An ANN is based on a collection of connected units or nodes called Artificial Neurons, which loosely model the neurons in a biological brain. Each connection can transmit a signal to other neurons.

The connections are called edges. Neurons and edges typically have a weight that adjusts as learning proceeds. The weight increases or decreases the strength of the signal at a connection.

Typically, neurons are aggregated into layers. Different layers may perform different transformations on their inputs. Signals travel from the first layer (the input layer), to the last layer (the output layer), possibly after traversing the layers multiple times.

Our Datasets

The dataset used in this project was gotten from a competition on kaggle (The M5 Competition), which ran from 2 March to 30 June 2020 and the script notebook is on my Github.

I used these datasets to predict item sales at stores in various locations for a 28-day time period.

Our datasets consist of sales data, generously made available by Walmart, starting at the item level and aggregating to that of departments, product categories and stores in three geographical areas of the US: California, Texas, and Wisconsin with 10 stores in total: 4 in Califonia and 3 each in Texas and Winsonsin.

Our first dataset (Train sales) consists of the number of units sold at day i, starting from 2011–01–29 to 2016–06–19 in the 10 stores.

The goods sold are divided into 3 categories which are: Hobbies, Household and Foods. These are then further divided into items which are Hobbies_1, Hobbies_2, Household_1, Household_2, Foods_1, Foods_2, Foods_3.

Our second dataset (Calendar) contains information about the dates the products are sold.

Snap_CA, snap_TX, and snap_WI are binary variables (0 or 1) indicating whether the stores of CA, TX or WI allow SNAP purchases on the examined date. 1 indicates that SNAP purchases are allowed.

The United States federal government provides a nutrition assistance benefit called the Supplement Nutrition Assistance Program (SNAP). SNAP provides low income families and individuals with an Electronic Benefits Transfer debit card to purchase food products.

Our third dataset (sell prices) contains information about the price of the products sold per store and date. It is the price of the product for the given week/store. The price is provided per week (average across seven days). If not available, this means that the product was not sold during the examined week.

Difficulties Encountered

Firstly, the datasets are really large and so merging them made my work station crash a lot. I then moved to Goggle Colab. It worked for a while and then it started crashing again.

I finally moved to Amazon Sagemaker and it was a lifesaver.

Exploratory Data Analysis

Here is the datasets time series plot:

The obvious drop is due to the Christmas effect. There is a drop in sales on Christmas day.

This shows the different time series distribution for the different 3 states. Clearly there is an upward trend over the years with California having more sales. This is due to the fact that California has 4 stores and the 2 other states have 3 each.

This shows the different time series for the different categories. Food takes is sold most, followed by household and then hobbies.

This is further broken down to show the different stores in the different states.

For Seasonalities,

there is usually a drop in sales at midweek, it picks up by Friday, peaks on Sunday (except for Wisconsin) and then drops. Obviously, stores are open on Sundays, unlike here in Germany where stores are mostly shut.

Also, for the monthly seasonality, we see a drop in sales by May, but then it picks up which could be due to the Summer season. It peaks in August then drops. Picks up by November due to the holiday seasons and then drops at Christmas, just as the year is coming to an end.

Data Modelling Approach

After merging the datasets and dropping some columns to avoid repetitions and to also save space, I grouped this new large dataset that I now have so that each column is a representation of each category in each store. With this, I can then use one column's data to fit my model and then predict the next 28 days.

Models

LSTM: Long Short Term Memory networks “LSTMs” — are a special kind of RNN (Recurrent Neural Networks), capable of learning long-term dependencies. LSTMs are mainly designed to avoid the long-term dependency problem (remembering information for long periods of time) which RNNs have.

A LSTM unit is composed of: a cell state C(t) , that brings information along the entire sequence and represents the memory of the network; a forget gate, that decides what is relevant to keep from previous time steps; an input gate, that decides what information is relevant to add from the current time step; an output gate, that decides the value of the output at current time step.

Neural Prophet: NeuralProphet is a python library for modelling time-series data based on neural networks. It’s built on top of PyTorch and is heavily inspired by Facebook Prophet and AR-Net libraries.

Using PyTorch’s Gradient Descent optimization engine making the modeling process much faster than Prophet Using AR-Net for modeling time-series autocorrelation (aka serial correlation) Custom losses and metrics.

Facebook Prophet: Facebook Prophet follows the sklearn model API. It create an instance of the Prophet class and then call its fit and predict methods.

The input to Prophet is always a dataframe with two columns: ds and y. The ds (datestamp) column should be of a format expected by Pandas, ideally YYYY-MM-DD for a date or YYYY-MM-DD HH:MM:SS for a timestamp. The y column must be numeric, and represents the measurement we wish to forecast.

I compared my results to one of a former colleague who used the ARIMA and SARIMA models for the same forecast.

SARIMA: Seasonal Autoregressive Integrated Moving Average, or Seasonal ARIMA, is an extension of ARIMA that explicitly supports univariate time series data with a seasonal component.

It has three new hyperparameters to specify the autoregression (AR), differencing (I) and moving average (MA) for the seasonal component of the series, as well as an additional parameter for the period of the seasonality.

Models Comparison

After running the previously defined models and obtaining predictions, I plotted it against the actual values which was provided after the competition for evaluation.

Points to note: Facebook Prophet and Neural Prophet when compare to our actual results were close. However, looking at the LSTM model, the forecast is 1 step ahead of the actual predictions which does not look good.

Next points of action

To make the LSTM model better, things I could do:

Increase the hidden layers in the LSTM node,

Add more layers of the LSTM,

Hyperparameters tuning.

The scripts can be found on my GitHub.

Thanks for reading.