Are stock prices correlated to trading volumes?
How are stock prices correlated to trading volumes?
Markus Rene Pae
Often when obtaining the market data regarding different stocks, you will probably receive open/high/low/close data along with trading volumes. This article explores almost every company on the Standard & Poor’s 500 list and finds the correlation between movements in trading volume and movements in prices.
In capital markets, volume, or trading volume, is the amount of a security that was traded during a given period of time.
About the data
As mentioned, this article will focus almost on every company on the S&P 500 list. The list of symbols was obtained from HERE. Only BRK.B and BF.B were excluded because I wasn’t able to get any data from the Yahoo Finance API.
Note: the mistake was that I was using the symbols BRK.B and BF.b instead of BRK-B and BF-B. The latter is used on YF.
8 Skills You Need to Become a Data Scientist | Data Driven Investor
Numbers do not scare you? There is nothing more satisfying than a beautiful excel sheet? You speak several languages…
The period is set to be between 2000–1–1 and 2019–30–12 (almost a full decade) and one-day intervals should be enough.
Something for Python users as well. The following piece of code was used to create necessary data files for calculations:
This code was used to obtain two .csv files: one for volumes and one for prices
The contents of the symbols.txt can be found HERE. As a result, two .csv files were obtained: one for trading volumes and one for stock prices.
Clarification: to get more or less synchronous price data, adjusted close prices were used.
The correlation coefficient measures how two variables are connected. It can take values from -1 to 1. To interpret its value, let’s examine the following list:
- Exactly -1 means that they are perfectly negatively correlated. If one goes up, the other one comes down and vice-versa.
- -0.7 means a strong negative linear relationship.
- -0.5 means a moderate negative linear relationship.
- -0.3 means a weak negative linear relationship.
- 0 stands for no linear relationship as the two variables are not related to each other in any mathematical way.
- 0.3 means a weak positive linear relationship.
- 0.5 means a moderate positive linear relationship.
- 0.7 means a strong positive linear relationship.
- Exactly 1 means that two variables are perfectly positively correlated. If one goes up then so does the other one.
In our analysis, we would expect to meet some assets where the correlation is as far from 0 as possible (the sign is not important).
Here are the 10 symbols with the highest correlation coefficient:
Except for AMD, the correlation coefficients are quite low! Therefore it is somewhat wrong to expect the stock prices to go up when the trading volumes increase.
Here are the 10 symbols with the lowest negative correlation coefficient:
Wow! That’s another thing! According to the list about the correlation coefficients, they are somewhere between moderate and strong negative correlation. This means that we may expect a decrease in stock prices when trading volumes spike and vice-versa.
Note: it’s probably hard to predict upward trends in stock prices since it requires a drop in trading volumes. Those drops often occur after a jump when speaking of trading volumes. That’s my opinion and this opinion is based on the data I have observed. :)
To get the idea of how those correlations were distributed see the histogram below:
This histogram shows that the correlation coefficients often tend to be negative
From here we can conclude that most of the weight is set near a correlation coefficient of -0.3. Notice that there are far more symbols where the correlation is negative.
How those results were achieved
Here’s the code that was used for this analysis:
To give a quick introduction then:
- Lines 3–8 create a list (stocks) that contains all the symbols that are going to be analyzed.
- Lines 10–11 import the data regarding trading volumes and adjusted close prices from .csv files we have previously created.
- Line 12 is for initializing an empty dictionary to store the results.
- Lines 14–16 represent a for loop where the correlation coefficient is found for each asset and then stored to the results dictionary.
- Lines 18–20 create a Pandas dataframe and store the data inside it so we could have this in a more readable form.
Feel free to take this analysis even further. Some examples of what I would probably do:
- Take another data where time intervals are shorter. It would be awesome to see if the absolute value of a correlation coefficient stays above 0.5.
- Create a machine learning-based price prediction algorithm that will, in the end, give recommendations for investments.
Keep in mind that with those results (with this data) it’s rather difficult to predict rises in stock prices. However, this might be a great tool for predicting downfalls and crashes.
Markus Rene Pae
Technician @University of Tartu | coding, investing, mathematics, data science enthusiast | Medium writer since December 2019 | lifelong learner Medium: https://medium.com/@markusrenepae