Using Python to Download Sentiment Data for Financial Trading.
How to Create a Function that Fetches Market Sentiment Data.
Market sentiment is an extremely important part of trading. It allows us to understand the positioning of the players who potentially could move the markets. Knowing that the majority of hedge funds are bullish on an asset gives us more confidence to invest in it.
Similarly, knowing that almost all of the hedge funds are bullish on an asset could give us a signal that the market may be overly bullish and that it is wiser to wait before investing or even be brave enough to initiate a contrarian position in case the fundamentals start to justify it.
In this article, we will discuss the famous Commitment of Traders Report — COT and present a way to easily get the values using Python. Next we will see how to combine them with their respective assets or currency pairs.
But first, we will introduce the concept of the Commitment of Traders Report before we move on to the more technical elements.
The COT Report
The U.S Commodity Futures Trading Commission (CFTC) publishes statistics of the futures market on a weekly basis called the Commitment of Traders — COT. The report has many valuable information inside, namely the number of futures contracts held by market participants (hedge funds, banks, producers of commodities, speculators, etc.).
Two main categories have to be distinguished before going further:
- Big speculators (Funds or non-commercial players): They deal in the futures market for speculative reasons, i.e. to profit from their positions. Examples of speculators are hedge funds.
- Big hedgers (Dealers or commercial players): They deal in the futures markets for hedging purposes, i.e. to cover their operations or other trading positions. Examples of hedgers include investment banks and big industrial and agricultural giants.
The reason we have used the word big here is because we are interested in those who can make a sizeable impact on prices if they choose to initiate buying and selling activities.
Big can also mean that they invest significantly in research and in understanding the product and therefore they tend to be on the correct side of the market (or opposite when dealing with hedgers).
Every COT report contains information on many assets, and this information comes in the form of long/short for both participants (speculators and hedgers), therefore, we have 4 sets of data for each asset and to simplify this, we can net the longs with the shorts and get a net value, that is, net speculative positions vs net hedging positions.
Obviously, the net speculative positions will have a positive correlation with the underlying asset while the net hedgers’ positions will have a negative correlation with the underlying asset:
- Example 1: Speculative net positions on the EURUSD will have a positive correlation with the currency pair.
- Example 2: Hedgers’ net positions on Gold will have a negative correlation with the asset.
Notice in the graph above how the EURUSD is positively correlated with the Speculators’ positioning and negatively correlated with the Hedgers’ positioning. We can form many statistical and technical strategies on the COT report to help us with the long-term forecasts on the assets.
This forms a huge part in the market sentiment and timing framework. The COT report can be considered to be a leading indicator even though it is mostly a coincidental indicator but it is definitely not a lagging report.
For more strategies on the Commitment of Traders Report:
Getting the COT Data Using Python
The first step is creating a function that opens the desired link and downloads the necessary file. As the documents on the CFTC’s website are zip files, we will include that issue inside the function. Hence our plan of attack would be for the function to find what we need using a URL and download it to our computer.
import pandas as pd
import numpy as np
import zipfile, urllib.request, shutil
import osdef get_COT(url, file_name):
with urllib.request.urlopen(url) as response, open(file_name, 'wb') as out_file:
with zipfile.ZipFile(file_name) as zf:
After importing the necessary libraries and running the required function, we can proceed to use it and give it the link of the 2020 COT report so that it downloads it into our computer.
Note that the COT report is downloadable by yearly format (i.e. we need to download the 2020 report, the 2019 report, etc.)
# Downloading and extracting COT files
The rename function in the above line should be used with the right directory of where the excel file is situated. Remember to change the part where it says Users\sofienkaabar as it refers to my personal path.
After running the above two blocks of code, you should see both a zip file named fut_fin_xls_2020 and an excel file with the name FinFutYY.xls. The excel file will be named 2020.xls as per the last two lines of code we have written.
The next step is to import the excel file into our Python interpreter. I use SPYDER 4.0. The COT file is in its raw form and we have to clean it up and leave the parts we need in form of a dataframe or array if you wish.
data_2020 = pd.read_excel('2020.xls')
data_2020 = data_2020[['Market_and_Exchange_Names',
The first line in the above block of code imports the excel file while the second one chooses the ones we need to keep and they are:
- The market’s name (e.g. Gold or VIX futures).
- The reporting date.
- The percentage of dealers (commercials) who are long.
- The percentage of dealers (commercials) who are short.
- The percentage of funds (non-commercials) who are long.
- The percentage of funds (non- commercials) who are short.
Remember that, intuitively (and empirically), the long dealers series should have a negative correlation with the asset, the short dealers series should have a positive correlation with the asset, the long funds series should have a positive correlation with the asset, and the short funds series should have a negative correlation with the asset.
We can either analyze the four components alone or calculate a net value using the following formulas:
It is up to the trader to choose whether to net them so that he has only two time series to deal with or to keep the four series and make a deeper analysis.
Now, the above can be done for the previous years as well. I recommend downloading the COT values since 2006 so that you have enough history to apply some statistical strategies (most of them are seen in the below link I have provided).
plt.plot(COT_USDCAD) # Plotting the COT after concatenation
An Example of a Strategy
We can use the Bollinger Bands to form a volatility barrier around the COT values so that we get statistical extremes and see if they deliver good signals or not. For example, if we plot the NZDUSD values versus the speculators values with their Bollinger bands, we will get the below graph:
With the below signal chart based on a contrarian strategy that initiates a:
- A long order whenever the COT values hit the lower Bollinger Band.
- A short order whenever the COT values hit the upper Bollinger Band.
A lot of optimization and a lot of other strategies can be applied but I think it is clear that we can do many things with the COT report.
In this article, we have seen what the COT report is and how to use a function in Python to import it directly from the CFTC website rather than manually downloading and formatting it.
Having used the manual method before implementing it in Python, I can tell you that it is very time-consuming. Automation saves us a lot of time when compounded over the years. The gain of time can be used productively elsewhere.
FX Trader | Author of the Book of The Book of Back-Tests