Many of us have felt like our phones were listening to us before. You’ll be talking about something with a friend and later you look at your phone and see an advert of the very thing you were talking about.

The fact of the matter is, Facebook isn’t listening to your conversations. Not because of some moral standpoint or belief in consumer privacy (ha!) but because it simply isn’t feasible or efficient to do so.

They know enough about you from your online behaviour and it is far easier and cheaper to collate information on your online activities than to collect and store thousands of exabytes of data or transcribe billions of hours of audio.

So the question is, why does it sometimes feel like they are listening? The answer comes in four main parts.

They have a pretty good idea of what we like and dislike. This comes from our long history of online behaviour and how we interact with Facebook, combined with the same knowledge for every other user. This means they know what you, your friends and family all enjoy and dislike— the first indication of what you might be talking about at any given time. This is what I’ll be addressing in this article.
They have information about our recent online activities, mostly on Facebook but also off-Facebook (more on this tomorrow in the next article of the series). They have this information for all of your friends and families too. A contrived example of how this could make it seem like your phone is listening: your friend tells you about a website they think you’d like, Facebook know its a website that you would probably like and that a friend of yours has recently visited it and bought something, so they show you that website. It seems like they were listening to your conversation but in reality they didn’t need to.
Facebook knows where you are when you log in. They are not necessarily tracking your every movement (again, inefficient and unnecessary) but they know enough about your location to build a decent picture of where you are and are going, as well as most importantly who else is in that location. If you go to meet your friend and both log in to Facebook at some point, don’t be surprised that Facebook knows you were in the same location and so is likely to show you something that you and your friend may have talked about. I addressed this location tracking in part 2 of this series.
People often forget that Facebook doesn’t just have your information. It has the same data for each of their 2.7 billion users. Although everyone is unique, there is always hundreds if not thousands of people who have very similiar interests to you out there in the world. By observing how these people react to certain adverts/visual cues, Facebook has a pretty good idea how you will react.

These four points taken in tandem hopefully start to paint a picture of how Facebook - and other tech giants - might seem like they are listening. The truth (which is possibly scarier), is that they know enough about you and your peers to deduce what you are likely to be talking about without having to listen.

This is the third part of my series ‘What does Facebook actually know about me?’, where I use Data Science to analyse the 7,500 files of data Facebook have about me. See Part 1 for a first look at the data, Part 2 for visualisation of location data and Part 4 (tomorrow) to find out what Facebook knows about my activity on other websites.

A Case Study

Now that I’ve given some background, in this article I’ll visualise and analyse the data I downloaded from Facebook which they use to determine what I like and am interested in. This is calculated based on what I have clicked on, liked and otherwise interacted with in the past and in turn determines the ads I see.

In the data there is a folder called ‘ads_and_businesses’ and within that a file called ‘ads_interests’. This is where I’ll start.

I’ve Ad enough

The ad interests file is simply a list of words or phrases that Facebook has decided I may be interested in. There are 650 topics in my file, ranging from the fairly normal (‘Action Movies’, ‘Pizza’, ‘Technology’) to the slightly more abstract and abnormal (‘Power (social and political)’, ‘Pressure’, ‘Gratitude’). For each of these: a) I decided whether it was something I probably was interested in and b) assigned it to a category.

The categories I chose were completely subjective and many topics could have fit into multiple categories. The table below shows what the categories were with some examples and the number of topics in each.

The categories to which I assigned each topic

As I mentioned above, I also decided whether I thought each one was in fact a potential interest of mine so the big question is how many of the 650 topics did Facebook get right?

The answer: 514 (79%). Which is pretty impressive, considering the billions of possible topics they could have picked. We can also see how their performance varied by category. By performance I mean of the topics that Facebook think I am interested in, how many am I actually interested in?

The best-performing categories were Politics (97.6% correct), Sports (94.5% correct) and Food & Drink (93.6% correct). This is probably down to the way I selected the categories and the amount I interact with these topics online. These three subjects are all things I’m very interested in, which has two key effects. Firstly, Facebook will have more data on the way I interact with posts and videos within these topics and so will have a better idea of what specific aspects I like and dislike. Secondly, these are topics in which I tend to be more open to trying new things or seeing a different perspective (e.g. there might be a political party whose views I disagree with or a food I’ve never tried but I’m still likely to be interested in them).
The worst-performing categories were Person (73.1% correct), Music (52.2% correct) and Location (42.4% correct). The nature of these are a direct contrast to what I mentioned above about the best-performing categories, in that if I see the name of a person or band I don’t know or haven’t heard before, I would almost never be interested in them. I also do not follow celebrity news or actively try and find new music so it isn’t surprising these categories did not do so well, as Facebook won’t have much data about my interactions with them. For the Location category, I had around 25 US States and cities in my list which (with a few exceptions) I wouldn’t ever say I’m interested in hearing or reading about - this is why it appears to have performed so badly.
Business - the largest category amongst my ad interests - performed reasonably well considering the number of topics it contained (81% correct). The reason this is the biggest category and it performed well is because it is by far the easiest way for Facebook to monetise their data. If they know I’m interested in Adidas, they can show me Adidas products. If they know I’m interested in ‘Gratitude’, it may be a bit harder to figure out how that translates into revenue-generating ads.

All Fun and Games

As well as wanting to know what I like for targeted advertising purposes, Facebook also wants me to interact as much possible with the platform. They do this by suggesting pages, videos and posts that they think might interest me.

In the files I downloaded, there is a folder called ‘information_used_for_recommendations’. It contains three folders, one each for video, news story and general newsfeed recommendations.

Similiar to the ad interests file, these are simply lists of topics that it thinks I will be interested in watching or reading about. The video, news story and newsfeed lists had 22, 35 and 86 topics respectively, with many topics repeated in two lists or even all three.

As the lists are much smaller than the ad interest list, I decided to shake it up a bit and take a slightly different approach. I classified each topic into a category (not necessarily the same categories as before) and also a subcategory, as well as grouping them from 1–5 based on whether they were a legitimate interest of mine - with 5 meaning something I am highly interested in and 1 meaning not at all interested.

The graph below shows how the types of topics I am recommended by Facebook depends on the medium by which I consume said topic.

The proportion of topics of each category which Facebook recommend to me for Facebook Watch, news stories and my newsfeed.

There is a Sport theme across all recommendations but I’m more likely to be recommended this in the form of a sports video rather than a post or page on my newsfeed. In fact of all the topics of video which Facebook recommends to me, only 1 in 5 is not sports-related. This contrasts to my newsfeed recommendations which are far more varied and include educational and Film & TV topics.

Below you can see my level of interest in the recommended topics of each category (for categories which had more than 15 recommendations)

Level of interest in recommended topics, by category

Entertainment is the most successful category, with around 65% of recommended topics being something I am obviously interested in. Sport appears to be the most unsuccessful and this arises due to the fact that a) I am likely to watch a sports video even if I’m not that interested in the subject and b) I ranked my level of interest in most sports as low because I was comparing them to my favourite sport (rugby) when in reality it would be relatively high compared to, for example, knitting.

Tip of the iceberg

Overall, it seems to me like Facebook has a reasonable idea of what I like and dislike. We do have to bear in mind that when Facebook uses this information they will not simply use it on its own but combine it with a) your historical interaction with the platform, b) the recent activity and likes/dislikes of your friends and c) the activity of everyone else on Facebook.

Combining these three data sources are likely to give a far more accurate representation of your overall personality, likes and dislikes.

It is also worth mentioning that I try not to use Facebook too much these days and so if I spent more time scrolling and interacting, these topics would most likely be more accurate.

Tomorrow I’ll be posting the next article in this series and will be looking at what information Facebook has about my activity on sites other than Facebook. This is something that many people probably don’t even know happens and I personally found it the most surprising, so if that sounds interesting follow me and my publication Data Slice to stay up to date. Thanks for reading!