Or how not to. You can use data visualization to inform, or misinform. We want to do the former, not the latter.
Data are used to persuade us — to put forward an argument. A data scientist or data journalist wants to tell a story about an idea, a study, a theory, a new technique.
But sometimes there is a hidden agenda.
Maybe it’s a politician promoting a partisan agenda, or a marketing executive trying to sell a product, but sometimes data are presented in a way that the designer hopes will get you to do something they want — vote for their policies, perhaps, or maybe buy a new washing machine or fridge.
We might hope that data are presented to us fairly but that is not always the case. So what can we do to spot those who might want to mislead us and how do we ensure that we do not inadvertently mislead others?
Let’s consider the way that may charts to lie to us (and then we’ll take a look at some real examples).
5 ways charts can lie
Alberto Cairo tells us that there are five ways that you can deceive your audience using charts and graphs:
- Use Poorly Designed charts
- Display Dubious Data
- Display Insufficient Data
- Conceal Uncertainty
- Suggest Misleading Patterns
Cairo is an authority on the use of charts and graphs in the media. He learnt his trade in the print media industry working for major newspapers in Spain and Brazil. Now he is the Knight Chair in Visual Journalism at the School of Communication of the University of Miami, Florida, and is a consultant to both private and government organisations.
Cairo is also the author of a number of books, including The Truthful Art: Data, Charts, and Maps for Communication, and The Functional Art: An introduction to information graphics and visualization.
But the list of deceitful methods above comes from How Charts Lie: Getting Smarter about Visual Information (from now on, HCL). This excellent book begins with a necessary chapter on how charts works but fairly swiftly gets into the meat of the subject: how they are lying to us, how to spot that they are lying to us and how not to do it yourself.
There are many ways in which a chart may be poorly designed. Using different scales for different measurements on the same chart, for example, distorting the proportions of graphics, choosing the wrong base line (if it isn’t zero, you need to think about why not) or using 3D charts.
Some would say that simply using pie charts is poor design as the data can normally be better communicated by other means.
They have a point.
If a pie chart has more than two or three segments they can be difficult to interpret because we aren’t that good at judging the quantity represented by a pie segment. It’s much easier to see the difference between a bunch of columns than the same number of pie segments, which is why a bar chart is often preferable to a pie chart.
Imagine you’re thinking of installing solar panels — you might want to when it is sunny. Take a look at this pie chart of sunshine hours over a year:
Which was the sunniest month? July, May? Which was the least sunny? January or December by the looks of it.
Now look at a bar chart that shows the same data:
No guesswork required here. It is completely clear which column is larger and you have a much better idea of the numerical difference between the months.
Not as attractive as the pie chart, maybe, but the data is much clearer.
But if normal pie charts can be tricky to interpret, 3D pie charts can be particularly problematic.
Take a look at the chart, below, which is inspired by an example in HCL. Imagine the pie chart represents my company’s market share in the widget manufacturing space (I’m the orangey-brown segment).
Looks like I’m doing well! I’m not dominating the market but my share looks significant.
But it’s not really true.
The 3D effect doesn’t give a true picture of reality. My pie segment, the one that is pushed out from the others, looks quite big but it is actually the smallest of the four.
This more conventional chart show the reality of the situation.
Base line effects
The next chart has a different sort of problem. The data comes from HCL and hows the improvement in school graduations during the Obama presidency in the US. Both charts show the same data but the way the one of the left it is presented shows an apparently significant change in achievement — the last column is much more than twice the length of the first — but this is misleading because the base line used is not zero but 70% (the original chart published by the US Administration was graphically more arresting than this one but illustrated the same problem).
The right-hand chart is a more truthful a bar chart with a zero base line and shows that the increase is real but but not as dramatic as the left one would have us believe.
If a chart does not start at zero then you have to know why not. For example, there is a notorious chart from climate change deniers that plots temperature change on a scale from 0 to 100º Celsius. On this scale the variation in global temperature is hardly noticeable. But this is obviously nonsense as the freezing and boiling points of water (0 and 100º) are of little relevance to climate change. This is a clear case where a base line of zero is totally inappropriate. Most sensible charts show the variation around an average temperature for a recent span of years where the relatively small changes in temperature can be more easily seen.
In 2018 a certain UK bank was reported to have a problem with gender pay differences. The press seized on the headline figure that women were paid around 44% less than men. Here is a fragment of the official UK government web page (concentrate on the chart on the left, for now):
But, while the papers jumped on these figures, they do not tell the whole story. While it is true that women earned less than men overall, the bank reported that men and women who were doing the same job were paid similar salaries. So what was going on?
There was no significant gender pay gap when comparing similar jobs. But there was a problem.
The reason for the headline figure was that there were more women in lower paid jobs than men, so the total pay earned by those women was, of course, lower than the men.
The bank did not have a gender pay gap problem, it had an equal opportunities problem.
The chart on the right shows that the majority of higher paid employees were men and that women tended to be employed in the lower paid jobs.
This is still a problem but it was not the one that was reported!
Resolving a gender pay gap issue is relatively straightforward — just give the women a pay rise. But changing the gender ratio at all levels of a company is something that will take a little longer — you can’t just sack the most of the men and take on more women as that would be punishing individual employees for the bad practices of the company. You have to wait for people to move on or retire.
Sometimes a little more digging is necessary to find the real story.
Occasionally we can be misled by a lack of data, or clarity in exactly what is being measured.
Don’t forget inflation
When politicians want to boast about the way they have funded something or how wages have risen during their tenure they will often quote absolute numbers — funding has increased by X million dollars, average pay has gone up by Y thousand pounds.
Here is the average pay in the UK over the period from 2001 to 2020 according to the OECD.
The blue line shows the increase in actual GB pounds. In 2001 the average pay was 22,371 GBP and in 2020 it was 36,987 GBP. That’s an increase of 14,616 GPB or about 65%.
Now, if I were a Prime Minister who had been in power for much of that time, I might be tempted to point out that such significant increase had happened under my watch. But that would not be entirely correct.
Because of inflation, the pound is worth less now than it was 10 years ago, so 2001 pounds were worth more than 2020 pounds and so the increase is not what it might first seem.
The red line in the chart uses the value of 2020 pounds as a measure of average salary and as you can see the rise in pay is much less steep. The starting point in 2001 is 31,542 GBP(2020) and in 2020 it is, of course the same as before 36,987 GBP(2020). Now, that is an increase of 5,445 GBP(2020) or about 17% over 10 years— not so good!
If you see a politician claiming great increases in the value of defence funding, the amount nurses are paid, money for foreign aid, the education budget, or whatever, make sure they are using properly adjusted figures. And if you are researching such evidence then make sure you are, too.
Alberto Cairo provides a similar example in his book. It questions which has been the biggest movie blockbuster over the years and finds that although the actual numbers look small by modern standards, the film Jaws is still one of the biggest financial successes.
What do we mean by average
The example above is reasonable because we are looking at the trajectory of salaries over time rather than looking at how much any one particular person earns.
But if we were interested in the actual salary of the average Brit, would this be a useful chart?
What does average actually mean? Are most people earning around this average salary?
According to the OECD, their definition of average is as follows:
“Average wages are obtained by dividing the national-accounts-based total wage bill by the average number of employees in the total economy, which is then multiplied by the ratio of the average usual weekly hours per full-time employee to the average usually weekly hours for all employees.”
In other words it is the mean of salaries and adjusted for the actual number of hours that people work.
But when, as Alberto Cairo points out, a politician says that a tax cut will mean a tax cut of X dollars for the average person what does that mean? Who is the average person?
An average can be the statistical mean, median or mode of a set of data. The everyday use of the word is normally the statistical mean, the total of a particular measure divided by the number of those measures.
But the mean can be skewed by a small number of very high salaries and the UK’s Office for National Statistics (ONS) prefers to use the median which is the centre point of the range, that is to say that 50% of workers earn more than that and 50% less.
The ONS’s 2021 Annual Survey of Hours and Earnings puts the mean salary for full time workers at 38,552 GBP and the median at 31,285. Quite a difference.
So, we need to be clear, are we talking about the average salary or the salary of the average person.
Being clear about uncertainty
All models are approximations. They have to be as they are simplified versions of reality. The only way of truly predicting an event with 100% accuracy is through the use of a time machine. And even then, if the Back to the Future movies are to be believed, that might not work either.
But when we see that the a candidate in an election has a 75% chance of winning, we naturally go away with the belief that they are actually going to win and if they don’t we complain that the pollster that said this was wrong. A 75% chance of winning, of course, leaves a 25% chance of losing. But it is no good trying to persuade a punter who bet on the favourite that if the election had been held 100 times over, our candidate would have won 75 of them.
One way to alleviate this problem is use error bars. Here is the same rainfall data we saw earlier but let’s pretend that it is a prediction. The error bars show that our projections are correct to within plus or minus 10%.
Correlation does not imply causation — it’s a statistician's mantra. Just because the way in which two variables change look as if they are linked doesn’t mean that one is directly effecting the other. For example, there is a probable relationship between the ice cream sales and instances of sunburn. So, do we conclude that that ice cream consumption causes sunburn?
This unlikely correlation is caused by a confounding variable. In this case the sun. When it is sunny more people get sunburn and also more people eat ice cream. Cases of sunburn and consumption of ice cream both go up but each are reliant on a third variable not on each other.
However, sometimes the confounding variable is less obvious. In HCL, Cairo gives a great example of this. A scatter chart seems to show that life expectancy increases the that more people smoke — an unexpected finding to say the least. On closer inspection, however, what it actually shows is that people in rich countries live longer and smoke more than people in poor countries who also tend to live shorter lives. In middle income countries it’s a mixed bag of varying lifespans and rates of smoking. Combining the data on one chart shows the strange correlation but when national wealth is highlighted it is easy to see that it is this that has the major effect on life expectancy.
I hope that little excursion through the five ways that charts can lie has been useful and will help you to think more deeply about how you present your data. The real expert, of course, is Alberto Cairo who gives many more examples than I can here and whose book I highly recommend.
Thanks for reading: if you would like to know when I publish articles, you can subscribe to my occasional free newsletter, Technofile which is on Substack .
How Charts Lie: Getting Smarter about Visual Information by Alberto Cairo, W. W. Norton & Company (2019)
Truthful Art, The: Data, Charts, and Maps for Communication by Alberto Cairo, New Riders (2016)
Functional Art, The: An introduction to information graphics and visualization by Alberto Cairo, New Riders (2012)
“There are three sorts of lies: lies, damned lies and statistics” is a phrase often attributed to Mark Twain but its origins probably pre-date his usage.
This article contains affiliate links.