Questions to Ask When Examining a Graph
Does the story told in the headline match the data used to bolster it?
It’s really common for people to read a tweet and take it as the truth. Especially when it comes from official accounts like the White House. And while the Trump White House was notorious for it’s uptick in data viz – that was terrible – the Biden White House data viz is not much better.
Take a look at this tweet:
In recorded history? Holy cow! We are ROCKING IT.
But take a look at the evidence being used to support this claim. The chart’s y-axis begins at Jan 16 2021. That’s just the start of the Biden administration. Not all of recorded history. These two things don’t match.
It’s ok to ask for more evidence.
I hate that we’re here but these days you have to also ask:
Does this data appear accurate and true?
Accuracy (in part) means you pay attention to the scale – is this scale appropriate for this data?
Let’s look at a few examples.
It’s hard to see the problem at first glance. My college-educated intellectual partner couldn’t see the problem until I pointed it out.
The y-axis starts off in increments of 1 and then, conveniently, changes to increments of 0.5 just as we clear the rest of the data and get into the value for 2021. The change in scale at that point makes the 2021 bar look taller than it actually is, as though the GDP grew more in 2021 than reality. (They fixed it here, where the 2021 bar is clearly shorter.)
The scale question shows up in the line chart we looked at earlier… and in a reply the White House posted, in which they pump up the deficit reduction in a column chart.
Looks like a pretty serious reduction, right? Like a third. Wow! Oh wait. The scale starts at 2,000 billion dollars, which is a weird way of saying 2 trillion bucks. The truncated scale here makes it look like the reduction is a LOT bigger than it actually is.
That does NOT always mean that scale must begin at 0 or stretch to 100%.
If 100% isn’t even in the realm of possibility, it doesn’t make sense to include it. For example, let’s say we’re looking at a school’s truancy rate. Since most kids go to school every day, a bad truancy rate would be something like 10%. In fact, an increase from 8% to 10% would be big, bad, and requiring some action. But that increase wouldn’t be detectable if the scale ran to 100%. So, look at what scale would be meaningful for this data. (Sometimes it can be hard to know.)
What data has been left out of this picture?
This tweet contains a data visualization. You can really only tell if you look at it for a while and eventually notice the little legend in the lower right that says one part of a bridge equals 100 actual bridges.
I’m pretty sure I’m gonna have to whip out a tape measure to do this math. But that’s not even my point here. Let’s focus on the claim they’re making: 1,500 bridges, fixed soon. Cool. Wait. Is that a lot of bridges? How many bridges are in the US? Or in my state?
We don’t know the denominator here, so it’s impossible to determine if 1,500 is good or if they’ve set their goals incredibly low. We don’t have the whole story.
Here’s another line chart:
This time about retail sales. Are you questioning whether that scale should start at 0? Me too. Is $0 in retail sales ever EVER going to be a reality? Not in your wildest hippie dreams.
Beyond that scale issue, take a look at the x-axis. It starts in Jan 2020 – which makes it match the tweet where they reference “over the last year.” Good! However, what did retail sales look like before the pandemic? Those data have been left out of this graph, isolating the story to something brag-able. The data that’s missing would paint a different picture.
Building your BS Detector
Every data designer is also an editor, making choices about what data goes in and what stays out. That’s not to say every graph is manipulative – it’s just that there’s only so much time, space, and attention span. So we have to edit. But yeah, it can cross the line into manipulation to support a specific agenda. Your job is to watch for when the line is being crossed.
Be on the lookout for source information – and sometimes look up the source too. Check who generated this data being visualized and whether they did so in way that was consistent and research-based.
Which makes it seem like you have to have a PhD to determine whether a graph is any good. What hope does my grandma have (bless her) when she’s scrolling Facebook? It can be so hard to judge truth and accuracy if you don’t have a background in stats and data visualization.
You’ll get wiser at spotting graph accuracy when you create more data viz in your daily life. You’ll learn what choices have to be made in the process of making this sausage and you’ll be able to see those decisions reflected in others’ graphs, too. It’ll put you in a much stronger place to intervene (with grace) when you see misinformation online.
Let's Hang Out