The Y-Axis Debate is one of the most hotly discussed among cool data nerds, like me and my friends. Going out for drinks with me is either a blast or a bore, depending on your nerd level. So let me clarify the parameters of the debate, including where nerds mainly agree, where they don’t, and what I advise. This post is an update to one I wrote a long time ago and my thinking has evolved since then.
Data viz nerds agree bar charts must start at zero.
The general idea is that a viewer should be able to use a ruler to measure the pieces of your visualization and find that the measurements are proportionate to the data they represent.
In the case of bar charts, this means that the y-axis must always start at zero.
The bars in a bar chart encode the data by their length, so if we truncate the length by starting the axis at something other than zero, we distort the visual in a bad way. My friend Chris Lysy calls this the “Cable News Axis” because it’s so common in TV news programming.
Data nerds don’t always agree that other graph types should have to start at zero.
Outside of bar charts, whether the y-axis must start at zero is still a matter of debate. There are cases where it wouldn’t make any sense.
If zero is not in the realm of possible data points, perhaps it doesn’t need to be included in the y-axis
A visualization of stock market activity is a great example. If the y-axis started at zero, the visual would look like a flat line. We wouldn’t see any variation, and the visual would become meaningless for us. But (fingers crossed) zero isn’t a possible data point in a data set for the stock market, so there’s no real justification for starting the axis at zero.
Other than for bar charts, I advocate for a y-axis that is based on something reasonable for your data. Maybe the minimum of the axis is your historically lowest point. Maybe the minimum should be the point at which you’d have to alert your superiors. Maybe the minimum is the trigger point where your team has decided a different course of action is needed. Whatever you pick, just pick. Make it meaningful and intentional. Not something the software automatically decides for you (though that’s a place to start your thought process).
Data nerds don’t always agree where the scale should end.
There are some who think that parts of a whole data, for example, must always run on a scale that goes all the way to 100%.
I think this squishes the data and makes for an awkward graph, where we can’t fully see what’s happening. Similar to my guidance for where a graph should end, it isn’t likely that any of these bars will ever reach 100% so, in my latest thinking about axes scales, it doesn’t have to run to 100%.
We can actually see the data more clearly if we choose an axis that is closer to where our real data ends.
While this does get our data into full view, it might leave out parts of the story. These survey response options (not the data, which I totally made up) come from Human Rights Campaign. Let’s say, just as a hypothetical case, they were interested in recruiting some people who said No into those who said No but I identify as an Ally. Let’s say they know they won’t change the minds of everyone in the No group but they set a goal to grow the percent reporting as Allies to 75%. That should become the new maximum for this scale.
and preferably, let’s label the goal as such so that our reasoning is evident. The point is that you should choose a maximum for your scale that makes sense. Maybe the maximum is your goal or your most successful campaign. This way, the axis itself becomes part of the story you need to tell about your data.
This advice and more like it are part of the second edition of Presenting Data Effectively, out June 2017 and available for preorder now.