The Y-Axis Debate is one of the most hotly discussed among cool data nerds, like you and me and our friends. Going out for drinks with people like us is either a blast or a bore, depending on your nerd level.
So let me clarify the parameters of the debate, including where nerds mainly agree, where they don’t, and what I advise.
Data viz nerds agree bar charts must start at zero.
The general idea is that a viewer should be able to use a ruler to measure the pieces of your visualization and find that the measurements are proportionate to the data they represent.
In the case of bar charts, this means that the y-axis must always start at zero.
The bars in a bar chart encode the data by their length, so if we truncate the length by starting the axis at something other than zero, we distort the visual in a that makes comparing the lengths misleading. People are likely to draw the wrong conclusions about the data.
My friend Chris Lysy calls this the “Cable News Axis” because it’s so common in TV news programming.
Data nerds don’t always agree that other graph types should have to start at zero.
Outside of bar charts, whether the y-axis must start at zero is still a matter of debate. There are cases where it wouldn’t make any sense.
Want a good example? Look at a graph of stock market activity.
If the y-axis started at zero, the visual would look like a flat line. Without the title pointing it out to you, you’d totally gloss over that pre-election dip. The visual would become meaningless for us.
But (fingers crossed) zero isn’t a possible data point in a stock market data set, so there’s no real justification for starting the axis at zero.
So here’s my take: If zero isn’t in the realm of possible data points, it doesn’t need to be included in the y-axis.
You’re graphing average SAT scores for your high school seniors – zero isn’t in this dataset. Not gonna happen.
You’re graphing total sales volume for your new phone model. It’ll never be zero.
You’ve got data on customer calls to the complaint line. You KNOW that’s never gonna hit zero.
Zero shouldn’t be in your scale.
Other than for bar charts, I advocate for a y-axis that is based on something reasonable for your data.
Maybe the minimum of the axis is your historically lowest point. Maybe the minimum should be the point at which you’d have to alert your superiors. Maybe the minimum is the trigger point where your team has decided a different course of action is needed.
Whatever you pick, just pick.
Make it meaningful and intentional. Not something the software automatically decides for you (though that’s a place to start your thought process).
Data nerds don’t always agree where the y-axis should end.
There are some who think that percentage-based data must always run on a scale that goes all the way to 100%.
Similar to my guidance for where a y-axis should begin, it isn’t likely that any of these bars will ever reach 100%. So 100% isn’t totally justified in the scale, especially if something else would show a more accurate story.
We can actually see the data more fully if we choose an axis that’s closer to where our real data ends.
This fuller view could be accompanied by a different title.
80% makes sense – it’s within the realm of possible data points for this dataset.
Let’s say that we work on a campaign to persuade those who are 65 and older to support abortion access and our org has set a goal to have 90% in support.
We’re now justified to increase the scale to include 90% and to label that goal.
The point is that you should choose a maximum for your scale that makes sense. Maybe the maximum is your goal or your most successful campaign.
This way, the axis itself becomes part of the story you need to tell about your data.
No one else can tell you where your y-axis should begin or end. It’s something you have to decide on a case-by-case basis when you’re assessing the most truthful way to show your data.
That’s why I include your y-axis scale in the Data Visualization Checklist: you have to actually put some thought into what it should be.