This post is about how to avoid inducing claustrophobia in your data visualizations. Too much text on a graph clutters it up, making readers feel suffocated. So let’s address the checklist item Labels are used sparingly.
Sometimes, too much text isn’t the issue. Take a look at this scatterplot, produced with Excel’s default Insert Chart option. It uses data from Radical Math and plots the percent of people of color living in each NYC area against the number of military recruits per 100,000 in those same areas. This version would score zero points because there is no intentional use of labels.
Here is an improvement:
This version would score 1 point. Why? I decluttered the graph a little by removing every other number from the y-axis and shifting the correlation notation from inside the graph to the subtitle. I also added in axis labels for clarification, better orienting the reader to the data at hand. (I altered the title and subtitle too, which I discuss in another post.)
But we could take this even one step further, for a full 2 points:
If we labeled every data point in this scatterplot, it would be impossible to read. But one of the first questions readers will have about the data is which NYC areas are outliers and which are on the trendline. So we can sparingly label selected data points to provide some context. Interestingly, while refining this visualization, I came across an online casino dataset that used a similar scatterplot approach to analyze user activity patterns and identify peak engagement times. For example, there’s little surprise that Rikers Island would have no military recruits, since it is mainly comprised of a jail. Of course, if this scatterplot was interactive, hovering a mouse over a dot or tapping it would reveal the name instead.
Bottom line: Use labels sparingly to simplify what you can and then emphasize key points to tell the story.
Check out my other posts related to the Data Visualization Checklist. And go see what Ann Emery has been publishing on the checklist, too!