Show All the Data
An Academy student wrote me this email:
“We are doing a capacity report, and one of the survey questions lists 11 analytic programs with the average user proficiency. Normally, I would only pull the top 3 or 5 and make an infographic for the information. This time, the partner wants all of the information.”
Super smart to edit down the data to a Top 3 or Top 5 list, especially when you know you’ll be presenting a lot of other data too.
So what do you do if your audience is asking you to show allll the data?
First, let’s interpret that as a really good sign: They’re into the data! Awesome!
Here’s the table of data this student sent with the question:
The simplest place to start is with a bar chart.
I sorted the data in this chart from greatest to least, which makes it easier to see the pattern at-a-glance. I also put the bar chart on a scale of 0-5, assuming that was the scale the respondents used to rate their proficiency.
Sure, it shows all the data.
However, this student clarified that the scale actually ran 1-5. Well that takes a bar chart out of the running because bars need to start at zero.
So we swapped in a dot plot instead.
With the adjusted scale, it’s even easier to see that proficiency is pretty low in all programs.
But the second time I reviewed the audience’s request to my student – to see all the data – I heard something different. Perhaps they don’t want to just see the full list of 11 programs and the average scores. Maybe they’re really asking to see every single data point instead of averages.
Beeswarm charts are such a good choice when you want to show the distribution of values within your whole dataset. This student’s beeswarm ended up looking like this:
I sorted this beeswarm such that the program with this highest average proficiency is on the left and the lowest average proficiency is on the right. It’s not too hard to see that there ain’t much activity at the top of this chart (aka not much high proficiency). But it’s a little harder to see how much low proficiency there is because there are so many dots clustering on the 1 line.
This beeswarm can work, but this chart type tends to work better when the possible response options are continuous rather than discrete. You can see what one of those looks like in this post.
A solid, middle ground choice here could be a ridge plot. The ridge plot shows the distribution of the data but doesn’t specify exactly how many people said what, where. You get the big picture of the shape of the data.
The point with this ridge plot is not to focus on the exact numbers (like “17 analysts rated themselves a 1 in Tableau”) but rather to see the big picture.
Like how for most software, the peak is at 1, meaning most analysts say they are aware of the software but not proficient in it.
And other peaks or plateaus, like how at least some analysts report pretty high proficiency in SAS, SPSS, and Tableau.
That might be info you’d assume from looking at the average scores in the dot plot, but the evidence is right in front of your face, no assumptions necessary, here in the ridge plot.
Matter of fact, you could show *both* the dot and the ridge to take your storytelling step-by-step.
Unless they actually don’t want to see the distribution of the values. What exactly do they mean when they ask to see alllll the data? Start by clarifying that question.