We all see data visualizations every day, from horrid chart junk pieces that trigger an immediate mental gag reflex, to exquisite renderings that invite interaction and discovery. Most are just trying to convey a fact or two quickly and efficiently, and that’s fine too. That said, there are a few basic rules to keep in mind: never use pie charts and minimize the non-data pixels for starters.
Some techniques are not super intuitive, like when to use a diverging stacked bar chart, but a lot of it is just common sense, maybe with a dash of flowchart thrown in. Here we are going to take a deep-ish dive into how to improve a common way to display state-level data. Our subject is this choropleth from a recent article in JAMA Network Open:
Note that I’m not opining on the accuracy of the research for this article or anything else about the quality (although… no margin of error?) I’m just using it as a fairly random example of a common way of displaying this type of data. The attraction of mapping data is obvious: it shows real state boundaries on an actual to-scale map. The downsides are equally obvious, the biggest being the large states take on disproportionate visual importance while the small states just plain disappear. Go ahead, try to find Rhode Island on the map above.
To be fair, the authors do provide a table with the values so you can look up any individual state, but in the table you lose any ability to visually compare states, or to quickly see which are on the high end of the scale and which are on the low end. So the viewer is forced to jump back and forth between the map and the table, a frustrating experience and unnecessary with better design choices. Let’s look at two options, one quick and easy and the other one not so much.
Option 1 of 2: Just Use a Chart
Even for geographic data charts are usually the default best way to handle its display. For this data set, a simple dot chart (in this case, a bar chart with error bars added and reformatted to dots and the actual bars hidden) with a median value, does the trick nicely:
The chart allows quick visual comparison of values sorted by highest value to lowest. The dot chart as the added benefit of not using nearly as much ink as a bar chart when printed. There is a more involved way of constructing a dot plot, detailed here, that instead of error bars for dots uses a second data series to make a scatter plot. This gives you more control over dot formatting:
Option 2 of 2: Build a Better Map
OK, OK, so you really like the map. We can do that, but let’s at least modify it to get rid of the relative size problem. The standard way of doing that is using a tile grid map, which could be squares, circles, or hexagons. When using a spreadsheet squares are probably your easiest option, for which we found this detailed walkthrough very helpful. It allowed us to build our own version:
There are a number of ways to represent the United States in a square tile grid pattern. The point is this method lets the viewer quickly see where the hot spots are – in this case, Alaska and New Mexico jump out, as well as a couple of mountain states. The cools spots with lower rates also stand out, mostly – and oddly – in the south.
For this way of displaying data you would still need a table to display the actual values. We used conditional formatting (graded color scale, two colors) for quick and dirty color-coding of the tiles. You can also use conditional formatting to shade tiles according to whatever scale you prefer. For example, if you want quintiles, make a formula to divide the data into five equal sets and make a rule to color each set. Then you can show a legend at the bottom with the colors and upper/lower set values, if that’s what you want.
So there you have it, two options for better data visualization using Microsoft Excel.