The Power And Danger Of Data Visualization

February 4, 2013

0_data_visualization

Recently, over the holidays with family, I participated in a conversation where I mentioned that the data supported my position. I made the mistake of saying “Statistically….” which as you might imagine, in a holiday family setting, is bound to generate at least one comment about how you can prove anything with statistics, and a dismissal of a scientific study, without reading it, because it contains *GASP* data.

“Figures often beguile me, particularly when I have the arranging of them myself; in which case the remark attributed to Disraeli would often apply with justice and force: ‘There are three kinds of lies: lies, damned lies, and statistics.'” – Mark Twain

Interestingly, there is no record of Benjamin Disraeli saying that, so the attribution of that quote by Twain to Disraeli itself was a lie.

Still, when you work with data, you have to be careful. How do we analyze this data? How do we look at it to get an accurate picture of what’s going on? It’s incredibly easy when you start adding in different dimensions, to completely get the wrong insight from a specific batch of data. That’s the reason why Mark Twain, and my in-laws, scoff at data. Because it’s way too easy to abuse, consciously and unconsciously. With the growing trend of visualizing data to make a point that can be far less dramatic with a table filled with numbers, this can become even more dangerous. A table filled with numbers might beguile one such as Samuel Clemens, but a picture is worth a thousand words. It’s far easier to convince with a good data visualization, our simple primate minds love the shiny things.

My favorite examples of data visualizations leading people to the wrong conclusions are maps. Maps are some of the earliest data visualizations. Points laid out on maps. Pretty simple, right? Hard to lead people astray. Well take the famous London Underground map. It’s so classic in it’s design, that it qualifies as art.

2_traditional_map

It represents specific data points, Underground stations, as they exist in and around London. But it’s misleading, because it’s stylized. There are numerous examples of it, so I just grabbed the first one I saw. Let’s say you were staying in a hotel near the Lancaster Gate station, and you needed to get to Paddington Station to catch a train. Based on this map, what would you do? Most would say they would walk to Lancaster Gate, hop on the Central line to Notting Hill Gate, and then change to a District or Circle Line train up to Paddington. Easy right? Great map.

3_nontraditional_map

How bout now? This normalizes the map so that the stations exist in better relation to how they exist in reality. Still want to hop the circle line, or maybe you want to think about whether it’s a walkable distance?

4_google_map

Here’s a Google Map of the area. If you were staying at the Rhodes Hotel for instance, sure the Lancaster Gate station is right around the corner, but Paddington Bear is waiting for you two blocks in the other direction. Still want to catch the tube, or do you want to walk? Is the traditional map inaccurate? Yup. It’s stylized to be more attractive, compact, clean, and helpful, but that leads to inaccuracies, and leading people to make the wrong decision.

Data points, displayed in different ways, making you draw different conclusions, and gain different insights.

Ok so how would it affect your data? Well sometimes it might be dramatic, or it might not be dramatic, but you might see it in Google Analytics. Maybe you’re looking at a visualization or graph showing your eCommerce rate dropping.

20_ecomrate

Not good right? Look see it’s a line and it’s going down. That’s bad. Why is that line going down?  Someone is gonna get fired here I know it. But wait, this chart is hiding out of bounds information. It starts at 94 and goes up to 101 (1.01% and 0.94% conversion rates, not 101% conversion rate). But still it’s a drop! We should be concerned! Look how much it drops!

21_revenueecom

Yeah, not all that much really. You’re looking at one metric, and one that can be affected by lots of other factors. A big referral link to your site drives a bunch of traffic to a blog post, but none of them turn around and convert, and your conversion rates drop for the site as a whole. But maybe your good traffic also increased, your organic search traffic increased, and your revenue from those people increased. Suddenly you’re looking at revenue picking up, with a slight dip in the conversion rate.

11_all_traffic

It’s even less dramatic in Google Analytics.

Don’t shy away from data visualizations, even like the ones at that top of the page, a great data visualization can tell a story in a way that might even convince Mark Twain that you’re right, but don’t forget his adage that he claimed Disraeli said. No matter how pretty your visualization, make sure that you’re looking at the whole story of the data, and that you have good data, or your fancy visualization will just be more lies, damned lies, and statistics. But prettier.