Avoiding Data Pitfalls: How to Steer Clear of Common Blunders When Working with Data and Presenting Analysis and Visualizations, by Ben Jones (Founder and CEO of Data Literacy) is one of those rare books that you want to read, reread, put on your shelf, and then reread again once each year.
Jones, a self-deprecating Canadian, says early on “I promise that you will fall into one or more of these pitfalls in the very near future. So will your colleagues. So will I. I probably fell into more than one of them in this book itself.” This explains the need for constant reminding.
Written in a clear and conversational style, the book outlines seven different types of data pitfalls:
Pitfall 1: Epistemic Errors: How We Think About Data
Pitfall 2: Technical Traps: How We Process Data
Pitfall 3: Mathematical Miscues: How We Calculate Data
Pitfall 4: Statistical Slipups: How We Compare Data
Pitfall 5: Analytical Aberrations: How We Analyze Data
Pitfall 6: Graphical Gaffes: How We Visualize Data
Pitfall 7: Design Dangers: How We Dress Up Data
Jones describes each pitfall in detail, breaking it into multiple sub-pitfalls. Pitfall 1A: The Data-Reality Gap tackles the difference between the real world and data. He observes that we don’t know the extent of crime, but rather reported crime; the true diameter of a part, but rather the measured diameter; or real public opinion, but rather the opinions of those responding to a poll.
Each pitfall is associated with engaging examples. For Pitfall 1A, Jones gives the example of the Meteoritical Society’s map of meteorites between 2,500 BCE and 2012. He shows the map and then observes “Doesn’t it seem uncanny that meteorites are so much more likely to hit the surface of the earth where there’s land, as opposed to where there’s ocean?”
He quickly answers his own question. The answer is found in the title of the map, “Every Recorded Meteorite Impact.” The truth is we don’t know the location of every meteorite, only the locations of meteorites that we have been both discovered and recorded. The analogy to the current coronavirus epidemic is clear … our statistics show the reported counts … the actual counts are not known.
Jones uses his model to guide us through each of the pitfalls. For Pitfall 1A, there are also examples using earthquake events, bicycle counts, and a case where cumulative counts go down over time. Each example is carefully documented with photos, charts, maps, or other data.
It would be unfair, and much too long, to reveal more. That said, when reading the book, you will look forward to each pitfall and the examples, which are relevant and easily understood. There is even a surprise eighth pitfall at the end.
ResourcesAvoiding Data Pitfalls Video (YouTube) [Video] Avoiding Data Pitfalls Covid-19 Edition (YouTube) [Podcast] Avoiding Data Pitfalls Podcast (Storytelling with Data – Episode 26) [Podcast] Data Literacy with Ben Jones Podcast (Data Crunch)