Statistics are a valuable tool, yet frequently misused and distorted. I have often wished that statistics was a mandatory high school class, since their usage is so critical to understanding the world around us. I thought I would try to share a few tips for better understanding statistics.
First, while statistics are often misused, they truly represent our most valuable tool for objectively assessing and quantifying issues and the nature of the world beyond our reach. The alternative is to rely on anecdotal evidence, stories that we have or others have experienced. While stories can be a great way of helping us to understand how situations work and connecting with others, they tend to lack of quantifiable objectivity. First, stories can very easily be non-representative of the most common realities. We, or those we listen too, can easily have experienced unique situations that do not actually correspond with normalcy. If fact, the stories we tend to enjoy most are those that are most unusual. So the stories that are shared, whether it be orally or through social or news media are often the ones that are furthest from an accurate representation of typical reality. For example, the deaths we were hear the most about on the news are the ones that are most unusual, and least likely to happen, precisely because those are the ones that are most interesting, the causes that in reality take the most lives, are banal.
Even though statistics are our most objective tool of describing the world in quantifiable ways, they are plenty of ways we can be deceived by them. Let’s look at a couple basic steps for filtering statistics.
The first type of statistic we often see is those that simply describe some issue. The statistics are dealing with “what”, but not “how” or “why”. These are the easiest statistic to understand since they simply describing a measurable quantity. The first thing to consider in evaluating these statistics is understanding context. Many statistics can be completely overwhelming, and difficult to comprehend. And when statistics are difficult to comprehend it is actually a good intuition that we need more context.
For example, the US national debt is currently about 17 trillion dollars. The huge size of national debt is often talked about, but in reality this number alone lacks context. Trillions of anything are simply incomprehensible for just about any human. So how can we bring context? We could consider this at a personal level, the national debt per person is about $55,000. This is little easier to understand (less than many mortgages, but much higher than a responsible level of consumer debt). We could also consider a statistic from a history perspective (the debt as a percentage of GDP is higher than most times in US history, but it is lower than certain times like during WWII). We could also compare the US to other countries (the US is higher than most, but lower than several, for example, Japan’s is nearly twice as high). This isn’t to make any particular claims about the US national debt, but it demonstrates how we can try to find some context for statistics. The US debt is high, but without this context, numbers like the national debt are conceptually meaningless, and it is important that we find things to compare against to provide meaning.
The next thing to beware of in approaching these statistics that often something is being implied. While using statistics to understand the “what”s of the world, we often want to take the next step to understand “why” things are the way they are, and “how” we can affect them. We want to understand cause and effect.
Probably the most common saying among those who deal with research and statistics is “correlation does not imply causation”. This may sound confusing, but it is a fairly simple warning. While statistics may show that two things are related, we need to be very cautious about assuming that one thing caused another thing.
Let’s consider an example. One could say that you are far more likely to die when you are in the hospital than when you are at the grocery store. Of course this is true, but does this therefore imply that hospitals are more dangerous than grocery stores? Of course not. Hospitals aren’t the cause of the deaths. It is the illnesses that lead both to deaths and to the hospitals that is the cause. We can’t assume that just because hospitals and deaths are related that one is the cause of the other.
But determining cause and effect, so that we can learn what factors lead to what results is still extremely important, and statistics can be used to assess cause and effect. So how can we accurately determine when a statistics indicates a true cause? A statistical relationship can demonstrate cause and effect if we can prove that no other factor is the real cause. Sometimes we can logically establish this. Other times we can setup experiments. In medicine (and increasingly in other fields, like microeconomics), trials are set up where different people are given or not given some intervention (like a drug), and then you can measure the effects by comparing the two groups. Usually the recipients are randomly chosen. The advantage of this approach, is that by randomly choosing the cause, we can be certain that no other causes are in play. For this reason, randomized control trials are used extensively to accurately assess drugs, and can be very valuable for making accurate assessment of other types of interventions.
Outside of controlled experiments, it can be very difficult to pin down sources, because it is very difficult to be sure that no other factors are influencing results. For example, determining the effect of different educational institutes is remarkably hard because so many factors like affluence affect both the education outcome and the type of institution that the parent would choose, and determining whether the institute was the deciding factor or any of the other vast number of ways that wealth can be used to benefit the education, is again, very difficult.
When we can’t control the factors, the primary remaining technique is regression analysis. However, this is where this casual discussion of statistics quickly turns much more advanced. Suffice to say that to truly demonstrate cause and effect with real world uncontrolled statistics typically requires complicated analysis.
To summarize the warning here, if someone is trying to convince you that one factor is the cause of another issue through statistics, unless the cause is logically provable, comes from a controlled experiment, or involves regression analysis, it is probably suspicious.
Statistics are an invaluable tool for understanding our world, making informed decisions how we prioritize our time and money. But in order understand them, remember to seek to understand them by putting statistics in context, and be cautious of drawing cause and effect relationships. Hopefully these tips help make sense of statistics.
April 27, 2014 at 11:36 pm
So should we bunt or not??? Good job…. have a good week… pappa
April 29, 2014 at 1:29 am
Seems like statics are used frequently for decision making by head coaches and their assistants in determining the amount of risk they are willing to take in competition … though Kris never used that venue in any of his examples … but, knowing Kris’ father like I do, I suspect he is the one who who asked about bunting.
April 29, 2014 at 1:49 am
Fascinating essay Kris. I live in a third world country which has a reputation for pervasive corruption (a reputation that is not substantiated by statistics). I also write stories about life here in Africa in a nation that has a population of over 37 million people. I have difficulty knowing when to trust statistics that come from this nation. For example, I read that the unemployment rate is around 90% (about the exact opposite of that in the United States), but I don’t think they statistics take into consideration the millions of people who work daily in their “gardens” to sustain their lives and the lives of their families since there is usually no monetary measurement given for that work. But, when a statistic is posted that over 50% of the population is under the age of 14 it helps confirm that the nation is fatherless. Malaria is reported statistically as the number one killer in this nation. But, how would a nation measure corruption? The citizens here will tell you that it is as common as sunshine and rain, but that can hardly qualify as a statistic. Bribes are consider a normal part of existence, but that consideration again falls beyond the borders of quantification. Yesterday I was in the office of the Anti-Corruption Department in the Jinja Central Police Station (which seems to confirm that corruption is a common problem, but again does not tell me how pervasive it really is). One of the detectives in that office argued with me for almost 20 minutes about the virtues of corruption and that the nation wouldn’t work without it! But, that is just a story and is of no use for the establishment of a statistically based confirmation of the degree of severity of corruption here.
So … how can I know when to go for it on fourth down?
My perceptions are based largely on reputation (Nigeria seems to be the nation that leads the other African nations in their ability to be cleaver in deceiving others for personal gain illegally … but again there is no statistic to confirm this general option), experience, and the concerted effort by some in public policy and bearocricy, and general consensus … all unable to be statistically proven one way or another.
These are my random thoughts based on how you provoked me to think about statistics in the context in which I live. I wish I had access to reliable hard and fast statistics for representing the need for change in the venue in I live and work. But, mostly I find I need to rely on my experience and tell personal stories and trust that they are representative, in some credible way, to describe reality here.
Thanks for writing. I always (not proven statistically) enjoy reading what you write.
April 29, 2014 at 4:17 am
Paul, thanks for your comments. I agree that statistics can not provide all the answers. Many times we just don’t have access to any better data (although we wish did). And many important issues simply can’t be quantified very well, so we do have to rely more heavily on our stories, experiences, and intuition. And regardless of the quantification, our stories are critical in helping us to connect. Hopefully, my post was not dismissing the importance of stories and their value, and I actually really believe in the importance of story (my intent was more to provide cautions in how we interpret statistics). Your stories are so valuable in helping so many of here in the states understand and connect to the critical work in Uganda.
In regards to the question of corruption, I think this is a really fascinating (and puzzling) issue to me. At least in my research, on one hand, if corruption is defined as bribery, I have heard studies that suggest bribes tend to have very little influence on economic development. On the other hand, if the definition of corruption is broadened just a bit to be the general lack of institutional accountability (which would include bribery), some of the best economists suggest that virtually all of development (and even liberty/oppression, violence/peace) hinges on corruption (and removing it). It is certainly does seem like it is very important. But it is tough to understand and like you said, to quantify. Statistics are sparse, difficult to understand, and easily distorted. Stories can be difficult to understand too. Your insights are greatly appreciated, thanks for writing.