Statistics are a valuable tool, yet frequently misused and distorted. I have often wished that statistics was a mandatory high school class, since their usage is so critical to understanding the world around us. I thought I would try to share a few tips for better understanding statistics.
First, while statistics are often misused, they truly represent our most valuable tool for objectively assessing and quantifying issues and the nature of the world beyond our reach. The alternative is to rely on anecdotal evidence, stories that we have or others have experienced. While stories can be a great way of helping us to understand how situations work and connecting with others, they tend to lack of quantifiable objectivity. First, stories can very easily be non-representative of the most common realities. We, or those we listen too, can easily have experienced unique situations that do not actually correspond with normalcy. If fact, the stories we tend to enjoy most are those that are most unusual. So the stories that are shared, whether it be orally or through social or news media are often the ones that are furthest from an accurate representation of typical reality. For example, the deaths we were hear the most about on the news are the ones that are most unusual, and least likely to happen, precisely because those are the ones that are most interesting, the causes that in reality take the most lives, are banal.
Even though statistics are our most objective tool of describing the world in quantifiable ways, they are plenty of ways we can be deceived by them. Let’s look at a couple basic steps for filtering statistics.
The first type of statistic we often see is those that simply describe some issue. The statistics are dealing with “what”, but not “how” or “why”. These are the easiest statistic to understand since they simply describing a measurable quantity. The first thing to consider in evaluating these statistics is understanding context. Many statistics can be completely overwhelming, and difficult to comprehend. And when statistics are difficult to comprehend it is actually a good intuition that we need more context.
For example, the US national debt is currently about 17 trillion dollars. The huge size of national debt is often talked about, but in reality this number alone lacks context. Trillions of anything are simply incomprehensible for just about any human. So how can we bring context? We could consider this at a personal level, the national debt per person is about $55,000. This is little easier to understand (less than many mortgages, but much higher than a responsible level of consumer debt). We could also consider a statistic from a history perspective (the debt as a percentage of GDP is higher than most times in US history, but it is lower than certain times like during WWII). We could also compare the US to other countries (the US is higher than most, but lower than several, for example, Japan’s is nearly twice as high). This isn’t to make any particular claims about the US national debt, but it demonstrates how we can try to find some context for statistics. The US debt is high, but without this context, numbers like the national debt are conceptually meaningless, and it is important that we find things to compare against to provide meaning.
The next thing to beware of in approaching these statistics that often something is being implied. While using statistics to understand the “what”s of the world, we often want to take the next step to understand “why” things are the way they are, and “how” we can affect them. We want to understand cause and effect.
Probably the most common saying among those who deal with research and statistics is “correlation does not imply causation”. This may sound confusing, but it is a fairly simple warning. While statistics may show that two things are related, we need to be very cautious about assuming that one thing caused another thing.
Let’s consider an example. One could say that you are far more likely to die when you are in the hospital than when you are at the grocery store. Of course this is true, but does this therefore imply that hospitals are more dangerous than grocery stores? Of course not. Hospitals aren’t the cause of the deaths. It is the illnesses that lead both to deaths and to the hospitals that is the cause. We can’t assume that just because hospitals and deaths are related that one is the cause of the other.
But determining cause and effect, so that we can learn what factors lead to what results is still extremely important, and statistics can be used to assess cause and effect. So how can we accurately determine when a statistics indicates a true cause? A statistical relationship can demonstrate cause and effect if we can prove that no other factor is the real cause. Sometimes we can logically establish this. Other times we can setup experiments. In medicine (and increasingly in other fields, like microeconomics), trials are set up where different people are given or not given some intervention (like a drug), and then you can measure the effects by comparing the two groups. Usually the recipients are randomly chosen. The advantage of this approach, is that by randomly choosing the cause, we can be certain that no other causes are in play. For this reason, randomized control trials are used extensively to accurately assess drugs, and can be very valuable for making accurate assessment of other types of interventions.
Outside of controlled experiments, it can be very difficult to pin down sources, because it is very difficult to be sure that no other factors are influencing results. For example, determining the effect of different educational institutes is remarkably hard because so many factors like affluence affect both the education outcome and the type of institution that the parent would choose, and determining whether the institute was the deciding factor or any of the other vast number of ways that wealth can be used to benefit the education, is again, very difficult.
When we can’t control the factors, the primary remaining technique is regression analysis. However, this is where this casual discussion of statistics quickly turns much more advanced. Suffice to say that to truly demonstrate cause and effect with real world uncontrolled statistics typically requires complicated analysis.
To summarize the warning here, if someone is trying to convince you that one factor is the cause of another issue through statistics, unless the cause is logically provable, comes from a controlled experiment, or involves regression analysis, it is probably suspicious.
Statistics are an invaluable tool for understanding our world, making informed decisions how we prioritize our time and money. But in order understand them, remember to seek to understand them by putting statistics in context, and be cautious of drawing cause and effect relationships. Hopefully these tips help make sense of statistics.