There are three kinds of lies: lies, damned lies, and statistics

The title of this blog is an expression that has been commonly used. It was popularized by Mark Twain. The implication of this statement could be that statistics can be put to use to prove anything. Well, an antidote to this would be that, statistics is not a problem but a solution.

There are two situations in which mistakes can happen in statistics. One is when a dishonest or unscrupulous person would manipulate the results in order to arrive at the desired conclusions. The second situation could arise with honest analysts who would unknowingly make mistakes by getting trapped in the trickiness of statistics. But, despite these mistakes, the entire field of statistics cannot be blamed. It can be justified by this example that, if a particular surgeon does not follow best practices the entire field of medicine does not become incapable or at fault. Similarly, in the case of statistics, it is the statistician who is expected to fix the problems once they have been identified.

When the statistical principles are applied in the correct direction, the analysis that they produce is more likely to show accurate results. In order to be able to produce trustworthy results, it is a task for the statisticians to ensure that all stages of the study are correct.

Statisticians should know how to:

  • Design the study in such a fashion that it answers the question that is at hand.
  • Collect data that is reliable
  • Analyse the data in an accurate way and appropriately
  • Verify assumptions
  • Draw more reliable conclusions

There are various ways in which the produced sample can give misleading conclusions:

  1. Prejudiced/Biased Sample: A non-biased sample can bias the results from the initial stage itself.
  2. Excess Generalisation: It is never likely that the results of one population may apply to another population. This limitation of statistics needs to be understood and kept in mind that inferences are limited to the characteristics of the population.
  3. Causality: Statisticians conclude causal connection in the light of tight criteria and that is what is required to be done.
  4. Incorrect Choice of Analysis:  The right choice of analysis would help in arriving at a solution. It would depend upon the complexity of the model.
  5. Violating the Assumption:  Almost all statistical analysis have assumptions, They largely are assumptions that are based upon the type of sample, type of data, distribution of data etc.  It is difficult to trust the results if the analysis is done without having checked the assumptions.