Monday, 12 March 2012

Being honest with statistics

Daniel Bor's latest blog entry discusses problems with weak statistics in neuroimaging papers, but the issues raised are relevant to any area of research that relies on inferential statistics. For example, as behavioural scientists move towards collecting larger data sets, the risk of finding false-positives increases accordingly.

Statistics also play an important role in any psychology degree. Gaps in statistical knowledge quickly become apparent when students are asked to critically review other's work or carry out their own research. One wonders if these early misconceptions could contribute to poor research practices further down the line. 

In my somewhat limited experience, psychology students generally know what numbers they need to report, but often fail to understand what goes into making those numbers a reality. This common misunderstanding can be split into three distinctive areas where undergraduates may benefit from additional support:

(a) Any introductory statistics course needs to emphasise the importance of data visualisation (like the graph below for example).

(b) Teaching needs to reinforce some of the basic mathematical principles behind inferential tests.

(c) Students need to appreciate the importance of assumptions that data must adhere to before accepting the output of any parametric test. 

I often ask students, 'Can you tell me what numbers go into calculate the value of t as part of an independent t-test?' Worryingly, many students are unable to answer this question. They are however, able to run this test in SPSS and correctly report the results.



A genuine result can only be confirmed if the assumptions of normality are met, but this is rarely mentioned in most published research. If data is not normally distributed then it must be transformed. Non-parametric tests provide a valuable alternative. My point is that if students are going to be taught the importance of data adhering to normality when using parametric statistics, then this should also be given careful consideration in published research. But I'll save that for another post

Introductory courses in statistics need to change.

There is a tendency to introduce concepts like p-values and degrees of freedom early-on without actually explaining where they come from.  Phrases like 'p is a measure of how likely the result was obtained by chance' are unhelpful. As an undergraduate, I had no idea how this was possible. A chance of what? The chance that is was raining outside? The chance that I made a mistake in the calculation? How could this p-value tell me how likely the result was down to chance when all that went into the test was my own data?

This causes many students (including myself at one point) much distress. These confusions stem from the fact that bare-bones explanations offer an incomplete picture. This builds up a sort of bleak mystique around the numbers as if they have somehow been plucked out of thin air. 

Students need to be clear what inferential tests are actually doing from the outset and this understanding will be further strengthened by the critical appraisal of others work. If these misconceptions can be ironed out quickly, then other statistical tests will become easier to digest. 

There is no point sugarcoating concepts just because they cause students to ask a lot of questions. The complete truth will make life easier in the long run, produce better research and better psychology graduates.