From XKCD
I'm sure many of you feel that it is disappointingly easy to become embarrassed for humanity whenever reading a discussion of correlations. In academia's greatest charade, every Stats 101 class or Epidemiology 101 class or heck even a Psych 101 class will emphatically declare that correlation does not imply causation. Then most people graduate and spend their entire lives reading causation into correlations. Especially if they become epidemiologists.
Observational studies are entirely legitimate forms of evidence, and correlations are entirely useful statistics. No one can question this. However, these correlations simply show a relationship and tell us nothing about the explanation of that relationship.
This doesn't change just because an explanation is biologically plausible. Nothing ever changes it. A correlation raises the possibility of a cause-and-effect relationship, but no more or less than it raises the possibility of a non-causal relationship.
Nevertheless, many people who understand this may still believe that a lack of correlation can rule out a cause-and-effect relationship.
But it can't. It can't even come close.
In fact, a lack of correlation tells us much less than a correlation does. This is because a correlation at least tells us there is a relationship, even if it tells us nothing about why the relationship exists. A lack of correlation, by contrast, does not even tell us that there isn't a relationship.
While “no relationship” is one possible explanation for a lack of correlation, there are several others:
Lack of statistical power. The study may have needed a larger sample to detect the correlation. This would not affect the correlation coefficient, but it would affect whether it is statistically significant.
Lack of sufficient range of variation. Even if you have a perfectly linear and very strong correlation, if you limit the range of variation you study, the correlation coefficient will decrease. If you limit your range narrowly enough, the correlation coefficient will essentially disappear. For example, if study time increases test scores linearly over the range of zero to ten hours per week, the relationship may be perfectly linear between eight and ten hours per week, but because you have limited the range of variation, you will decrease the correlation coefficient. As you decrease the range further, the correlation coefficient becomes zero. This would make it look like the study was not actually underpowered, because there would not even be a non-significant correlation. But it's just an illusion.
Lack of linearity. Conventional correlation coefficients look for linear relationships. As X increases, Y increases. Or as X increases, Y decreases. As Ned Kock frequently points out, many relationships found in nature are U-shaped or otherwise non-linear. For example, Paul and Shou-Ching Jaminet suggest in Perfect Health Diet that there is an optimal range of carbohydrate consumption. The risk of disease, according to their hypothesis, will be lowest in this range, and will increase as one departs from it either by increasing or decreasing carbohydrate intake. If you're looking for a straight line when nature provides a U, you aren't going to find your line.
Incomplete or inappropriate adjustment for confounding factors. There may be other factors that affect the relationship that are not being taken into account. On the other hand, perhaps there were statistical adjustments that were made that shouldn't have been made. Assuming that the stats “have been adjusted for all the confounding factors” assumes that our knowledge of what may affect the relationship is complete or nearly complete. In fact, our knowledge of what affects the relationship could be closer to a grain of sand in an entire seashore. Moreover, understanding what is a true confounding factor requires understanding the cause-and-effect relationship — and usually this is uncertain and controversial. Failure to make the right adjustments results in a failure to make the relationship manifest, while making the wrong adjustments can hide a true relationship.
Thus, lack of correlation certainly does not imply lack of causation.
Back to our regularly scheduled genetics series — with a likely wheat interlude coming soon.
Enjoy the night!