Gary Taubes on Cherry-Picking and Paradigm Shifts (A Brief Thought on Science)

Sep 06, 2011

Warning: A Serious Blog Post Occurs Somewhere Below

Some controversy recently erupted in the Twitter-sphere when a number of us including Dave Dixon and Dallas Hartwig were recently discussing Denise Minger's angular hypothesis of atherosclerosis, in which she proposed that increased concentrations of serum bananas and increased concentrations of other plasma constituents with pointy ends or sharp edges penetrate the blood vessel wall and initiate plaque development. Andrew Badenoch's research showing that increased banana intake does not increase serum banana levels has made it difficult to base a dietary theory on this hypothesis, but we have tentatively concluded that picking cherries, because of their sphericity and resultant tendency to bounce cleanly off the blood vessel lining without incurring any injury, is likely to lengthen lifespan.

After several of us observed that not chewing such fruits is likely to preserve their roundness, reduce their insulinogenic properties, and lower their effect on reward centers in the brain, I used the definition of cherry-picking recently put forward by Gary Taubes to suggest that dismissing studies demonstrating the benefits of not chewing your food might significantly increase lifespan. In other words, if Mr. Taubes is seeking key experiments that are capable of distinguishing between competing hypotheses, and if he considers this “cherry-picking,” then his approach to studies like the one I just linked to that support all three hypotheses are likely to lead to increased cherry-picking and thus increased immunity to heart disease. In suggesting that Gary was likely to outlive most of us, I was simply wishing well to a man that has introduced innumerable people to the work of Weston Price and to the paleo movement, an achievement Melissa McEwen recently emphasized, and infusing this wish with a little of the humor that has thus far characterized the bulk of this discussion.

Nevertheless, some found this comment to be “snarky.” Dallas and I therefore decided that humor just doesn't come across that well in 140 characters, and that this issue deserves a serious blog post rather than a bunch of tweets. I had written such a serious blog post at lunch yesterday (Monday), but duty called, more important things arose, and I never managed to finish it. Given the issue's import, I have decided to finish and publish that post. So here it is — a very serious post about the art of cherry-picking.

The Unfinished Blog Post Begins

There's really nothing like spending Labor Day making up for lost Hurricane Time developing new ways to reduce artifactual formation of malondialdehyde (MDA) during the homogenization of adipose tissue. Yet even those of us who missed the memo that the ~~Bible~~ labor movement invented a thing called “the weekend” have some time to blog during lunch — as long as it's just a brief thought about life, or the scientific method.

The Art of Picking Cherries

Gary Taubes recently wrote about the essentiality of “the supposedly heinous crime of cherry picking” to the progress of science:

This map-making exercise can be perceived as a justification for cherry-picking of the data, which, in a way, it is. But I’m arguing that such selective interpretation of the data is a fundamental requirement to make progress in any field of science, and particularly one as off the rails as that of obesity and nutrition. It is inherent to the process that Kuhn described as “map-making,” to taking a non-playable game – a dysfunctional paradigm – and making it playable.

Gary goes on to explain that launching a paradigm shift requires sifting through the vast mass of scientific data to isolate key experiments that can differentiate between competing hypotheses and to discard the rest.

I think Gary is making a critical point. There is no sense in trying to wrestle with all the data. A great deal of data (perhaps most) that has ever passed through a scientist's notebook or collection of Excel files probably remains unpublished, and most things that happen — that is, potential data — pass through time unrecorded. Trying to amass “all” of the data is thus a naive exercise in futility.

Moreover — and I think this is Gary's main point — most tests of a hypothesis could be interpreted as supporting multiple hypotheses, and when it comes time to tease these hypotheses apart, we get nowhere by looking at every experiment that supported one or another of them. Instead, we need key experiments capable of distinguishing them.

Nevertheless, I don't think dismissing irrelevant data can actually be called “cherry-picking.” Cherry-picking, the way I see it, is the dismissal of relevant data. This dismissal falls roughly into two categories:

If one type of observation is repeated multiple times with conflicting results, it would be cherry-picking to look only at the results that support our hypothesis.

If one hypothesis could be distinguished from another with several different types of observations and these different methods of hypothesis-splicing yield conflicting results, it would be cherry-picking to use only the types of tests that support our hypothesis.

Gary clearly isn't arguing for this type of cherry-picking. He elaborates:

What we ultimately want, as Feynman suggested, is an experiment or an observation that can unambiguously — i.e., rubbing back and forth gets us as close to nowhere as we can get — differentiate between hypotheses or paradigms.

Once again, I think he is making a critically important point and I am mostly in agreement.

At the same time, I think it would be somewhat naive to believe there could ever be a single experiment that could definitively distinguish between two competing hypotheses. It would be all the more naive to think such an experiment could definitively support a single hypothesis, because most hypotheses that any given experiment supports are probably undreamed of. If we can only think of two, three, or five hypotheses that we could use to interpret the results of a single experiment, it is probably our imagination and not our experimental precision that is the limiting factor.

I imagine Gary would agree, and I am therefore not suggesting that he is naive to these points, but simply offering an elaboration and clarification of the strong arguments he already made.

What we need to do is design our experiments to be as discriminating as possible, realizing that we will never achieve infinite precision. We must then slice the data from many different angles like we would slice a pie — or perhaps a steak, if Gary would prefer, or even a low-reward plain potato — and attempt to paint all of these different forms of imperfectly discriminating evidence into a coherent picture.

LDL Oxidation as an Example

Take, for example, my contention that the oxidation of polyunsaturated fatty acids and proteins in the LDL membrane is a central event in the initiation of the atherosclerotic lesion and a less central but still important event in the inflammatory cascade that eventually enables that plaque to cause a heart attack. I and many others have come to this conclusion by attempting to reconcile the evidence garnered from multiple approaches including test tube science, animal experiments, and the genetic and clinical evidence in humans.

There is no single, definitive experiment that could ever be performed that could, in and of itself, demonstrate this hypothesis to be true.

I explained in “Genes, LDL-Cholesterol Levels, and the Central Role of LDL Receptor Activity in Heart Disease” that statins, cholestyramine, and thyroid hormone all increase the activity of the LDL receptor, but none of them do so specifically. We do not have a drug or dietary agent that only changes LDL receptor activity and does nothing else. There are antibodies to PCSK9 currently being investigated for clinical use, which should inhibit the degradation of the LDL receptor. If their specificity of action pans out, these might be able to show that LDL receptor activity governs heart disease risk even in people without genetic defects, and dose-response studies could define the range in which LDL receptor activity is important and whether its relationship to heart disease risk is linear.

It would be wrong, however, to consider this in and of itself definitive. PCSK9 may do things we don't yet understand. Future tests may show that the antibodies bind to other things besides PCSK9 that were not included in the initial specificity tests, or that the antibodies may elicit some unforeseen reaction of the immune system.

Such tests would also leave us in the dark about why increasing LDL receptor activity protects against heart disease. Is it, as I contend, that robust clearance of lipoproteins from the blood prevents their oxidation?

Current ways of testing the oxidation hypothesis in live humans are quite certainly insufficient. There are no antioxidants we could supplement that would act specifically on the LDL particle, and adding single antioxidants is always risky business because doing so can actually disturb the antioxidant network and disrupt important communication signals.

Perhaps we could design an experiment where we randomize people to receive the anti-PCSK9 drug or a placebo. Then we could inject half of each group with chemically purified oxidized LDL and half of each group with an inert solution that had gone through the same purification process (so it picks up all the same trace contaminants) but that lacks any oxidized LDL.

That would never pass an Institutional Review Board for obvious ethical reasons. Even if it did, however, and even if we showed that injection of oxidized LDL abolished the protective effects of increasing LDL receptor activity, there are still a whole host of objections to a definitive interpretation:

The fact that something can produce a disease experimentally does not mean it did produce the disease in everyone who has it. What if this is one of many causes? How important is it relative to other causes?

What if this experiment cannot be repeated in people of a different gender or ethnicity? We would have to go back to the drawing board to attempt to explain why. This would certainly make our results seem less definitive, but we would only know about this problem once we attempted to replicate the experiment in these other populations.

In humans who are not acting as laboratory guinea pigs, LDL oxidation is a continuous process. Does injecting people on, say, a weekly basis with a larger amount of oxidized LDL than they would ever experience at one time create a fundamentally different scenario? Perhaps in most people LDL never oxidizes fast enough to accumulate in the blood at a high enough concentration to cause harm.

We could go on and on. The totality of all the possible objections to a definitive interpretation can never be satisfied with a single study. Developing broad support for a hypothesis requires studying it in many different ways, looking at it from many different angles, using the most discriminating evidence possible but recognizing that its precision is imperfect, and attempting to fit all of the pieces of the puzzle together — without picking any cherries along the way.

Harnessing the Power of Nutrients

Discussion about this post