Big Ocean of Data

Confounded by Big Data and the Obesity Paradox

We live in an age of big data. That big data brings the possibility of big new insights in nutrition, obesity, and health. It also brings the possibility of big mistakes as people try to translate associations they find into cause and effect relationships.

Especially with big data sets, the possibility of confounding errors looms large. Confounding occurs when a robust association between two variables leads people to think there’s a causal relationship at work. But especially with big data sets, other factors – invisible to the researchers – can be at work to confound the observations. In a special issue of the Proceedings of the National Academies of Sciences, Richard Shiffrin explains:

There are enormous difficulties facing researchers trying to draw causal inference from or about some pattern found in Big Data: there are almost always a large number of additional and mostly uncontrolled confounders and covariates with correlations among them, and between them and the identified variables. This is particularly the case given that most Big Data are formed as a nonrandom sample taken from the infinitely complex real world: pretty much everything in the real world interacts with everything else, to at least some degree.

And so it is that we have a seemingly endless debate about the so-called obesity paradox. In large, broad samples of the population, obesity is pretty clearly understood to carry a risk of excess mortality. A meta-analysis published by Lancet this summer provides good support for this understanding.

But in more narrowly-defined populations, people with a BMI in the range of obesity have been observed to survive longer than people with lower BMI. The classic example is people who already have cardiovascular disease. Especially in an older population with heart disease, thinner people tend to die sooner than people with a BMI in the range of mild obesity.

Figuring out what this association means is hardly simple. Is extra fat tissue of a high BMI helping people in this special population live longer? Or is the low BMI of people who don’t survive a signal of some other health problem that confounds the analysis?

Two cautions are worth remembering. First is the problem of confounding. Writing on the subject in a recent text, John Danziger and Andrew Zimolzak warn that:

Any observational study may have unidentified confounding variables that influence the effects of the primary exposure, therefore we must rely on research transparency along with thoughtful and careful examination of the limitations to have confidence in any hypotheses.

Second is the problem of using BMI to define obesity. In Obesity Reviews, Alexios Antonopoulos and colleagues explain that:

Observations supporting the existence of an obesity paradox could be driven by both the limitations of BMI as an obesity index and clinical studies per se and may represent an epiphenomenon rather than a true causal relationship.

Not every problem will yield to the brute force of big data sets. The obesity paradox is a perfect example of a puzzle that will likely vex scientists for some time to come.

Click here for the text by Danziger and Zimolzak. Click here for the paper by Antonopoulos et al. Click here for yet another thoughtful review of the obesity paradox.

Big Ocean of Data, photograph © NASA / flickr

Subscribe by email to follow the accumulating evidence and observations that shape our view of health, obesity, and policy.


September 19, 2016

2 Responses to “Confounded by Big Data and the Obesity Paradox”

  1. September 20, 2016 at 12:53 pm, Angela Meadows said:

    “In large, broad samples of the population, obesity is pretty clearly understood to carry a risk of excess mortality.”

    It could be argued, and has been by many, that this is a perfect example of correlation equalling causation. One obvious candidate for the confounding variable would be living in a society that constantly stigmatised your body and your being, and leads to massive self-loathing, and repeated attempts at intentional weight loss, through more or less healthy behaviours (leading to weight cycling, another potential candidate).

    This is not a question that can be really answered by real-world research because you would be hard pressed to find a population that has not been exposed to this messaging. Some evidence from previous years has shown that where high-weight is considered the ideal, the negative relationships between weight and health are attenuated or even disappear altogether (e.g. Meunnig 2008), but it will be harder and harder to find such populations these days. Thus, the relationship between high BMI and poor health must remain one of correlation being mistaken for causation.