Ten Tools for Exaggeration in Pediatric Obesity Studies
Tall tales are not just for the literature of Mark Twain. In fact, you can find a few in childhood obesity. A new paper in Obesity Reviews offers an inventory of ten methods for exaggeration of effectiveness in childhood obesity studies. Andrew Brown and colleagues (including ConscienHealth’s Ted Kyle) provide examples of each.
Checklists can be useful. Perhaps this one will help everyone from researchers to readers spot errors and be more careful to avoid them.
1. Teaching to the Test
Self-reported outcomes are a problem in obesity research. That’s because people fudge a bit when asked about what they’ve eaten, how much they weigh, or any number of other things. But the problems multiply when you’re testing a program that urges people to do something and then judge its success by whether or not they say they’ve done it. In this circumstance a control group makes the problem worse. Nobody’s taught them the right answers to the final exam. The best solution is to use objective measures, not self-reports.
2. Regression to the Mean
In obesity, when you don’t have a control group, regression to the mean becomes a problem. Simply stated, a group chosen because their BMI is higher than average will drift toward the average over time. Even without any intervention. Thus, claims based on a comparison to baseline alone are bogus. Yet it happens all the time. The answer is simple. Don’t rely on comparisons to baseline.
3. Moving the Goalposts
It’s annoying when results don’t come in as hoped. Researchers might spend years developing an intervention, only to find that the results of a controlled trial don’t conform a prespecified definition for success. So if you can find a significant result by moving the goalposts, the temptation is real. But for a careful reader, it destroys the credibility of the result. What’s more, learning what doesn’t work is important, too.
Papering over a null result gets in the way of progress toward approaches that will make a bigger difference. Null results deserve respect in the literature.
4. Overlooking Clusters
In many controlled studies, we can’t randomize individuals. Instead, we might randomize all the children in a classroom or a school to the same program. Other classrooms and schools serve as control groups. But cluster randomized trials require special analysis to avoid big statistical errors. All too often, this doesn’t happen.
The answer is statistical sophistication up front. A cluster-randomized trial requires a solid plan for analysis. Blowing money on a bogus analysis is a terrible shame.
5. Hacking the P and Fishing for Significance
P-hacking gets a lot of attention these days because it’s common and it’s corrosive to confidence in science. It happens when a primary analysis yields a poor result. Or sometimes there’s no prespecified analysis. That’s an invitation for exploring multiple analyses to find one that gives the desired result. Methods for detecting p-hacking are hardly simple, but they’re available. Scientists should protect themselves from inadvertent p-hacking by following good statistical practices. And finally, there’s nothing wrong with exploratory analyses, so long as researchers are open about them and their limitations. It can be the starting point for new research.
6. Bogus Baseline Comparisons
On occasion in an RCT, the treatment group might improve, but the control does not. In this circumstance, a common error is to conclude that the treatment works better than the control. But without a formal statistical analysis to compare the two groups, this conclusion is simply wrong. Statisticians call this a difference in nominal significance (DINS) error. It’s easy to avoid by doing the right analysis. And it’s easy to spot if you know what to look for.
7. False Equivalence
When a comparison of two active treatments doesn’t find a difference between them, you might think that they’re equally effective. But you’d be wrong. It’s one thing not to find a difference. It’s quite another to prove equivalence. A reader can spot this error by reading the results of a study carefully. A researcher can avoid this mistake by deciding up front what kind of study they’re doing: superiority, equivalence, or non-inferiority.
8. Discounting Inconvenient Controlled Results
Sometimes, observational data is all we have. But when we have randomized control data, we need to take it seriously. Nonetheless researchers sometimes discount inconvenient results from controlled data and, instead, point to encouraging observations from the active group alone. This paper provides one such example.
9. One-Sided Testing
A one-sided t-test can beef up the statistical significance of a study’s results compared to a two-sided test. But it’s almost always the wrong test to do, except in the case of a noninferiority study. So just don’t.
10. Insignificant Clinical Significance
When big differences between test and control groups are not statistically significant, labeling those differences as “clinically significant” is wrong. Yet it happens. The bottom line here is simple. If an outcome isn’t statistically significant, the intervention is not effective. Period. Anecdotes don’t add up to data.
Click here for the paper in Obesity Reviews.
A Carload of Tomatoes, illustration by Edward H. Mitchell via Emma Paperclip / flickr
Subscribe by email to follow the accumulating evidence and observations that shape our view of health, obesity, and policy.
Month #, 2019
August 21, 2019 at 12:27 pm, Allen Browne said:
Should be required reading for all interested in research or doing research or reading research papers. Thanks.
Allen