A recent study looks at the use of acupuncture in the treatment of infantile colic. If it seems counterintuitive to stick needles into a crying baby, in this case your intuition is a reasonable guide. Time reports about the study in an article titled: The Soothing Benefit of Acupuncture for Babies, which begins:
“Acupuncture may help babies who cry too much, according to a new study published in the journal Acupuncture in Medicine.”
The BBC did a better job with the headline: Can acupuncture ease baby colic? which begins:
The crying of babies with colic may be reduced if they are treated with acupuncture, according to controversial research from Sweden.
But UK experts say no conclusions can be drawn from the small study of 147 babies aged two to eight weeks.
The difference between the two reports seems to be that the BBC consulted Dr. Edzard Ernst to help put the study into context. Dr. Ernst is an expert in complementary and alternative medicine (CAM) but is known for taking a science-based approach to the field. Meanwhile Time quotes a doctor of Integrative Medicine who gave only fawning praise for the study.
Unfortunately, this is a common pattern. There have been several thousands studies of acupuncture, and despite this massive effort to demonstrate that sticking thin needles to a certain depth into specific points of the body has a therapeutic effect, the research has failed to do so. Systematic reviews of acupuncture show several clear patterns that are most consistent with the conclusion that acupuncture does not work. First, when properly controlled for, it does not seem to matter where you stick the needles. It also does not seem to matter if you actually stick needles through the skin. Gently poking the skin with toothpicks seems to be enough.
What this means is that properly blinded comparisons of acupuncture tend to show no difference. The only comparisons that show a difference are unblinded – typically between some type of acupuncture (including fake acupuncture) and no treatment at all. There does not appear to be any dose-response, the details of the treatment do not matter, and the training of the acupuncturist does not matter. The only reasonable scientific interpretation of this fact is that there is no specific effect from acupuncture, that it is merely a theatrical placebo and any observed effects are non-specific placebo effects.
This would be similar to a drug that only had an effect with unblinded comparisons to no treatment, but never showed a clear difference to blinded placebo treatments, that showed no dose response, and where the dosing interval or other details did not seem to matter. Further, it was claimed based on this level of evidence that the drug worked for hundreds of completely unrelated conditions, without the slightest plausible mechanism. If a drug company promoted such a drug there would be cries of the evils of “Big Pharma.”
It could be reasonably argued (and we have) that after several thousand studies there is simply no reason for further acupuncture research. This is a failed hypothesis that was never plausible in the first place. If, however, you are going to conduct acupuncture research at the very least you should do a rigorous study. We are way past the preliminary research stage with acupuncture, and it has already been clearly established that preliminary acupuncture studies are of no use at all. Such studies tend to be false positive and not replicated by more rigorous studies. The only reason to do a preliminary acupuncture study at this stage is to pad your CV and to promote acupuncture through a credulous media.
Acupuncture for Colic
Colic simply refers to an excessively fussy baby that cries more than three hours a day, three or more days a week, for three weeks or longer. This is a benign condition, but can be extremely stressful for the caregivers. The main treatment is to simply provide some TLC for the baby by holding and cuddling them. It may also be helpful to remove milk from their diet, and to make sure they are properly burped after feeding.
There is no reason to think that sticking needles into a colicky baby will be of any benefit, but acupuncture is a treatment looking for an application, and any application will do. There have been a few studies of acupuncture for colic. The only review I could find concluded that the evidence for acupuncture in colic was limited and inconclusive. Landgren, lead author on the current study, responded to this review by publishing the following comment:
Inconclusive results in the few published articles on the subject can be due to different acupuncture points, different insertion time, different needling methods, differences in the outcome variables, in how the crying was measured and insufficient sample sizes. Further research is needed on understanding the utility, safety, and effectiveness of acupuncture in infants with colic.
There is no justification for this conclusion in the acupuncture literature, which clearly shows that method, insertion time, and acupuncture points do not matter. All these details, however, provide endless opportunities to dismiss negative results. I would also point out that while small sample size does limit the power of a study to detect a small effect, small studies are more susceptible to p-hacking and are more likely to be false positive.
It is interesting to note that Landgren went on to do a follow up study, which was larger than previous studies, but still lacking in sufficient rigor to definitively answer the question of whether or not acupuncture works for colic. He just added another preliminary study generating another round of misleading promotion of acupuncture.
The study looked at three treatment groups, including two types of acupuncture (varied in needle location) and usual care, meaning no intervention. There was no sham or placebo acupuncture, which in my opinion is a fatal flaw. Again – at this point in acupuncture research, a study without at least sham acupuncture is worthless. The study found no difference between the two acupuncture groups, so they pooled both of those groups and compared it to the no treatment group and found some advantages for acupuncture.
Here is the critical fatal flaw in the study, however. David Colquhoun spells it out nicely, so I quote:
Table 1 of the paper lists 24 different tests of statistical significance and focuses attention on three that happen to give a P value (just) less than 0.05, and so were declared to be “statistically significant”. If you do enough tests, some are bound to come out “statistically significant” by chance. They are false positives, and the conclusions are as meaningless as “green jelly beans cause acne” in the cartoon. This is called P-hacking and it’s a well known cause of problems. It was evidently beyond the wit of the referees to notice this naive mistake. It’s very doubtful whether there is anything happening but random variability.
Yes, this paper screams p-hacking. In fact table three showed 32 different statistical analyses, with four being barely or minimally significant, without any clear pattern. This is consistent with random noise. It should also be noted that the primary outcome did not achieve statistical significance, which means that technically the study was negative. The secondary measures that barely made it over the 0.05 line were cherry picked.
I would also point out that the study showed no difference between the two types of acupuncture. Again – it does not matter where you stick the needles, because acupuncture points are pure pseudoscience. They do not exist. Also, the researchers decided, because they were treating infants, not to stick the needles to the “proper” depth to elicit the de qi. Rather they used superficial needling. So this study also demonstrates that it does not matter if or how you insert needles.
The frequentist fallacy is to conclude that if you can make the numbers show statistical significance, then there is a real effect. This ignores the very real problem of p-hacking, which involves massaging the data to squeeze out statistical significance from noise. Even without p-hacking, p-values are a poor predictor of whether or not an effect is actually real. David also points out that a p-value close to 0.5 indicates a false positive rate of 30%. This increases if you do multiple comparisons, which this study did, and also increases if the prior plausibility is low, which it is.
The net effect of all this is that the alleged positive effects in this study (which is really negative) are almost certainly false positive. This study failed to achieve the rigor necessary to provide reliable results. The results, as they are, are technically negative and do not justify the conclusions of the authors, nor the press release claiming that acupuncture is a viable option for colic.
It is really unacceptable at this point in the arc of acupuncture research to publish small studies with poor rigor and questionable results. This only serves to muddy the waters further and cause confusion both professionally and in the public. Only the most rigorous studies, with proper control groups and statistical rigor, address the criticisms of acupuncture and acupuncture research. This study does not do that.
There is still no reason to recommend acupuncture for colic in infants. There is no plausible mechanism, the bulk of the vast acupuncture research shows that acupuncture does not work, and the research for colic specifically is limited and unconvincing.
Acupuncture for Infantile Colic Steven Novella