Say that one day you see me walk up to a roulette table and win big on my first spin. You might mumble, “Lucky guy.”

Later on you happen to see me on the basketball court, where I hit 15 of 20 shots from the three-point line. To that you might say, “Wow, Peter has game.”

The difference in how these two positive outcomes arose, and my relative influence on each of them, illustrates why the Food and Drug Administration should reject Biogen’s flawed statistical argument that aducanumab, its controversial Alzheimer’s drug, is ready for approval.

advertisement

Aducanumab is believed to block the buildup of the protein beta-amyloid in the brain, which some say is the cause of Alzheimer’s disease. Retarding this buildup, so the thinking goes, will attenuate the progression of Alzheimer’s disease.

Multiple comparative clinical trials have cast doubt on the amyloid theory, and Biogen even pulled the plug on two aducanumab trials, called EMERGE and ENGAGE, in 2019 based on prespecified futility analyses. But a last-minute reanalysis showed that patients in the EMERGE trial who took higher doses of aducanumab for longer times had statistically significantly better cognitive outcomes than the those who took a placebo.

advertisement

The question for Biogen — and for the FDA — is whether EMERGE’s second-look finding is a true measure of the drug’s benefit. It could also be a false positive, where the result is not actually due to the treatment. My performance at the roulette wheel and on the basketball court highlight the distinction between these two scenarios, and how the FDA should discern which one is more likely.

A single-spin roulette win is unlikely. The odds of it happening twice in a row is less than 1 in 1,000. Now to the basketball court. On my next outing, I am a dismal 1 for 20 from the three-point line. Given my first stellar performance, that’s unlikely, with 1 in 1,000 chance of having two consecutive performances this different. But even though it’s improbable, there is no doubt that I am the same player who is solely responsible for the outcome.

If my roulette and basketball outings were clinical trials, the P value would be less than 0.001 in each case. Some people think that P values like these tell us how likely it was that the observed outcome was due to chance. That’s wrong. A P value tells us only how likely the outcome would be *if* it were due to chance. That means the same P values can have different implications.

No matter how many roulette spins I win in a row, and no matter how improbable that is, no one would conclude that I am a skilled roulette player: It is purely a game of chance. The opposite would be said about tossing basketballs — it is just me responsible for the ball going through the hoop or missing.

The FDA has to figure out if the low P value in the EMERGE trial is more a reflection of aducanumab actually affecting patient outcomes, like the basketball example, or being a bystander in a trial where the outcomes differed due to chance alone, like the roulette example.

That’s where our general knowledge, and the FDA’s knowledge of prior evaluations of the amyloid theory, need to come into play. That knowledge should always underlie the interpretation of an observed outcome, whether it’s in gambling, basketball, or drug trials.

The common term for this knowledge is the prior. In my examples of roulette and basketball, the priors — meaning what our knowledge tells us of the likelihood that I will have an influence on the outcome — are 0% and 100%, respectively.

Biogen wants the FDA to believe that the prior for aducanumab causing the outcome difference in EMERGE is high, like my influence on three-point shooting. But all those previous negative trials of the amyloid hypothesis suggest that EMERGE is a lot more like my roulette spins, with aducanumab as a mere bystander to an outcome difference that was due to chance.

We don’t need to rely only on the extremes in this case. Bayes theorem is often used to understand trial results, particularly when the treatment being studied is unlikely to work in the first place. Using this approach, if the prior odds that aducanumab works are around 1 in 10 (meaning among 10 scenarios where drugs like aducanumab are tried, only one will actually benefit patients), the chance that the EMERGE result is a false positive is around 40%.

(The two other components of this calculation are: 1) the P value less than 0.05 that the study used as a standard for statistical significance and 2) the statistical power — the probability of detecting an effect if one actually exists — of 80%. How these numbers can be put together is shown in this graph from a paper on redefining statistical significance.)

An amyloid enthusiast could say the prior odds were as high as 1 in 5, but that would still mean the likelihood that EMERGE is a false positive is around 25%.

Whether the chance that the EMERGE result was a false positive is 40% or 25%, either is much too high to justify the FDA concluding that aducanumab is effective at deterring the progression of Alzheimer’s disease, especially since the public and physicians generally consider FDA approval to mean that a treatment does what it claims to do.

A better step would be for Biogen to run another trial. It might cost around $250 million dollars, but given Biogen’s public assertion that aducanumab works, the economics would still be a no brainer. Ultimate approval would mean billions of dollars, or perhaps tens of billions, for the company.

Then again, Biogen may not be as confident as it claims to be, which means it would never run that next trial of aducanumab.

In other words, what Biogen does next — if the FDA does not approve the drug — will actually tell us about its “prior.”

*Peter B. Bach, M.D., is the director of the **Drug Pricing Lab** at Memorial Sloan Kettering Cancer Center in New York City.*

FDA should approve and Biogen track and report the ongoing findings since the downside risks of taking the drug are small, similar to what FDA did with SRPT.

It’s just the question if it works within a relative large population ! If you would analyze your basketball-drunken-trials you would pick those shots at the basket which will go through the net. When you would make a robot you would write the exact commands in the software to make a 100% chance, and not those lowsy ones that were relatively unpredictable or false in their effectiveness. In other words, when the target-group(patients) is homogeneous(like the fixed messures of the basket-equipment, you know the outcome ! Because people are not reacting the same, a number of solutions will be required and not a single product for all marketsegments(dementia-variations/patient-variantions). I presume even illnesses like dementia could be segmentated so the right drug will do for ‘whom’ it’s made for and not those others.

Great article. Just wondering… “how the odds of it (A single-spin roulette win) happening twice in a row is less than 1 in 1,000….?or was it used figuratively

There are 38 spaces on an American roulette wheel.

So the chances of two wins in a row are (1/38)*(1/38) or 1 in 1,444.

I have been diagnosed, in UK my domain, with Alzheimer’s disease in the last few months. There seems as yet no effective treatment which for me, and thousands of others, is distressing. I should be pleased to receive any up to date news as regards medication or impending treatment.

Anne Goodwin