In the Interim...

In this episode of "In the Interim…", Dr. Scott Berry and Dr. Nick Berry investigate how futility in clinical trials and stopping rules in sports illuminate very similar decision problems, albeit with very different consequences. Drawing from baseball’s 10-run rule, tournament cuts in golf, the discussion confronts traditional and Bayesian strategies for interim decisions. The episode explains why simulation, not historical trial review, provides the empirical backbone for futility boundaries in clinical trials, and details the mechanics and consequences of aggressive stopping criteria. Using the Biogen aducanumab Alzheimer’s trials, the conversation exposes how a futility rule based on 20% predictive probability halted trials even when meaningful probability of success remained. Scott and Nick address the influence of ethical considerations, cost, regulatory priorities, and statistical rigor, and contrast Bayesian predictive probability’s strengths over conditional power.

Key Highlights
  • Dissects sports futility rules (10-run rule, golf cuts, Bill James heuristic) and their application to clinical trial design
  • Argues for prospective simulation to define adaptive futility thresholds
  • Explains how Bayesian predictive probability provides a more robust framework than conditional probability for interim adaptive decisions
  • Details how aggressive futility criteria may prematurely stop trials and risk missing beneficial treatments, as in the aducanumab case
  • Explores the intersection of ethics, patient safety, operational efficiency, regulatory standards, and trial cost

Creators and Guests

Host
Scott Berry
President and a Senior Statistical Scientist at Berry Consultants, LLC

What is In the Interim...?

A podcast on statistical science and clinical trials.

Explore the intricacies of Bayesian statistics and adaptive clinical trials. Uncover methods that push beyond conventional paradigms, ushering in data-driven insights that enhance trial outcomes while ensuring safety and efficacy. Join us as we dive into complex medical challenges and regulatory landscapes, offering innovative solutions tailored for pharma pioneers. Featuring expertise from industry leaders, each episode is crafted to provide clarity, foster debate, and challenge mainstream perspectives, ensuring you remain at the forefront of clinical trial excellence.

Judith: Welcome to Berry's In the
Interim podcast, where we explore the

cutting edge of innovative clinical
trial design for the pharmaceutical and

medical industries, and so much more.

Let's dive in.

Scott: Alright, welcome everybody.

Back to In the Interim, I'm your
host, Scott Berry and I'm joined

with by Nick Berry today, Dr.

Nick Berry.

And we are going to do
our se our second episode.

Lessons for drug developers
from the world of sports.

You can go back to our first time
we did this episode 14 of, of in the

interim, and we had a really nice
episode on regression to the mean.

Today we're gonna talk about the 10
run rule and futility, so maybe bear

with us a little bit as we get as we
get to that, but I do wanna revisit.

The regression to the mean and
a prediction that Nick made.

We, we did this episode last year and May
of last year, Aaron Judge was hitting 400.

And for those of you who are not
baseball fans, uh, I, I, last week

had somebody from Iceland send
me a, uh, uh, an email about the

episode on, uh, the Panther trial.

So we have.

People globally listening to this,
A 400 success rate in baseball

is a, is a very, very high bar.

Nobody's done it since 1941.

And it's, it's one of these
mythical, uh, targets that

somebody could be a 400 hitter.

Well, Aaron Judge was.

Batting 400.

It's just his hits divided by at bats, uh,
in May, and that's early in the season.

His, his rate was 400.

We were talking about
regression to the mean.

So Nick predicted that by end of
year, he would be hitting three 30.

Interestingly, Jim Albert, who we
did another episode, I did an episode

with Jim Albert about Bayesian
statistics and sports statistics.

His, his, his career in those, he
guessed three 20 and Aaron Judge's

final batting average was 3 31.

So Nick, you were off, um, in your guess.

Nick Berry: of

Scott: Yeah.

Yeah.

Nick Berry: 0.1%

off.

But

Scott: Yeah.

Nick Berry: yeah, I think that
was basically weighted average of

what he's done with some random,
informed guess of his true batting

average to get me to three 30.

Yep.

Scott: Yeah.

And by the way, it was
an incredible season.

3 31 with the home runs he
had was a phenomenal season.

Uh.

Nick Berry: thing now.

I think he's hitting like 2 25 this
year, uh, almost a month into the season.

So the exact opposite.

We can, we can him to the mean
in the other way this year.

Scott: Ah, that, that actually
would've, uh, made for a good thing.

Maybe we should figure out what he is.

And, and you have to post another
prediction, but let me talk about

a different prediction you made.

And this was a, a, a prediction
we are coming off of.

We are, we are recording this
episode coming off of the

Master's Golf Tournament.

So that was last weekend.

And, uh, we, we are, we are,
we're a big golf family.

Uh, we enjoy golf.

Even, uh, Nick's sister Lindsay,
who doesn't play golf at

all, she loves to watch golf.

Uh, so we were watching the Masters and
halfway through the Masters, 36 holes, two

rounds, Rory McElroy had a six shot lead.

So he, the second place, and there were
multiple golfers that were six shots

back, but a six shot lead halfway through
the tournament is a quite large lead.

I, I believe it was the largest
lead ever in the masters at that.

And of course, our family text,
uh, goes to predictions, goes to

statistics, and the question was
from, from Nick's sister, Lindsay.

What's the chance that Rory McElroy wins
the Masters and multiple of us guess.

But Nick, you, you made a prediction
of the chance that he win the

Masters and what was your prediction?

I.

Nick Berry: I think I said 70%.

Basically my, my math was, I think
he probably has the best expected

value of the whole field just based
on the fact that he won last year.

He's playing really well this year,
so I said he's probably the best

player in the field this week.

a six shot lead and I, he that,
you know, his distribution was good

enough that even though there was a
lot of people with the opportunity to

catch him 70% chance of him winning.

Scott: And my, my prediction was 60%.

And interestingly, uh, Nick's,
uh, mom, uh, prediction was 20%.

It was quite different.

Uh, and

Nick Berry: golf and she knows
how variable it is, and that was

the reason for her prediction.

It was like weird stuff happens.

I

Scott: yep.

Nick Berry: you know, it's one guy
has to hold onto the lead and so

she asked 20% because of all the
variability and what could happen.

Scott: So at that point, uh,
to, to, to sort of come to this,

interestingly, Scotty Scheffler,
after 36 holes was 12 shots back.

And he's the number one
ranked golfer in the world.

Rory, I think was the number two
ranked golfer in the world, uh, in it.

And Scotty Scheffler was
12 shots back, essentially.

Very little chance he can
win less than 1% chance.

He would, he would win
the golf tournament.

Uh, how many, how much
less, but less than 1%.

Interestingly, if you were 17 shots back.

Meaning after 36 holes,
you were five over par.

Rory was 12 under par.

So if you were five over par
or more, you had a 0% chance of

winning because you were quote
unquote cut from the tournament.

The, those players stopped playing and
they reduce, and the, and the, the, the

rule of the cut is the top 50 players,
and I think it started with 91 players.

The top 50 players and ties continue on
into the weekend, the last two rounds.

But they s they, they, they cut and,
and sure, I'll say the word futility,

that these players no longer have an
opportunity to win the golf tournament.

If you were 17 shots back, happened to be
in that tournament where Scotty Shuffler

was 12 back, but he got to keep playing.

Now, what happened in the
golf tournament, Nick?

Nick Berry: Uh, so round three, Rory
struggled, I think shot one over par.

Um, so he went to 11 under, and
the rest of the field played great.

You know, Al it seemed like
almost everyone that was.

Six or seven shots behind him.

Shot five under six under, and he
went into Sunday to hide for the lead.

he blew, he blew his lead, right?

It, it

Scott: Right

Nick Berry: after day three.

Um, Scotty Scheffler shot seven
under par and moved into contention.

Um, I think that, you know, the number one
player that was 12 shots back on Sunday.

whole field kind of struggled.

Nobody shot seven under,
nobody had a, a huge round.

Rory eked out a win, um, by one stroke.

Uh, so he, he did.

He did technically retain his
lead, even though he blew it and

Scott: Yes.

Nick Berry: at the

Scott: Yeah.

Nick Berry: Um, but the, the most
important part is that I was the most

right in our family group chat because I
said 70%, which was the highest number.

And it doesn't matter how he got there,
there's no pictures on the scorecard.

Uh, I picked the highest
probability Rory went on to win.

So I, uh, I claimed victory
in the family group shop.

Scott: An interesting part of it was
Scotty Scheffler ended up finishing

second place, one shot behind.

And yes, he, he, he missed a birdie
putt on 17, that that eventually could

have led him to a tie, uh, within it.

So he finished one shot back.

He had a very, very small chance of
winning the golf tournament in a,

not in a, in a different setting.

We could.

Said, you're done playing.

You have no chance to win.

You're cut.

You can't play, uh, in this setting.

But he came back and, and,
and had a legitimate chance

of, of winning the tournament.

And, and so I won't, I won't address
your 70% being the right answer when, uh,

we know more about the golf tournament.

But, but you're technically right.

If you evaluate the likelihood
function of what happened, you had the

highest likelihood function in that.

But coming to this question about in
a sports competition, do we stop the

sports competition and what does it mean?

And you won't be
surprised to, to see that.

We'll turn this into thinking
about stopping clinical trials and

what are the similarity of that.

So Nick's brother Cooper, and if
you're on video here, you can see

Nick is wearing the team shirt
for the Pomona Pitzer Sage Hens.

Nick's brother Cooper plays on
Pomona Pitzers baseball team.

And in a recent game we were watching,
he was playing up in Oregon, up in

Portland, Oregon, and they were playing
Lewis and Clark and in game three of

their series, they were tied at one each.

The, the sage hens scored six runs in the
sixth inning, and they went up 11 runs.

They were up 11 to zero.

And in the bottom of the inning,
Lewis and Clark didn't score.

So after six innings, and it's a
nine inning game, so you're two

thirds of the way through the
game, the sage hens were up 11 runs

now.

The game kept going, so the
game did not stop at that point.

In the seventh inning, they scored
one more run to go up, 12 to zero.

Lewis and Clark, uh, the Sea Otters,
I think are their, their mascot, uh,

scored zero and the game was 12 to zero.

And by rule, the game ends at that point.

It's futile.

And there's a, you know, there's
a classic 10 run rule and there's

a 10 run rule after seven innings
that if one team is ahead by 10

runs or more, the game is over.

So the game ended in seven innings
at that point, and we stopped

now.

We have futility rules.

In the masters, we stop players
who are not in the top 50.

In youth sports, we do this quite a bit.

We have a 10 run rule in baseball.

It's a little bit different in
sports with a clock, baseball has

no clock, it plays nine innings,
and that's kind of the clock.

So games can go long in that setting.

They did, they, you know, they do have
travel and they're dealing with that.

Uh.

In games like American football and
in hockey and basketball, they do

something called a running clock.

So if a team goes up by five
goals in a hockey game, they don't

stop the clock between whistles.

They let it run, they,
they shorten the game.

So thi this happens, um, Nick used
to play select baseball and they did

something kind of interesting where
after three innings, if you were up by

15, they might stop it four innings.

It was 10 or uh, eight or more.

And after five innings, it was,
sorry, after four it was 10,

and after five it was eight.

So they had a graduated rule within
those games, and the game is over.

So why do we do this in sports, Nick?

Why do we have these rules
that stop the competition?

Nick Berry: Oh, I think the
reasons behind them are confusing.

There's a lot of, uh, we'll call
'em stakeholders to this, but I

think the implication of all the
rules is that there's, I'll say no

point in continuing to play because.

The

you know, result is, is
determined at that point.

And, uh, I think in reality it's
probably, everyone understands

it's just a sufficiently small
probability of something other

than the obvious thing happening.

Um, we're, we're willing to say,

for the benefit of all involved,
let's truncate the game at this place.

We know where this is going.

I think in like youth sports.

People got places to be, you
know, parents gotta get home.

There's another game right after this.

Uh, the tournament's running late.

We have only have the
fields until nine o'clock.

Um, a lot of different reasons to go to
it for professional sports or, you know,

I'll call, uh, what Cooper does something
where, you know, the school's paying

for this and, and there's more money.

I mean, um, have a stake in this.

They're, they're wasting resources.

Uh, if you just.

You're pitching guys late in the
game that you don't want to throw,

or silly things happen like, oh, we
lost, we have to play two more innings.

put an outfielder

Scott: an out.

Nick Berry: 'cause I don't
wanna waste a pitcher.

And, and you know, things start
to get wacky in that regard.

And so maybe we should just
call the game at this point.

But I think it all boils down to
everyone knows what's gonna happen.

What's the point in continuing to play
sort of is, is the idea behind all of 'em.

Scott: There's also a concern in youth
sports that if one team has such a lead,

it becomes almost unsportsmanlike for
that team to be trying hard to win.

Are they gonna win by.

Absurd amounts, for example,
uh, potential injuries.

There's aspect.

We don't have futility rules in sports.

Now we, we, we do have some, and
I'll come to those, but largely in an

American baseball game, if one team is
ahead by 25, runs after eight innings

or seven innings, they keep going.

Uh, uh, in, in those sports,
they don't stop them.

Now people have paid
to go watch that game.

Nick Berry: Yeah.

Scott: they're selling beer, uh,
though they don't sell beer after

the seventh inning, but, but
they're selling beer at an NFL game.

At a, at an NBA game, a team
could be ahead by 40 points in

an NBA game, and they keep going.

In those games.

So pro sports are a little bit different.

They do worry about injuries, but
there are no futility rules in pro

sports, which is somewhat interesting.

Nick Berry: Well, you mentioned the fans.

The fans do have futility
rules, and this is the.

If you're at a, a baseball game
and a team's up 20 runs the

stand's empty people to them.

I think it's more important to, uh, be in
the car before traffic hits than it is to

watch your team get smothered by 25 runs.

I'm, I'm definitely a
stick it out to the end.

I paid to be there.

I went through the, the
effort of getting there.

I'm gonna watch the ninth inning of this
game, but, uh, but the stadium does empty

in those cases, so fans have their own.

Utility rule that, or sorry,
utility rule that they've created

in their mind for when it's over.

Scott: So can we make a
mistake with a futility roll?

Could we have stopped that?

That game that Cooper's team played
up in Oregon, they were ahead

12 to zero after seven innings.

They could have lost that
game if they kept playing.

Uh, the rules are a nine inning game, and
the sea otters could have scored 13 runs

in the ninth inning and won that game.

It's possible in those circumstances, just
like it's possible that we could have cut.

Scotty Scheffler from that golf tournament
and said, you don't have a chance to win.

You can you no longer get to play.

And he could have won that tournament.

By the end, he didn't.

But in another setting, we have stopped
competitions that would've flipped in.

The other team would've won.

Nick Berry: Yeah, for sure.

I think there's, there's a, a heuristic.

This is.

Bill James, the famous analytics,
uh, the father of analytics.

I think people give him credit for that.

he has this, um, I, I
called it a heuristic.

I think it's a good,
good word for it college

Scott: College.

Nick Berry: but when he thinks the
game is over, um, it's, it's a simple

calculation like you take the winning,
how much you're winning by minus three.

Add a little bit if you have
the ball and then I don't even

know what you do from there.

Something like,

Scott: Something like that.

Nick Berry: weird, a little algebra scheme
you do and it says this game is over.

This game is not over.

um, I bring this up because partially
'cause I wanted to talk about

this example, but when I was in
college, Texas a and m was playing

in March in the the NCA tournament.

We just finished March,
Matt, March Mattis as well.

Uh, and.

Text a m was playing northern
Iowa in I think the round of 32.

they lost, they were futile, futile,
according to Bill James' algorithm.

Um, we were down 12 points with 42
seconds to go or something like that.

and doing the calculation that's, that's
impossible to, to, to, to come back from.

And I have two good friends
that were at the game.

They left.

They left, you know,
it's obvious it's over.

Gotta beat traffic.

They're in the concourse walking away.

And um, um, and my alma mater came back
and won, um, Alex Caruso of now a two time

NBA championship fame was on that team.

And it was, you know, the comeback
of a lifetime for me in school.

And so.

happened, you know, this was futile.

This would've stopped for

Scott: Yep.

Nick Berry: according to a, a
popular rule, and it switched.

Weird things happened.

The game reversed index a M1.

Scott: So there, there are some cases
where you can't come back, but in

all those situations it's possible.

Uh, the probability may be very, very
small and we'll get to that question

of the probability of winning to that.

There are other cases that we do have
futility rules in, in pro sports, when

you're playing a seven game series.

And one of the team goes up four to one.

They've won four of the first five.

They stop playing.

They don't play the
sixth and seventh games.

The, and in that situations, it's
impossible for the other team to win.

So they stop playing.

So they don't say, well,
people have bought tickets.

We're gonna sell food, we're
gonna play game six and seven.

They, they do stop playing and they go on.

So there are.

Some binding, uh, binding's the
wrong word, but there are cases

where it's impossible to come
back and they stop playing.

Another example of that is golf,
where you, when you do match play,

like in the Ryder Cup, you play who
wins the most holes on 18 holes.

If you're up four with three holes
to play, the golfers stop playing.

They don't play the last three holes.

The match is over.

Nick Berry: You're

Scott: those are

Nick Berry: And

Scott: your, you're
mathematically eliminated.

Yep,

Nick Berry: it's an obvious
way to do futility, right?

Like you don't

Scott: yep,

Nick Berry: a, a model or any complex
reasoning to understand why that happens

or, or to determine if it's a good rule.

Right.

Scott: yep.

So let's come to questions
of how we might suppose.

A professionally league came to
you and said, we want to institute

stopping rules that when the game,
you know, reaches a particular point.

Bill James had a rule.

We, we should stop it.

For safety, for sportsmanship.

How would you do it?

Nick Berry: Uh, um,

carefully I would start
with a bunch of data.

I think, I think step one is get
every single game that's ever been

played at that professional level
and specifically with data about.

Probably just the score over time.

I think one thing you can't do in
this is put, and we'll, we'll come

to this later, but you can't put any
preconceived notion about the quality

of the teams into this calculation.

Right?

I think it has to be a hard rule based on
some, just like, you know, uh, observable

score difference or something like that.

So yeah, you start by

Scott: Hey, start by.

Nick Berry: all of these score differences
over time, overlay 'em on a plot.

And then I would probably choose a
reasonable time in the game, maybe

three quarters of the way through.

Uh, I don't, you don't wanna stop
anything, you know, while there's still an

immense amount of variability in the game.

So I may start two thirds or three
quarters of the way through, or some

number and, and I draw a cut there on
that graph that I was making and say,

okay, what point on the Y axis, what
score differential can I choose that, um.

First, no one has ever come back
from, I'd probably start there

and say, okay, if you're up 37
points with 10 minutes to play in a

basketball game, the game is over.

I think is a useful starting place.

It's probably not the rule I would end
up with, but I would start there and

then maybe I say, okay, what about 1%?

Um, I say, okay, what, what
point on this line, do you have

a 1% chance of coming back from?

And that's probably too big of a number.

Um, in baseball, every team plays
162 games, so multiply that by,

you know, what, 15 to get the total
number of games played, uh, a year.

That might not be the right math, but, you
know, 1% of those games is still a lot.

You don't want to, you don't
want to have that many flip flops

of, of things that you stop.

So I probably choose a
number, like, uh, 0.1%

or something like that, or 0.05%

of games and draw a line there.

do this with a curve as well.

Maybe I do two third, three quarters
of the way through five six of the

way through, and seven eighths of
the way through or something like

that, and draw that same point at
those three lines, and you could stop

for futility in any of those points.

Scott: So you're getting at something,
you're, you're interested in this, at

looking at different points in the, in
the competition where, what is the chance

the team that's trailing can win the game?

And you mentioned when
that's below 1% or 0.1%,

that the team that's trailing
could come back and win, that this

is a candidate game for stopping
that, that we've reached it.

Getting to this point of
predictive probabilities.

So you're building this model and
you talked about getting all of

this historical data we have on, on
competitions, whatever sport it is

that you would build that what is the
probability this team can win at this,

uh, from, from this state in the game?

We have something like that in most of our
sports, for example, if, and, and gambling

is a huge part of sports, uh, within this.

So you can get this our, our favorite
baseball team, Nick and I are

our fans of the Minnesota Twins.

They played a game yesterday.

Uh, it was Wednesday, April 15th,
and Minnesota was trailing the

Red Sox throughout the game.

They got behind.

They actually were ahead one to nothing,
but they got behind four to, uh,

sorry, nine to one by the sixth inning.

ESPN provides win probabilities, and
you can graph this across the game.

So they're doing this, so they're
providing win probabilities, and it was

not long by about the seventh inning in a
nine to one game that the probability that

the twins were gonna win dropped below 1%.

Now interestingly, they scored four
runs in the bottom of the ninth and

they lost nine to five, but that
probability never wavered from 99%.

That game could have been a candidate.

Maybe it doesn't reaches the threshold.

These are really natural things
within games to, to look at.

And then you balance why are we stopping?

What, what, what are the purpose?

Uh, to, to understand the risk because
you could stop a game that the team

would've come back and won and.

That's, that's a mistake to some extent.

You've made a mistake by stopping it.

You could also not stop a game
that the trailing team lost.

Now, that's a different kind of mistake
where maybe the resources used in the

remainder of that game were for Naugh.

Because the game didn't change in that
setting, we might consider that if we're

trying to look at a rule for its ability
to stop trials that are gonna lose anyway,

we, we, we would evaluate that those,
those criteria, those errors that we might

make with a, with a rule to stop sporting
competitions, which of course have very

different goals than clinical trials,
which of course we're gonna get to.

I also have a friend.

Yeah, go ahead.

Nick Berry: that, that quantity that you
keep talking about, the, well, you've,

you've referenced both probability of
winning, but also predictive probability.

I think it's interesting.

The first podcast we did about
regression to the mean, I think we

kept telling people, at least non
statisticians, that that's a hard thing

to, to get your mind around, right?

You see data and you're like, okay, that's
what I'm gonna believe going forward.

So we were trying to tell people.

You need to regress this towards
the population average, even

though it feels weird to do.

I think the predictive
probability is the exact opposite.

Everyone that is making decisions
about these games intuitively

understands what you're talking about.

They understand like the variability left
in the game, and they know why a four run

lead in the fourth inning is different
than a four run lead in the eighth inning.

And they don't need to describe it as
information fractions and variability

and things like that to make that point.

But I like this topic because it's

so intuitive and it's just like the
root of decision making in general

is this like predictive framework
that, that we're working under.

Um.

Scott: So, so you brought up this
idea, and this is gonna be in sports.

It seems like such a natural thing.

And by the way, that's the goal of these
episodes that you, you take something

in sports that seems so natural and
understand and then you flip it to

clinical trials, which, you know,
that's sort of how I think, and it's not

uncommon in a clinical trial scenario
that I explains something like a sporting

competition, but being trailing by five.

Early in the game means something
very different than trailing

by five late in the game.

Because you have less time to have a
large differential to reverse that deficit

that you have, uh, in, in this scenario.

It's such a natural thing in sports.

Where in clinical trials do we bid build
futility rules off of the observed effect?

Do you know?

How do we do that?

The wind probability.

In sports incorporates that.

So in that game where the Red Sox were
playing the twins, the Red Sox were

up by five runs after five innings.

The win probability for them
under that scenario, uh, after

the fifth inning is about 90%.

When the game goes to the sixth inning and
the score doesn't change, it's still five.

The, the probability the Red
Sox win goes up because there's

less time left to flip that.

So the same observed effect
at different times have quite

different predictive probabilities.

It's a very natural thing in
sports to understand that part

in in the wind probability.

Okay.

So let's flip this to clinical trials, and
we've already set this up a little bit.

So what does this have to
do with clinical trials?

Many clinical trials.

Are set up much like a
sporting competition,

especially phase three trials.

Earlier.

Stage trials can have, uh, uh, goals
that are largely learning estimation.

Those may be a little bit harder.

You can, you should have
futility rules in those settings.

But let's think about phase three trials
because they're the largest, the most

expensive, but they have very well-defined
rules of quote unquote winning that trial.

They're targeted on demonstrating
one treatment is better

than another treatment.

In a sporting complex, uh, contest,
it's almost like a player has to

play, you know, the best player
in the world to show their better.

They have to, to to show that
they, they can win that game.

We have very well-defined rules
in sports about who wins the game.

Now, in clinical trials, we also
have really well-defined rules that,

one, we're trying to see that this
treatment is statistically significantly

greater than another treatment.

So again, thinking about this question,
like before we talk about how to do

this, I asked you the question about why
we might stop a sporting competition.

Why might we stop a clinical
trial before the end for futility?

Nick Berry: Again, I

Scott: Yes.

Nick Berry: a lot of reasons.

Um,

I, there, there are patient.

Ethical, um, of the trial and if it's
looking like the treatment is, has

no chance of being, being successful
in a clinical trial, you know, is

it ethical to be randomizing these
patients to be experimenting on them?

Um, there's, for a sponsor, there's
certainly monetary constraints.

Some of these drugs are extremely
expensive to administer.

It's extremely expensive
to follow subjects up.

Um, often can roll over into some
open label extension where a drug that

might be, you know, better for these
subjects can be administered to them

after a trial stops for futility.

Um, uh, I think there's, there's
a ton of different reasons.

Uh, safety and cost are probably
two of the most prevalent.

Scott: Yep.

So that there, there are strong
reason that if you're running a

trial and it's very, very unlikely
the trial's gonna be successful.

There are benefits of stopping that trial.

Uh, and the ethical parts
of it are, are huge.

You're asking a patient to contribute
to science, and if you look at the data,

they're no longer contributing to science.

The answer's largely been.

Uh, the, the question's been
answered at that point now.

You can, uh, any futility rule in that
trial that, by the way, there are some

strange trials where mathematically
the trial cannot be successful.

But tho those are very, very rare where
it's integer valued outcomes and no

number of events can flip the outcomes.

Those, those are different trials.

Most of these trials, you're
now addressing similar type

questions, and so we are.

As clinical trial designers
asked to create futility rules.

So you're the NHL is not asking
you this or major league baseball,

but how do we create futility
rules for a clinical trial?

Nick Berry: Yeah.

Um, let's see.

So I think it's largely the same
as what I described for like the

NHL rule with probably one major
difference is that I wouldn't

start by collecting a ton of data.

I don't think I'm gonna start with
all the clinical trials that have ever

been run and looking at their curves.

I know the data generating process or at.

have assumptions about the data generating
process and what I think is likely, so

I would probably start by, could think
of it as simulating 1 million different

paths through this, and maybe I think
of this at the beginning as I simulate

every patient or, or I look really
often I take benchmarks throughout

this trial, maybe a hundred benchmarks,
and at each of those points I can can.

I fit a t test, maybe I, you know, do
fit some simple test for now, uh, to say,

how much better is the active arm doing?

How much worse is the active arm doing?

And then I go through the same exact pro.

Now I have those spaghetti plots
exactly like I had for sports.

You know, I had the score
differential over time.

I have treatment effect over time.

Um, I would probably start again.

Maybe I don't wanna start until
halfway through the trial.

I would say, okay, at what
point is it impossible?

Did none of my 1 million simulations
come back to, to the you know, how,

where the treatment was so bad that it
couldn't get back to being successful?

And again, like I said with the, the
other example, that's not necessarily,

it's not a good futility rule, right?

Something that never allows a reversal

Scott: Yeah.

Nick Berry: a little too
conservative for my liking.

But that probability of
reversing is really important.

So I again, wanna limit it to maybe
something like 1%, or in sports I use 0.1%

'cause we're doing this over and
over and over and over and over

and I didn't want a bunch of these.

But, um, in clinical trials, I think a
predictive probability of 1% is still

fairly conservative for futility role.

But something in the one to 5% range
is probably where I would start.

Trying to cut that probability
of reversing after thing.

And again, I'm, I'm looking at trials
that I simulated from start to finish.

So in reality, I know when I stop
a one of those simulated trials for

fertility, what would've happened.

And I think that is a really important
aspect of the simulations that I get

this counterfactual look at all of
the simulations and, and can do that.

And so again, I'm
drawing a spaghetti plot.

I'm placing benchmarks that may be, uh.

three quarters away through the trial and,
and myself utility rules at those points.

So the same exact process with the data
generating me mechanism being different.

Scott: So there's, there's this same
quantity that was this win probability

that we, where you can go to espn.com

and look at.

We build that in clinical trials, and
you can build that through statistical

modeling on what is the outcome?

Is it a, is an event outcome?

Is it a change from baseline outcome?

Is it a responder outcome?

You, you can take that outcome and build.

A mechanism for calculating
what's the probability that the

treatment is gonna demonstrate
benefit by the end of the trial?

What is its win probability?

That's something in clinical trials
called a predictive probability.

You, you may have heard of
a conditional probability.

Conditional probability is similar.

It's a little bit different in
that it a conditional probability.

You assume you know the truth about
the team or or what the treatment

is, and then you calculate that.

A predictive probability
estimates it from the data at

that point with its uncertainty.

In calculating the probability of
winning the trial going forward.

So there are different mechanisms.

Bayesian tends to be predictive
probability, where conditional power

is a pretty simple, uh, taking the
observed rate that you see right now,

usually underrepresents variability.

So I don't love the number that comes out
for conditional probability, which is why

we use Bayesian predictive probabilities.

And then we look at that for, its.

Potential stopping the trial.

When that probability gets low, you
can look at stopping the clinical trial

Nick Berry: To, to keep our
link to sports here, the, you're

talking about conditional power
or conditional probability of

success versus predictive power.

Predictive probability of success.

calculation

Scott: calculation.

Nick Berry: is different than the
calculation done in a clinical

trial because in sports, you
know, we know a lot about.

The two teams going in and we that
information in a way that's not

common in phase three clinical trial.

As Lisa, you were saying, so if I
was doing a predictive probability in

sports and a team's winning, uh, nine
to one, like the twins game you were

describing, um, I am going to shrink my.

Posterior probability of, you know,
the quality of those teams massively

Scott: Massive

Nick Berry: being more of
a 50 50 game going forward.

Something that conditional power

Scott: power.

Nick Berry: naturally.

Scott: Naturally.

Nick Berry: I'm gonna

Scott: I'm

Nick Berry: probability of the twins
winning, given that the teams are

equal, is something you could do
with a conditional power and get

that number out in a predictive.

Probability sense, you would

Scott: center.

Nick Berry: probability
that the teams are much, are

Scott: Much our predict.

Nick Berry: because they're
professional sports teams,

they are pretty close together.

And even though you observe nine to
one doesn't really change what you

think about the team as a whole.

It's just that the game is random, so
your predictive probability in that

sense, it's actually probably pretty
close to a conditional probability

of success at that same number.

And so the, the huge amount of
prior information, I think kind

of changes the decision making
of that predictive probability.

Scott: That, but that can be incorporated
in clinical trials where you might

even want to build a stopping rule that
says, we wanna stop when an optimist.

Nick Berry: Yeah.

Scott: the drug thinks the probability
of winning is below 1% or 5% kind of

thing, and you use a Bayesian prior,
that gives a reasonable probability, an

optimist probability, uh, um, estimate
of the drug's effect, and even when.

Optimist thinks the trial is
unlikely to be successful, you stop.

Is is by the way, it, it alludes to
this in the, the Bayesian, the Bayesian

draft guidance that you can use this
design prior, uh, sort of thing.

You can do this for the futility.

So it's something that can be
incorporated in clinical trials for sure.

Nick Berry: Yep.

That makes total sense.

Scott: Now, some of the.

Nick Berry: idea.

Scott: Uh, yeah, I, Jacob Jay Cade's
idea actually is a really nice idea that

you take multiple, uh, uh, clinicians
with varying views across the spectrum.

And when all of those clinicians
are convinced of it, then you stop.

So you might need to go longer
to demonstrate that you think the

drug is effective to a pessimist

Nick Berry: Yeah.

Scott: likewise to stop when, when
you think it's not effective as well.

It, it's a really neat idea actually.

Um.

Now this the, this characteristic.

We spend a great deal of time simulating
different rules and we calculate those

error rates that we talked about before.

So you can make a mistake
with a futility rule.

You could have a futility rule that stops
a trial that would've gone on and flipped

and reversed, and you say There's very
little chance for the treatment to win.

And we calculate that's 1%.

For example, we, we've done trials
where the predictive probability of

success of that trial is 2% and it stops
for futility has a 5% futility rule.

The trial stops by, by,
by interpretation That.

There's a 2% chance that trial
could have come back and been

successful under that scenario.

It's just deemed that.

Taking into account the pluses
and minuses of this, that it's

a good thing to stop that trial.

Now that decision is made ahead of
time, uh, in those circumstance,

and we simulate multiple rules.

We look at error rates.

We also look at how, you know, what
happens to power when you have a

futility rule for a treatment, you.

You may decrease its chance of success
because some of those trials, those

spaghetti plots that Nick talked about,
they hit the futility rule and they

would've reversed and gone on to win.

You reduce power by that if you have
a very, very aggressive futility rule.

An example of this would be
halfway through the trial.

If the predictive probability of
the treatment winning is less than

50%, we're gonna stop the trial.

That would be a strikingly
aggressive futility rule.

Essentially, it means that the
treatment is observing something very

close to what it needs to win by the
end, which gives it kind of 50 50.

If it stays there, it wins.

If not, it loses and you stop the
trial there, you're gonna reduce

power by very large amounts.

Uh, Nick and I are looking
at example when that.

That probability can drop power by
25% in your, your, your effect size

because you're being so aggressive
at stopping it and you don't give

the treatment that, that chance
to, to, to go on and be successful.

I.

A 20% rule, by the way, and I'm gonna say
20% because there's an interesting story.

You know, futility story to this
in, in the example we're looking

at can have somewhere a five
to 10% reduction in power, 20%.

If you think about a sports
competition where the team only

has a 20% chance of winning.

That happens a lot.

Uh, those are exciting games by
the way, where a team scores two

touchdowns very close to the end to win.

Uh, golfer comes back from five
shots back on the last nine holes.

I that's not even 20% coming back
by three shots is sort of 20%.

That's an, that's a common occurrence.

Well, that rule is used in futility.

For futility in trials.

And there's a famous example when that
rule was used, and it's a famous example

where people think futility might not
be a good thing in clinical trials.

And we'll kind of get to that.

It's the Biogen example of Aducanumab
It was an Alzheimer's treatment and

this, this was, you know, 5, 6, 7 years
ago, uh, Alzheimer's treatment, uh.

A disease modifying therapy.

They're running two phase 3 trials
and they did a futility analysis

where they looked at both trials,
a predefined futility analysis, and

they looked at both trials and they
stopped both trials for futility.

Interestingly, they stopped
those trials, but they got.

Follow up data on one of the trials
turned out to be statistically

significant in Alzheimer's.

That's a huge deal.

The other trial was not, and there was
a lot of discussion about this was a bad

futility rule, the these types of things.

Their rule for futility was that
if either of the two trials had

less than a 20% chance of success.

They would stop both.

And if you think about that in the
sports comp, uh, context that's a

really, really aggressive futility rule.

Now, it was a bit more complicated
because one of the trials, the

conditional probability of the one
that won was 60% when they stopped.

So it was a controversial application
of futility rules in this scenario.

I mean, in some extent it was right.

The other trial was not successful.

But in hindsight, I think Biogen
would've done it differently

and they were not necessarily
pleased with the futility rule.

In, in that scenario.

Now, I did a debate with Paul Eisen from,
uh, USC on whether we should be doing

Al Futility rules in Alzheimer's trials.

Now you can go get that and you can find
that debate and I did win the debate.

Uh, 80% to 20% people agreed
that we should be doing futility

rules in Alzheimer's trials.

Uh, I feel like.

If I would've lost that, I, I
feel like it's such a yes, we

should absolutely be doing that.

We should be doing futility rules in
every phase three trial that, uh, if

I'd have lost that, that would've been
like Rory losing, uh, leading by six

after 36 holes, uh, sort of scenario.

Nick Berry: The, the
Aducanumab example is.

Interesting because earlier while we
were talking, I was thinking about

how the value you choose for your fu
utility rule almost creates this sort

of implied utility on what you're
valuing in your clinical trial.

Um, you know, if you have an aggressive
FU utility rule, you're probably saying

it's really expensive to continue.

I only want to spend this extra
money if it's likely that we can win.

So the, the implied utility of that
futility rule is something like.

We need two successful phase
three trials or bust, is sort

of the implication of that.

And that was probably what was
going through their mind while

they're constructing the rule.

Like, we need, in order to get
this approved, we need two phase

three trials to meet this threshold
and the FDA will approve us.

And you know, clearly that's
not what happened at the end

of the day, but uh, you can see
how they got there a little bit.

And I think the utility that
comes out of this is out of your

futility rule is interesting.

Um, and.

You know, you say a lot by not
actually saying anything on purpose.

Scott: Yeah, no.

We spend a great deal of time building
these predictive probabilities in that

scenario where you have two phase three
trials running at exactly the same time.

The idea that you do conditional
power only in that trial and you

ignore the data from the other trial.

Nick Berry: Yeah.

Scott: The fact that the other trial
had a 60% chance of winning meant

that it had a good effect size.

If you would've done regression to
the mean and shrunk over the two

trials, you probably would've got
a much higher probability for the

second trial of being successful, and
then maybe that would've eventually

been successful had they run it out.

To the end.

So a huge part of what we do as
statistical consultants is build good

predictive probabilities so that we're
making good decisions in almost every

clinical trial, like Alzheimer's,
like stroke, uh, uh, it, uh, oncology.

You have information on patients
that start of the primary endpoint.

They're not all as simple as success
and failure and you know everything

about everything in the trial.

You have incomplete information on this.

So we're building predictive
probabilities that are using

maybe dose response modeling.

They're using longitudinal
modeling of early clinical

outcomes to later clinical items.

Based on their predictability, we
spent a ton of time getting those

predictive probabilities to be
really good so that they make really

good decisions in clinical trials.

And I suspect that was not optimally
done in the Biogen example.

Okay.

Um, I, I do wanna make reference to
a nice paper by our colleagues, Roger

Lewis and Barbara Berger In jama,
there's a, a futility in clinical trials.

There's a JAMA series for, uh,
methodology in clinical trials,

and they write about futility.

You can read more about those.

But unfortunately Nick, I think we've
hit the futility for, for this episode.

Nick Berry: Yep.

Scott: Uh, uh, uh, uh, and, and that's
more time and resources based here, uh,

that than the fact that we're losing.

Nick Berry: Yeah, right.

Scott: Yes.

We, we are not losing.

Yes, yes, yes.

Uh, so we appreciate you all joining
Nick and me in this deeper dive into

the intersection of sports statistics
and the science of clinical trials.

And until the next time,
we'll be here in the interim.