A podcast on statistical science and clinical trials.
Explore the intricacies of Bayesian statistics and adaptive clinical trials. Uncover methods that push beyond conventional paradigms, ushering in data-driven insights that enhance trial outcomes while ensuring safety and efficacy. Join us as we dive into complex medical challenges and regulatory landscapes, offering innovative solutions tailored for pharma pioneers. Featuring expertise from industry leaders, each episode is crafted to provide clarity, foster debate, and challenge mainstream perspectives, ensuring you remain at the forefront of clinical trial excellence.
Judith: Welcome to Berry's In the
Interim podcast, where we explore the
cutting edge of innovative clinical
trial design for the pharmaceutical and
medical industries, and so much more.
Let's dive in.
Scott: Alright, welcome everybody.
Back to In the Interim, I'm your
host, Scott Berry and I'm joined
with by Nick Berry today, Dr.
Nick Berry.
And we are going to do
our se our second episode.
Lessons for drug developers
from the world of sports.
You can go back to our first time
we did this episode 14 of, of in the
interim, and we had a really nice
episode on regression to the mean.
Today we're gonna talk about the 10
run rule and futility, so maybe bear
with us a little bit as we get as we
get to that, but I do wanna revisit.
The regression to the mean and
a prediction that Nick made.
We, we did this episode last year and May
of last year, Aaron Judge was hitting 400.
And for those of you who are not
baseball fans, uh, I, I, last week
had somebody from Iceland send
me a, uh, uh, an email about the
episode on, uh, the Panther trial.
So we have.
People globally listening to this,
A 400 success rate in baseball
is a, is a very, very high bar.
Nobody's done it since 1941.
And it's, it's one of these
mythical, uh, targets that
somebody could be a 400 hitter.
Well, Aaron Judge was.
Batting 400.
It's just his hits divided by at bats, uh,
in May, and that's early in the season.
His, his rate was 400.
We were talking about
regression to the mean.
So Nick predicted that by end of
year, he would be hitting three 30.
Interestingly, Jim Albert, who we
did another episode, I did an episode
with Jim Albert about Bayesian
statistics and sports statistics.
His, his, his career in those, he
guessed three 20 and Aaron Judge's
final batting average was 3 31.
So Nick, you were off, um, in your guess.
Nick Berry: of
Scott: Yeah.
Yeah.
Nick Berry: 0.1%
off.
But
Scott: Yeah.
Nick Berry: yeah, I think that
was basically weighted average of
what he's done with some random,
informed guess of his true batting
average to get me to three 30.
Yep.
Scott: Yeah.
And by the way, it was
an incredible season.
3 31 with the home runs he
had was a phenomenal season.
Uh.
Nick Berry: thing now.
I think he's hitting like 2 25 this
year, uh, almost a month into the season.
So the exact opposite.
We can, we can him to the mean
in the other way this year.
Scott: Ah, that, that actually
would've, uh, made for a good thing.
Maybe we should figure out what he is.
And, and you have to post another
prediction, but let me talk about
a different prediction you made.
And this was a, a, a prediction
we are coming off of.
We are, we are recording this
episode coming off of the
Master's Golf Tournament.
So that was last weekend.
And, uh, we, we are, we are,
we're a big golf family.
Uh, we enjoy golf.
Even, uh, Nick's sister Lindsay,
who doesn't play golf at
all, she loves to watch golf.
Uh, so we were watching the Masters and
halfway through the Masters, 36 holes, two
rounds, Rory McElroy had a six shot lead.
So he, the second place, and there were
multiple golfers that were six shots
back, but a six shot lead halfway through
the tournament is a quite large lead.
I, I believe it was the largest
lead ever in the masters at that.
And of course, our family text,
uh, goes to predictions, goes to
statistics, and the question was
from, from Nick's sister, Lindsay.
What's the chance that Rory McElroy wins
the Masters and multiple of us guess.
But Nick, you, you made a prediction
of the chance that he win the
Masters and what was your prediction?
I.
Nick Berry: I think I said 70%.
Basically my, my math was, I think
he probably has the best expected
value of the whole field just based
on the fact that he won last year.
He's playing really well this year,
so I said he's probably the best
player in the field this week.
a six shot lead and I, he that,
you know, his distribution was good
enough that even though there was a
lot of people with the opportunity to
catch him 70% chance of him winning.
Scott: And my, my prediction was 60%.
And interestingly, uh, Nick's,
uh, mom, uh, prediction was 20%.
It was quite different.
Uh, and
Nick Berry: golf and she knows
how variable it is, and that was
the reason for her prediction.
It was like weird stuff happens.
I
Scott: yep.
Nick Berry: you know, it's one guy
has to hold onto the lead and so
she asked 20% because of all the
variability and what could happen.
Scott: So at that point, uh,
to, to, to sort of come to this,
interestingly, Scotty Scheffler,
after 36 holes was 12 shots back.
And he's the number one
ranked golfer in the world.
Rory, I think was the number two
ranked golfer in the world, uh, in it.
And Scotty Scheffler was
12 shots back, essentially.
Very little chance he can
win less than 1% chance.
He would, he would win
the golf tournament.
Uh, how many, how much
less, but less than 1%.
Interestingly, if you were 17 shots back.
Meaning after 36 holes,
you were five over par.
Rory was 12 under par.
So if you were five over par
or more, you had a 0% chance of
winning because you were quote
unquote cut from the tournament.
The, those players stopped playing and
they reduce, and the, and the, the, the
rule of the cut is the top 50 players,
and I think it started with 91 players.
The top 50 players and ties continue on
into the weekend, the last two rounds.
But they s they, they, they cut and,
and sure, I'll say the word futility,
that these players no longer have an
opportunity to win the golf tournament.
If you were 17 shots back, happened to be
in that tournament where Scotty Shuffler
was 12 back, but he got to keep playing.
Now, what happened in the
golf tournament, Nick?
Nick Berry: Uh, so round three, Rory
struggled, I think shot one over par.
Um, so he went to 11 under, and
the rest of the field played great.
You know, Al it seemed like
almost everyone that was.
Six or seven shots behind him.
Shot five under six under, and he
went into Sunday to hide for the lead.
he blew, he blew his lead, right?
It, it
Scott: Right
Nick Berry: after day three.
Um, Scotty Scheffler shot seven
under par and moved into contention.
Um, I think that, you know, the number one
player that was 12 shots back on Sunday.
whole field kind of struggled.
Nobody shot seven under,
nobody had a, a huge round.
Rory eked out a win, um, by one stroke.
Uh, so he, he did.
He did technically retain his
lead, even though he blew it and
Scott: Yes.
Nick Berry: at the
Scott: Yeah.
Nick Berry: Um, but the, the most
important part is that I was the most
right in our family group chat because I
said 70%, which was the highest number.
And it doesn't matter how he got there,
there's no pictures on the scorecard.
Uh, I picked the highest
probability Rory went on to win.
So I, uh, I claimed victory
in the family group shop.
Scott: An interesting part of it was
Scotty Scheffler ended up finishing
second place, one shot behind.
And yes, he, he, he missed a birdie
putt on 17, that that eventually could
have led him to a tie, uh, within it.
So he finished one shot back.
He had a very, very small chance of
winning the golf tournament in a,
not in a, in a different setting.
We could.
Said, you're done playing.
You have no chance to win.
You're cut.
You can't play, uh, in this setting.
But he came back and, and,
and had a legitimate chance
of, of winning the tournament.
And, and so I won't, I won't address
your 70% being the right answer when, uh,
we know more about the golf tournament.
But, but you're technically right.
If you evaluate the likelihood
function of what happened, you had the
highest likelihood function in that.
But coming to this question about in
a sports competition, do we stop the
sports competition and what does it mean?
And you won't be
surprised to, to see that.
We'll turn this into thinking
about stopping clinical trials and
what are the similarity of that.
So Nick's brother Cooper, and if
you're on video here, you can see
Nick is wearing the team shirt
for the Pomona Pitzer Sage Hens.
Nick's brother Cooper plays on
Pomona Pitzers baseball team.
And in a recent game we were watching,
he was playing up in Oregon, up in
Portland, Oregon, and they were playing
Lewis and Clark and in game three of
their series, they were tied at one each.
The, the sage hens scored six runs in the
sixth inning, and they went up 11 runs.
They were up 11 to zero.
And in the bottom of the inning,
Lewis and Clark didn't score.
So after six innings, and it's a
nine inning game, so you're two
thirds of the way through the
game, the sage hens were up 11 runs
now.
The game kept going, so the
game did not stop at that point.
In the seventh inning, they scored
one more run to go up, 12 to zero.
Lewis and Clark, uh, the Sea Otters,
I think are their, their mascot, uh,
scored zero and the game was 12 to zero.
And by rule, the game ends at that point.
It's futile.
And there's a, you know, there's
a classic 10 run rule and there's
a 10 run rule after seven innings
that if one team is ahead by 10
runs or more, the game is over.
So the game ended in seven innings
at that point, and we stopped
now.
We have futility rules.
In the masters, we stop players
who are not in the top 50.
In youth sports, we do this quite a bit.
We have a 10 run rule in baseball.
It's a little bit different in
sports with a clock, baseball has
no clock, it plays nine innings,
and that's kind of the clock.
So games can go long in that setting.
They did, they, you know, they do have
travel and they're dealing with that.
Uh.
In games like American football and
in hockey and basketball, they do
something called a running clock.
So if a team goes up by five
goals in a hockey game, they don't
stop the clock between whistles.
They let it run, they,
they shorten the game.
So thi this happens, um, Nick used
to play select baseball and they did
something kind of interesting where
after three innings, if you were up by
15, they might stop it four innings.
It was 10 or uh, eight or more.
And after five innings, it was,
sorry, after four it was 10,
and after five it was eight.
So they had a graduated rule within
those games, and the game is over.
So why do we do this in sports, Nick?
Why do we have these rules
that stop the competition?
Nick Berry: Oh, I think the
reasons behind them are confusing.
There's a lot of, uh, we'll call
'em stakeholders to this, but I
think the implication of all the
rules is that there's, I'll say no
point in continuing to play because.
The
you know, result is, is
determined at that point.
And, uh, I think in reality it's
probably, everyone understands
it's just a sufficiently small
probability of something other
than the obvious thing happening.
Um, we're, we're willing to say,
for the benefit of all involved,
let's truncate the game at this place.
We know where this is going.
I think in like youth sports.
People got places to be, you
know, parents gotta get home.
There's another game right after this.
Uh, the tournament's running late.
We have only have the
fields until nine o'clock.
Um, a lot of different reasons to go to
it for professional sports or, you know,
I'll call, uh, what Cooper does something
where, you know, the school's paying
for this and, and there's more money.
I mean, um, have a stake in this.
They're, they're wasting resources.
Uh, if you just.
You're pitching guys late in the
game that you don't want to throw,
or silly things happen like, oh, we
lost, we have to play two more innings.
put an outfielder
Scott: an out.
Nick Berry: 'cause I don't
wanna waste a pitcher.
And, and you know, things start
to get wacky in that regard.
And so maybe we should just
call the game at this point.
But I think it all boils down to
everyone knows what's gonna happen.
What's the point in continuing to play
sort of is, is the idea behind all of 'em.
Scott: There's also a concern in youth
sports that if one team has such a lead,
it becomes almost unsportsmanlike for
that team to be trying hard to win.
Are they gonna win by.
Absurd amounts, for example,
uh, potential injuries.
There's aspect.
We don't have futility rules in sports.
Now we, we, we do have some, and
I'll come to those, but largely in an
American baseball game, if one team is
ahead by 25, runs after eight innings
or seven innings, they keep going.
Uh, uh, in, in those sports,
they don't stop them.
Now people have paid
to go watch that game.
Nick Berry: Yeah.
Scott: they're selling beer, uh,
though they don't sell beer after
the seventh inning, but, but
they're selling beer at an NFL game.
At a, at an NBA game, a team
could be ahead by 40 points in
an NBA game, and they keep going.
In those games.
So pro sports are a little bit different.
They do worry about injuries, but
there are no futility rules in pro
sports, which is somewhat interesting.
Nick Berry: Well, you mentioned the fans.
The fans do have futility
rules, and this is the.
If you're at a, a baseball game
and a team's up 20 runs the
stand's empty people to them.
I think it's more important to, uh, be in
the car before traffic hits than it is to
watch your team get smothered by 25 runs.
I'm, I'm definitely a
stick it out to the end.
I paid to be there.
I went through the, the
effort of getting there.
I'm gonna watch the ninth inning of this
game, but, uh, but the stadium does empty
in those cases, so fans have their own.
Utility rule that, or sorry,
utility rule that they've created
in their mind for when it's over.
Scott: So can we make a
mistake with a futility roll?
Could we have stopped that?
That game that Cooper's team played
up in Oregon, they were ahead
12 to zero after seven innings.
They could have lost that
game if they kept playing.
Uh, the rules are a nine inning game, and
the sea otters could have scored 13 runs
in the ninth inning and won that game.
It's possible in those circumstances, just
like it's possible that we could have cut.
Scotty Scheffler from that golf tournament
and said, you don't have a chance to win.
You can you no longer get to play.
And he could have won that tournament.
By the end, he didn't.
But in another setting, we have stopped
competitions that would've flipped in.
The other team would've won.
Nick Berry: Yeah, for sure.
I think there's, there's a, a heuristic.
This is.
Bill James, the famous analytics,
uh, the father of analytics.
I think people give him credit for that.
he has this, um, I, I
called it a heuristic.
I think it's a good,
good word for it college
Scott: College.
Nick Berry: but when he thinks the
game is over, um, it's, it's a simple
calculation like you take the winning,
how much you're winning by minus three.
Add a little bit if you have
the ball and then I don't even
know what you do from there.
Something like,
Scott: Something like that.
Nick Berry: weird, a little algebra scheme
you do and it says this game is over.
This game is not over.
um, I bring this up because partially
'cause I wanted to talk about
this example, but when I was in
college, Texas a and m was playing
in March in the the NCA tournament.
We just finished March,
Matt, March Mattis as well.
Uh, and.
Text a m was playing northern
Iowa in I think the round of 32.
they lost, they were futile, futile,
according to Bill James' algorithm.
Um, we were down 12 points with 42
seconds to go or something like that.
and doing the calculation that's, that's
impossible to, to, to, to come back from.
And I have two good friends
that were at the game.
They left.
They left, you know,
it's obvious it's over.
Gotta beat traffic.
They're in the concourse walking away.
And um, um, and my alma mater came back
and won, um, Alex Caruso of now a two time
NBA championship fame was on that team.
And it was, you know, the comeback
of a lifetime for me in school.
And so.
happened, you know, this was futile.
This would've stopped for
Scott: Yep.
Nick Berry: according to a, a
popular rule, and it switched.
Weird things happened.
The game reversed index a M1.
Scott: So there, there are some cases
where you can't come back, but in
all those situations it's possible.
Uh, the probability may be very, very
small and we'll get to that question
of the probability of winning to that.
There are other cases that we do have
futility rules in, in pro sports, when
you're playing a seven game series.
And one of the team goes up four to one.
They've won four of the first five.
They stop playing.
They don't play the
sixth and seventh games.
The, and in that situations, it's
impossible for the other team to win.
So they stop playing.
So they don't say, well,
people have bought tickets.
We're gonna sell food, we're
gonna play game six and seven.
They, they do stop playing and they go on.
So there are.
Some binding, uh, binding's the
wrong word, but there are cases
where it's impossible to come
back and they stop playing.
Another example of that is golf,
where you, when you do match play,
like in the Ryder Cup, you play who
wins the most holes on 18 holes.
If you're up four with three holes
to play, the golfers stop playing.
They don't play the last three holes.
The match is over.
Nick Berry: You're
Scott: those are
Nick Berry: And
Scott: your, you're
mathematically eliminated.
Yep,
Nick Berry: it's an obvious
way to do futility, right?
Like you don't
Scott: yep,
Nick Berry: a, a model or any complex
reasoning to understand why that happens
or, or to determine if it's a good rule.
Right.
Scott: yep.
So let's come to questions
of how we might suppose.
A professionally league came to
you and said, we want to institute
stopping rules that when the game,
you know, reaches a particular point.
Bill James had a rule.
We, we should stop it.
For safety, for sportsmanship.
How would you do it?
Nick Berry: Uh, um,
carefully I would start
with a bunch of data.
I think, I think step one is get
every single game that's ever been
played at that professional level
and specifically with data about.
Probably just the score over time.
I think one thing you can't do in
this is put, and we'll, we'll come
to this later, but you can't put any
preconceived notion about the quality
of the teams into this calculation.
Right?
I think it has to be a hard rule based on
some, just like, you know, uh, observable
score difference or something like that.
So yeah, you start by
Scott: Hey, start by.
Nick Berry: all of these score differences
over time, overlay 'em on a plot.
And then I would probably choose a
reasonable time in the game, maybe
three quarters of the way through.
Uh, I don't, you don't wanna stop
anything, you know, while there's still an
immense amount of variability in the game.
So I may start two thirds or three
quarters of the way through, or some
number and, and I draw a cut there on
that graph that I was making and say,
okay, what point on the Y axis, what
score differential can I choose that, um.
First, no one has ever come back
from, I'd probably start there
and say, okay, if you're up 37
points with 10 minutes to play in a
basketball game, the game is over.
I think is a useful starting place.
It's probably not the rule I would end
up with, but I would start there and
then maybe I say, okay, what about 1%?
Um, I say, okay, what, what
point on this line, do you have
a 1% chance of coming back from?
And that's probably too big of a number.
Um, in baseball, every team plays
162 games, so multiply that by,
you know, what, 15 to get the total
number of games played, uh, a year.
That might not be the right math, but, you
know, 1% of those games is still a lot.
You don't want to, you don't
want to have that many flip flops
of, of things that you stop.
So I probably choose a
number, like, uh, 0.1%
or something like that, or 0.05%
of games and draw a line there.
do this with a curve as well.
Maybe I do two third, three quarters
of the way through five six of the
way through, and seven eighths of
the way through or something like
that, and draw that same point at
those three lines, and you could stop
for futility in any of those points.
Scott: So you're getting at something,
you're, you're interested in this, at
looking at different points in the, in
the competition where, what is the chance
the team that's trailing can win the game?
And you mentioned when
that's below 1% or 0.1%,
that the team that's trailing
could come back and win, that this
is a candidate game for stopping
that, that we've reached it.
Getting to this point of
predictive probabilities.
So you're building this model and
you talked about getting all of
this historical data we have on, on
competitions, whatever sport it is
that you would build that what is the
probability this team can win at this,
uh, from, from this state in the game?
We have something like that in most of our
sports, for example, if, and, and gambling
is a huge part of sports, uh, within this.
So you can get this our, our favorite
baseball team, Nick and I are
our fans of the Minnesota Twins.
They played a game yesterday.
Uh, it was Wednesday, April 15th,
and Minnesota was trailing the
Red Sox throughout the game.
They got behind.
They actually were ahead one to nothing,
but they got behind four to, uh,
sorry, nine to one by the sixth inning.
ESPN provides win probabilities, and
you can graph this across the game.
So they're doing this, so they're
providing win probabilities, and it was
not long by about the seventh inning in a
nine to one game that the probability that
the twins were gonna win dropped below 1%.
Now interestingly, they scored four
runs in the bottom of the ninth and
they lost nine to five, but that
probability never wavered from 99%.
That game could have been a candidate.
Maybe it doesn't reaches the threshold.
These are really natural things
within games to, to look at.
And then you balance why are we stopping?
What, what, what are the purpose?
Uh, to, to understand the risk because
you could stop a game that the team
would've come back and won and.
That's, that's a mistake to some extent.
You've made a mistake by stopping it.
You could also not stop a game
that the trailing team lost.
Now, that's a different kind of mistake
where maybe the resources used in the
remainder of that game were for Naugh.
Because the game didn't change in that
setting, we might consider that if we're
trying to look at a rule for its ability
to stop trials that are gonna lose anyway,
we, we, we would evaluate that those,
those criteria, those errors that we might
make with a, with a rule to stop sporting
competitions, which of course have very
different goals than clinical trials,
which of course we're gonna get to.
I also have a friend.
Yeah, go ahead.
Nick Berry: that, that quantity that you
keep talking about, the, well, you've,
you've referenced both probability of
winning, but also predictive probability.
I think it's interesting.
The first podcast we did about
regression to the mean, I think we
kept telling people, at least non
statisticians, that that's a hard thing
to, to get your mind around, right?
You see data and you're like, okay, that's
what I'm gonna believe going forward.
So we were trying to tell people.
You need to regress this towards
the population average, even
though it feels weird to do.
I think the predictive
probability is the exact opposite.
Everyone that is making decisions
about these games intuitively
understands what you're talking about.
They understand like the variability left
in the game, and they know why a four run
lead in the fourth inning is different
than a four run lead in the eighth inning.
And they don't need to describe it as
information fractions and variability
and things like that to make that point.
But I like this topic because it's
so intuitive and it's just like the
root of decision making in general
is this like predictive framework
that, that we're working under.
Um.
Scott: So, so you brought up this
idea, and this is gonna be in sports.
It seems like such a natural thing.
And by the way, that's the goal of these
episodes that you, you take something
in sports that seems so natural and
understand and then you flip it to
clinical trials, which, you know,
that's sort of how I think, and it's not
uncommon in a clinical trial scenario
that I explains something like a sporting
competition, but being trailing by five.
Early in the game means something
very different than trailing
by five late in the game.
Because you have less time to have a
large differential to reverse that deficit
that you have, uh, in, in this scenario.
It's such a natural thing in sports.
Where in clinical trials do we bid build
futility rules off of the observed effect?
Do you know?
How do we do that?
The wind probability.
In sports incorporates that.
So in that game where the Red Sox were
playing the twins, the Red Sox were
up by five runs after five innings.
The win probability for them
under that scenario, uh, after
the fifth inning is about 90%.
When the game goes to the sixth inning and
the score doesn't change, it's still five.
The, the probability the Red
Sox win goes up because there's
less time left to flip that.
So the same observed effect
at different times have quite
different predictive probabilities.
It's a very natural thing in
sports to understand that part
in in the wind probability.
Okay.
So let's flip this to clinical trials, and
we've already set this up a little bit.
So what does this have to
do with clinical trials?
Many clinical trials.
Are set up much like a
sporting competition,
especially phase three trials.
Earlier.
Stage trials can have, uh, uh, goals
that are largely learning estimation.
Those may be a little bit harder.
You can, you should have
futility rules in those settings.
But let's think about phase three trials
because they're the largest, the most
expensive, but they have very well-defined
rules of quote unquote winning that trial.
They're targeted on demonstrating
one treatment is better
than another treatment.
In a sporting complex, uh, contest,
it's almost like a player has to
play, you know, the best player
in the world to show their better.
They have to, to to show that
they, they can win that game.
We have very well-defined rules
in sports about who wins the game.
Now, in clinical trials, we also
have really well-defined rules that,
one, we're trying to see that this
treatment is statistically significantly
greater than another treatment.
So again, thinking about this question,
like before we talk about how to do
this, I asked you the question about why
we might stop a sporting competition.
Why might we stop a clinical
trial before the end for futility?
Nick Berry: Again, I
Scott: Yes.
Nick Berry: a lot of reasons.
Um,
I, there, there are patient.
Ethical, um, of the trial and if it's
looking like the treatment is, has
no chance of being, being successful
in a clinical trial, you know, is
it ethical to be randomizing these
patients to be experimenting on them?
Um, there's, for a sponsor, there's
certainly monetary constraints.
Some of these drugs are extremely
expensive to administer.
It's extremely expensive
to follow subjects up.
Um, often can roll over into some
open label extension where a drug that
might be, you know, better for these
subjects can be administered to them
after a trial stops for futility.
Um, uh, I think there's, there's
a ton of different reasons.
Uh, safety and cost are probably
two of the most prevalent.
Scott: Yep.
So that there, there are strong
reason that if you're running a
trial and it's very, very unlikely
the trial's gonna be successful.
There are benefits of stopping that trial.
Uh, and the ethical parts
of it are, are huge.
You're asking a patient to contribute
to science, and if you look at the data,
they're no longer contributing to science.
The answer's largely been.
Uh, the, the question's been
answered at that point now.
You can, uh, any futility rule in that
trial that, by the way, there are some
strange trials where mathematically
the trial cannot be successful.
But tho those are very, very rare where
it's integer valued outcomes and no
number of events can flip the outcomes.
Those, those are different trials.
Most of these trials, you're
now addressing similar type
questions, and so we are.
As clinical trial designers
asked to create futility rules.
So you're the NHL is not asking
you this or major league baseball,
but how do we create futility
rules for a clinical trial?
Nick Berry: Yeah.
Um, let's see.
So I think it's largely the same
as what I described for like the
NHL rule with probably one major
difference is that I wouldn't
start by collecting a ton of data.
I don't think I'm gonna start with
all the clinical trials that have ever
been run and looking at their curves.
I know the data generating process or at.
have assumptions about the data generating
process and what I think is likely, so
I would probably start by, could think
of it as simulating 1 million different
paths through this, and maybe I think
of this at the beginning as I simulate
every patient or, or I look really
often I take benchmarks throughout
this trial, maybe a hundred benchmarks,
and at each of those points I can can.
I fit a t test, maybe I, you know, do
fit some simple test for now, uh, to say,
how much better is the active arm doing?
How much worse is the active arm doing?
And then I go through the same exact pro.
Now I have those spaghetti plots
exactly like I had for sports.
You know, I had the score
differential over time.
I have treatment effect over time.
Um, I would probably start again.
Maybe I don't wanna start until
halfway through the trial.
I would say, okay, at what
point is it impossible?
Did none of my 1 million simulations
come back to, to the you know, how,
where the treatment was so bad that it
couldn't get back to being successful?
And again, like I said with the, the
other example, that's not necessarily,
it's not a good futility rule, right?
Something that never allows a reversal
Scott: Yeah.
Nick Berry: a little too
conservative for my liking.
But that probability of
reversing is really important.
So I again, wanna limit it to maybe
something like 1%, or in sports I use 0.1%
'cause we're doing this over and
over and over and over and over
and I didn't want a bunch of these.
But, um, in clinical trials, I think a
predictive probability of 1% is still
fairly conservative for futility role.
But something in the one to 5% range
is probably where I would start.
Trying to cut that probability
of reversing after thing.
And again, I'm, I'm looking at trials
that I simulated from start to finish.
So in reality, I know when I stop
a one of those simulated trials for
fertility, what would've happened.
And I think that is a really important
aspect of the simulations that I get
this counterfactual look at all of
the simulations and, and can do that.
And so again, I'm
drawing a spaghetti plot.
I'm placing benchmarks that may be, uh.
three quarters away through the trial and,
and myself utility rules at those points.
So the same exact process with the data
generating me mechanism being different.
Scott: So there's, there's this same
quantity that was this win probability
that we, where you can go to espn.com
and look at.
We build that in clinical trials, and
you can build that through statistical
modeling on what is the outcome?
Is it a, is an event outcome?
Is it a change from baseline outcome?
Is it a responder outcome?
You, you can take that outcome and build.
A mechanism for calculating
what's the probability that the
treatment is gonna demonstrate
benefit by the end of the trial?
What is its win probability?
That's something in clinical trials
called a predictive probability.
You, you may have heard of
a conditional probability.
Conditional probability is similar.
It's a little bit different in
that it a conditional probability.
You assume you know the truth about
the team or or what the treatment
is, and then you calculate that.
A predictive probability
estimates it from the data at
that point with its uncertainty.
In calculating the probability of
winning the trial going forward.
So there are different mechanisms.
Bayesian tends to be predictive
probability, where conditional power
is a pretty simple, uh, taking the
observed rate that you see right now,
usually underrepresents variability.
So I don't love the number that comes out
for conditional probability, which is why
we use Bayesian predictive probabilities.
And then we look at that for, its.
Potential stopping the trial.
When that probability gets low, you
can look at stopping the clinical trial
Nick Berry: To, to keep our
link to sports here, the, you're
talking about conditional power
or conditional probability of
success versus predictive power.
Predictive probability of success.
calculation
Scott: calculation.
Nick Berry: is different than the
calculation done in a clinical
trial because in sports, you
know, we know a lot about.
The two teams going in and we that
information in a way that's not
common in phase three clinical trial.
As Lisa, you were saying, so if I
was doing a predictive probability in
sports and a team's winning, uh, nine
to one, like the twins game you were
describing, um, I am going to shrink my.
Posterior probability of, you know,
the quality of those teams massively
Scott: Massive
Nick Berry: being more of
a 50 50 game going forward.
Something that conditional power
Scott: power.
Nick Berry: naturally.
Scott: Naturally.
Nick Berry: I'm gonna
Scott: I'm
Nick Berry: probability of the twins
winning, given that the teams are
equal, is something you could do
with a conditional power and get
that number out in a predictive.
Probability sense, you would
Scott: center.
Nick Berry: probability
that the teams are much, are
Scott: Much our predict.
Nick Berry: because they're
professional sports teams,
they are pretty close together.
And even though you observe nine to
one doesn't really change what you
think about the team as a whole.
It's just that the game is random, so
your predictive probability in that
sense, it's actually probably pretty
close to a conditional probability
of success at that same number.
And so the, the huge amount of
prior information, I think kind
of changes the decision making
of that predictive probability.
Scott: That, but that can be incorporated
in clinical trials where you might
even want to build a stopping rule that
says, we wanna stop when an optimist.
Nick Berry: Yeah.
Scott: the drug thinks the probability
of winning is below 1% or 5% kind of
thing, and you use a Bayesian prior,
that gives a reasonable probability, an
optimist probability, uh, um, estimate
of the drug's effect, and even when.
Optimist thinks the trial is
unlikely to be successful, you stop.
Is is by the way, it, it alludes to
this in the, the Bayesian, the Bayesian
draft guidance that you can use this
design prior, uh, sort of thing.
You can do this for the futility.
So it's something that can be
incorporated in clinical trials for sure.
Nick Berry: Yep.
That makes total sense.
Scott: Now, some of the.
Nick Berry: idea.
Scott: Uh, yeah, I, Jacob Jay Cade's
idea actually is a really nice idea that
you take multiple, uh, uh, clinicians
with varying views across the spectrum.
And when all of those clinicians
are convinced of it, then you stop.
So you might need to go longer
to demonstrate that you think the
drug is effective to a pessimist
Nick Berry: Yeah.
Scott: likewise to stop when, when
you think it's not effective as well.
It, it's a really neat idea actually.
Um.
Now this the, this characteristic.
We spend a great deal of time simulating
different rules and we calculate those
error rates that we talked about before.
So you can make a mistake
with a futility rule.
You could have a futility rule that stops
a trial that would've gone on and flipped
and reversed, and you say There's very
little chance for the treatment to win.
And we calculate that's 1%.
For example, we, we've done trials
where the predictive probability of
success of that trial is 2% and it stops
for futility has a 5% futility rule.
The trial stops by, by,
by interpretation That.
There's a 2% chance that trial
could have come back and been
successful under that scenario.
It's just deemed that.
Taking into account the pluses
and minuses of this, that it's
a good thing to stop that trial.
Now that decision is made ahead of
time, uh, in those circumstance,
and we simulate multiple rules.
We look at error rates.
We also look at how, you know, what
happens to power when you have a
futility rule for a treatment, you.
You may decrease its chance of success
because some of those trials, those
spaghetti plots that Nick talked about,
they hit the futility rule and they
would've reversed and gone on to win.
You reduce power by that if you have
a very, very aggressive futility rule.
An example of this would be
halfway through the trial.
If the predictive probability of
the treatment winning is less than
50%, we're gonna stop the trial.
That would be a strikingly
aggressive futility rule.
Essentially, it means that the
treatment is observing something very
close to what it needs to win by the
end, which gives it kind of 50 50.
If it stays there, it wins.
If not, it loses and you stop the
trial there, you're gonna reduce
power by very large amounts.
Uh, Nick and I are looking
at example when that.
That probability can drop power by
25% in your, your, your effect size
because you're being so aggressive
at stopping it and you don't give
the treatment that, that chance
to, to, to go on and be successful.
I.
A 20% rule, by the way, and I'm gonna say
20% because there's an interesting story.
You know, futility story to this
in, in the example we're looking
at can have somewhere a five
to 10% reduction in power, 20%.
If you think about a sports
competition where the team only
has a 20% chance of winning.
That happens a lot.
Uh, those are exciting games by
the way, where a team scores two
touchdowns very close to the end to win.
Uh, golfer comes back from five
shots back on the last nine holes.
I that's not even 20% coming back
by three shots is sort of 20%.
That's an, that's a common occurrence.
Well, that rule is used in futility.
For futility in trials.
And there's a famous example when that
rule was used, and it's a famous example
where people think futility might not
be a good thing in clinical trials.
And we'll kind of get to that.
It's the Biogen example of Aducanumab
It was an Alzheimer's treatment and
this, this was, you know, 5, 6, 7 years
ago, uh, Alzheimer's treatment, uh.
A disease modifying therapy.
They're running two phase 3 trials
and they did a futility analysis
where they looked at both trials,
a predefined futility analysis, and
they looked at both trials and they
stopped both trials for futility.
Interestingly, they stopped
those trials, but they got.
Follow up data on one of the trials
turned out to be statistically
significant in Alzheimer's.
That's a huge deal.
The other trial was not, and there was
a lot of discussion about this was a bad
futility rule, the these types of things.
Their rule for futility was that
if either of the two trials had
less than a 20% chance of success.
They would stop both.
And if you think about that in the
sports comp, uh, context that's a
really, really aggressive futility rule.
Now, it was a bit more complicated
because one of the trials, the
conditional probability of the one
that won was 60% when they stopped.
So it was a controversial application
of futility rules in this scenario.
I mean, in some extent it was right.
The other trial was not successful.
But in hindsight, I think Biogen
would've done it differently
and they were not necessarily
pleased with the futility rule.
In, in that scenario.
Now, I did a debate with Paul Eisen from,
uh, USC on whether we should be doing
Al Futility rules in Alzheimer's trials.
Now you can go get that and you can find
that debate and I did win the debate.
Uh, 80% to 20% people agreed
that we should be doing futility
rules in Alzheimer's trials.
Uh, I feel like.
If I would've lost that, I, I
feel like it's such a yes, we
should absolutely be doing that.
We should be doing futility rules in
every phase three trial that, uh, if
I'd have lost that, that would've been
like Rory losing, uh, leading by six
after 36 holes, uh, sort of scenario.
Nick Berry: The, the
Aducanumab example is.
Interesting because earlier while we
were talking, I was thinking about
how the value you choose for your fu
utility rule almost creates this sort
of implied utility on what you're
valuing in your clinical trial.
Um, you know, if you have an aggressive
FU utility rule, you're probably saying
it's really expensive to continue.
I only want to spend this extra
money if it's likely that we can win.
So the, the implied utility of that
futility rule is something like.
We need two successful phase
three trials or bust, is sort
of the implication of that.
And that was probably what was
going through their mind while
they're constructing the rule.
Like, we need, in order to get
this approved, we need two phase
three trials to meet this threshold
and the FDA will approve us.
And you know, clearly that's
not what happened at the end
of the day, but uh, you can see
how they got there a little bit.
And I think the utility that
comes out of this is out of your
futility rule is interesting.
Um, and.
You know, you say a lot by not
actually saying anything on purpose.
Scott: Yeah, no.
We spend a great deal of time building
these predictive probabilities in that
scenario where you have two phase three
trials running at exactly the same time.
The idea that you do conditional
power only in that trial and you
ignore the data from the other trial.
Nick Berry: Yeah.
Scott: The fact that the other trial
had a 60% chance of winning meant
that it had a good effect size.
If you would've done regression to
the mean and shrunk over the two
trials, you probably would've got
a much higher probability for the
second trial of being successful, and
then maybe that would've eventually
been successful had they run it out.
To the end.
So a huge part of what we do as
statistical consultants is build good
predictive probabilities so that we're
making good decisions in almost every
clinical trial, like Alzheimer's,
like stroke, uh, uh, it, uh, oncology.
You have information on patients
that start of the primary endpoint.
They're not all as simple as success
and failure and you know everything
about everything in the trial.
You have incomplete information on this.
So we're building predictive
probabilities that are using
maybe dose response modeling.
They're using longitudinal
modeling of early clinical
outcomes to later clinical items.
Based on their predictability, we
spent a ton of time getting those
predictive probabilities to be
really good so that they make really
good decisions in clinical trials.
And I suspect that was not optimally
done in the Biogen example.
Okay.
Um, I, I do wanna make reference to
a nice paper by our colleagues, Roger
Lewis and Barbara Berger In jama,
there's a, a futility in clinical trials.
There's a JAMA series for, uh,
methodology in clinical trials,
and they write about futility.
You can read more about those.
But unfortunately Nick, I think we've
hit the futility for, for this episode.
Nick Berry: Yep.
Scott: Uh, uh, uh, uh, and, and that's
more time and resources based here, uh,
that than the fact that we're losing.
Nick Berry: Yeah, right.
Scott: Yes.
We, we are not losing.
Yes, yes, yes.
Uh, so we appreciate you all joining
Nick and me in this deeper dive into
the intersection of sports statistics
and the science of clinical trials.
And until the next time,
we'll be here in the interim.