In the Interim...

In this episode of "In the Interim…", Dr. Scott Berry and Dr. Kert Viele examine response-adaptive randomization (RAR) in clinical trials, dissecting its statistical rationale, common criticisms, and implementation challenges. Drawing on extensive experience with trials such as BAN2401 (lecanemab), ICECAP, dulaglutide seamless Phase 2/3, I-SPY2, REMAP-CAP, PROSPECT, and the historical ECMO trial, they discuss the scientific advantages and disadvantages and ethical impact. RAR reallocates patient assignments during interim analyses to direct more patients to better-performing arms, but this can reduce power in two-arm trials, introduce complexity from temporal trends, and create operational complexity. The ECMO trial and "play-the-winner" approaches are discussed as cautionary examples emphasizing the need for thorough simulation before deployment. The hosts highlight RAR’s strengths for dose-finding, multi-arm, and some platform designs, but underscore its limitations in confirmatory two-arm settings. Operational demands, data reliability, simulation across scenarios, and resistance to overgeneralization are recurrent themes. The episode concludes by situating RAR within the broader context of adaptive platform trials and learning healthcare systems.

Key Highlights

Definition and mechanics of RAR, with interim analysis guiding allocation updates
Multi-arm adaptive and platform trial experiences (BAN2401, ICECAP, dulaglutide, I-SPY2, REMAP-CAP, PROSPECT)
Critique of RAR in two-arm trials (power loss), temporal trends, unblinding, and overgeneralized literature
ECMO/play-the-winner: risks of poorly simulated RAR
Necessity for rigorous pre-trial simulation and robust data flows
Contextualization of RAR’s role in both traditional and learning healthcare environments

For more, visit us at https://www.berryconsultants.com/

Creators and Guests

Host

Scott Berry

President and a Senior Statistical Scientist at Berry Consultants, LLC

What is In the Interim...?

A podcast on statistical science and clinical trials.

Explore the intricacies of Bayesian statistics and adaptive clinical trials. Uncover methods that push beyond conventional paradigms, ushering in data-driven insights that enhance trial outcomes while ensuring safety and efficacy. Join us as we dive into complex medical challenges and regulatory landscapes, offering innovative solutions tailored for pharma pioneers. Featuring expertise from industry leaders, each episode is crafted to provide clarity, foster debate, and challenge mainstream perspectives, ensuring you remain at the forefront of clinical trial excellence.

Judith: Welcome to Berry's In the
Interim podcast, where we explore the

cutting edge of innovative clinical
trial design for the pharmaceutical and

medical industries, and so much more.

Let's dive in.

Scott: All right.

Welcome everybody back to In The Interim.

your host, Scott Berry.

Uh, I'm your co-host, Scott Berry.

Uh, my, my co-host, uh, my common
co-host, Kurt Veily, is back.

Though Kurt, it's been a bit of a,

Kert Viele: Benny Gap.

Yep, happy to be here.

Happy to be here

Scott: we, we may find out, uh, if
Kurt has any pet peeves for the today.

But get to the topic of the day,
and you know, we are somewhere in

the 60-plus episodes, Kurt, and
we have not done an episode on

response adaptive randomization.

Something that isâ¦

makes people passionate.

Uh, it's controversial.

Um, it's, it's a great topic
for innovative trial designs.

I think it's especially a great
topic as the world coming, the world

of clinical trials is evolving.

You know, what is, what is the role of,
of response adaptive randomization in it?

What are our thoughts on it?

Um, lots of critics of response
adaptive randomization.

So let's, let's get to the, to the
bottom, to the top, to the sides

of response adaptive randomization

Kert Viele: Sounds great

Scott: Okay.

So it's a controversial topic.

Let's, let's, let's, uh, start with
what is response adaptive randomization?

Remember, my, my, my, my wife
loves to tune in on these, and

she's gonna wanna make sure that
everybody understands the topic.

So explain to Tammy what response
adaptive randomization is.

Kert Viele: All right.

First off, hi, Tammy.

Good to see you.

Um, the, uh, so the idea behind RAR,
response adaptive randomization - we'll

abbreviate it from here on out - um,
is essentially, uh, all adaptive

trials, they do interim analyses, and
they make some kind of adjustment to

the trial at those interim analyses.

In RAR, the adjustment that you're making
is to change the allocation probabilities.

And usually, what we would do is
we would increase allocation to

arms that are performing better.

We decrease allocation to arms
that are performing worse.

The basic idea is to do a couple things.

One is to try to treat patients better
in the trial, try to get them the

better-performing arms, and we hope we get
some better inferences at the same time.

But I know we're gonna touch on
that when we get into some of the

controversies because that sometimes
happens and sometimes doesn't

Scott: Yeah.

So e-examples of, of how this may
be, and we, we've been involved

in multiple trials where response
adaptive randomization has been used.

And so a couple of those, the, the BAN2401
trial was a trial for, of lecanemab, which

is now approved for Alzheimer's disease.

Their phase two trial was a placebo
and five active arms of lecanemab.

It was actually two
frequencies and three doses.

And those five arms, they, they did a
hundred and ninety-six patients enrolled

where it was initially fixed randomization
to those, those six arms in the trial.

An interim analysis took place,
and a new randomization probability

was set for each of the arms.

in-- essentially, the
control was a fixed rate.

slightly different than that, but, but
for all intents and purposes, the control

was fixed, and the other five could vary.

With the remaining probability, a
new randomization vector was created.

For the next period of time, patients were
randomized u-using those probabilities,

and then fifty patients later enrolled,
a new interim was done, and this was

repeated during the course of the trial.

And it ended up that two of
those doses, the two higher

doses with the two frequencies,
were the most patients explored.

One of them was moved to phase
three, and it, it was, uh, uh,

successful in a phase three trial.

goal of that phase two trial
was to find the best dose.

What is the best arm,
and how good is that arm?

And we'll come back to whether RAR
is good or not, but that, that was

an example of, of response-adaptive
randomization being used.

I'll point out one more.

We'll, we'll probably talk about
other trials and how they do it and

the, the good and the bad of it.

Uh, I'll touch on another one just
because I've done two recent podcasts

on it, and we are doing a podcast
on the results of the ICECAP trial.

It-- By the way, it's been recorded, and
we're just waiting for JAMA to publish

the article, and then we're gonna release
the discussion of the ICECAP trial.

The ICECAP trial had 10 arms.

arms were the duration of hypothermic
cooling for the treatment of cardiac

arrest, post-cardiac arrest resuscitation.

You go to the, uh, the emergency room,
and they do different durations of

hypothermic cooling, and the goal was
to find the best duration and largely

was there an increasing dose response?

Largely, what is the best duration?

Was the goal of that, and there
were 10 arms in that trial

Kert Viele: And I-- you should also
add that the, the lecanemab story,

you published that article as well, so
people can go back and read that and

see what happened interim to interim.

I thought that wasâ¦

A lot of times we don't publish
that kind of information.

I thought it was good that it's out there

Scott: Yeah, it's the only trial I know of
that published the actual interim reports

that were done at each of, I think there
were 18 or 19 interim analyses, and it,

it shows the quantities that were used to
calculate the randomization probabilities.

It shows what each one
were at all the interims.

You can see those reports.

Fantastic.

I'm trying to get the ICECAP people
to publish that, and we will.

We'll, we'll show each of the interims for
that as well so people can follow along.

Watch the movie at home,
um, uh, uh, of the trial.

So that gives a little bit of
a flavor in just explaining

response-adaptive randomization.

Now, there, thereâ¦

Uh, it-- to some extent, it
sounds fantastic, um, uh, as this.

If I'm a patient, I feel like
I would like to be in a trial

that uses response-adaptive
randomization just at face value.

Um, they-- At, at, at the same time
within this, I think there are a

lot of people who are unfamiliar
with clinical trials, that when I

explain to them adaptive trials,
they look at me I have three heads.

Like, don't all trials do this?

Shouldn't the r- the allocation of
treatments the latter part of the

trial be informed by the earlier part
of the trial and what's given to them?

And when you describe that, no, most of
them not, um, they're surprised by that.

So there's, there's that part.

But there are, there are critics of it.

There areâ¦

A-and, a-and I, I don't know
where we fall in this, Kurt.

Um, I don't necessarilyâ¦

I'm not pro-RAR, uh, as I, I
don't think there's value in that.

It's a tool in the trial to g- in some
cases, get better answers, some cases

treat patients better, but it's a tool
available to us as trial designers,

and it has good and bad in it.

I think there-- it's fair to say there
are, there are authors out there who

are, against RAR and write articles.

"Don't, don't fall for the RAR trick.

Don't do it.

It's bad."

Uh, but it is quite
controversial in the literature

Kert Viele: At least we're very
persuasive because we, we're lay-

laying out traps and we're getting
people to do our, our stuff.

But ironically, we don't-- how
many of these do we actually do?

We certainly do them.

I mean, we do, you know, hundreds
of trials over the years, but it's

certainly not 60% of these involve RAR.

It's particular cases where we've done
it and particular cases where we haven't

Scott: Very much so.

Um, let, so let's talk about
what, what those cases.

So i- if, if, um, um, what, what are
the, what are the critics of RAR?

What, where does this perhaps not go well?

Kert Viele: So, and well, this gets
at whether, you know, we're critics

of RAR for certain cases as well.

So, I mean, the, the central thing, I
think what you said from the patient

standpoint, you know, obviously
if you're enrolling late in the

trial, you wanna be, "Hey, a lot of
people have generated information.

I wanna get the benefit of that."

I would fully understand that perspective.

There's a scientific aspect that the
purpose of the trial is to generate

information beyond the trial.

If you're doing, in lecanemab,
there are millions of people who

are suffering from Alzheimer's that
want a good and effective drug.

So we wanna keep them in mind while we're
also trying to help patients in the trial.

One thing that happens is to get a lo--
I'm not gonna do a lot of math here at

all, of course, but if you're looking
at estimating a treatment effect, so the

sample sizes in each of the arms matter.

You're comparing control to treatment.

If you can make both of those
arms have bigger sample sizes,

you're gonna get better estimates.

You're gonna get more information.

So now the question is, well,
how does RAR actually do that?

Well, RAR does that by
increasing allocation to some

arms, decreasing it to others.

If you're in a two-arm trial, for
example, this is the classic and a

lot of the papers that are against
RAR, they focus on two arms with

reason to say it's problematic here.

You increase the treatment allocation
potentially, but you decrease the control.

And this gets into kind of a
robbing Peter to pay Paul situation.

You can actually decreaseâ¦

You, you estimate treatment effects
worse in two-arm RAR s- than you

would in doing a fixed trial.

So a lot of people have noted this.

There's a problem.

Two-arm trials, we
don't do RAR very often.

In this case, I think it's
a really specialized issue.

All that's gonna be turned on its head
in a multiple-arm setting, uh, and I

think we'll get to that in a minute.

But two arms is a very-- If you read the
literature, two arms is a very, very,

very controversial case, and I think we
agree with the, the limitations there

Scott: So, so a lot of critics
of RAR will point out that in a

two-arm trial, uh, moving away from
one-to-one, reduces your power.

It reduces your, uh, uh, i-
inferences, your inferences about

the, the difference between them.

Now, there are some odd cases where,
uh, it might be good to reduce one,

the variability's different in the
two arms and, and, and all of that.

But largely, you reduce power in that.

And then they make a sweeping argument
that so you should never do RAR

Kert Viele: Yeah, so that's my pet peeve.

You wanted a pet peeve.

This is-- My, my pet peeve is how
overgeneralized the RAR literature is.

Uh, uh, RAR, it's a lot
of different methods.

You can do it a lot of different ways.

You can do it in a lot of different cases.

A lot of papers are written, "I'm
gonna explore this version of RAR

in this setting, but I'm gonna
generalize it to everything," and

that just doesn't serve us well

Scott: And they say, "Look,
this method doesn't work here.

RAR doesn't work," and they

generalize it to every possible

RAR cases of that.

Kert Viele: So

Scott: Okay, so in-- So you talked
about the two-arm case, the two, the

two-arm setting, and so that would
be a specialized scenario where the

goal isn't entirely treatment effect.

There, there may be other goals, and
we, and we'll come to that a little bit.

So let's just, uh, take, uh, another
criticism of it is temporal trends.

So what is the, the, the, the
criticism about temporal trends?

Kert Viele: So there's something
really magical about doing fixed

randomization, not doing RAR.

So if, if I run a trial and I enroll
patients one-to-one, and I have patients

that enroll early in the trial, I have
patients that enroll late in the trial.

If I look at the treatment and control
groups, however those split between

how many patients came in early and
how many came in late, they're the same

between treatment and control because
I've done one-to-one the whole way.

I've equalized allocation the
whole way through the trial.

If I do RAR, that doesn't
happen necessarily.

If I start the trial one-to-one and then
shift to three-to-one, for example, or

some people would say nine-to-one, uh,
which I don't think we get to very often.

Uh, but in any case, if you do that, so
what's gonna happen is you go three-to-one

in favor of treatment later in the trial.

You look at the treatment subjects,
they're more late patients.

You look at the control subjects,
there are fewer late patients.

If there's any kind of anything going on,
and the classic example is COVID, where

there's a new variant coming in and it--
people act-- react to it differently,

that is gonna generate a bias, and
that is potentially really, really bad.

So I mean, it depends on how big
the bias is, but it can completely

invalidate your conclusions.

Typically, we would
add terms to our model.

We'd model time to address this, uh, but
that's generally what the issue is, is you

do need to take active steps to correct
for that and make sure you're robust.

Scott: Oh, that, that creates biases.

And that term, it creates biases, always
depends on how you're analyzing the data.

Kert Viele: Right

Scott: you ignore the fact that it's
different over time and it's not part

of the model, then it could cause bias.

If you're adjusting by time, then,
know, assuming you have additive

effects across time, then it's
not biased within that setting.

So there are, there are ways to
adjust for time if it's a concern.

There, there are some cases where we have
much less concern about time effects.

In some cases, we have larger effects
about, uh, larger concerns about time.

So there are ways to adjust for that.

Okay, um, there's one other potential
criticism, which is the rare circumstance

where people may be un- uh, unblinded
to the, what's the allocation is, is

that you may learn one treatment is
doing better than another one just

by the fact it's being given more.

in a blinded trial where patients
are blinded, investigators are

blinded, nobody would know that a
particular treatment is given more.

In trials where they're unblinded,
this becomes also a potential

criticism, um, uh, uh, of, of RAR

Kert Viele: Yeah, and among many other
criticisms of an unblinded trial.

Lots of things can happen in those
settings, and RAR may make them worse, so

Scott: Okay, so we've set
up potential negatives.

This unblinding aspect of
it is a potential negative.

trends is a potential negative to it,
and in a two-arm trial, you reduce power.

So, uh, o- okay,

Kert Viele: sounds, sounds awful, Scott

Scott: sounds, sounds bad.

So what are the potential

Kert Viele: So, so

Scott: of, of, of RAR?

We, we started, it seemed
like a positive thing.

Kert Viele: So,

Scott: positives of RAR?

Kert Viele: well, so there is a
case where RAR is a good match.

So in that case, the one we often
use it for is situations where

we are looking at multiple arms.

We're not in a two, two-arm setting,
and we're looking for the best

arm among those alternatives.

So the classic examples
here would be dose-finding.

I'm not particularly worried about,
you know, which arm is the fourth-best.

I wanna know what is the best.

Um, in the epidemic settings, the
pandemic settings, we wanna know what's

the best treatment for, for patients.

We don't necessarily
need to know the others.

In those cases, what RAR does for you
is because you are increasing allocation

to the best arm, it's the one doing the
best, it's the one you're gonna raise,

you get more information on that best arm.

You wanna be careful here.

This is the thing we've always been
telling people is you wanna make sure

you maintain the control allocation
because that's part of your comparison.

If you're gonna do an estimate at the
end to compare to control, you need

controls, so don't, don't skimp on those.

But in those cases, RAR will give
you more power 'cause it gets

more patients on the, on the arms
that matter in your comparison.

It gets you lower bias, lower,
um, uh, lower variance, so

basically you get better estimates.

All of these inferential things
start working in your favor

Scott: So i-in a, in a setting, and
looked at in a, in a treat-- potential

trial for Ebola, for example, you might
have four active treatments that could

be brought in as a treatment for it.

If you're increasing the probability
of one or two of those treatments and

keeping control the same, you have a
better idea of figuring out the best.

But now you've reduced the, the
randomization to arm four, in a setting,

the question is, do you care about that?

So power is such an oddâ¦

Uh, it's not an odd thing, but
it's a really important thing to

unders- make sure we understand what
are we trying to do in the trial.

What are the goals of the trial?

If it's trying to identify every
possible treatment that's good or

bad, you might not wanna do RIR
'cause it's not finding the best.

If it's about finding the best, then
identifying the effect, it can increase

the power, and dose finding is a great
example of that because you don't care,

uh, about a-arms that aren't as effective.

You're trying to identify the best.

And so it's a perfect example, and it's
commonly where we do it the most at Barry

is a dose finding trial, uh, where, where
it can create a better trial design.

Okay.

Now, at the same time, so if somebody
enters into a trial and, um, there's

this framework that's set up in these
trials of a VIP, where suppose you

were running a trial and a VIP came
in and you wanted to treat the patient

as well, and you could look at the
data and assign one of the arms.

As the trial goes on, you're more and
more likely to put that patient on a

better arm because you're learning,
and you would treat that VIP better.

Now, we don't do that in trials,
and VIPs don't get to come in

and look at the data of it.

Um, but that, i-if we're, if we're
increasing the probability to arms

that are doing better and we're
learning more efficiently, we're

treating patients better in the trial.

And so a patient that comes in later
in the trial is more likely to be

given a more effective arm in the trial

Kert Viele: So we've talked about this.

Um, it probably doesn't happen in
trials very often, but we've talked

about this example in the context of
a learning healthcare system where

the, the whole healthcare system
is trying to improve itself, but

somebody walks in the door going,
"I, I don't want you to randomize me.

You just look at the data and tell me
which one's the best, and I want that."

And you can imagine situations where
that's-- I mean, we talk about the

ethics of it for a long period of time.

Uh, but it's at least conceivable.

The nice thing about RAR is in that
ethical case, if you were doing

fixed randomization, that VIP coming
in wanting special treatment, they

actually get treated much better
than the average patient because the

average patient's getting randomized,
not making use of information.

If you're doing RAR, what's happening
is the difference between that

VI-VIP and the average patient in the
trial is actually pretty darn small.

It's a couple-- You know, if you're
doing mortality, it's a couple percent

rather than twenty, twenty-five percent.

So it really helps on getting
treat-- patients treated well and

can-- the, the idea that we're still
learning as well as, um, as well

as making use of that information

Scott: Yep.

Yep.

so from a perspective, and I started
this off with if I'm, if I'm a

patient going into a trial and there
are two trials, and one of them is

doing RAR and one of them is not,
I'd rather go in the trial with RAR.

I find that a, a, a better place to go.

interestingly, we do a number of trials,
especially rare disease, we're treating

a pediatric disease, um, a Duchenne
muscular dystrophy, where you might do

two to one or three to one randomization.

There's a s- very strong preference
of a patient to enter into a trial

where they're more likely to be put
on the active arm, and it's, it's

done as a tool to increase enrollment
wi- within the trial to, to, to

increase the likelihood patients
would agree to enter into that trial

because it's favoring the active arm.

Now you've got a procedure that learns
from the data and, and goes to different

arms within it, which is, which is a
perception generally, I think, that

patients would rather be in that trial

Kert Viele: And I, I think we ought to
add kind of to the, the objections to RAR

here that, you know, we simulate these
trials and how are we doing this RAR?

How are we increasing the allocation?

You can, of course, do this badly.

Uh, when we present this to people
who have never, you know, done

adaptive trials before, they-- the
natural question is, well, what if

you overreact to early data and the
early data is just noisy and wrong?

And certainly, the answer is if you do
RAR badly, you can overreact to that.

There are papers that have gone
through, you know, doing interims

at a sample size of three where,
hey, this isn't working out well.

So you wanna simulate this.

You wanna make sure that you do this
in a way where you don't overreact, but

you're making use of the information.

And that requires, uh, basically
a seasoned practitioner in

order to find the right balance

Scott: Yeah.

So, uh, we would never enter into
a trial, put an algorithm together

that does RAR, just hope it works.

Uh, we simulate thousands of trials.

Weâ¦

Millions of trials.

We make sure under a huge range of
scenario, it's getting better answers,

or it's accomplishing the goals
that it's intended to improve upon.

Uh, if it's that trial itself, we
wanna improve the outcome of patients,

if it's to learn the right dose.

Uh, so we simulate them
e-extensively ahead of time.

There's a famous example of an RAR
algorithm where, in hindsight, I'm not

sure that they like the outcome of it,
um, within it, and it's the ECMO trial.

Uh, uh, Bartlett, years ago, ran a
trial, and it was, um, newborns that

had particular respiratory issues.

They were being randomized between
conventional care and ECMO,

an ECMO machine, um, for them.

Now, the, the-- What was the design?

It was called a play the winner
design and, or Polya urn design,

where initially there's a bowl
and there's two balls in the bowl.

One is red for conventional,
one is blue for ECMO.

And the first patient comes in, you grab a
ball, and whatever ball is selected, that,

that's assigned to the patient If that
patient is a success and they survive,

survival was the endpoint, you put back
the ball on the therapy they were given.

So if they were given the blue
therapy and they lived, you put

another blue ball in the, in the bowl.

And then the next patient that
comes in has a two-thirds chance

of be given the, the blue therapy.

If they're given blue and they
don't survive, you put a red ball in

the, in the urn, and now it's more
likely the patient would get red.

And every patient that, that continues
and you put more and more balls in the

urn i- in there, and therapy's doing
better have more balls in there, therapy's

doing worse, the other one has more.

And it was a, it was a procedure for doing
response adaptive randomization that.

Now, what happened in the trial
is the first patient that came

in was randomized to conventional
care, and the patient died.

Um, and so a new ball was put in for ECMO.

The next patient that came
in got ECMO and survived.

now a new ball is put
in, so now it's 75% ECMO.

it went on a run of like 27
consecutive ECMO patients,

and they were all surviving.

I think there may have
been a death in there.

But the trial ended up, I believe,
uh, like 26 out of 27 on ECMO

and oh for one or oh for two.

I think at the end they even added a few
more conventional therapies on there.

But it ended up incredibly
disparate, uh, uh, in the trial.

Now, it turns out ECMO
is highly effective.

It's better than conventional.

We know that now.

But the trial was greatly criticized,
and I had the, I had the benefit once

of asking Bartlett, "Had you seen a
simulation that showed what happened

in the trial, would you have said
that was good, I'm glad it did it?"

He said, "No.

No.

We would have done something different."

And that's the value of simulation, to be
able to evaluate that particular algorithm

Kert Viele: So I, uh, one aspect of
that, so all of these play the winner,

the poly-polyarms schemes, they
come out of a particular literature,

which obviously, you know, your
father Don's heavily contributed to.

But w- there is one aspect of that.

A lot of that literature is based
on trying to treat a sequence

of patients well without having
any regulatory aspect to it.

So at no point do you have to
generate, you know, P less than .025,

so to speak.

Um, one thing that people should keep
in mind is that when you put that kind

of regulatory requirement on it, you
need a different kind of evidence.

And so that's one of the things
more modern RAR methods do is they

take into account the regulatory
environment as well, which is a

change over the past 40 years.

Scott: Wh- whi- which is doable.

Kert Viele: Yeah

Scott: there are guidance documents
that say response adaptive randomization

is, can be done in confirmatory trials.

Um, it's probably not very
common because phase three a

lot of times are two-arm trials

Kert Viele: Yeah

Scott: and, and there may not be
benefit in a t- two-arm trial,

especially in that regulatory setting.

It's unlikely that you would do
RAR in a two-arm phase three trial.

Kert Viele: Well, you, you have examples
of seamless two, three trials that have

done RAR in the first part and then
lowered, lowered the number of doses

Scott: Right.

So Eli Lilly's treatment Trulicity,
dulaglutide, uh, it's one of the

earliest GLP-1 inhibitors, was
seven-arm phase two trial that at

some point could trigger phase three.

it selected two doses to move forward into
a fixed randomization phase three portion.

By the way, it included data from the
first part, uh, in the final analysis

of that, which was done through
response adaptive randomization.

And it honed in on the 1.5

milligram dose.

Higher doses were having, safety issues,
high heart rate, high blood pressure.

It moved away from them, and 1.5

was the dose that moved forward,
eventually has done incredibly

well and, you know, billions
of dollars of year treatment.

So that, that is certainly done in
seamless two, three trials, where the

first part is to find the right dose.

An interesting side effect of that is
that the original design in that trial

was three arms in the phase two part.

And when they looked at expanding
that to a range of seven doses,

that increases the sample size by
that factor, you know, seven thirds

and it's, oh, that, that's too big.

But by doing response adaptive
randomization and honing in on a

particular part of the dose response
curve, y- the sample size isn't

seven thirds, and the sample size
wasn't even any bigger to do seven

doses than three doses because
of the modeling in the trial.

So there's a huge benefit to that.

The ICECAP is very similar to that.

ICECAP was three durations, but they
wanted a wider range of durations.

They went to 10 with largely the
same sample size because it hones

in very quickly in the region of
interest, uh, in the scenario.

So there areâ¦

The, the higher power
isn't just higher power.

It may enable more doses, uh, to
be used in a better trial design

Kert Viele: And one thing that you
talked about, well, you'd obviously

talked about the TRILOGY trial
for a long time, but the, uh, the

notion that it incorporated safety.

You, you were doing RAR not just on
an efficacy measure, but a combination

which allowed them to pick a dose
that balanced several features.

I forget what the features
are, what were in that trial?

Scott: clinical utility
index that was HbA1c chains,

Kert Viele: Yeah

Scott: Uh, uh, amazingly in the
world we're in, weight loss, which

has become, the GLP-1s now approved
solely for weight loss, um, and

blood pressure and heart rate.

So there are four endpoints that
went into selecting the optimal

therapeutic, uh, uh, dose, uh, in it.

okay.

Um, uh, within this, by the way,
we should mention, uh, it's sort

of a really interesting story
when it w- in and of itself.

You mentioned, uh, Don's work in this and
bandit problems, uh, and its relationship

to response adaptive randomization.

But even going farther, back farther
than this, largely, I think the first

randomized trial, randomized clinical
trial of humans, um, was in the 1940s.

Uh, streptomycin,

Kert Viele: 1948, streptomycin

Scott: uh, treatment of that,
fixed randomization, uh, two-arm

trial, fixed randomization.

But you can go back to 1933, and there's
a paper by Thompson where he introduces

response adaptive randomization, uh, to
that, and it was a long-forgotten paper,

um, within it that has now gr- been,
been, uh, uh, cited many, many times,

and it's even a little bit of the, uh,
search, uh, uh, uh, uh, Google search

things and, and bandit problems that
has brought that paper back to life.

Kert Viele: And not just
brought it back to life.

I mean, it's one of the more cited papers.

I mean, talk about
having to wait for fame.

You publish it in 1933, you get
very little attention for 70 years,

and now suddenly you have 5,000
citations or whatever it's at now

Scott: Yep.

bandit problems is largely, it-- you've
got multiple arms to pull, and you decide

on which arm to allocate to a patient.

Usually, they're done
in a deterministic way.

Oh, give them arm three,
give them arm one.

And usually, you've set up a goal
of that particular trial, which can

include a very large horizon that
at some point you have to pick a

treatment and go with one treatment.

Um, but you set up a goal to, to
save as many particular patients

or have a particular outcome.

And, and this was my father's dissertation
work, was bandit problems, uh, within it.

So closely related to response
adaptive randomization is this whole

literature of bandit problems, which
have also been cited many, many times.

Um, and many of this is now
advertising, uh, online advertising.

If Google has multiple ways in
which it can present an ad to you,

and I'm using Google just as a, a

Kert Viele: Yeah

Scott: of things, it can, uh,
throw out three and, and find

out how many clicks does it get?

Well, maybe we'll try
two, maybe we'll try one.

Maybe we'll personalize it to
individuals within that so they

can be using this technology to
increase the number of clicks

Okay.

Um, uh, now within the, the properties
of this, I wanted-- platform

trials have all, uh, brought out
a really interesting part to this.

So platform trials are those where we
have multiple agents in the trial and

a, a, a relatively new, uh, uh, thing.

And we might have four
e-experimental treatments and a

control in the trial at one time.

So this opens up the interesting
question of do we wanna do response

adaptive randomization platform trials,
which largely are a multi-arm trial

Kert Viele: So I, I think platform trials
to me are we, we still haven't explored

this in as much detail as we would like.

We've done simulations for our own
trials and so on, but RAR, it's a

really different beast in platform
trials, or at least it can be.

If, if I'm doing an umbrella trial where
I've got four arms, I've got to enroll

two hundred patients, when I do RAR,
if I'm increasing allocation to certain

arms, I'm decreasing allocation to others.

In a platform trial that's perpetual,
so I'm gonna do four arms, I'm

gonna replace an arm when I drop it,
increasing the allocation doesn't

necessarily lower sample size.

It slows things down.

And so you could say, "I'm gonna speed up
the arms that look the most promising,"

but I'm not gonna abandon everything else.

It's still there for me to
get to it later, potentially.

And I've been really interested in
how all that plays out in practice,

and we've done this both ways.

But I, I think this is one of the
open research areas is how this works.

Scott: Yeah, one of the controversial
things or the w- things that may

be problematic is if there's four
sponsors that own-- four separate

Kert Viele: Yeah

Scott: that own the drugs, and we
accelerate sponsor A, it slows down

sponsor B by, by, by definition,
and that may be an undesirable

thing to recruiting arms in a trial.

So many of them in that scenario,
phase two setting, don't use response

adaptive randomization because

Kert Viele: it would even be worse if
we said we weren't going to explore an

arm at all, much less than slow it down.

That often generates more controversy
with sponsors with reason, "Hey,

we want you to give us an answer."

Scott: Right.

One trial where this was done with
multi-sponsors, and it was generally

perceived by all to be a good thing, is
the I-SPY2 So the I-SPY2 trial is a Phase

2 trial in neoadjuvant breast cancer,
and there would be 20% fixed on the

control, and then the remaining 80% was
set up across the experimental arms that

were in the trial, and it would impr-
it would increase the, the probability.

The, the really interesting thing there
is breast cancer, we, we understand

heterogeneity of disease in breast
cancer better than most diseases.

HER2 breast cancer is, HER2
positive is a different breast

cancer than HER2 negative.

So the trial was enrolling, uh,
hormone receptor by HER2 status,

also had MammaPrint status.

So there's eight types of cancer that
women are being enrolled within that.

So if you came in with HER2 negative,
hormone receptor negative, triple

negative disease, you are have a
different randomization probabilities

over the four arms that are in
the trial than somebody who's HER2

hormone receptor negative within that.

Theâ¦

we might accelerate one drug for one
type of cancer, but a different drug is

being accelerated in another type, and
they had a fixed 120-patient sample size.

So they got more patients within
the subtypes where they were

doing better, and they might have
gotten less in other settings.

So you were treating patients better, but
the, the sponsors perceived this to be a

positive because of the multiple subgroups
where RRR was done different by subgroup.

So it wasn't just slowing
down one for the other.

One slows up one place and
speeds up the other place.

That was thought to be
a very positive thing

Kert Viele: Well, and it, it speeds up
their phase three because if you know

where your treatment has the biggest
benefit, you can design a smaller

trial to detect that in phase three.

You enroll the people who benefit, and you
don't dilute the effect through people who

don't, and obviously the people who don't
benefit can go to other trials where they

may achieve benefit and benefit them there

Scott: So, well, one of the places where
I've run into the most controversy is,

uh, so I-I've been involved in, uh,
multiple dozens of trials that use RAR.

Very happy with all of them.

And really very happy
with e-every use of RAR.

We did have a couple issues
in the REMAP-CAP trial, uh, so

fair to sort of point that out.

Uh, these have been made public.

Where REMAP-CAP is multifactorial.

So a patient that comes in could
be randomized to drug A, yes or no.

Dr- And then a different domain of
treatments that you could get drug

one, two, or three, uh, within that.

And, you know, w- I think the
most a patient's been randomized

is seven different domains,
simultaneously randomized that.

it uses response adaptive randomization.

Some of those domains
are two, um, arm domains.

There's a control, and then
there's, yes, you get therapy.

Should you get high-dose
vitamin C, yes or no?

That was one of the domains in the trial.

Now, so it was deemed this is a trial
that was treating a pandemic, COVID-19,

and I'll, I'll be specific to that.

We're also enrolling
non-pandemic community-acquired

pneumonia, uh, within that.

But within the pandemic, it was thought
that this was highly beneficial to

do response adaptive randomization.

was more likely that groups, uh, that,
that, there would be a positive to

entering into this trial if it was
using the most up-to-date information.

It's treating the pandemic while
it's learning at the same time.

Now, we did end up in a two-domain
case of simvastatin, where it ended

up about ninety percent probability
for simvastatin versus no simvastatin.

And it hovered around determining
efficacy, and it probably took longer

it to declare efficacy, uh, than it
would have if it stayed one-to-one.

But meanwhile, it was increasing
the probability of patients getting

simvastatin during the pandemic.

So this was a case that was debated a
lot as to whether RAR was a good thing

or not in that particular scenario

Kert Viele: So we've got-- There's another
example in addition to the pandemic.

We have a trial called PROSPECT, which
ha- we've-- there's a protocol paper out.

The results paper is not out yet.

We're hoping that will
come relatively soon.

Um, but anyway, it is a
two-by-two factorial experiment.

So technically there are four arms,
but you could think of it as doing

two arms twice on the rows and
columns of that two-by-two table.

Scott: Yep

Kert Viele: there were similar discussions
there over this is-- PROSPECT is on,

uh, respiratory distress in children.

And so the notion was, hey, we
want to treat these kids well

while we're trying to learn.

And that was exactly
part of the debate there.

Scott: Uh, one other case just to be
upfront is we did have a domain where

initially data were reported and flipped.

Um, and so what happens is the RAR,
intending to improve the better, um,

went the wrong way, uh, within that.

Lot, a, a number of lessons learned.

Now, the final data that went into
it was all fixed and all of that.

So by the way, there's a, a, a
strong operational burden upon,

uh, I, I don't wanna say strong.

There's an operational burden upon
doing RAR, making sure the data's

right that goes into the RAR,
the very- various pieces of this.

It's, uh, an important part of the story
is the logistical part of doing RAR.

Some settings, it's
reasonably straightforward.

I-Spy 2, it, it ran weekly automated,
um, but you gotta get the data right

Kert Viele: I remember doing
the RACE trial and I would get

data every 12 patients, and
this was back in the old days.

And so I would be sitting there,
I basically put together the

randomization table and it got
replaced in the randomization system.

But I was more or less manually
doing that, which was not what

you'd want if you can avoid it.

I think everything worked out fine in
this, but it's been good seeing more

and more groups learn how to do this
and be able to do this automatically.

But as you said, there are still
some snafus that occur and you

want an experienced group doing
this, not just on the design side,

but on the implementation side

Scott: Yep.

Uh, I do wanna come back to something
you said that I think in sort of looking

forward to the future here, uh, ofâ¦

I, I think platform trials,
multi-arm experiments, uh, are

growing in popularity, even
in comparative effectiveness.

I do think somewhat of the future
is a learning healthcare system,

where if you imagine where we are
today, we do these experiments to

learn the right therapy, but 99.9%

of people are treated outside of those,
and we don't learn from them at all.

As we start to merge learning about
different treatments with treating

patients at the same time, we, we
do experiments where the Belmont

Report lays out that it's okay that
these people are being experimented

on, that we do one-to-one.

But in a learning healthcare system
where we want to treat patients

better and learn about therapies,
response-adaptive randomization

is an incredibly powerful tool.

If I owned a healthcare system and
there was a particular treatment

and there's five available drugs
to treat that, uh, IBS, psychiatry,

number of scenarios, radomizing
patients to the different treatments,

response-adaptive randomization,
learning the right therapies, we don't

learn from these patients at all i-in
a randomized causal way for sure,

uh, is an incredibly powerful thing.

So I think response-adaptive
randomization, while it's a powerful

tool in the right setting, it's
a bad tool in other settings, and

that's, that's the part of it.

Um, is it has a really strong future
as I think healthcare becomes more

integrated, learning healthcare.

Uh, I think it, it, it's,
it's a powerful future

Kert Viele: And you're gonna be
exploring combinations of therapies.

All of this is gonna go together
in ways that typically aren't done

in, say, a hundred-patient trial.

So just 'cause I can't

Scott: And like the I-SPY 2 trial, we're
gonna have a much better idea of the, how

diseases are cl- similar but different,
heterogeneity of treatment effect.

Uh, all of this comes together where I
think res- response adaptive randomization

is, is going to be even more valuable.

Thompson is going to, post
100 years, get more and more

citations as, as time goes on.

Uh, which I think is Thompson
sampling, it's even called.

Um,

Kert Viele: Certain types?

Yep

Scott: All right.

Well, it took us, it took us 60-plus
episodes, uh, before we got to

response adaptive randomization.

I, I suspect it's not our last
discussion of this, but, uh, one of

my favorite topics and, and one of
them that get- keeps people listening

So thanks for joining, Kurt.

Appreciate everybody out
there tuning in today.

Until next time, we'll
be here in the interim

Kert Viele: Thanks, Scott

More episodes

Chapters

Creators and Guests

What is In the Interim...?