In the Interim...

In this episode of "In the Interim...", Dr. Scott Berry hosts Dr. Stephen Senn, award-winning statistician and author, for a discussion on advanced challenges in adaptive and platform trial methodology. Senn draws on experience in academic, pharmaceutical, and regulatory settings to address the recent draft guidance on Bayesian statistics from the FDA and multiple controversies in clinical trial design.

Key Highlights
  • Emphasizes understanding data origin and regression to the mean as essential for trial interpretation, above adherence to Bayesian or frequentist frameworks.
  • Details methodological considerations for time adjustments and model complexity, highlighting that model specification and parameter handling are critical regardless of statistical school.
  • Identifies the limitations of non-concurrent controls in platform trials, focusing on evolving background therapy, site participation, and protocol changes that reduce validity of historical or pooled control data.
  • Analyzes blinding difficulties in trials with multiple treatments and administration modes, using “veiled” blinding as a case and noting the implications for placebo response comparability.
  • Clarifies that operational efficiencies are the principal advantage of adaptive and platform trials, while purported statistical efficiencies can be exaggerated.
  • Stresses the importance of presenting interim analyses transparently to DSMBs when using complex models for time or covariate adjustment, to ensure oversight and interpretation remain rigorous.
For more, visit us at https://www.berryconsultants.com/

Creators and Guests

Host
Scott Berry
President and a Senior Statistical Scientist at Berry Consultants, LLC

What is In the Interim...?

A podcast on statistical science and clinical trials.

Explore the intricacies of Bayesian statistics and adaptive clinical trials. Uncover methods that push beyond conventional paradigms, ushering in data-driven insights that enhance trial outcomes while ensuring safety and efficacy. Join us as we dive into complex medical challenges and regulatory landscapes, offering innovative solutions tailored for pharma pioneers. Featuring expertise from industry leaders, each episode is crafted to provide clarity, foster debate, and challenge mainstream perspectives, ensuring you remain at the forefront of clinical trial excellence.

Judith: Welcome to Berry's In the
Interim podcast, where we explore the

cutting edge of innovative clinical
trial design for the pharmaceutical and

medical industries, and so much more.

Let's dive in.

Scott Berry: Welcome everybody.

Back to, in the Interim.

I'm your host, Scott Berry.

Have a, uh, really honored
to have my guest today.

Uh, a very well-known, uh, uh, me,
uh, uh, award-winning statistician

with me today, we're gonna talk about
a number of things, uh, with Dr.

Steven Sen.

He's a, he's worked as a
statistician, an academic.

In various positions in Switzerland,
Scotland, England, Luxembourg has a

really interesting history of, uh, of
work from, uh, being, which is a, a, a

really cool title, the head of competence.

At the Center for
Methodology and Statistics.

Uh, I love that title.

Uh, he's been a professor of Statistics at
the University of Glasgow, uh, professor

of Pharmaceutical and Health Statistics
at University College of London.

He spent, uh, eight years
at cba ge, uh, as well.

Um, he's a co-author, sorry.

He's the author of Crossover Trials.

In clinical research, statistical issues
in drug development, dicing with death.

He was awarded the 2009 Bradford Hill
Medal of the Royal Statistical Society

in 2017, the Fisher Memorial Lecture,
and so I honored to have you on.

Welcome to In the Interim, Steven.

Stephen Senn: thanks.

Thanks for the invitation.

The pleasure is mine.

Scott Berry: So, uh, I, my father
Don Berry has, has, uh, attributed

a quote to you and I'm wondering
if you want to, uh, attach this.

He said that you said, I don't
care whether someone is Bayesian or

Frequentist, as long as they understand
the regression to the mean effect.

Stephen Senn: Yes, I, I tend to agree
that I, I, uh, I often think that

it's, it's far more important to,
uh, to know how the data arrived.

I, I think that the, the basic thing a
statistician should always ask themselves,

how do I get to see what I see?

And, uh, the, the, this, this, the
central fact about regression to the

mean is that the data you are using as
a comparator were selected to be what

they are and the data you are using
to make the comparison or an outcome.

And they're fundamentally
different things.

The, the patients only got into
the particular trial because

they had the values you defined.

They wouldn't have got in
if the blood pressure hadn't

been above a certain level.

After that, what you did
was something different.

You didn't set a criterion, you actually
observed what happened, and that's

basically, it's the asymmetry between
those two things and understanding

why this causes a particular problem.

That sort of thing is the thing that
statisticians ought to understand.

Scott Berry: Hmm.

Hmm.

Stephen Senn: straight away why it's
nonsense to claim that because, um,

actors who win an Oscar live longer.

That, uh, the esteem of winning an
Oscar is causing them to live longer.

If you can see what the fundamental
flaw is there, then you're

beginning to be a statistician.

It doesn't

Scott Berry: Yeah.

Stephen Senn: a Bayesian or a frequencies.

Yeah.

So,

Scott Berry: Yeah.

Yeah.

Well, and, and, and it may even come up
as we talk about what is a placebo effect,

uh, which I think you laid out very much.

Uh, which I think is, is
very much misunderstood by

many in, in clinical trials.

Uh, and so I, I wanna talk.

We want to talk, uh, and, and you,
you have been quite, uh, active

on social media, by the way.

I very much enjoyed the, uh, on Twitter,
your pictures of the various hikes you

take, uh, beautiful hikes, which always
seemed to end with a picture of a beer.

Uh, but, but, but those are fantastic.

I hope you're able to still do those.

Stephen Senn: Yes.

Scott Berry: Yep, yep.

I, I've, I've.

Stephen Senn: be doing a hike tomorrow.

The weather looks good.

So I think, uh, my wife and
I will do a hike tomorrow.

Scott Berry: Ah, fantastic.

Fantastic.

Uh.

Stephen Senn: I'm usually on my
own uh, my wife has a walking club.

She's in.

Which I'm not allowed to join.

So she, she goes off on a Thursday,
I go do something else on my, on the

Thursday, and then other days we might
be doing, going off for a hike together.

So.

Scott Berry: Ah, that's wonderful.

That's wonderful.

As, as part of, uh, Steven's activity on
social media, a number of really wonderful

blogs, uh, I, I, I suggest you read them.

Then one of them came up, which is
titled Beware of the Morlocks, uh,

and, and Loving Stephen's, uh, blogs.

I go read it, it turns out, may,
maybe a criticism of something I was

involved in, so we're gonna get to that.

But first, there's lots going
on in the world, uh, here.

Uh, the FDA guidance on.

Draft guidance on Bayesian statistics.

Uh, part of your beware of the morlocks
talks about bays and, and, and not.

I think it might be nice to throw
it over to you about the bays, not

bays, your position on that, and
then maybe even leading into the FDA

guidance and what you think of that up.

Stephen Senn: Okay, so I can
start with flexible designs.

If you change the
allocation ratio and you.

Nothing else about it.

The net effect will be, there will be a,
an induced correlation between time and

the treatment because at a particular era,
more of A was given than B and another era

more of B was given than a or whatever.

And so basically, unless you do
something about it, time is confounded

with, um, the treatment's given.

And so a natural thing is
to try and de confound it.

So the classical frequentist way is to
say, every time you change the allocation

ratio, you declare a stratum and you then
essentially fit the stratum as an, as a

fixed effect factor, and that will force
the construction of all estimates on

the basis of within stratum differences.

The only way you can eliminate the
stratum effect is by constructing a

difference, first of all, and then
everything basically boils down

to weighted combinations of these
within, within strata differences.

So that, that's, that's what happens.

But of course in doing that, you're using
up a large number of degrees of freedom.

Um, and you may ask yourself, can
time really be this complicated?

Scott Berry: Hmm.

Stephen Senn: so the national thing
to do is to say, well, maybe I

could use a rather simpler function,
one with rather fewer parameters.

uh, if you have fewer parameters,
the penalty you will pay for loss of

orthogonality will be rather less.

There's always a penalty for
any loss of orthogonality.

In any regression model, the penalty
will be less and a natural way to do

this, a natural way to be flexible
is Bayesian, I don't actually see the

Bayesian part of it as being essential.

Essentially what you're doing is
you're replacing a multi-parameter

adjustment with something which
uses rather fewer parameters.

That's one.

One way of putting it.

So that's the first thing.

The second thing is, well, it
has to be a little bit careful.

There's an, there is a correlation
between time and allocation, but

time covers a whole host of things
and this is not always appreciated.

the danger is you concentrate on time
as if it was something continuous.

Some aspects of it are, is reasonable
to believe in some smoothness of time,

but some things are not so continuous.

And I gave some examples in the block.

So that's

Scott Berry: Hmm.

Mm-hmm.

Stephen Senn: of.

What's, what's in the blog?

Scott Berry: Oh, okay.

So let, let's set up the
blog and, and, and do that.

But before I, I wonder if before we do
that, just the, the, the Bayesian in

frequentist thing, I given the quote
that, that, um, uh, about regression

to the mean, you are somewhat agnostic
to this, and this is not anti basian.

This is not anti frequentist.

This, this is about
functionality of the model.

Stephen Senn: Yeah, sure.

If, if what you did was you had an
uninformative prior on each of these

stratum effects, you would effectively be
fitting a fixed effect frequentist model.

Scott Berry: Hmm.

Stephen Senn: It's only because you do
something like, um, you either have a.

A polynomial function in which
you do something like you

penalize higher order terms.

You, you, you make them less likely
to be, to be as large as lower order

terms or you, uh, you imagine that you
have somehow some pseudo data that you

can add in some particular way to help
with the, with a particular adjustment.

It's only if you start doing that,
um, which is a perfectly natural

thing to do for anybody modeling.

That, that, that, that then
arises what I think is dangerous.

If people say, oh, because I'm
using bays, the problem is solved.

There are still very, very hard
decisions to be made in principle

about how were you going to use bays.

That's the big issue,

Scott Berry: Hmm.

Stephen Senn: whether.

Scott Berry: Yeah.

Okay, so let, let's set up the problem
and, and, and maybe take one step back.

Just if, if you read Steven's blog,
beware the Morelocks so you can

get there on his LinkedIn page.

But, but to set up the problem is, uh,
and, and you discuss the, the Seville

paper, which I'm a co-author on the night,
uh, talks about the Bayesian time machine.

And so the A scenario I think that we, we
can use is that you have a platform trial.

And the platform trial starts with
a common control and you have one

experimental arm and you make reference
to one of the figures on there.

You have one experimental arm that starts,
so it's arm one against control for

the first, there's 10 periods of time.

Within this graph, uh, by the way, this
is the challenge of a, uh, of a podcast

is try to describe to people without
actually showing them a graph, uh,

where arm one and control are enrolling
equally one-to-one, and then a new arm

is added in time three, arm two is added.

Arm one is still being used, and
our and control is being used.

And now it's one to
one to one for control.

Arm one and arm two.

That continues in period
four, in period five.

Arm one goes away.

It's done enrolling patients.

Arm two is there.

Now we add arm three, we add
arm four, arm five eventually

adds through the 10 period.

So you can imagine staggered
arms in the trial enroll.

A common control throughout
being randomized during the

trial, uh, within this setting.

And then the question comes down
about how to make inferences about

one of the arms relative to control.

One of the, the, the simplest thing
would probably be to just compare

that arm to only the concurrent
controls, and let's avoid for a second.

You talk about having multiple placebos.

Maybe avoid that for a second,
but I think that's something you

importantly want to talk about.

So when we're making inferences about a
particular arm, do we compare that arm to

the common controls that were randomized
and eligible for that control at the time?

Do we use other controls
that have been enrolled, say

before that ARM was enrolled?

So you have a control patient that was
randomized in the same trial at the

time you're, if we're talking about
ARM two, it was randomized when ARM

one was there and it was a control
patient, but before ARM two was there.

So that's considered a
non concurrent control.

Stephen Senn: Right.

Scott Berry: And so are we
going to use that in some way

to make inference about arm two?

Is that a reasonable setup of the problem?

Stephen Senn: Yeah, I think
that's, uh, that's reasonable.

Um, and the, there is a relationship
to incomplete block designs where

typically in an incomplete block design,
not every block gets every treatment.

I mean, that's basically
why it's incomplete.

Scott Berry: hmm.

Stephen Senn: And the sort of
tradition there in, um, agricultural

statistics, where they were often used
was that they had to be connected.

You had to somehow be able to
construct the, um, the treatment

effect as on the basis of a number
of within block differences.

However, um, later I think it
was Yates realized there was

some further information that
was recoverable in another way if

you'd randomized between blocks.

Scott Berry: Mm-hmm.

Stephen Senn: But bas basically they had
to be connected in order to, in order

to actually make the comparison using
all the information that there was.

Scott Berry: Oh,

Stephen Senn: you were just limited to
those particular blocks in which the

two, the pair of treatments you were
interested in happened to be represented.

Scott Berry: right.

Uh, and, and in a lot of these platform
trials, you get this multiple overlapping

or bridging where when ARM two was there.

Uh, it was there with arm one and control
and so earlier, arm one and control

provide some potential information.

So

you, we, we could do only comparing
to the concurrent controls,

we could do, as you described,
where we put in a, uh, covariate

for each piece of time.

Uh, in a, in a frequentist fixed effect.

And we add in, we add in nine
degrees of freedom or something

like that for, for, for time.

Um, within that, what the Bayesian time
machine or the paper, the Seville paper

talks about, which will come back to the
guidance, but the FDA guidance references

this paper and, and a couple trials.

It's used in GBM Agile, for example.

Um.

That it takes those units of
time because they are ordered

chronologically.

It does a smoothing estimate of the
effect of time over these blocks.

Um, coming back a little bit
to the regression effect, but

that this smoothing over time.

Yes.

It uses largely a, a, a smoothing spline
over time, and that's the reference

to the time machine and it's done in
a Bayesian way, but as you say, this

could be done in a frequentist way.

Stephen Senn: Yeah.

Yeah.

Uh, the

Scott Berry: So y

Stephen Senn: There would be a slight,
a problem in in frequentist one, which

some Bayesian methods could avoid, is
that the polynomial you could fit change

as the number of periods were added.

Obviously very early on you
can't, you can't fit a polynomial

with four or five parameters.

Four or

Scott Berry: hmm.

Stephen Senn: is still less than
the nine or whatever that you would

use for a full frequent as fit.

But earlier on you couldn't do that.

So potentially you'd actually
be smoothing because as you went

later on, you could actually revise
some of your smoothing things.

You'd actually be smoothing in a slightly
different way as you went further on.

Scott Berry: Hmm.

Stephen Senn: now, it's inherent
to, to Bayesian approaches that you

learn as you go along anyway, so,
you know, that's not necessarily

seen as being a big deal there.

Scott Berry: Okay, so this, uh, in, in
some of these platform trials, when you

use this model and we're making inferences
about arm two, it does use controls

from earlier in their comparison to arm
one, which is also there with arm two.

So it, it, it does use these
essentially adjusting for time.

And we have a quote in that paper,
which you put in your blog where we

say, uh, and there's two parts to this.

I wanna talk about that largely.

Uh, we're, we're in a world right
now where you hear a ton about

real world data, historical data.

You could go out and get historical
controls for a glioblastoma

trial and use those to help
make inferences in your trial.

You're running now.

That data's from a different
protocol, different data.

It's got a lot of things that
a non concurrent control in a

platform trial doesn't suffer from.

Uh, we do.

So it's the same protocol,
the same data elements.

All of these controls were randomized.

Uh, now it was at a
different time, so we say.

Time is the thing that's different.

You brought up that there's more than
time and I, I want you to maybe touch

on this again, that it's not just time.

Stephen Senn: No.

So let me, let me give you an example.

If you were to have a look at, um, of
NEVIRAPINE in, uh, HIV infection, a

lot of them were placebo controlled.

But what did that mean?

It actually meant that because a ZT
was already approved as a treatment

for HIV infection, it meant that
the patients were all getting a ZT.

But some of them, in addition,
got placebo to Nevirapine and some

of them got ine in in addition.

And some people would describe that
as being a placebo controlled trial.

I would, others would say, no, no, no.

It's a trial of the combination
therapy of NEVIRAPINE and a ZT versus

a ZT alone doesn't matter all the
time in clinical trials all the

time there is background therapy.

All the time, standard
of care is evolving.

So when you said just now that
the protocol is the same, yes.

But the world doesn't stand still
actually in any particular serious

disease, what you will find is all the
time that your trial is running, then

in that case, the world is evolving.

And to get return to the HIV
trials, we know that people who were

recruited later into the same trial.

And given the same treatment,
had better survival than those

who were recruited earlier.

Why was that?

It was because you were learning more
about the treatment of AIDS as time

went on and all the patients in those
trials continue to benefit from the

improvement in care that was going on.

yes, I agree with you.

It's time in a sense, but I don't
agree that the fact that the same

protocols are used deals with all
the problems, not by a long way.

Scott Berry: Okay, so part
of it is all the problems.

Uh, but the other part is some of
the problems, uh, as, as a comparison

of external data to some extent.

It's a much higher level
of that, but doesn't, yeah.

Okay.

Okay.

Stephen Senn: but it doesn't,
it doesn't deal with all that.

And then I

Scott Berry: Right.

Stephen Senn: one in which,
an earlier paper of yours, um.

rightly discussed, but this later Savile
paper didn't, um, that you discussed

the fact that it will be impossible
to blind all of the treatments to each

other, uh, in these particular trials,

Scott Berry: so

let I, let's not move on to that yet.

But the,

okay, so,

Stephen Senn: get onto that later then.

Yeah.

Scott Berry: okay,

so in, in the setting here, let's
think of this as a common control.

Uh, and not that there's
different modes of administration

of a, of a, of a control.

Stephen Senn: you one, one other question.

Scott Berry: Um,

Stephen Senn: It's not the case, I
don't think, but correct me if I'm

wrong, that you will necessarily
stick with all the centers through

all the life of the platform trial

Scott Berry: yeah, one, but.

Stephen Senn: not the case in a
standard parallel group trial.

It's, I've been sat on many data
monitoring boards and you find that

recruitment is poor and the sponsor says,
oh, well, we'll enroll some new centers.

the enrolling of new centers, the dropping
of new centers occurs all the time.

So your differences in time.

Are also differences between centers

Scott Berry: Hmm.

Yep.

Stephen Senn: a parallel group
trial as you would treat a

cluster randomized trial.

So therefore, this center effect
is something you have to deal with.

It's dealt

Scott Berry: Yeah.

Stephen Senn: in a standard,
fixed, uh, allocation trial.

It's not dealt with in a platform trial.

Scott Berry: Yeah.

One of the nice things about many
standing trials, the IY two, uh,

neoadjuvant breast cancer, that
trial ran for 10 years essentially.

It started with 20 sites and these
sites, sites lasted for 10 years

with very little changes to them.

So

one of the benefits, right, one of the
benefits of the platform becomes almost

this learning healthcare thing to that.

Now you, you're right that there
is some variation to that and

a number of trials we do try to
adjust for, for, for center site.

One of the things about this
though, as you mentioned.

Time and, and maybe this is
semantics about whether time

includes the evolution of background
care and other aspects to it.

But we also empirically have
these observations on the control

arm over these 10 periods.

So we empirically see that, uh,
outcomes are getting better over time.

Now there's absolutely the assumption
of additivity of that across arms.

So within the model that an additive
effect, whether this is a hazard

ratio, whether this is a a, a responder
analysis empirically, we can see

whether or not this is happening
during the course of the trial.

Yeah.

Stephen Senn: I, I, uh, I
have no problem with that.

And in any case, I

Scott Berry: Yeah.

Yeah.

Stephen Senn: really regard
the activity assumption as

being particularly important.

I'm less concerned about that.

I regard that as being a
treatment by time interaction.

what I'm really interested in is the
main effective time, and I think the

Scott Berry: Mm-hmm.

Stephen Senn: of time is something
that can be underestimated

if we're not careful.

Scott Berry: Mm-hmm.

Stephen Senn: Um, but essentially
you're, you're replacing a, you're

replacing a model with many parameters,
with one with fewer parameters.

That's not necessarily an unreasonable
thing to do, but I'm not sure that

everybody who's involved in A-D-S-M-B
understands exactly what's going on.

And in particular, um, the question is how
should data that they use for monitoring,

monitoring be presented to them?

Scott Berry: Yeah.

So, uh, a huge, a huge issue then is
during the course of the trial, the DSMB

is reviewing that there, there's this
model that's making time adjustments

and do they just accept that, Hey, the
model's got this, don't worry about it.

Or they, are they able to view this?

Are they able to see it?

Um, uh, uh, hugely important.

Let me come back to what you
said about a other type of trial,

and I'll make reference to the
Heal a LS trial, which has the

components that you just brought up.

In many trials, what happens is there
isn't a common control oncology,

for example, there's a standard of
care and patients generally aren't

even blinded in oncology trials.

Because of the intensity
of the treatments.

But in an A, in the a LS platform trial, a
patient is randomized to say drug A, B, C,

and then they're further randomized three
to one, to its placebo or its active.

So at any time, if there's a, B,
c, enrolling in the trial, patients

are being randomized to a's placebo,
B's, placebos, or C'S placebos.

They are not blinded to A, B, or C.

So if one of them is three a day
or a pill and one of them is a.

Uh, uh, subcutaneous shot.

They're not given the
blinding subcutaneous.

They're only given the
mode of administration of

the drug that they're on.

So at any one time, we have modes
of administration of placebo.

So not only do we have
controls in that had a LS.

That were enrolled slightly before
the arm came on, but we also have

placebos at the same time that are given
different modes of administration of

a placebo across the different arms.

And I think you wanted
to, to talk about the.

Stephen Senn: Yes.

I mean, I think that I've referred
to such trials a long time ago.

I'm trying to remember when it was, um, as
veiled, uh, if you, if you don't know what

treatment you are getting, but you know,
some of the treatments you're not getting,

in that case, um, it's not fully blind.

It's sort

Scott Berry: Mm-hmm.

Stephen Senn: in that particular way.

Oh, I use the term veiled as being,

Scott Berry: Mm-hmm.

Stephen Senn: obstructed.

Um.

I just said to have a look.

Uh,

Scott Berry: term.

Yep.

Stephen Senn: yeah, I
think it might have been,

yeah, 2004 I think it

Scott Berry: Ah, okay.

Stephen Senn: and I was thinking
of particular trial that we had

run at Cbga, where we had two
patches of hormone replacement

therapy, a high dose and a low dose.

And that meant the patches
were of different sizes.

Scott Berry: Hmm.

Stephen Senn: the only ca way you could
have blinded the patients would've

been by giving them two patches,
a large patch and a small patch.

one of them would've been, let's say,
active and one of them would've been a

placebo, but they wouldn't know which.

And then you could maybe have
had a placebo group with two

patches, which were both placebos.

Um, then basically, uh, a patient who's
being given the highest dose knows that

they're not being given the lowest dose.

So if expectation leads them to
report side effects, 'cause they

say, wow, I'm getting a high dose of
hormones, this could be a problem.

I don't feel so well, and they report
it, you only control for that by

comparing them to their own placebo.

If you compare them to the whole, the
pool placebo group, then in that case

you don't actually control for this.

Scott Berry: Hmm.

Hmm.

Stephen Senn: study I was involved
in was a target study where, um.

OC was, uh, treatment for rheumatism.

Osteoarthritis was compared
to, uh, Naproxen twice daily

or ibuprofen three times daily.

again, it was awkward to blind things.

So basically you had a substudy, which
was Lum, coxib versus Naproxen, and

another substudy, which was Ibuprofen
versus um, uh, ibuprofen versus li

And on the day safety monitoring
Board, we had to take great care.

To make sure we only looked at placebo,
sorry, the control patients from the same

sub study because the results in the two
sub studies were just not comparable.

So

Scott Berry: so Oh, interesting.

Stephen Senn: you, so you would've got,
uh, a bias, you would've been biased

in actually doing the monitoring if
you hadn't split them in sub studies.

We had to do, uh, essentially
treat them as two separate trials.

Scott Berry: So do you, in that
case, the outcome sounds like it was

osteoarthritis, pain, for example.

Stephen Senn: Well, yes, although,
to be honest, the, the, the trial

was also looking at, um, because,
um, the second generation of, uh,

COX inhibitors instead of Cox two
inhibitors, were supposed to be better

in terms of, uh, gastric side effects.

Scott Berry: Hmm.

Stephen Senn: one of the things
one was looking at was gastric

side effects, but also there was a
question mark over cardiotoxicity.

So one of the other things one was looking
at was, uh, cardiovascular side effects.

Scott Berry: Hmm.

Stephen Senn: Um, but the problem was that
the two substudies were not comparable.

Scott Berry: Hmm.

Stephen Senn: same protocol.

The only thing that was different
in the protocol was essentially

the treatments that were given.

Scott Berry: Okay.

Uh, so do, do you think that
that's context specific?

So for example, in the a LS
trial, we have at the same time

patients that are given different.

Uh, placebo by different randomization
endpoints are functional rating scales.

They are mortality, uh, combined together.

Uh, we, by the way, we have
randomized comparisons of those

different controls in the trial.

So if you're making inference about
drug A and it has its placebos, I would

shudder to ignore placebo B and placebo c.

First of all, we have randomized
comparison and actively are

they responding differently.

But in some diseases, and we talked
about early on, the regression of mean

and what is a placebo effect, and much
of the placebo effect is protocol driven

and not thought that, I'm thought that
I'm taking two pills a day instead of

three pills a day are gonna affect the
time of mortality in an a LS trial.

Stephen Senn: Yes, that's,
that's, that's true.

And, uh, I don't necessarily argue
against using a, um, let's say

a model with fewer parameters.

I'm

Scott Berry: Hmm

Stephen Senn: that that's the case.

I'm just saying that these particular
issues are not necessarily discussed.

For example.

Scott Berry: mm-hmm.

Stephen Senn: To return
to the time machine.

Uh, in your particular SAVI Al
paper, you're looking at comparing a

model, which essentially has got nine
parameters for time with your spline

model, which has got rather fewer.

Scott Berry: Right,

right.

Stephen Senn: uh, if everything's
okay, then you're gonna do better

with the model, which has got
fewer parameters, no question.

Um, I mean, if, if you could ignore
time altogether, you'd be even better in

Scott Berry: Yes.

Stephen Senn: model, you know, but

Scott Berry: Right, right.

Stephen Senn: nobody's gonna go that far.

But actually.

I could argue there are not 10 groups.

There are 24 groups.

you look at the combination of time,
period, and control taking into amount,

uh, into account the blinding thing,
you don't end up with 10 groups.

You end up with 24.

And in that case, the degree of adjustment
is going to penalize you a lot more.

Scott Berry: Hmm.

Stephen Senn: So, so it's not even
true that the, uh, I forget what

you call it, the, the, the time

Scott Berry: Time categorical where?

Yeah.

Right.

Fixed effect.

Stephen Senn: time, time.

Categorical fixed effect.

It's not even clear that the
time categorical effect gets rid

of all the, of all the biases.

Actually,

Scott Berry: Hmm.

Stephen Senn: if you really
believe in concurrent control, you

have to have 24 groups, not 10.

Scott Berry: Right.

Stephen Senn: you can
still make the connection.

There's still a sort of connection you
can probably make, but nevertheless,

it's going to be a lot more difficult.

Scott Berry: Right, right.

Uh, and, and get that, the fully
parameterized scenario in that, and

is there a different effect of the
placebos and does that vary over time?

Starts to get hard to
imagine in all of that.

But that has ramifications
in trial design.

So when we're designing one of
these trials and we do the amount

of randomization to a control in
that, uh, you know, forcing the

fully parameterized model means
we have less investigational arms.

We have to enroll more patients
to a placebo who have a LS for 12

months, for example, has massive
implications to the design.

So hence the statistician
plays this interesting role

where it's the concern about.

24 parameters, as you say.

How much modeling do we do?

How much are we willing to do, and the
ramifications it has on the disease

and the global state of treatment.

Stephen Senn: Yeah.

Yeah, so I, I, as I say, I'm not,
I'm not arguing in always, you

Scott Berry: Yep.

Stephen Senn: statistics is
a bias variance trade off.

It's one of the

Scott Berry: Yep.

Stephen Senn: things,
first things that you learn

Scott Berry: Yeah.

Stephen Senn: and you can't always come
down and insist, well, I want the, uh,

the unbiased solution because it really
depends on how complex a model you

Scott Berry: Mm

Stephen Senn: what that would mean.

Scott Berry: hmm.

Stephen Senn: a certain degree of,
um, of bias is accepted by everybody.

I'm not, uh,

Scott Berry: Hmm,

Stephen Senn: not arguing against that.

But I think there are, nevertheless,
there are some things which are

happening with adaptive designs.

of all, I think that the claim
for efficiency that was made

has been somewhat misleading.

I think there is a big
benefit in efficiency.

I think it's mainly organizational.

I think it's not

Scott Berry: hmm.

Stephen Senn: much being able to use,
um, the same controls over and over

again because as you've already argued,
there's actually less information

in that than one might think.

Of the possibility of
adjusting for time effects.

As soon as you start doing that, then
you find the standard errors will go up.

Um,

Scott Berry: but, but it's sort of
compared to what, compared to only

looking at the concurrent controls,
there can be huge advantages, um,

of, of, of building that model.

Stephen Senn: not, they're not
as great as sometimes claimed.

Scott Berry: Okay.

Okay.

Alright.

Stephen Senn: see, you can see that
from some of the, uh, the Bayesian

work on using historical, uh,
controls, which I like very much.

I'm thinking of the sort
of work that Heinz Schley

Scott Berry: Mm-hmm.

Stephen Senn: der and people
like that based on Novartis have

used for using historical data.

And we, we've done a similar thing

Scott Berry: Yeah.

Stephen Senn: frequency mode,
and what you find is you, you can

identify, in one of our cases we
identified 1,200 historical patients.

when you looked at the
between study variation.

It was equivalent to
having optimistically 50.

So 1,200, you thought, wow, I'm rich.

But actually, when you had a look
at, uh, between study variation

because you're using historical
data, then in that case the, the

information was not nearly as great.

Now,

Scott Berry: Hmm.

Stephen Senn: saying something as drastic
as that happens with platform trials,

some of the, some of the discourse.

Has gone in the simplistic way of saying,
oh, and we can use all this control data.

Well, it's not quite as simple as that,

Scott Berry: Hmm.

Stephen Senn: but I

Scott Berry: this,

Stephen Senn: the organizational
side is, is, uh, a great saving.

Scott Berry: Yeah.

Yeah.

Right,

Stephen Senn: during the COVID epidemic in

Scott Berry: right.

Stephen Senn: oneself to, drop an
add arms and so forth quickly with a

minimal amount of, uh, administrative
fuss was, was very important.

Scott Berry: So the, you touched on a
larger topic sometime, uh, uh, scientific

hype and actual reality a little bit.

You, you talked about adaptive
designs, but largely platform trials.

There's been, the new FDA guidance is out,
draft guidance on Bayesian statistics.

ICH E20 draft was out, which talks
about adaptive designs largely.

Um, uh, it, do you have,
uh, uh, what, what is your.

Thoughts on all of that and the movement
towards some Bayesian to adaptive designs.

Stephen Senn: Well, I mean, decision
analysis, um, teaches you, and I'm, I'm

not denying it, that the, the option
to change things is always valuable.

So from

that point of view, you can't, you
can't argue against flexibility.

Um, the option is not always as
great as, uh, as some people think.

I'm slightly annoyed about all of this
because there are other simpler things

that the FDA could have been doing a
long, long time ago, which would've

made a but much bigger difference.

One of them would be banning dichotomies
an extraordinary, extraordinary number.

Of clinical trials still use information
destroying dichotomies, and we know

that as soon as you do that, on the best
of cases, your sample size increases

by about the necessary sample size
increases by 50%, but it can easily

double treble if you get the cut
points wrong, if you get bad ones.

So that's one particular point.

The second thing was using covariates
which one could have been using

in a linear model since years ago.

Now the FDA has gone up
on some, uh, ridiculous.

Uh, covariate hunt in terms
of estimands and so forth.

All of this is really of minor importance.

The important was to use covariates
to model, and we could have been doing

that a long time ago and we weren't.

Scott Berry: Hmm.

Stephen Senn: I've even turned up a
particular, um, meetings in which the

head of a particular section, statistic
section of the FDA said proudly.

We don't do modeling and I think, gosh.

How can a statistician say, say
such a thing and be proud of it?

It's unbelievable.

So

Scott Berry: I'm fully on board with
you on both of those, those points.

The, the, you know, the dichotomy is just,
is, it's mind blowing that we do that.

Yeah.

Stephen Senn: Absolutely insane.

You, you, you replace a, you replace
the whole Kaplan Meier curve that

you could have all that information.

You replace it by just two
points on the curve, you know,

Scott Berry: Yeah.

Stephen Senn: Uh, response rate
or the death rate or whatever

the rate is at two years rather

Scott Berry: Right,

Stephen Senn: having the whole, the whole
thing that there is there, you know?

Scott Berry: right, right, right.

Uh, yep.

Stephen Senn: as Kane said, in the
long run, we're all dead, so, you know.

Scott Berry: Yeah.

Okay.

So, uh, you, you had touched on something
else that I, that, that I think is

important and I wanna make sure it, it,
it, you, you were able to talk about

that is A-D-S-M-B in a more complicated
trial within a platform, trial time

adjustments going on, and the role
of, uh, or the challenges of that.

Stephen Senn: Yeah.

Yeah.

So, um, I mean, I think that's,
that's challenging because, uh, what

you're typically looking at is you're
looking at all sorts of side effects.

Potential side effects.

I mean, let's call them,
uh, adverse events,

Scott Berry: Mm-hmm.

Stephen Senn: without really knowing
whether they're causal or not.

Um, and, uh, it's very unlikely that you
will have the machinery for doing the

time adjustments for all these things.

So you're actually having to make
some sort of a judgment just by

looking at raw data, where ideally
you would want controlled, controlled

data you'd like to be comparing,
like with like, in order to do that.

Scott Berry: Hmm.

Stephen Senn: So, so this is
one of the problems certainly.

Scott Berry: Yeah.

Yeah.

So, and, and, and I do think it's a, in
these more complicated trials where you

have human oversight diving into what the
model does know and doesn't know the model

knows certain things and maybe there's
a trust there, but there's lots of other

things that it's critical to be able
to do this well with the DSMB for sure.

Stephen Senn: Yeah.

Scott Berry: Um.

You, you, you said something as we
were, we were coming on, and I, I

wonder, I think it'd be valuable
for everybody to talk about,

but you describe this evolution.

I, and, and largely my question
is to you, what, looking forward,

what things do you think are
important in clinical trial science?

Um, you've seen.

A good, uh, a good deal.

And we talked about, you've been a
statistician, you've been a professor,

you've been at a pharmaceutical company.

Lots of roles in this.

So thinking about things going forward
where we are, one of them you said,

which I thought was really interesting,
was the difference between eras in which

we had private data and public analysis
to public data and private analysis.

I, I'd love for you to, to, to
tell our listeners about that.

Stephen Senn: Well, I suppose, um,
even though I'm a frequentist, I mean.

I usually do frequentist analysis and
even though I sort of believe in the

value of pre-specified analysis, I sort
of wonder, well, know, especially when

we looks at things like multiplicity,
is it reasonable that just because a

group of us got together on a particular
day and we decided on this particularly

complicated scheme for adjusting
endpoints, that the whole of scientific

posterity is now condemned to use
this particular scheme that we chose.

and, uh, of course the Bayesian answer
would be, well, they're not, because

people are not required to think alike.

They start out with different,
uh, prior distributions.

They have different values.

Um, and the, the sort of depressing
result, end result of that is

that we end up sharing data.

Uh, there's no, uh, no analysis',
still some value in, um.

There's still some value in trust.

I often say that you should think
of the purposes of a protocol

in terms of the five vowels.

A for anticipation, it's
your thought experiment.

E for ethics.

It's the way in which you
explore the ethical problems

that could arise with the trial.

I, for inference, which is what you and I
are interested in, O for organization and

you for utmost good faith is utmost good.

Faith was important.

If we're gonna share data.

we have to know how did the data arise?

What was done before we saw them, which

Scott Berry: Mm-hmm.

Yeah.

Yeah.

Stephen Senn: So I, I think that we're
moving towards a, an era in which the

data will be available on the web.

Um, we're gonna have all sorts
of problems with anonymizing.

But then we will have, uh, to uh, sort
of adapt to quotation of, uh, chairman

Mao's, let a thousand analyses flourish.

You know, so we we're gonna see lots
and lots of different analyses, uh,

and the, uh, the problem of, uh,
multiplicity will enter a new world.

Scott Berry: Hmm.

Stephen Senn: and maybe it's
not so quite much hidden data.

It's, uh, analysis missing, not at random.

We should worry about, you know, on only
the interesting ones will be reported.

The, the dozens and dozens of
boring ones will not make it.

Scott Berry: Hmm.

Stephen Senn: I, I think
that's a, that's an issue

Scott Berry: Yeah.

Yeah, I, I

agree.

Stephen Senn: that.

I think is also that, that the idea
of the evidence from a study, which

was always rather suspect from a
Bayesian point of view, because it

would depend on your prior distribution
as to how evidential the study was,

um, is also coming under scrutiny.

Scott Berry: Hmm.

Stephen Senn: you can see that
with your adaptive design.

If you have a look, you'll see
that the information continues to

accrue for treatment, number one,
even though it's been abandoned

Scott Berry: Hmm.

Stephen Senn: in,

Scott Berry: Yeah.

Stephen Senn: design you're looking at.

Because what's happening is
we continue to have control.

So although we learn nothing more directly
about treat number one, indirectly, we

learn something about it because of the

Scott Berry: Yeah.

And then.

Stephen Senn: no fixed.

Evidence from a particular study.

It's all relative.

Scott Berry: Yeah.

And, and that actually was absolutely
in the I SPY two trial over 10 years.

We had 27 arms and the inferences
about arm one that was in

there continued to change.

Now it was very, very small, but it
did continue to change as the, as the

data accrued during the course of that.

Yeah.

Right.

Right, right, right.

Um, I it, do you find value?

I, um, a question outta nowhere.

Uh, uh, do you find value
since you, you rail against

the dichotomizing of endpoints?

Do you have a similar frustration
with the dichotomizing of a

trial as success or failure?

And does a Bayesian play a role in
quantifying evidence, perhaps above

and beyond frequentist in those trials?

Stephen Senn: Well, I think that, um.

There's a sense in which, at the
point at which you have to make

a decision, things are binary.

So you do have to make a decision
for a given patient as to whether

to use one treatment or another.

Um, in theory, what you could say
is, well, we're going to delegate the

decision making away from the FDA.

Um, what the FDA will do instead
is the FDA will say that, uh,

these data have a seal of approval.

They are data that you can
use to make your own decision.

Now it's over to you, the doctor and
the patient to make the decision.

And a long time ago, Jurgen Hilden,
uh, sort of, uh, very good Danish

statistician who was interested in
utility theory, he proposed this.

He actually said that what you
should do is you should produce.

Um, an analysis of all the various
outcomes, all the things that might

matter to a patient, and then every
patient could look at them together and

they could do the trade off themselves.

They could decide whether they
would, so eventually there would

have to be a binary decision.

The patient's gonna have to decide to take
Tri Pill A or pill B or something else.

Scott Berry: Hmm.

Stephen Senn: But, it doesn't
mean that you have to think of

a trial in that particular way.

Provide information.

I can see all sorts of difficulties
in making this, uh, a way in which

society will behave, but I wouldn't
necessarily argue against it.

I think, you know,

Scott Berry: Hmm.

Stephen Senn: I think it's also,
by the way, I think relates to

this is a misunderstanding about,
um, clinic irrelevant differences.

Delta Delta, um,

Scott Berry: Okay.

Yeah.

Stephen Senn: delta for me is not
what you expect the drug to do.

It's essentially some way in which you
scale the information, because what

you want in a trial is you want the
trial to provide a valuable amount of

information, and that means that the,
um, data precision, essentially the

standard error divided by some function
of n uh, so standard deviation divided

by some function, square root function
of n or whatever, that that should be

some multiple of what you consider an
important quantum of information to be.

Scott Berry: Hmm,

Stephen Senn: It's that particular
ratio that you're targeting and

the clinical relevant differences,
a sort of way of, scaling that.

Scott Berry: Hmm.

Stephen Senn: so yeah, I think trials
shouldn't, they're not failures

or successes that trials provide
a certain amount of information.

And then, you know,

Scott Berry: Awesome.

Awesome.

So, uh, uh, any, any
other closing comments?

Steven,

Stephen Senn: uh, no.

Apart from getting my best regards to Don.

Scott Berry: I will.

Stephen Senn: uh, no, I
don't, I don't think so.

I mean, um,

apart from, I would say
that, um, people should

think about concurrent control.

it's not the be all and the end all,
but it does all sorts of things for you.

Scott Berry: Hmm.

Stephen Senn: also, um, blinding, if you
can run a double-blind Randomized trial

that also cures all sorts of things.

If you're not careful, you're
liable to overlook things, which

would be impossible if the trial
is randomized and double-blind Hmm.

for instance, in a trial of a vaccine,
you might say, well, you know, the

people we're going to vaccinate, they
can come to the center to be vaccinated.

We can't run the trial
double-blind so there's no point

calling the control people in.

We'll get nurses to go and
take blood samples from them to

see if they're seropositive or negative.

And already what you find is
that now the blood samples

are being handled differently

Hmm.

and maybe they're being sent off at
a different time in a different lab.

If it's the same lab, then we know
that assays vary over time, and

so without really realizing it,
the measurement process itself is

introduced to bias simply because
not randomized, double-blind If it's

randomized, double-blind it's impossible.

There is no way that you can correlate.

taking of any sort of measurement with
the allocation of either treatment because

it's random and nobody knows what the

Scott Berry: Hmm.

Hmm.

Agree.

Agree.

Well, thank you so much.

Thank you for your blogs.

Thank you for your.

Stephen Senn: Okay.

Scott Berry: Yep.

Thank you for the pictures of your
hikes and enjoy your hike tomorrow.

Uh, I know you've been in a lot of
interims, but thanks for joining

us, uh, here and for everybody.

Uh, till next time, we'll
be here in the interim.

Stephen Senn: Okay.

Yeah.

Thanks.

Bye.

Scott Berry: Thank you.