A podcast on statistical science and clinical trials.
Explore the intricacies of Bayesian statistics and adaptive clinical trials. Uncover methods that push beyond conventional paradigms, ushering in data-driven insights that enhance trial outcomes while ensuring safety and efficacy. Join us as we dive into complex medical challenges and regulatory landscapes, offering innovative solutions tailored for pharma pioneers. Featuring expertise from industry leaders, each episode is crafted to provide clarity, foster debate, and challenge mainstream perspectives, ensuring you remain at the forefront of clinical trial excellence.
Judith: Welcome to Berry's In the
Interim podcast, where we explore the
cutting edge of innovative clinical
trial design for the pharmaceutical and
medical industries, and so much more.
Let's dive in.
Scott Berry: Welcome everybody.
Back to, in the Interim.
I'm your host, Scott Berry.
Have a, uh, really honored
to have my guest today.
Uh, a very well-known, uh, uh, me,
uh, uh, award-winning statistician
with me today, we're gonna talk about
a number of things, uh, with Dr.
Steven Sen.
He's a, he's worked as a
statistician, an academic.
In various positions in Switzerland,
Scotland, England, Luxembourg has a
really interesting history of, uh, of
work from, uh, being, which is a, a, a
really cool title, the head of competence.
At the Center for
Methodology and Statistics.
Uh, I love that title.
Uh, he's been a professor of Statistics at
the University of Glasgow, uh, professor
of Pharmaceutical and Health Statistics
at University College of London.
He spent, uh, eight years
at cba ge, uh, as well.
Um, he's a co-author, sorry.
He's the author of Crossover Trials.
In clinical research, statistical issues
in drug development, dicing with death.
He was awarded the 2009 Bradford Hill
Medal of the Royal Statistical Society
in 2017, the Fisher Memorial Lecture,
and so I honored to have you on.
Welcome to In the Interim, Steven.
Stephen Senn: thanks.
Thanks for the invitation.
The pleasure is mine.
Scott Berry: So, uh, I, my father
Don Berry has, has, uh, attributed
a quote to you and I'm wondering
if you want to, uh, attach this.
He said that you said, I don't
care whether someone is Bayesian or
Frequentist, as long as they understand
the regression to the mean effect.
Stephen Senn: Yes, I, I tend to agree
that I, I, uh, I often think that
it's, it's far more important to,
uh, to know how the data arrived.
I, I think that the, the basic thing a
statistician should always ask themselves,
how do I get to see what I see?
And, uh, the, the, this, this, the
central fact about regression to the
mean is that the data you are using as
a comparator were selected to be what
they are and the data you are using
to make the comparison or an outcome.
And they're fundamentally
different things.
The, the patients only got into
the particular trial because
they had the values you defined.
They wouldn't have got in
if the blood pressure hadn't
been above a certain level.
After that, what you did
was something different.
You didn't set a criterion, you actually
observed what happened, and that's
basically, it's the asymmetry between
those two things and understanding
why this causes a particular problem.
That sort of thing is the thing that
statisticians ought to understand.
Scott Berry: Hmm.
Hmm.
Stephen Senn: straight away why it's
nonsense to claim that because, um,
actors who win an Oscar live longer.
That, uh, the esteem of winning an
Oscar is causing them to live longer.
If you can see what the fundamental
flaw is there, then you're
beginning to be a statistician.
It doesn't
Scott Berry: Yeah.
Stephen Senn: a Bayesian or a frequencies.
Yeah.
So,
Scott Berry: Yeah.
Yeah.
Well, and, and, and it may even come up
as we talk about what is a placebo effect,
uh, which I think you laid out very much.
Uh, which I think is, is
very much misunderstood by
many in, in clinical trials.
Uh, and so I, I wanna talk.
We want to talk, uh, and, and you,
you have been quite, uh, active
on social media, by the way.
I very much enjoyed the, uh, on Twitter,
your pictures of the various hikes you
take, uh, beautiful hikes, which always
seemed to end with a picture of a beer.
Uh, but, but, but those are fantastic.
I hope you're able to still do those.
Stephen Senn: Yes.
Scott Berry: Yep, yep.
I, I've, I've.
Stephen Senn: be doing a hike tomorrow.
The weather looks good.
So I think, uh, my wife and
I will do a hike tomorrow.
Scott Berry: Ah, fantastic.
Fantastic.
Uh.
Stephen Senn: I'm usually on my
own uh, my wife has a walking club.
She's in.
Which I'm not allowed to join.
So she, she goes off on a Thursday,
I go do something else on my, on the
Thursday, and then other days we might
be doing, going off for a hike together.
So.
Scott Berry: Ah, that's wonderful.
That's wonderful.
As, as part of, uh, Steven's activity on
social media, a number of really wonderful
blogs, uh, I, I, I suggest you read them.
Then one of them came up, which is
titled Beware of the Morlocks, uh,
and, and Loving Stephen's, uh, blogs.
I go read it, it turns out, may,
maybe a criticism of something I was
involved in, so we're gonna get to that.
But first, there's lots going
on in the world, uh, here.
Uh, the FDA guidance on.
Draft guidance on Bayesian statistics.
Uh, part of your beware of the morlocks
talks about bays and, and, and not.
I think it might be nice to throw
it over to you about the bays, not
bays, your position on that, and
then maybe even leading into the FDA
guidance and what you think of that up.
Stephen Senn: Okay, so I can
start with flexible designs.
If you change the
allocation ratio and you.
Nothing else about it.
The net effect will be, there will be a,
an induced correlation between time and
the treatment because at a particular era,
more of A was given than B and another era
more of B was given than a or whatever.
And so basically, unless you do
something about it, time is confounded
with, um, the treatment's given.
And so a natural thing is
to try and de confound it.
So the classical frequentist way is to
say, every time you change the allocation
ratio, you declare a stratum and you then
essentially fit the stratum as an, as a
fixed effect factor, and that will force
the construction of all estimates on
the basis of within stratum differences.
The only way you can eliminate the
stratum effect is by constructing a
difference, first of all, and then
everything basically boils down
to weighted combinations of these
within, within strata differences.
So that, that's, that's what happens.
But of course in doing that, you're using
up a large number of degrees of freedom.
Um, and you may ask yourself, can
time really be this complicated?
Scott Berry: Hmm.
Stephen Senn: so the national thing
to do is to say, well, maybe I
could use a rather simpler function,
one with rather fewer parameters.
uh, if you have fewer parameters,
the penalty you will pay for loss of
orthogonality will be rather less.
There's always a penalty for
any loss of orthogonality.
In any regression model, the penalty
will be less and a natural way to do
this, a natural way to be flexible
is Bayesian, I don't actually see the
Bayesian part of it as being essential.
Essentially what you're doing is
you're replacing a multi-parameter
adjustment with something which
uses rather fewer parameters.
That's one.
One way of putting it.
So that's the first thing.
The second thing is, well, it
has to be a little bit careful.
There's an, there is a correlation
between time and allocation, but
time covers a whole host of things
and this is not always appreciated.
the danger is you concentrate on time
as if it was something continuous.
Some aspects of it are, is reasonable
to believe in some smoothness of time,
but some things are not so continuous.
And I gave some examples in the block.
So that's
Scott Berry: Hmm.
Mm-hmm.
Stephen Senn: of.
What's, what's in the blog?
Scott Berry: Oh, okay.
So let, let's set up the
blog and, and, and do that.
But before I, I wonder if before we do
that, just the, the, the Bayesian in
frequentist thing, I given the quote
that, that, um, uh, about regression
to the mean, you are somewhat agnostic
to this, and this is not anti basian.
This is not anti frequentist.
This, this is about
functionality of the model.
Stephen Senn: Yeah, sure.
If, if what you did was you had an
uninformative prior on each of these
stratum effects, you would effectively be
fitting a fixed effect frequentist model.
Scott Berry: Hmm.
Stephen Senn: It's only because you do
something like, um, you either have a.
A polynomial function in which
you do something like you
penalize higher order terms.
You, you, you make them less likely
to be, to be as large as lower order
terms or you, uh, you imagine that you
have somehow some pseudo data that you
can add in some particular way to help
with the, with a particular adjustment.
It's only if you start doing that,
um, which is a perfectly natural
thing to do for anybody modeling.
That, that, that, that then
arises what I think is dangerous.
If people say, oh, because I'm
using bays, the problem is solved.
There are still very, very hard
decisions to be made in principle
about how were you going to use bays.
That's the big issue,
Scott Berry: Hmm.
Stephen Senn: whether.
Scott Berry: Yeah.
Okay, so let, let's set up the problem
and, and, and maybe take one step back.
Just if, if you read Steven's blog,
beware the Morelocks so you can
get there on his LinkedIn page.
But, but to set up the problem is, uh,
and, and you discuss the, the Seville
paper, which I'm a co-author on the night,
uh, talks about the Bayesian time machine.
And so the A scenario I think that we, we
can use is that you have a platform trial.
And the platform trial starts with
a common control and you have one
experimental arm and you make reference
to one of the figures on there.
You have one experimental arm that starts,
so it's arm one against control for
the first, there's 10 periods of time.
Within this graph, uh, by the way, this
is the challenge of a, uh, of a podcast
is try to describe to people without
actually showing them a graph, uh,
where arm one and control are enrolling
equally one-to-one, and then a new arm
is added in time three, arm two is added.
Arm one is still being used, and
our and control is being used.
And now it's one to
one to one for control.
Arm one and arm two.
That continues in period
four, in period five.
Arm one goes away.
It's done enrolling patients.
Arm two is there.
Now we add arm three, we add
arm four, arm five eventually
adds through the 10 period.
So you can imagine staggered
arms in the trial enroll.
A common control throughout
being randomized during the
trial, uh, within this setting.
And then the question comes down
about how to make inferences about
one of the arms relative to control.
One of the, the, the simplest thing
would probably be to just compare
that arm to only the concurrent
controls, and let's avoid for a second.
You talk about having multiple placebos.
Maybe avoid that for a second,
but I think that's something you
importantly want to talk about.
So when we're making inferences about a
particular arm, do we compare that arm to
the common controls that were randomized
and eligible for that control at the time?
Do we use other controls
that have been enrolled, say
before that ARM was enrolled?
So you have a control patient that was
randomized in the same trial at the
time you're, if we're talking about
ARM two, it was randomized when ARM
one was there and it was a control
patient, but before ARM two was there.
So that's considered a
non concurrent control.
Stephen Senn: Right.
Scott Berry: And so are we
going to use that in some way
to make inference about arm two?
Is that a reasonable setup of the problem?
Stephen Senn: Yeah, I think
that's, uh, that's reasonable.
Um, and the, there is a relationship
to incomplete block designs where
typically in an incomplete block design,
not every block gets every treatment.
I mean, that's basically
why it's incomplete.
Scott Berry: hmm.
Stephen Senn: And the sort of
tradition there in, um, agricultural
statistics, where they were often used
was that they had to be connected.
You had to somehow be able to
construct the, um, the treatment
effect as on the basis of a number
of within block differences.
However, um, later I think it
was Yates realized there was
some further information that
was recoverable in another way if
you'd randomized between blocks.
Scott Berry: Mm-hmm.
Stephen Senn: But bas basically they had
to be connected in order to, in order
to actually make the comparison using
all the information that there was.
Scott Berry: Oh,
Stephen Senn: you were just limited to
those particular blocks in which the
two, the pair of treatments you were
interested in happened to be represented.
Scott Berry: right.
Uh, and, and in a lot of these platform
trials, you get this multiple overlapping
or bridging where when ARM two was there.
Uh, it was there with arm one and control
and so earlier, arm one and control
provide some potential information.
So
you, we, we could do only comparing
to the concurrent controls,
we could do, as you described,
where we put in a, uh, covariate
for each piece of time.
Uh, in a, in a frequentist fixed effect.
And we add in, we add in nine
degrees of freedom or something
like that for, for, for time.
Um, within that, what the Bayesian time
machine or the paper, the Seville paper
talks about, which will come back to the
guidance, but the FDA guidance references
this paper and, and a couple trials.
It's used in GBM Agile, for example.
Um.
That it takes those units of
time because they are ordered
chronologically.
It does a smoothing estimate of the
effect of time over these blocks.
Um, coming back a little bit
to the regression effect, but
that this smoothing over time.
Yes.
It uses largely a, a, a smoothing spline
over time, and that's the reference
to the time machine and it's done in
a Bayesian way, but as you say, this
could be done in a frequentist way.
Stephen Senn: Yeah.
Yeah.
Uh, the
Scott Berry: So y
Stephen Senn: There would be a slight,
a problem in in frequentist one, which
some Bayesian methods could avoid, is
that the polynomial you could fit change
as the number of periods were added.
Obviously very early on you
can't, you can't fit a polynomial
with four or five parameters.
Four or
Scott Berry: hmm.
Stephen Senn: is still less than
the nine or whatever that you would
use for a full frequent as fit.
But earlier on you couldn't do that.
So potentially you'd actually
be smoothing because as you went
later on, you could actually revise
some of your smoothing things.
You'd actually be smoothing in a slightly
different way as you went further on.
Scott Berry: Hmm.
Stephen Senn: now, it's inherent
to, to Bayesian approaches that you
learn as you go along anyway, so,
you know, that's not necessarily
seen as being a big deal there.
Scott Berry: Okay, so this, uh, in, in
some of these platform trials, when you
use this model and we're making inferences
about arm two, it does use controls
from earlier in their comparison to arm
one, which is also there with arm two.
So it, it, it does use these
essentially adjusting for time.
And we have a quote in that paper,
which you put in your blog where we
say, uh, and there's two parts to this.
I wanna talk about that largely.
Uh, we're, we're in a world right
now where you hear a ton about
real world data, historical data.
You could go out and get historical
controls for a glioblastoma
trial and use those to help
make inferences in your trial.
You're running now.
That data's from a different
protocol, different data.
It's got a lot of things that
a non concurrent control in a
platform trial doesn't suffer from.
Uh, we do.
So it's the same protocol,
the same data elements.
All of these controls were randomized.
Uh, now it was at a
different time, so we say.
Time is the thing that's different.
You brought up that there's more than
time and I, I want you to maybe touch
on this again, that it's not just time.
Stephen Senn: No.
So let me, let me give you an example.
If you were to have a look at, um, of
NEVIRAPINE in, uh, HIV infection, a
lot of them were placebo controlled.
But what did that mean?
It actually meant that because a ZT
was already approved as a treatment
for HIV infection, it meant that
the patients were all getting a ZT.
But some of them, in addition,
got placebo to Nevirapine and some
of them got ine in in addition.
And some people would describe that
as being a placebo controlled trial.
I would, others would say, no, no, no.
It's a trial of the combination
therapy of NEVIRAPINE and a ZT versus
a ZT alone doesn't matter all the
time in clinical trials all the
time there is background therapy.
All the time, standard
of care is evolving.
So when you said just now that
the protocol is the same, yes.
But the world doesn't stand still
actually in any particular serious
disease, what you will find is all the
time that your trial is running, then
in that case, the world is evolving.
And to get return to the HIV
trials, we know that people who were
recruited later into the same trial.
And given the same treatment,
had better survival than those
who were recruited earlier.
Why was that?
It was because you were learning more
about the treatment of AIDS as time
went on and all the patients in those
trials continue to benefit from the
improvement in care that was going on.
yes, I agree with you.
It's time in a sense, but I don't
agree that the fact that the same
protocols are used deals with all
the problems, not by a long way.
Scott Berry: Okay, so part
of it is all the problems.
Uh, but the other part is some of
the problems, uh, as, as a comparison
of external data to some extent.
It's a much higher level
of that, but doesn't, yeah.
Okay.
Okay.
Stephen Senn: but it doesn't,
it doesn't deal with all that.
And then I
Scott Berry: Right.
Stephen Senn: one in which,
an earlier paper of yours, um.
rightly discussed, but this later Savile
paper didn't, um, that you discussed
the fact that it will be impossible
to blind all of the treatments to each
other, uh, in these particular trials,
Scott Berry: so
let I, let's not move on to that yet.
But the,
okay, so,
Stephen Senn: get onto that later then.
Yeah.
Scott Berry: okay,
so in, in the setting here, let's
think of this as a common control.
Uh, and not that there's
different modes of administration
of a, of a, of a control.
Stephen Senn: you one, one other question.
Scott Berry: Um,
Stephen Senn: It's not the case, I
don't think, but correct me if I'm
wrong, that you will necessarily
stick with all the centers through
all the life of the platform trial
Scott Berry: yeah, one, but.
Stephen Senn: not the case in a
standard parallel group trial.
It's, I've been sat on many data
monitoring boards and you find that
recruitment is poor and the sponsor says,
oh, well, we'll enroll some new centers.
the enrolling of new centers, the dropping
of new centers occurs all the time.
So your differences in time.
Are also differences between centers
Scott Berry: Hmm.
Yep.
Stephen Senn: a parallel group
trial as you would treat a
cluster randomized trial.
So therefore, this center effect
is something you have to deal with.
It's dealt
Scott Berry: Yeah.
Stephen Senn: in a standard,
fixed, uh, allocation trial.
It's not dealt with in a platform trial.
Scott Berry: Yeah.
One of the nice things about many
standing trials, the IY two, uh,
neoadjuvant breast cancer, that
trial ran for 10 years essentially.
It started with 20 sites and these
sites, sites lasted for 10 years
with very little changes to them.
So
one of the benefits, right, one of the
benefits of the platform becomes almost
this learning healthcare thing to that.
Now you, you're right that there
is some variation to that and
a number of trials we do try to
adjust for, for, for center site.
One of the things about this
though, as you mentioned.
Time and, and maybe this is
semantics about whether time
includes the evolution of background
care and other aspects to it.
But we also empirically have
these observations on the control
arm over these 10 periods.
So we empirically see that, uh,
outcomes are getting better over time.
Now there's absolutely the assumption
of additivity of that across arms.
So within the model that an additive
effect, whether this is a hazard
ratio, whether this is a a, a responder
analysis empirically, we can see
whether or not this is happening
during the course of the trial.
Yeah.
Stephen Senn: I, I, uh, I
have no problem with that.
And in any case, I
Scott Berry: Yeah.
Yeah.
Stephen Senn: really regard
the activity assumption as
being particularly important.
I'm less concerned about that.
I regard that as being a
treatment by time interaction.
what I'm really interested in is the
main effective time, and I think the
Scott Berry: Mm-hmm.
Stephen Senn: of time is something
that can be underestimated
if we're not careful.
Scott Berry: Mm-hmm.
Stephen Senn: Um, but essentially
you're, you're replacing a, you're
replacing a model with many parameters,
with one with fewer parameters.
That's not necessarily an unreasonable
thing to do, but I'm not sure that
everybody who's involved in A-D-S-M-B
understands exactly what's going on.
And in particular, um, the question is how
should data that they use for monitoring,
monitoring be presented to them?
Scott Berry: Yeah.
So, uh, a huge, a huge issue then is
during the course of the trial, the DSMB
is reviewing that there, there's this
model that's making time adjustments
and do they just accept that, Hey, the
model's got this, don't worry about it.
Or they, are they able to view this?
Are they able to see it?
Um, uh, uh, hugely important.
Let me come back to what you
said about a other type of trial,
and I'll make reference to the
Heal a LS trial, which has the
components that you just brought up.
In many trials, what happens is there
isn't a common control oncology,
for example, there's a standard of
care and patients generally aren't
even blinded in oncology trials.
Because of the intensity
of the treatments.
But in an A, in the a LS platform trial, a
patient is randomized to say drug A, B, C,
and then they're further randomized three
to one, to its placebo or its active.
So at any time, if there's a, B,
c, enrolling in the trial, patients
are being randomized to a's placebo,
B's, placebos, or C'S placebos.
They are not blinded to A, B, or C.
So if one of them is three a day
or a pill and one of them is a.
Uh, uh, subcutaneous shot.
They're not given the
blinding subcutaneous.
They're only given the
mode of administration of
the drug that they're on.
So at any one time, we have modes
of administration of placebo.
So not only do we have
controls in that had a LS.
That were enrolled slightly before
the arm came on, but we also have
placebos at the same time that are given
different modes of administration of
a placebo across the different arms.
And I think you wanted
to, to talk about the.
Stephen Senn: Yes.
I mean, I think that I've referred
to such trials a long time ago.
I'm trying to remember when it was, um, as
veiled, uh, if you, if you don't know what
treatment you are getting, but you know,
some of the treatments you're not getting,
in that case, um, it's not fully blind.
It's sort
Scott Berry: Mm-hmm.
Stephen Senn: in that particular way.
Oh, I use the term veiled as being,
Scott Berry: Mm-hmm.
Stephen Senn: obstructed.
Um.
I just said to have a look.
Uh,
Scott Berry: term.
Yep.
Stephen Senn: yeah, I
think it might have been,
yeah, 2004 I think it
Scott Berry: Ah, okay.
Stephen Senn: and I was thinking
of particular trial that we had
run at Cbga, where we had two
patches of hormone replacement
therapy, a high dose and a low dose.
And that meant the patches
were of different sizes.
Scott Berry: Hmm.
Stephen Senn: the only ca way you could
have blinded the patients would've
been by giving them two patches,
a large patch and a small patch.
one of them would've been, let's say,
active and one of them would've been a
placebo, but they wouldn't know which.
And then you could maybe have
had a placebo group with two
patches, which were both placebos.
Um, then basically, uh, a patient who's
being given the highest dose knows that
they're not being given the lowest dose.
So if expectation leads them to
report side effects, 'cause they
say, wow, I'm getting a high dose of
hormones, this could be a problem.
I don't feel so well, and they report
it, you only control for that by
comparing them to their own placebo.
If you compare them to the whole, the
pool placebo group, then in that case
you don't actually control for this.
Scott Berry: Hmm.
Hmm.
Stephen Senn: study I was involved
in was a target study where, um.
OC was, uh, treatment for rheumatism.
Osteoarthritis was compared
to, uh, Naproxen twice daily
or ibuprofen three times daily.
again, it was awkward to blind things.
So basically you had a substudy, which
was Lum, coxib versus Naproxen, and
another substudy, which was Ibuprofen
versus um, uh, ibuprofen versus li
And on the day safety monitoring
Board, we had to take great care.
To make sure we only looked at placebo,
sorry, the control patients from the same
sub study because the results in the two
sub studies were just not comparable.
So
Scott Berry: so Oh, interesting.
Stephen Senn: you, so you would've got,
uh, a bias, you would've been biased
in actually doing the monitoring if
you hadn't split them in sub studies.
We had to do, uh, essentially
treat them as two separate trials.
Scott Berry: So do you, in that
case, the outcome sounds like it was
osteoarthritis, pain, for example.
Stephen Senn: Well, yes, although,
to be honest, the, the, the trial
was also looking at, um, because,
um, the second generation of, uh,
COX inhibitors instead of Cox two
inhibitors, were supposed to be better
in terms of, uh, gastric side effects.
Scott Berry: Hmm.
Stephen Senn: one of the things
one was looking at was gastric
side effects, but also there was a
question mark over cardiotoxicity.
So one of the other things one was looking
at was, uh, cardiovascular side effects.
Scott Berry: Hmm.
Stephen Senn: Um, but the problem was that
the two substudies were not comparable.
Scott Berry: Hmm.
Stephen Senn: same protocol.
The only thing that was different
in the protocol was essentially
the treatments that were given.
Scott Berry: Okay.
Uh, so do, do you think that
that's context specific?
So for example, in the a LS
trial, we have at the same time
patients that are given different.
Uh, placebo by different randomization
endpoints are functional rating scales.
They are mortality, uh, combined together.
Uh, we, by the way, we have
randomized comparisons of those
different controls in the trial.
So if you're making inference about
drug A and it has its placebos, I would
shudder to ignore placebo B and placebo c.
First of all, we have randomized
comparison and actively are
they responding differently.
But in some diseases, and we talked
about early on, the regression of mean
and what is a placebo effect, and much
of the placebo effect is protocol driven
and not thought that, I'm thought that
I'm taking two pills a day instead of
three pills a day are gonna affect the
time of mortality in an a LS trial.
Stephen Senn: Yes, that's,
that's, that's true.
And, uh, I don't necessarily argue
against using a, um, let's say
a model with fewer parameters.
I'm
Scott Berry: Hmm
Stephen Senn: that that's the case.
I'm just saying that these particular
issues are not necessarily discussed.
For example.
Scott Berry: mm-hmm.
Stephen Senn: To return
to the time machine.
Uh, in your particular SAVI Al
paper, you're looking at comparing a
model, which essentially has got nine
parameters for time with your spline
model, which has got rather fewer.
Scott Berry: Right,
right.
Stephen Senn: uh, if everything's
okay, then you're gonna do better
with the model, which has got
fewer parameters, no question.
Um, I mean, if, if you could ignore
time altogether, you'd be even better in
Scott Berry: Yes.
Stephen Senn: model, you know, but
Scott Berry: Right, right.
Stephen Senn: nobody's gonna go that far.
But actually.
I could argue there are not 10 groups.
There are 24 groups.
you look at the combination of time,
period, and control taking into amount,
uh, into account the blinding thing,
you don't end up with 10 groups.
You end up with 24.
And in that case, the degree of adjustment
is going to penalize you a lot more.
Scott Berry: Hmm.
Stephen Senn: So, so it's not even
true that the, uh, I forget what
you call it, the, the, the time
Scott Berry: Time categorical where?
Yeah.
Right.
Fixed effect.
Stephen Senn: time, time.
Categorical fixed effect.
It's not even clear that the
time categorical effect gets rid
of all the, of all the biases.
Actually,
Scott Berry: Hmm.
Stephen Senn: if you really
believe in concurrent control, you
have to have 24 groups, not 10.
Scott Berry: Right.
Stephen Senn: you can
still make the connection.
There's still a sort of connection you
can probably make, but nevertheless,
it's going to be a lot more difficult.
Scott Berry: Right, right.
Uh, and, and get that, the fully
parameterized scenario in that, and
is there a different effect of the
placebos and does that vary over time?
Starts to get hard to
imagine in all of that.
But that has ramifications
in trial design.
So when we're designing one of
these trials and we do the amount
of randomization to a control in
that, uh, you know, forcing the
fully parameterized model means
we have less investigational arms.
We have to enroll more patients
to a placebo who have a LS for 12
months, for example, has massive
implications to the design.
So hence the statistician
plays this interesting role
where it's the concern about.
24 parameters, as you say.
How much modeling do we do?
How much are we willing to do, and the
ramifications it has on the disease
and the global state of treatment.
Stephen Senn: Yeah.
Yeah, so I, I, as I say, I'm not,
I'm not arguing in always, you
Scott Berry: Yep.
Stephen Senn: statistics is
a bias variance trade off.
It's one of the
Scott Berry: Yep.
Stephen Senn: things,
first things that you learn
Scott Berry: Yeah.
Stephen Senn: and you can't always come
down and insist, well, I want the, uh,
the unbiased solution because it really
depends on how complex a model you
Scott Berry: Mm
Stephen Senn: what that would mean.
Scott Berry: hmm.
Stephen Senn: a certain degree of,
um, of bias is accepted by everybody.
I'm not, uh,
Scott Berry: Hmm,
Stephen Senn: not arguing against that.
But I think there are, nevertheless,
there are some things which are
happening with adaptive designs.
of all, I think that the claim
for efficiency that was made
has been somewhat misleading.
I think there is a big
benefit in efficiency.
I think it's mainly organizational.
I think it's not
Scott Berry: hmm.
Stephen Senn: much being able to use,
um, the same controls over and over
again because as you've already argued,
there's actually less information
in that than one might think.
Of the possibility of
adjusting for time effects.
As soon as you start doing that, then
you find the standard errors will go up.
Um,
Scott Berry: but, but it's sort of
compared to what, compared to only
looking at the concurrent controls,
there can be huge advantages, um,
of, of, of building that model.
Stephen Senn: not, they're not
as great as sometimes claimed.
Scott Berry: Okay.
Okay.
Alright.
Stephen Senn: see, you can see that
from some of the, uh, the Bayesian
work on using historical, uh,
controls, which I like very much.
I'm thinking of the sort
of work that Heinz Schley
Scott Berry: Mm-hmm.
Stephen Senn: der and people
like that based on Novartis have
used for using historical data.
And we, we've done a similar thing
Scott Berry: Yeah.
Stephen Senn: frequency mode,
and what you find is you, you can
identify, in one of our cases we
identified 1,200 historical patients.
when you looked at the
between study variation.
It was equivalent to
having optimistically 50.
So 1,200, you thought, wow, I'm rich.
But actually, when you had a look
at, uh, between study variation
because you're using historical
data, then in that case the, the
information was not nearly as great.
Now,
Scott Berry: Hmm.
Stephen Senn: saying something as drastic
as that happens with platform trials,
some of the, some of the discourse.
Has gone in the simplistic way of saying,
oh, and we can use all this control data.
Well, it's not quite as simple as that,
Scott Berry: Hmm.
Stephen Senn: but I
Scott Berry: this,
Stephen Senn: the organizational
side is, is, uh, a great saving.
Scott Berry: Yeah.
Yeah.
Right,
Stephen Senn: during the COVID epidemic in
Scott Berry: right.
Stephen Senn: oneself to, drop an
add arms and so forth quickly with a
minimal amount of, uh, administrative
fuss was, was very important.
Scott Berry: So the, you touched on a
larger topic sometime, uh, uh, scientific
hype and actual reality a little bit.
You, you talked about adaptive
designs, but largely platform trials.
There's been, the new FDA guidance is out,
draft guidance on Bayesian statistics.
ICH E20 draft was out, which talks
about adaptive designs largely.
Um, uh, it, do you have,
uh, uh, what, what is your.
Thoughts on all of that and the movement
towards some Bayesian to adaptive designs.
Stephen Senn: Well, I mean, decision
analysis, um, teaches you, and I'm, I'm
not denying it, that the, the option
to change things is always valuable.
So from
that point of view, you can't, you
can't argue against flexibility.
Um, the option is not always as
great as, uh, as some people think.
I'm slightly annoyed about all of this
because there are other simpler things
that the FDA could have been doing a
long, long time ago, which would've
made a but much bigger difference.
One of them would be banning dichotomies
an extraordinary, extraordinary number.
Of clinical trials still use information
destroying dichotomies, and we know
that as soon as you do that, on the best
of cases, your sample size increases
by about the necessary sample size
increases by 50%, but it can easily
double treble if you get the cut
points wrong, if you get bad ones.
So that's one particular point.
The second thing was using covariates
which one could have been using
in a linear model since years ago.
Now the FDA has gone up
on some, uh, ridiculous.
Uh, covariate hunt in terms
of estimands and so forth.
All of this is really of minor importance.
The important was to use covariates
to model, and we could have been doing
that a long time ago and we weren't.
Scott Berry: Hmm.
Stephen Senn: I've even turned up a
particular, um, meetings in which the
head of a particular section, statistic
section of the FDA said proudly.
We don't do modeling and I think, gosh.
How can a statistician say, say
such a thing and be proud of it?
It's unbelievable.
So
Scott Berry: I'm fully on board with
you on both of those, those points.
The, the, you know, the dichotomy is just,
is, it's mind blowing that we do that.
Yeah.
Stephen Senn: Absolutely insane.
You, you, you replace a, you replace
the whole Kaplan Meier curve that
you could have all that information.
You replace it by just two
points on the curve, you know,
Scott Berry: Yeah.
Stephen Senn: Uh, response rate
or the death rate or whatever
the rate is at two years rather
Scott Berry: Right,
Stephen Senn: having the whole, the whole
thing that there is there, you know?
Scott Berry: right, right, right.
Uh, yep.
Stephen Senn: as Kane said, in the
long run, we're all dead, so, you know.
Scott Berry: Yeah.
Okay.
So, uh, you, you had touched on something
else that I, that, that I think is
important and I wanna make sure it, it,
it, you, you were able to talk about
that is A-D-S-M-B in a more complicated
trial within a platform, trial time
adjustments going on, and the role
of, uh, or the challenges of that.
Stephen Senn: Yeah.
Yeah.
So, um, I mean, I think that's,
that's challenging because, uh, what
you're typically looking at is you're
looking at all sorts of side effects.
Potential side effects.
I mean, let's call them,
uh, adverse events,
Scott Berry: Mm-hmm.
Stephen Senn: without really knowing
whether they're causal or not.
Um, and, uh, it's very unlikely that you
will have the machinery for doing the
time adjustments for all these things.
So you're actually having to make
some sort of a judgment just by
looking at raw data, where ideally
you would want controlled, controlled
data you'd like to be comparing,
like with like, in order to do that.
Scott Berry: Hmm.
Stephen Senn: So, so this is
one of the problems certainly.
Scott Berry: Yeah.
Yeah.
So, and, and, and I do think it's a, in
these more complicated trials where you
have human oversight diving into what the
model does know and doesn't know the model
knows certain things and maybe there's
a trust there, but there's lots of other
things that it's critical to be able
to do this well with the DSMB for sure.
Stephen Senn: Yeah.
Scott Berry: Um.
You, you, you said something as we
were, we were coming on, and I, I
wonder, I think it'd be valuable
for everybody to talk about,
but you describe this evolution.
I, and, and largely my question
is to you, what, looking forward,
what things do you think are
important in clinical trial science?
Um, you've seen.
A good, uh, a good deal.
And we talked about, you've been a
statistician, you've been a professor,
you've been at a pharmaceutical company.
Lots of roles in this.
So thinking about things going forward
where we are, one of them you said,
which I thought was really interesting,
was the difference between eras in which
we had private data and public analysis
to public data and private analysis.
I, I'd love for you to, to, to
tell our listeners about that.
Stephen Senn: Well, I suppose, um,
even though I'm a frequentist, I mean.
I usually do frequentist analysis and
even though I sort of believe in the
value of pre-specified analysis, I sort
of wonder, well, know, especially when
we looks at things like multiplicity,
is it reasonable that just because a
group of us got together on a particular
day and we decided on this particularly
complicated scheme for adjusting
endpoints, that the whole of scientific
posterity is now condemned to use
this particular scheme that we chose.
and, uh, of course the Bayesian answer
would be, well, they're not, because
people are not required to think alike.
They start out with different,
uh, prior distributions.
They have different values.
Um, and the, the sort of depressing
result, end result of that is
that we end up sharing data.
Uh, there's no, uh, no analysis',
still some value in, um.
There's still some value in trust.
I often say that you should think
of the purposes of a protocol
in terms of the five vowels.
A for anticipation, it's
your thought experiment.
E for ethics.
It's the way in which you
explore the ethical problems
that could arise with the trial.
I, for inference, which is what you and I
are interested in, O for organization and
you for utmost good faith is utmost good.
Faith was important.
If we're gonna share data.
we have to know how did the data arise?
What was done before we saw them, which
Scott Berry: Mm-hmm.
Yeah.
Yeah.
Stephen Senn: So I, I think that we're
moving towards a, an era in which the
data will be available on the web.
Um, we're gonna have all sorts
of problems with anonymizing.
But then we will have, uh, to uh, sort
of adapt to quotation of, uh, chairman
Mao's, let a thousand analyses flourish.
You know, so we we're gonna see lots
and lots of different analyses, uh,
and the, uh, the problem of, uh,
multiplicity will enter a new world.
Scott Berry: Hmm.
Stephen Senn: and maybe it's
not so quite much hidden data.
It's, uh, analysis missing, not at random.
We should worry about, you know, on only
the interesting ones will be reported.
The, the dozens and dozens of
boring ones will not make it.
Scott Berry: Hmm.
Stephen Senn: I, I think
that's a, that's an issue
Scott Berry: Yeah.
Yeah, I, I
agree.
Stephen Senn: that.
I think is also that, that the idea
of the evidence from a study, which
was always rather suspect from a
Bayesian point of view, because it
would depend on your prior distribution
as to how evidential the study was,
um, is also coming under scrutiny.
Scott Berry: Hmm.
Stephen Senn: you can see that
with your adaptive design.
If you have a look, you'll see
that the information continues to
accrue for treatment, number one,
even though it's been abandoned
Scott Berry: Hmm.
Stephen Senn: in,
Scott Berry: Yeah.
Stephen Senn: design you're looking at.
Because what's happening is
we continue to have control.
So although we learn nothing more directly
about treat number one, indirectly, we
learn something about it because of the
Scott Berry: Yeah.
And then.
Stephen Senn: no fixed.
Evidence from a particular study.
It's all relative.
Scott Berry: Yeah.
And, and that actually was absolutely
in the I SPY two trial over 10 years.
We had 27 arms and the inferences
about arm one that was in
there continued to change.
Now it was very, very small, but it
did continue to change as the, as the
data accrued during the course of that.
Yeah.
Right.
Right, right, right.
Um, I it, do you find value?
I, um, a question outta nowhere.
Uh, uh, do you find value
since you, you rail against
the dichotomizing of endpoints?
Do you have a similar frustration
with the dichotomizing of a
trial as success or failure?
And does a Bayesian play a role in
quantifying evidence, perhaps above
and beyond frequentist in those trials?
Stephen Senn: Well, I think that, um.
There's a sense in which, at the
point at which you have to make
a decision, things are binary.
So you do have to make a decision
for a given patient as to whether
to use one treatment or another.
Um, in theory, what you could say
is, well, we're going to delegate the
decision making away from the FDA.
Um, what the FDA will do instead
is the FDA will say that, uh,
these data have a seal of approval.
They are data that you can
use to make your own decision.
Now it's over to you, the doctor and
the patient to make the decision.
And a long time ago, Jurgen Hilden,
uh, sort of, uh, very good Danish
statistician who was interested in
utility theory, he proposed this.
He actually said that what you
should do is you should produce.
Um, an analysis of all the various
outcomes, all the things that might
matter to a patient, and then every
patient could look at them together and
they could do the trade off themselves.
They could decide whether they
would, so eventually there would
have to be a binary decision.
The patient's gonna have to decide to take
Tri Pill A or pill B or something else.
Scott Berry: Hmm.
Stephen Senn: But, it doesn't
mean that you have to think of
a trial in that particular way.
Provide information.
I can see all sorts of difficulties
in making this, uh, a way in which
society will behave, but I wouldn't
necessarily argue against it.
I think, you know,
Scott Berry: Hmm.
Stephen Senn: I think it's also,
by the way, I think relates to
this is a misunderstanding about,
um, clinic irrelevant differences.
Delta Delta, um,
Scott Berry: Okay.
Yeah.
Stephen Senn: delta for me is not
what you expect the drug to do.
It's essentially some way in which you
scale the information, because what
you want in a trial is you want the
trial to provide a valuable amount of
information, and that means that the,
um, data precision, essentially the
standard error divided by some function
of n uh, so standard deviation divided
by some function, square root function
of n or whatever, that that should be
some multiple of what you consider an
important quantum of information to be.
Scott Berry: Hmm,
Stephen Senn: It's that particular
ratio that you're targeting and
the clinical relevant differences,
a sort of way of, scaling that.
Scott Berry: Hmm.
Stephen Senn: so yeah, I think trials
shouldn't, they're not failures
or successes that trials provide
a certain amount of information.
And then, you know,
Scott Berry: Awesome.
Awesome.
So, uh, uh, any, any
other closing comments?
Steven,
Stephen Senn: uh, no.
Apart from getting my best regards to Don.
Scott Berry: I will.
Stephen Senn: uh, no, I
don't, I don't think so.
I mean, um,
apart from, I would say
that, um, people should
think about concurrent control.
it's not the be all and the end all,
but it does all sorts of things for you.
Scott Berry: Hmm.
Stephen Senn: also, um, blinding, if you
can run a double-blind Randomized trial
that also cures all sorts of things.
If you're not careful, you're
liable to overlook things, which
would be impossible if the trial
is randomized and double-blind Hmm.
for instance, in a trial of a vaccine,
you might say, well, you know, the
people we're going to vaccinate, they
can come to the center to be vaccinated.
We can't run the trial
double-blind so there's no point
calling the control people in.
We'll get nurses to go and
take blood samples from them to
see if they're seropositive or negative.
And already what you find is
that now the blood samples
are being handled differently
Hmm.
and maybe they're being sent off at
a different time in a different lab.
If it's the same lab, then we know
that assays vary over time, and
so without really realizing it,
the measurement process itself is
introduced to bias simply because
not randomized, double-blind If it's
randomized, double-blind it's impossible.
There is no way that you can correlate.
taking of any sort of measurement with
the allocation of either treatment because
it's random and nobody knows what the
Scott Berry: Hmm.
Hmm.
Agree.
Agree.
Well, thank you so much.
Thank you for your blogs.
Thank you for your.
Stephen Senn: Okay.
Scott Berry: Yep.
Thank you for the pictures of your
hikes and enjoy your hike tomorrow.
Uh, I know you've been in a lot of
interims, but thanks for joining
us, uh, here and for everybody.
Uh, till next time, we'll
be here in the interim.
Stephen Senn: Okay.
Yeah.
Thanks.
Bye.
Scott Berry: Thank you.