A podcast on statistical science and clinical trials.
Explore the intricacies of Bayesian statistics and adaptive clinical trials. Uncover methods that push beyond conventional paradigms, ushering in data-driven insights that enhance trial outcomes while ensuring safety and efficacy. Join us as we dive into complex medical challenges and regulatory landscapes, offering innovative solutions tailored for pharma pioneers. Featuring expertise from industry leaders, each episode is crafted to provide clarity, foster debate, and challenge mainstream perspectives, ensuring you remain at the forefront of clinical trial excellence.
Judith: Welcome to Berry's In the
Interim podcast, where we explore the
cutting edge of innovative clinical
trial design for the pharmaceutical and
medical industries, and so much more.
Let's dive in.
Scott Berry: All right.
Welcome everybody.
Back to, in the interim, I'm
your host Scott Berry, and
I have three guests today.
Um, a hierarchical set of guests today.
Uh, and that, that gets to our topics.
So let me introduce our guests
and then I will get to our
topic, which is hierarchical.
End points, win ratios.
Uh, what is this?
But let's, let's introduce the guest
today and let me start with Dr.
Amy Crawford and I'm going to learn from,
uh, my Sean Cassidy podcast that I did
where he said he was so impressed when
the doctor said, tell me your story.
So Amy, tell me your story.
Amy Crawford: Oh boy.
Um, and I'm at the top of the hierarchy.
Um, so, um, I am, I'm a statistician.
I work with all of these
beautiful people, um, and I, uh.
I did my undergraduate
degree at North Dakota State.
I'm a mid-westerner, um, my
graduate degree at Iowa State.
Um, moved to Texas, uh,
enjoying the heat down here.
Um, and at Berry Consultants I have been
working on a lot of clinical trials, uh,
particularly in the cardiovascular space.
Um, some stroke that deal with, um,
wanting to look at composite endpoints,
hierarchical composite endpoints.
And so, um.
Had a few, had a few places where I've dug
deep into properties of these things and,
and all the nuances, and so I'm happy to
join the, the Hierarchical composite Crew
with, uh, Jessica and Cora here today.
Scott Berry: Wonderful, uh, Cora, Dr.
Cora Allen Avita, what is your story?
Cora Allen-Savietta: Hi.
Thanks for having me.
Um, so I'm also a Midwestern girl.
I come from an academic
family of mostly biologists.
But, um, my grandfather and my
father are, uh, both statisticians.
So I grew up asking a lot of questions.
My parents saying, well,
what's your hypothesis?
It's not gonna be biased.
And those parts of the discussions
were always the most exciting to me.
I came to statistics from, I think,
a storytelling kind of perspective.
I started as a psychology
major, took my first statistics
class as part of a psychology
requirement and just fell in love.
Um, I accidentally did a whole, uh.
Pulled an all-nighter, um, just
the first time I got access
to a statistical software.
So I just fell in love immediately
and then went and got my PhD at the
University of Wisconsin and Madison.
Um, and then, uh, where I worked on
questions around, um, uh, genetics
and can we create phylogenetic?
Um.
tell the story of how a particular people
or animal has, uh, evolved over time.
And now I'm doing
something very different.
Um, here at Berry Consultants.
Uh, I design clinical trials, um, a
broad range of different kinds of trials,
platform trials, uh, heart failure
trials, as we'll be talking about today.
Also, some neurological,
uh, disease trials.
So excited to be here and, uh,
thanks for, thanks for joining.
Scott Berry: Does that, and now
you have to, uh, help me with the
terminology, but does that mean you were
doom scrolling statistical software?
Cora Allen-Savietta: I don't think that,
uh, you could call it doom scrolling.
It didn't have endless scroll.
I will say it wasn't the most exciting,
uh, statistical analysis tool.
It was just SPSS.
I don't know if any of you have
had the pleasure of working with
SPSS, but as someone who had never
had access to even R before I was.
Just completely, uh, amazed by
the way that it could take a set
of numbers and turn it into a
story that you could tell people.
Um, and so that I think, ties in nicely
today to how do we take a set of numbers
and then parse them into something
that actually tells a clinical story.
Scott Berry: I am gonna
go with doom scrolling.
Okay.
Cora Allen-Savietta: Okay.
Scott Berry: Dr.
Jessica Overby, tell us your story.
Jessica Overbey: You know, I'm last, so
I, I should have come up with something I.
Uh, something witty.
I had the most time.
So let me do, let me try.
Um, so I wa did my undergrad
at UNCI was a biology major.
Um, and I realized quickly that I
could not go to medical school because
the sight of blood makes me ill.
Um, it wasn't gonna be the right fit.
So I saw a sign on the bathroom
wall that's like, do you wanna
be a biostatistics major?
And I was like, what's that?
Um, so I applied and I got in
and I was like, let's do this.
Um, and quickly, similar to Cora, kind
of fell in love, found my niche, um.
I worked in a computational bio lab for
a while, and that's where I got into
coding and, and, uh, it, it just, uh, felt
like that was the place I needed to be.
Also because in the biology lab
I'm, I'm also very prone to broken
breaking glassware, so being at a
computer is the right fit for me.
Scott Berry: Hmm.
Jessica Overbey: Um, and so from
there I did, uh, a master's in
biostatistics at Columbia in New
York, and I moved to Mount Sinai right
after as a master level statistician.
Um, and I was at Mount
Sinai for over a decade.
I, I did my PhD, uh, a few years
after I started working full-time
and I, I went back to Columbia,
but kept my job at Sinai.
Um, and then I joined, once I
finished my doctorate, I, I became
faculty at Sinai, which was great.
And, um, while I was at Sinai, I
worked in a clinical trials unit.
Um, I spent a lot of time working in the
cardiothoracic surgical trials network,
so that's really where I got my feet wet.
And, um.
Became kinda a cardiovascular
trial focused person.
Um, so when I moved to Barry, I brought
a lot of that cardiovascular, uh,
tri trial, uh, experience with me.
And so I, I try to spend a
lot of time in that space.
Um, I'm very interested in heart
failure, so, um, that's where I'm now
and very glad to be at Barry doing
some more innovative things in that
space and happy to be here today.
Scott Berry: Very nice and, and,
uh, I, I don't know if we go back
to the beginning of the win ratio,
but it feels cardiovascular was, was
somewhere in the beginning of this.
And, and so I think it's a,
it's a part of the history.
And Amy, I know you've been
thinking of analogies or, or, or
what is the win ratio and, um.
So first of all, we should let our
audience know, uh uh, what is a win ratio
or a hierarchical composite analysis.
Amy Crawford: Sure.
Uh, yeah, I'll start and others
should feel free to jump in.
Um, so higher.
So when we're thinking about how to
measure, you know, whether, how a
patient feels functions or survives,
which is important in the clinical
trial space, that's the point, right?
Um.
There are, there are different
endpoints that we look at.
So if we stay in the cardiovascular
world, you know, you could
ask, um, do patients, uh.
On treatment tend to to live longer.
Do they tend to, um, you know,
have better, uh, quality of
life after their treatment?
Things like this.
And the, and the idea is that there
are a lot of different things that
go into measuring how a patient
feels functions and survives.
And what we're talking about today with
composite endpoints is combining, um, a
number of those, of those things together.
Components of, of how a patient, um.
Does after, during, or after treatment,
and comparing that to control.
When we talk about hierarchical
composite endpoints, we're talking
about ranking things in order
of, uh, clinical importance to a
patient and, and the community.
So you could imagine a heart
failure patient, um, has a
procedure in a clinical trial.
Um, and then we follow them and we,
we, we watch to see are, are they
alive at the end of a follow-up period?
Um.
How many times have they returned
to the hospital, uh, for heart
failure events and, um, you know,
what's their quality of life like?
And we can rank those in order and
say, you know, if they're alive or not.
That's the most important thing.
And only if, if if, um, if they're alive.
Do we really wanna look at,
um, the next layer in the, in.
Composite, which is say, heart
failure, hospitalizations.
How many times have they
gone back to the hospital?
And only if maybe, you know, they,
they, um, if, if we wanna move past that
and look at quality of life as like,
the least Im important, um, outcome.
And so, um.
The hierarchical composite takes these
components and it, and it puts them
in this order that's generally agreed
upon by, by a clinical audience.
There's, um, a process for
comparing people on this endpoint,
which gets a little bit awkward,
but I'll, I'll pause there.
Um, was that kind of what you
were looking for, Scott, or does
anybody else have anything to add?
Scott Berry: No, I, I, I
think that makes sense.
And so, uh, we're, we're
interested in quality of life.
In that, in your scenario, if you're alive
and if you've avoided hospitalization,
uh, uh, you know, hospitalizations for
heart failure are, are particularly
bad if we've avoided that we're
interested in quality of life or, or.
Uh, but if, if death happens,
that's sort of the first thing.
So we've got these three things.
If we went in and said the primary
endpoint is only mortality, we
probably need huge sample sizes,
and it doesn't completely capture.
The, a patient feels functions
and survives the functions part.
Um, if we only did heart failure
hospitalizations, we, we have
some of the same d same issues.
We also have to incorporate mortality
into that, and it doesn't incorporate,
um, uh, any functional part beyond.
Um, uh, hospitalizations if we
did only quality of life now
we've, we're ignoring mortality.
We're ignoring hos hospitalization.
So in, in three endpoints like that,
we may come back to this, and I know
there are many examples of this, and
we may talk about different examples
of this, but that's the general ideas.
We'd like to include all of these
endpoints as a primary rather than
any one of them seems insufficient.
So, Cora, how do we analyze such a thing?
If, uh, you know, I, I
want mortality to count.
I want hospitalizations to count.
I want quality of life to count only
if they, they survive kind of thing.
So how do you analyze such a thing?
Is it an orbital endpoint
or what do we do with it?
Cora Allen-Savietta: You're
setting us up so well.
Um, so to analyze this, um, Amy mentioned
that it gets a little bit technical and
complex, uh, once we're trying to analyze
people, but I think you set us up nicely.
So first we're going to take all the
patients in the treatment group, all
the patients in the control group.
And I'm gonna talk about the win
ratio here just for simplicity.
And we're gonna compare each patient
in the treatment group to every
other patient in the control group.
And we're gonna count how many
times did that treated patient win.
Um, and then we'll do that for
all of the treated patients.
And so we have all of
the pairwise comparisons.
It's as if we s.
Took a big group of people and we
said, you're all gonna play singles
tennis matches, and then we're gonna
count how many times each of you won
and how many times each of you lost.
And now we take the number of wins and
the number of losses for each patient.
We sum up that for the treatment group.
We sum that up for the control group.
And the win ratio is the ratio
of wins in the treatment group.
The ratio of win wins in the control
group, or you can use the win odds,
um, which is typically, uh, preferred.
Now, um, where you take half of the
ties and you add that to the top half,
the ties are added to the bottom.
That's a side note.
Basically, basically you get a ratio
of the, uh, relative effectiveness
of the treatment versus control.
Does that answer your question?
Scott Berry: Uh, it, it does, but let,
let's, um, let's make sure this is clear.
The, the tennis match.
So, uh, and by the way, we're, we're,
we're a two by two table here on
the screen, uh, of the four of us.
So Jessica and I are on the right.
So we're in one group, uh, core
and Amy and another group When.
Cora is in one treatment group and
I'm in another, and you say there's
a tennis, but how do you compare?
Who wins between the two of us
and how does that incorporate
the hierarchical component?
What?
Cora Allen-Savietta: Oh yeah.
That's great.
Yeah.
Let's get into that.
So if I was being compared against you,
Scott, you were in the control group?
I was in the treated group.
I missed that.
Let's, let's say that.
So I, okay.
Um, let's say I, neither of us, um, let's,
first we're gonna compare on mortality.
So did either of us die during
the treatment period, um, that
we have shared between us?
Let's say we both have the
same follow-up time, so we both
were followed for two years.
Did either of us die in those two years?
Let's.
Neither of us did.
If that's the case, then we're gonna
ask, um, we're gonna go down to heart.
Heart failure hospitalizations.
Were either of us hospitalized for a
heart failure event, um, or maybe a
heart failure medication, uh, diuretics
were needed, something like that.
Um, if the answer is no to that.
Then will be a tie on both mortality
and heart failure events, and we're
gonna then go to the next level.
So it really only, you only come
down to those lower levels if you
have ties on the higher levels.
So now we're at a quality of life
measure, and here's where we're
gonna break a lot of those ties.
So let's say that I'm able to,
um, have a longer walk distance.
Let, maybe that's the quality of
life measure than you are, and it's.
It's a meaningful enough
difference that we're gonna
call it significantly different.
Maybe I can walk 60 meters in
the, uh, amount of time, uh,
and you can walk 10 meters.
Um, so we'll say that I then won this
comparison of me versus you, and I would
get one added to my count of winds.
Scott Berry: Okay, so the hierarchical
components are how we play the game
against each other, each treatment to
each control, and at the end of the day,
we're looking at how, how many times
the treatment beats a control patient.
Largely, and we get the statistics
right for understanding the
variability of it, and we're looking
for perhaps is there a statistically
significantly higher number of wins
from a treatment to a, to the control.
Now wind could be on mortality, that
one survive longer than the other.
They could, neither of them could
have of died, or they both could
have, and they could be a tie.
And then you move down the
level and you keep looking to
see if there's a, a win or not.
And it is possible, people tie
throughout and they end up as a tie.
Um, this is like the NHL.
There's, there's lots of ties and
forms of ties, but there, there, there
could be, uh, ties as part of that.
Now.
When we do a primary
analysis, that's a win ratio.
And I guess I, I, I will throw
it to Jessica, but I want
everybody to, to, to jump in here.
Now, I, I, I guess I should come clean
here, um, in this is, I'm not a fan
of the win ratio and, um, I have a,
a love hate relationship with the
WIN ratio, so I, I should be clear.
To our audience out there.
Now, I've used the win ratio and I've
used hierarchical composites, and that's
Cora Allen-Savietta:
I'm still seeing Amy and
Scott Berry: benefits.
They have some things that I, I find
peculiar or less beneficial to that.
So let's sort of dive into all of
Amy Crawford: in, he can.
Scott Berry: When we do
a primary analysis on
Cora Allen-Savietta:
I'm not sure I heard it.
I think.
Talking maybe about the final analysis.
Scott Berry: Yeah, the final
analysis, and, and I, I, I, I
understand statistically I can say
one group's better than the other one.
How do we estimate or provide
an estimate based on this test?
And, and, and I'll throw it.
This is gonna be hard because I know you
all want to answer all of these questions.
Um, and so, Jessica, how do I,
what, what does the, what does the
final sort of s demand look like?
What does the final analysis look like?
Jessica Overbey: Yeah, so, um.
For the final analysis, we use the
Finkelstein Schoenfeld test, and
that's to make, get us that p value.
Um, but that doesn't have a natural
treatment effect next to it.
So that's how the win ratio
came about to compliment the
Finkelstein Schoenfeld test.
So the p value tells us on average
treated patients are having
better outcomes than control.
Um, and we go to the win ratio
to get our treatment effect.
So that tells us, um, it's
the ratio of wins to losses.
That's one that means
there's no difference.
The groups are doing the same
if it's greater than one.
We know that on average, treated
patients are doing better than controls,
and I think the official estimate is.
Say you compare, you pull out a random
treated patient to a random control.
This is the, this is the ra, this
is the ratio that the treated
patient will do better than control
is the official estimate language.
Um, but you know, in isolation,
I think what you're gonna get
at is, is that interpretable.
What does it mean to win?
Especially in the
concept of the hierarchy.
If you just look at the win
ratio, you don't exactly know.
Where in the hierarchy these
wins and losses got decided.
So there's an
interpretability issue there.
So usually right next to the win
ratio, there'll be this tree or, um,
that shows you the breakdown of, of
at what level each win and losses.
Occurred.
So at the top, if we're doing mortality,
we'll know this percent of patient
or this percent of payers won for
treated this percent, one for control.
And then you'll have the number of ties.
And then of those ties, you'll go
down to the next level and say, this
is how many won on heart failure.
This is how many lost.
Um, and so the tree lets you see.
Exactly what proportion of pairs are
decided by each endpoint and then within
each endpoint, what's the difference?
So in, and you really need that
tree, um, to compliment the treatment
estimate, which I think is unique to
the win ratio to have, to have all these
supplementary analyses to understand it.
Scott Berry: So they, the, the
actual number summary might be this
proportion of, uh, or, or is it the
odds of a win for the treatment group?
We talk about 1.2.
Uh, or something like that as
the, the 20% more wins for the
treatment group relative to control.
And I understand it's hard
to explain what that means.
'cause it could have been on mortality,
it could have been on the, the second,
it could have been on the third.
What, what is the number
summary that we get out of that?
Or is that just not important?
Jessica Overbey: I think
it's the wind ratio, right?
That's your, that's your
core treatment effect.
And, um, the most popular
is the wind ratio.
Cora alluded to the wind odds.
There's been a call to
use the wind difference.
So I think you can slice up these, these
proportions a lot of different ways.
Cora Allen-Savietta: Yeah.
And I will just note there that the win
odds is typically recommended in a lot of
these trials that we're seeing coming out.
Now, we'll see even at the final
analysis, up to 20%, maybe more ties.
Um, and in cases where there are
a lot of ties, the win ratio can
overstate the treatment effect, whereas
the win odds is gonna give, um, I
think a more balanced, uh, sense.
Um, really incorporating those ties.
Half the ties go to the top
half, the ties go to the bottom.
Scott Berry: Okay.
Now, if, if we're gonna build the, part
of the, what's hard about this question is
what's the com, what are we comparing to?
But, um, I, I get the attraction that,
uh, in a scenario we're worried about
running a trial in mortality or mortality
plus heart failure events that we may
need very, very large sample sizes and.
Quality of life, functioning of those,
uh, patients matters, uh, in this.
And so what, how does power come
out in this if you're working with
a client to build a trial, a, a win
ratio hierarchical composite trial.
How does power tend to come out in this?
Is this a, a net positive?
And I know it depends on compared
to what, but generally how are
those comparisons falling out?
Jessica Overbey: If you're talking about
doing a win ratio that incorporates
these lower level quality of life
endpoints versus of course just
doing your more standard mortality.
And clinical events only.
The win ratio is usually
gonna be more powerful.
If you're talking about taking the
components of the win ratio and
forcing them into another composite,
it's not always the case that
the win ratio is more powerful.
Um, so for example, if you
were like to do an aggregate.
Global z statistic of each component
that might end up being more powerful,
but without assigning weights, you,
you run into the issue there of the
clinical severity isn't being taken into
account, which I think will open you
up to criticism, um, and is where the
wind ratio really shines because you are
able to take that severity into account.
Scott Berry: Okay.
The win ratio takes severity into account
by ordering them, but the relative weight
that the third component plays relative
to the first or second is somewhat opaque.
Unspecified,
Cora Allen-Savietta: Unpredictable.
Scott Berry: uh, yeah.
Yeah.
Cora Allen-Savietta: Yeah.
Scott Berry: Yep.
And, and maybe, and I know Amy,
you've been working on a great deal
of trying to measure the contribution
of each individual endpoint.
Um, and, and presumably if nobody
dies, that endpoint has no impact.
If 80% of the patients die, all of a
sudden, that becomes a super impactful.
So how do we understand the contribution
of each of the pieces in the hierarchy?
Amy Crawford: Yeah, this is,
this is interesting, right?
Um, so like Jessica mentioned
earlier, you can look at the tree,
um, where the percent of pairwise
decisions made at each level.
And that's a great first step of
kind of understanding, um, the,
the number of decisions that
are made in this round robin.
Uh, by each endpoint, but really,
uh, just because you've ordered
them in in a particular way doesn't
mean they're contributing to your
analysis, um, in that order, right?
It depends on the prevalence.
Like you said, Scott, if nobody dies,
you're not going to break any ties.
You're not going to make
any decisions on that first.
Uh, level of the hierarchy.
And so the weight that that
endpoint brings to the analysis is
going to be zero or, or very low.
And as therapies get better and better,
I think we're going to see the weight
of, um, like mortality become less
and less in these analyses, um, right
as patients are, are, are surviving.
And so part of what the work
I'm doing gets at is, um.
Is interpreting the weights that
the tree is a great place to start.
But there's some nuances that happen
where like, let's say Cora and Scott
are, are being compared head to head.
And, um, the comparison
is broken on mortality.
Let's say Scott, Scott dies during
the follow-up period, Cora lives,
that's a, that's a pair that's, um,
decided by the mortality endpoint,
the hi highest level in the hierarchy.
Now you have taken away the opportunity
for a decision on heart failure events,
hospitalizations, and quality of life.
And so, um, there's, there are some
nuances where, when, when these
endpoints in these comparisons actually
go into the statistical test, um.
Decisions on higher endpoints in the
hierarchy actually take away opportunities
for decisions on lower endpoints.
And so the number of decisions, right?
It's not, it's not really an equal
comparison because the lower endpoints,
they, they may have had that opportunity
taken away by the higher endpoints.
And so.
In as the statistician, I'm
thinking, you know, contribution
to explaining variance, right?
The, if you're breaking ties at the higher
endpoints, you're explaining a lot more
variance in your test where, where the
variance is coming from in your test
and you're taking away the opportunity,
um, from the lower level, say, quality
of life endpoints in the hierarchy.
So, um, it's a, the tree is a
great, great place to start and
we're thinking about ways to.
Measure, um, measure this
in, in that context as well.
One other thing that I'll say is, um,
you know, when we talk about interpreting
the weights of these things, we're,
we are not able to, the, the test
doesn't capture how much you won by.
So, um, it's not that you can't say, oh,
okay, well, mortality broke a lot of ties.
Um, we made a lot of
decisions on mortality.
Um, it's, we're not able to then
go through and say, um, Cora lived
three years longer than Scott.
So there's not a weight of sort of
treatment effect amount even within that.
So there's kind of all these layers to it.
Um, yeah.
Scott Berry: So, so in a way it's a, I
mean, it's a very non-parametric test
in that sense that there is no relative
hazard ratio that comes out of this.
There's no, uh, event ratio for
heart failure hospitalizations,
where 10 is much worse than one.
It's a loss is a loss.
Um, uh, and of course you, you know
that I do a lot of things in sports.
It's sort of saying who won and lost
the game, and you ignore the score.
Uh, as part of that, the other thing
that seems to be part of this and,
and one of the, the, the is putting
things together that have very
different, uh, clinical meaning.
So it's not unusual that in
cardiovascular trials we do death,
we do, um, a, a non-fatal MI
or, or, or something like that.
And then the third level is pro bnp,
which those are very different things.
Now, I think sponsors want to
put that in there because it's,
let's call it a biomarker.
And I know it's, it's a.
It, it, it, there's a lot of research
going into what that marker means and,
and what other markers mean there, but
it's something that, that many things
may go to that as the tiebreaker.
And so when my overall test wins and
most of my things are broken by that,
that's really good for the sponsor.
That affects that biomarker, but it
still leaves open the interpretation of
are we having positive clinical effects?
And, and, and so this is, I, I, and I
know I, I, it, it, it's not for us to
say, but when, when we're helping to
build these things, that's going to be
something that presumably the consumers of
this regulators may not accept something.
If at the thing at the end really
isn't very clinically important, and
90% of ties are broken at that point.
And so the contribution
is so high, OO of those.
Jessica Overbey: I think you're
right and that's actually.
As we've done more win ratio trials,
some of the more recent trials
published are only having wins on that
lower order, lower order endpoint.
So I can think about the Luminate
trial that I think was a three
order composite of death, some sort
of clinical event and quality of
life, and it was very clear that.
It really came down to
quality of life only.
So the buy-in of the clinical
community might not be as high there.
So I think when you're going
to design a trial, I agree.
I think we always try to show simulated
trials where that ends up being the
case and saying, what if this happens?
Are you comfortable?
Is this, is this really
the route you wanna go?
Is really important.
Scott Berry: Okay.
Um, um, in this now we at Barry,
we do a lot of adaptive designs.
And adaptive designs can
be important part of this.
It seems like adaptive designs
with this endpoint can have
multiple tricky things to it.
Um, and, um, well.
I, let me turn it over to you all
'cause I know there's a number of tricky
things when doing an adaptive design.
Uh, so who wants to, to do this first?
And this may be, everybody weighs
in on this part, and I know there's,
there's parts to quantities of interest.
There's parts to the
relative weights of things.
So what are the challenges of
doing adaptive designs with a
hierarchical composite win ratio?
Cora Allen-Savietta: Jessica,
do you wanna start us off?
Jessica Overbey: Uh, I didn't,
I didn't wanna go right after I
already spoke, but, um, I've been
thinking about this a lot lately.
So, yeah, a great question.
Important question.
Um.
It is hard to do interim monitoring
on these hierarchical endpoints
because when you do an interim,
you've got partial data on patients.
And if you think about that pairwise
comparison at the patient level,
that's gonna change over time,
and that's gonna be very sensitive
to how much follow up they have.
So if you have an earlier biomarker
endpoint at say, three months, and
then maybe you look at the data.
And at that interim, a lot
of patients maybe have less
than six months of follow up.
You are gonna find that the win ratio
is being really driven by the KCCQ,
whereas if the final follow up isn't
for two years, you're gonna have,
you know, you expect, uh, the higher
order end endpoints, the clinical
events will eventually start, um,
taking up more of the hierarchy.
So when you do an an
Scott Berry: So, so let me make
sure, sorry to interrupt, but
lemme make, so KCCQ is measured.
KCCQ is measured at three months.
Say, uh, and, and this
is a quality of life.
And so when you do an interim, relatively
early in the length of exposure, and you
look at the win ratio as it's designed.
Many people haven't died yet.
They haven't had bad clinical
events, and so almost everybody's
being determined on this KCCQ.
But when the trial reaches a longer
level of exposure and now patients
are getting two years of exposure,
all of a sudden the other things
are now changing the win ratio.
And so that, that's the issue that you're
concerned with in doing adaptive designs.
Jessica Overbey: Yeah, I call
it the immature win ratio.
It, it's at the interim It's
not really reflecting what
it's gonna be at the final.
And that does create issues on
making decisions that day based
on that raw, uh, win ratio.
Okay.
Scott Berry: Okay.
Um, does that mean we don't do interims?
Can we, can we model these?
Can we do things like predictive
probabilities of success?
Um, or is this something
we shouldn't mess with?
Jessica Overbey: Um, well,
obviously we can do interims.
I'll let, I'll let another
person take this core.
Do you wanna take this one?
Cora Allen-Savietta: Yeah,
I'm happy to jump in.
I mean, I think we touched on this
earlier, that the win ratio, part of
the reason we choose it is for the, um,
higher power with smaller, uh, smaller
sample size, smaller lo shorter follow up.
So, uh, along with that, we
often have reason to, um.
Do interim analysis.
If we're seeing really strong treatment
effects early on, we wanna get this,
uh, this treatment to regulators and
then to patients as quickly as possible.
Or, you know, we also want to be
able to call futility on trials
that that might not be as as useful.
So there's strong reason
to do interims here.
Um, even if there are some challenges.
And one of the solutions that, that
we've used here at Barry pretty
frequently with some success is.
The predictive probability.
So here this is, um, maybe
I'll describe what a predictive
probability is, uh, right off the bat.
So that is, um, taking the data that
we have at a particular point in time.
Um.
If we predict forward, how likely
are we to see a win on this trial?
That's our predictive
probability of success.
Um, we could ask with the current
sample size, what's our probability
of success given the treatment
effects that we're seeing right now?
Um, we could ask at the
maximum sample size, what's our
prob probability of success?
So typically we might think about
doing that for a single endpoint.
So we could maybe start as.
Start simple and say, what if
we look at the deaths that have
occurred already, um, on the
treated group and the control group.
Maybe it's just one or two on each side.
Um, what's our probability of
winning on the trial, um, based
on just that, um, hazard ratio.
So.
That would be a single outcome,
predictive probability.
But now we want to be able to ask, see
what's the distribution of treatment
effects that we're seeing on mortality?
Then separately, what's the treatment
effects that we're seeing on heart
failure hospitalizations, and what are
the treatment effects that we're seeing
on KCCQ and for patients that don't have
that mature data, with the longer follow
up, we're gonna simulate their data
all the way out to the end, and we're
going to do that thousands of times.
Over those thousands of different
trials that are comprised of partially
simulated data and partially observed
data, we're gonna create a thousand
different potential outcomes.
And of those a thousand data sets, what
proportion end up with a win and what
proportion end up as a loss using the
win ratio or actually the Finkelstein
Schoenfeld and its corresponding P value?
Um, how, how likely are we to win?
So that incorporates the idea of, um.
This kind of immature win ratio.
We're not relying on that immature win
ratio and it's P value that's more heavily
weighted to those lower quality of life.
We're not relying on that.
At the interim, we're actually using
a more sophisticated model that allows
us to think about how many deaths, how
many heart failure hospitalizations
are we gonna have over time as they ac.
Scott Berry: Okay, so let me see if I,
I, so early on, if you just calculated
the, uh, Jessica's immature win.
That's not necessarily a good estimate
of what it's gonna be in two years,
because we know we're gonna get more
mortality and heart failure events,
and it's not gonna be only this.
So you're modeling the effect of the
treatment on the individual components
to forecast two years from now.
What is this gonna look like?
So for example, if we said we're gonna
run this for 20 years, we might have.
Almost all of them are mortality
events and it's broken there, so
you're modeling the effect there.
So that can be done to make any
decisions, uh, during an adaptive
design, futility, uh, uh, stopping
enrollment, whatever it is.
The, that, that's a, a really
good way to understand what a
less immature win ratio could be.
That a fair summary.
Cora Allen-Savietta: I think so.
Scott Berry: Okay.
Okay.
Alright.
Um.
What about, uh, I, I, okay, I won't ask
the regulator question, but that's coming.
So think about that.
The, the regulator question's coming.
Um, um, it, it, what is frustrating,
and maybe I'm going sideways,
but I do go sideways a lot.
Um, what, what could, is this,
Amy, is this an ordinal endpoint?
Could I just ignore this whole thing
and create an ordinal endpoint instead?
Amy Crawford: I mean, um,
yeah, you could, there are,
Scott Berry: Okay.
Amy Crawford: you could do
whatever you want, Scott.
No.
Um, so yeah, there are a lot of cases
where, you know, um, this does just
collapse to an ordinal endpoint.
Um, if every, you can imagine, say
we're running a two year a, a study
with a treatment period of two years.
Everybody's followed for two years.
At the end of two years, I
don't really need to do any
pairwise comparison of patients.
To tell you, um, the order in which
patients, um, fell out in, in terms of
how well they did in the trial, right?
If everybody's got two years of follow
up, um, and, and I died first in the
trial, I had the worst outcome, right?
And you can kind of create a ruler that's
independent of any pairwise comparison
and you can just order patients and,
and then you have an ordinal outcome.
And there are
Scott Berry: so largely, so.
Largely you compared to
everybody, you lose to everybody.
Amy Crawford: I lose to
Scott Berry: sort of like, so you're the
worst and somebody beats everybody and
you could almost rank them, uh, in that.
Um, okay.
Amy Crawford: Yeah.
And so you can imagine, you know, if,
if, if we, if not all of us have two
years of follow up, if, if, if, say I
have one year of follow up and Jessica
has two, what happens in the pairwise?
This is kind of one of the nuances.
What happens in this win ratio?
Uh, the head to head is we actually take
Jessica's two years of follow up and we
scale it back and we say what happened
to Jessica in one year in the trial
and compare Jessica to Amy at one year.
Right.
And, and so we have all of this
differential follow-up, and, and
that's, this is kind of why the FS
test was, was designed is, is so that
you can, you can scale back to the
lowest common denominator of follow-up
time and make comparisons in that way.
And so you can imagine that
we lose some transitivity.
Um, when we do the pair, the round
robin, because I'm being compared
to Jessica one year, but Jessica's
being compared to core up two years.
Um right.
And so that kinda breaks the
independent ordering, um, which you
may want in an ordinal endpoint.
Um, for an ordinal model, but, um, but
for the most part, yeah, these are,
these are essentially ordinal endpoints.
Um, and, and depending on some assumptions
you make going in, you can create, you can
use them to create, um, an, an ordering
of the patients in the trial and, and,
um, and, and use that as an ordinal
Scott Berry: Okay, so, so sometimes you
do have a lack of transitivity, which is,
which is sort of weird that you beat me.
I beat Jessica, but Jessica beats
you, uh, would be, would, would be.
And that can happen in these tests,
which are sort of awkward in the
whole interpretation of this, that.
You know that, that it doesn't
hold that in that scenario.
Um, and so that, that can be
awkward in this, this part of it.
Okay.
Now I know all of you have gone to
regulators with win ratio tests, and
the worst question I can ask you is
what do regulators think of win ratios?
Because regulators are diverse.
Not every win ratio is the same.
There are some where the
components may all be.
Clinically similar, there may be
some widely different kind of thing.
Um, what are some of the concerns that
regulators have had in, in designs where
you've had win ratios and I, I, I guess
we can throw this to, to all of you.
Uh, Amy, what, what concerns have regulat.
Amy Crawford: Yeah, we've talked a
lot about, um, well there's, there's
a few competing perspectives, right?
There's C Clinic, clinical, um, which
is a lot of discussion about, um,
endpoints, ordering of endpoints, what
constitutes a heart failure event.
Um, things that are, things that are,
you know, defining the hierarchy.
Um, from a statistical perspective,
there are questions of, you know,
how, how are you analyzing it?
Um, we talk a lot about ties,
um, and I think this was
alluded to earlier, but, um.
What do you do with ties?
Um, what, what makes a tie?
So on the quality of life endpoint
at the end of the hierarchy.
Um, I think Jessica mentioned
earlier, a lot of times you
only want to declare a winner.
If there's a, you know, clinically
meaningful difference between patients
and quality of life, and what is,
what is clinically meaningful?
It's a kind of a clinical question, but
it's also a statistical question because
the more ties you have, the fewer, you
know, um, wins and losses you have and,
and that kind of can take your power down.
So, um, lots of competing
interests, but I think generally.
My experience, regulators have,
um, sort of accepted this after
discussion and, and, you know, with
a reasonable approach and, um, yeah.
But I interested to hear what Cora
and Jessica, their experience,
um, with the regulatory.
Scott Berry: Cora.
Cora Allen-Savietta: Yeah, so I'll echo.
What Amy said.
I think, um, after discussion on
important things like what's the
tiebreaker, um, thresholds, how much
better do I have to be on KCCQ compared
to Scott to be considered a win?
If we're just one yard different, or sorry
for one point difference for KCCQ, then
we're not going to, uh, call that a win.
For example, uh, maybe we need a
five point buffer, 10 point buffer.
Um.
Uh, or caliber.
Uh, those pieces are important, I think.
Um, something else that we often talk to
about regulators is what's the package
of supportive evidence that we're
gonna provide along with the win ratio?
So, um, we're gonna give the Finkelstein
Schoenfeld the p-value We're also gonna
give the win ratio, the win odds, the
win difference, maybe even a net benefit.
Measure.
All of these are really
nice, complimentary, what I
like to call win statistics.
Um, and each of them has
their kind of pros and cons.
Um, and together they tell
a more complete story.
But I think the most important pieces are
really that tree that Jessica described
where we're showing where the winds
are happening at each level, how many
ties we end up with at the end, and
then the individual component analysis.
So we wanna see, um, for.
People, uh, for everyone.
What was the mortality, um, for
people who didn't have a mortality
event, um, what does the rate ratio
look like for hospitalizations
for people who had neither
mortality nor heart failure events?
Um, what does, uh, the quality of
life measure look like in terms
of just differences in means?
And these don't have to be complex,
but I think that it's really very
challenging to interpret, um,
something like a win ratio without
all of that complimentary material.
I think.
That doesn't have to be a negative.
I think as statisticians we should think
of ourselves as storytellers and we should
always be ready to present a kind of
complete report, um, about the primary
analysis that's beyond just a p-value Um,
and then I am hoping Jessica maybe can
chime in on, um, some of the conversations
we've had with European regulators.
I've been mostly speaking about,
um, American regulators, but we've
also recently had some interesting
interactions with Europeans.
Jessica Overbey: Yeah.
Um, so.
Agree with everything.
Cora and Amy said, um, kind of leaning
into our recent experience, uh, with
European regulators, but I think
FDA also has these concerns is back
to that interim monitoring piece.
Um, I think there's some discomfort
with doing group sequential
analyses on the wind ratio.
And that would be the case where
you take the immature wind ratio
and make a decision that's gonna get
pushback, uh, on both sides of the
ocean, um, for a number of reasons.
And, um.
If you can demonstrate that maybe doing
the, once you, once you've enrolled every
patient, if your, if your final analysis
is at two years, you can sometimes.
Get a group sequential analysis, maybe
at one year, demonstrating that the
distribution of wins and losses isn't
that different from one to two years.
That could happen, but you have to, you
have to be able to demonstrate that.
And then there's also, um, the piece in
terms of what's the information fraction.
There's a lot of conversation there.
Um, and then another piece is if you do.
Wanna go the predictive probability route
for maybe stopping enrollment early.
Um, which I think we're all a big fan of.
There is a concern that we've heard
of stopping enrollment because of a
very strong biomarker signal only.
Um, and that's not, I mean, that's
a great reor concern, but it's also
gonna be a concern of the company.
So what we have done is with the
predictive probability, we will only
build in statistical significance.
We'll build in that there needs to be
statistical significance and also a
direction of events for the clinical
events that's in the right direction.
Um, and that can help with that too.
Scott Berry: Hmm.
So the, the, an interesting part of
the win ratio, so in your example
where you do a three months quality of
life, but you're following patients to
two years, if everybody was at three
months, this is largely quality of life.
If everybody gets to two years, there's
a much bigger impact of mortality.
So, um.
The interpretation of it and the
value of this relatively changes
depending on the amount of exposure.
And so regulators may want to make
sure you have the amount of exposure
that makes this something that
they clinically find attractive.
It sounds like now some of
the things you talked about,
like predictive probabilities.
If we're doing a Goldilocks design where
at some point you say, we think we have
enough patients, we're gonna follow
them all to two years, is somewhat.
Somewhat sponsor risk.
I imagine that the modeling is
going to, going to bear out.
That when you get the amount of
exposure that's appropriate, this
all, this all ends up successful.
Jessica Overbey: Yeah,
I think that's fair.
Scott Berry: Okay.
Um, um, have I have, I, I.
I was gonna ask about thi this is
naturally a relatively frequentist thing,
um, and it's a non-parametric thing.
Um, can we, I, and, and this
is not a religious thing that
we need to make this Bayesian.
Um, can we do things like
incorporate covariates?
Can we do, uh, that, could
we do modeling of this?
Uh, other than just purely
who had the most wins.
Amy Crawford: Can I go?
Um, I'm gonna go.
Um, yeah, so I've had some
experience with this recently.
So, um, yeah, so we talked a little bit
about is this an ordinal endpoint earlier.
Um, and there there are cases where,
you know, it's natural to use the
number of wins and the number of losses.
Um, and, and let in this
round robin comparison and
let that order the patients.
And then, um, we can, we can take
that as an ordinal variable and
fit an ordinal regression model.
To that.
And now, um, the ordinal
regression model, proportional
odds model is semi parametric.
And we're in a space where we can,
we're in regression, you know, where
we, we can adjust for covariates.
Um, we can be bayesian, we can put
priors on, on, um, on parameters.
We could do a meta-analysis.
We could have trial a as round robin
and an ordering and treatment effect,
and we could have the treatment effect
follow a common prior distribution
with trial bs, you know, round,
round robin and, and ordinal, um, you
know, odds ratio treatment effect.
And that follows a common distribution.
And, you know, so you could think of all
the, all the possibilities once you're
in a regression space to be bayesian,
to do modeling, to get intervals
that are interpretable and nice.
Um.
And so that's, that's something
that I've been, I've been
working on a little bit lately.
Um, one other thing that I'll mention,
uh, I had it and now I lost it.
Um,
Nope.
It's still gone.
Scott Berry: Oh, okay.
Okay.
Okay.
Alright.
So
Amy Crawford: I'll chime
back in if I remember.
Scott Berry: Okay.
So let, let, let me, uh, throw this out.
Um, and maybe.
Closing, parting shots in this or overall
viewpoint of this things going forward?
Uh, uh, closing.
I, I don't wanna do this 'cause I don't
know whether you all want to make the
case that these are, you know, this
solves all our problems, but, uh, closing
comments on, on hierarchical endpoints,
win ratios, where we're going, Jessica.
Jessica Overbey: Sure.
Um, I think that the win ratio.
Can be cool and very
useful in certain settings.
It has a lot of problems.
We've talked about them, but you
know, there's no perfect endpoint
oftentimes in, in a heart failure trial.
Um, so until we come up with
that perfect endpoint, I, I say
there's a lot of different options.
They all have pros and
cons go in eyes wide open.
Sometimes Lu win ratio is a great fit.
So I tend to be pro in, you know, when
it's the best fit, it's the best fit.
Scott Berry: Mm-hmm.
Cora.
Cora Allen-Savietta: Um, I
largely agree with Jessica.
I think thinking about where
we've been and, um, where we
are now of trials that we're.
Just ranking patients, just putting them
in two categories where they successes
or failures after two years, the win
ratio is a huge improvement from there.
Um, and even thinking about time to first
event as a comparison with win ratio,
I think the win ratio is capturing much
more, um, significance to patients.
And I think that's really,
um, a benefit in terms of, uh.
Delivering treatments that are
gonna have clinical meaning
to patients and clinicians.
I would love to see some more exploration
of other composites that I think put a
little bit more of the responsibility
on the trial lists, um, on the, um, on
the key stakeholders to put weights.
On these components, it's hard to
put weights on these components,
but we have really nice tests.
Uh, the global test is one that
allows you to say you do univariate
analyses of each of these
components and pre-specify weights.
If mortality is the most important to
you, put the highest weight on that.
Then a medium weight on heart
failure, hospitalizations, and
then a lower weight on biomarkers.
I feel a little bit uncomfortable
with the weights being, um.
Sort of left up to chance by how much,
how many heart failures and mortalities
we happen to see within a trial.
Um, I like trialists to maybe take
some responsibility for setting those
weights instead of kind of having
them be kind of hidden and implicit
because hierarchy is not weight.
Scott Berry: Now that's exciting.
Yes.
Uh, Amy.
Amy Crawford: Yeah, I,
I, I feel the same way.
I think waiting to see what the weight,
the, the, the weights fall out to be
at the end of the trial is, um, I,
I, I agree with Cora and, um, it's
a, it's an interesting space, but I,
I do think, you know, I understand
how, how the, the, especially the
cardiovascular field got to this place.
You know, lots of things
matter to these patients.
There is a clinical
ordering to those things.
Um, and sometimes I, like Jessica
said, it is the right thing to do, but
I, I, I think that, um, I think we're
headed in, in a direction, hopefully,
where we can do smarter things.
Um, and, and so it's useful, it, it,
it's not my favorite, but it is useful.
The interpretation piece is really
sitting down as the statistician
and, and having to say, you know,
bear with me or something like that.
I need to show you four things,
four slides in order for
this to come through cleanly.
You know, that's a challenge.
So, um,
Scott Berry: Yeah.
Uh, and I, and yep.
Sorry, and I, and I know you're all
working on very hard, really nice graphics
to demonstrate all of that, so that,
that's pretty exciting work as well.
So appreciate it.
Appreciate you all coming in
here and, uh, for everybody
else, we'll see you next time.
We will be here in the interim.