In the Interim...

In this episode of "In the Interim…", Dr. Scott Berry is joined by statisticians Dr. Amy Crawford, Dr. Cora Allen-Savietta, and Dr. Jessica Overbey for a technical deep dive into hierarchical composite endpoints and the win ratio in clinical trial design. The group addresses clinical and statistical justifications for layered endpoint structures, demonstrates the mechanics of pairwise win ratio analysis, and explores operational and interpretive consequences in both conventional and adaptive trials. The panel scrutinizes analytic limitations, regulatory concerns, and emerging modeling strategies—all grounded in real-world trial examples.

Key Highlights
  • Precise definition and use case for hierarchical composite endpoints in cardiovascular and related trials.
  • Stepwise breakdown of win ratio mechanics, tie-handling, and the distinction between effect estimation (win ratio) and hypothesis testing (FS-test).
  • Discussion of endpoint prevalence and dominance, risk of clinical interpretation being tied to lower-order outcomes, the role of patient exposure, and methods to parse component contributions.
  • Overview of statistical power, role of simulation, and comparative advantages over other composite approaches.
  • Identification of core limitations: interpretive complexity, opaque weighting, and mutable meaning of wins with maturing data.
  • Review of predictive probability for adaptive interim analysis and modeling using ordinal regression.
  • Opinions of US and European regulatory perspectives including support, reservations, and expectations for transparency with graphics and complementary analyses.
For more, visit us at https://www.berryconsultants.com/

Creators and Guests

Host
Scott Berry
President and a Senior Statistical Scientist at Berry Consultants, LLC

What is In the Interim...?

A podcast on statistical science and clinical trials.

Explore the intricacies of Bayesian statistics and adaptive clinical trials. Uncover methods that push beyond conventional paradigms, ushering in data-driven insights that enhance trial outcomes while ensuring safety and efficacy. Join us as we dive into complex medical challenges and regulatory landscapes, offering innovative solutions tailored for pharma pioneers. Featuring expertise from industry leaders, each episode is crafted to provide clarity, foster debate, and challenge mainstream perspectives, ensuring you remain at the forefront of clinical trial excellence.

Judith: Welcome to Berry's In the
Interim podcast, where we explore the

cutting edge of innovative clinical
trial design for the pharmaceutical and

medical industries, and so much more.

Let's dive in.

Scott Berry: All right.

Welcome everybody.

Back to, in the interim, I'm
your host Scott Berry, and

I have three guests today.

Um, a hierarchical set of guests today.

Uh, and that, that gets to our topics.

So let me introduce our guests
and then I will get to our

topic, which is hierarchical.

End points, win ratios.

Uh, what is this?

But let's, let's introduce the guest
today and let me start with Dr.

Amy Crawford and I'm going to learn from,
uh, my Sean Cassidy podcast that I did

where he said he was so impressed when
the doctor said, tell me your story.

So Amy, tell me your story.

Amy Crawford: Oh boy.

Um, and I'm at the top of the hierarchy.

Um, so, um, I am, I'm a statistician.

I work with all of these
beautiful people, um, and I, uh.

I did my undergraduate
degree at North Dakota State.

I'm a mid-westerner, um, my
graduate degree at Iowa State.

Um, moved to Texas, uh,
enjoying the heat down here.

Um, and at Berry Consultants I have been
working on a lot of clinical trials, uh,

particularly in the cardiovascular space.

Um, some stroke that deal with, um,
wanting to look at composite endpoints,

hierarchical composite endpoints.

And so, um.

Had a few, had a few places where I've dug
deep into properties of these things and,

and all the nuances, and so I'm happy to
join the, the Hierarchical composite Crew

with, uh, Jessica and Cora here today.

Scott Berry: Wonderful, uh, Cora, Dr.

Cora Allen Avita, what is your story?

Cora Allen-Savietta: Hi.

Thanks for having me.

Um, so I'm also a Midwestern girl.

I come from an academic
family of mostly biologists.

But, um, my grandfather and my
father are, uh, both statisticians.

So I grew up asking a lot of questions.

My parents saying, well,
what's your hypothesis?

It's not gonna be biased.

And those parts of the discussions
were always the most exciting to me.

I came to statistics from, I think,
a storytelling kind of perspective.

I started as a psychology
major, took my first statistics

class as part of a psychology
requirement and just fell in love.

Um, I accidentally did a whole, uh.

Pulled an all-nighter, um, just
the first time I got access

to a statistical software.

So I just fell in love immediately
and then went and got my PhD at the

University of Wisconsin and Madison.

Um, and then, uh, where I worked on
questions around, um, uh, genetics

and can we create phylogenetic?

Um.

tell the story of how a particular people
or animal has, uh, evolved over time.

And now I'm doing
something very different.

Um, here at Berry Consultants.

Uh, I design clinical trials, um, a
broad range of different kinds of trials,

platform trials, uh, heart failure
trials, as we'll be talking about today.

Also, some neurological,
uh, disease trials.

So excited to be here and, uh,
thanks for, thanks for joining.

Scott Berry: Does that, and now
you have to, uh, help me with the

terminology, but does that mean you were
doom scrolling statistical software?

Cora Allen-Savietta: I don't think that,
uh, you could call it doom scrolling.

It didn't have endless scroll.

I will say it wasn't the most exciting,
uh, statistical analysis tool.

It was just SPSS.

I don't know if any of you have
had the pleasure of working with

SPSS, but as someone who had never
had access to even R before I was.

Just completely, uh, amazed by
the way that it could take a set

of numbers and turn it into a
story that you could tell people.

Um, and so that I think, ties in nicely
today to how do we take a set of numbers

and then parse them into something
that actually tells a clinical story.

Scott Berry: I am gonna
go with doom scrolling.

Okay.

Cora Allen-Savietta: Okay.

Scott Berry: Dr.

Jessica Overby, tell us your story.

Jessica Overbey: You know, I'm last, so
I, I should have come up with something I.

Uh, something witty.

I had the most time.

So let me do, let me try.

Um, so I wa did my undergrad
at UNCI was a biology major.

Um, and I realized quickly that I
could not go to medical school because

the sight of blood makes me ill.

Um, it wasn't gonna be the right fit.

So I saw a sign on the bathroom
wall that's like, do you wanna

be a biostatistics major?

And I was like, what's that?

Um, so I applied and I got in
and I was like, let's do this.

Um, and quickly, similar to Cora, kind
of fell in love, found my niche, um.

I worked in a computational bio lab for
a while, and that's where I got into

coding and, and, uh, it, it just, uh, felt
like that was the place I needed to be.

Also because in the biology lab
I'm, I'm also very prone to broken

breaking glassware, so being at a
computer is the right fit for me.

Scott Berry: Hmm.

Jessica Overbey: Um, and so from
there I did, uh, a master's in

biostatistics at Columbia in New
York, and I moved to Mount Sinai right

after as a master level statistician.

Um, and I was at Mount
Sinai for over a decade.

I, I did my PhD, uh, a few years
after I started working full-time

and I, I went back to Columbia,
but kept my job at Sinai.

Um, and then I joined, once I
finished my doctorate, I, I became

faculty at Sinai, which was great.

And, um, while I was at Sinai, I
worked in a clinical trials unit.

Um, I spent a lot of time working in the
cardiothoracic surgical trials network,

so that's really where I got my feet wet.

And, um.

Became kinda a cardiovascular
trial focused person.

Um, so when I moved to Barry, I brought
a lot of that cardiovascular, uh,

tri trial, uh, experience with me.

And so I, I try to spend a
lot of time in that space.

Um, I'm very interested in heart
failure, so, um, that's where I'm now

and very glad to be at Barry doing
some more innovative things in that

space and happy to be here today.

Scott Berry: Very nice and, and,
uh, I, I don't know if we go back

to the beginning of the win ratio,
but it feels cardiovascular was, was

somewhere in the beginning of this.

And, and so I think it's a,
it's a part of the history.

And Amy, I know you've been
thinking of analogies or, or, or

what is the win ratio and, um.

So first of all, we should let our
audience know, uh uh, what is a win ratio

or a hierarchical composite analysis.

Amy Crawford: Sure.

Uh, yeah, I'll start and others
should feel free to jump in.

Um, so higher.

So when we're thinking about how to
measure, you know, whether, how a

patient feels functions or survives,
which is important in the clinical

trial space, that's the point, right?

Um.

There are, there are different
endpoints that we look at.

So if we stay in the cardiovascular
world, you know, you could

ask, um, do patients, uh.

On treatment tend to to live longer.

Do they tend to, um, you know,
have better, uh, quality of

life after their treatment?

Things like this.

And the, and the idea is that there
are a lot of different things that

go into measuring how a patient
feels functions and survives.

And what we're talking about today with
composite endpoints is combining, um, a

number of those, of those things together.

Components of, of how a patient, um.

Does after, during, or after treatment,
and comparing that to control.

When we talk about hierarchical
composite endpoints, we're talking

about ranking things in order
of, uh, clinical importance to a

patient and, and the community.

So you could imagine a heart
failure patient, um, has a

procedure in a clinical trial.

Um, and then we follow them and we,
we, we watch to see are, are they

alive at the end of a follow-up period?

Um.

How many times have they returned
to the hospital, uh, for heart

failure events and, um, you know,
what's their quality of life like?

And we can rank those in order and
say, you know, if they're alive or not.

That's the most important thing.

And only if, if if, um, if they're alive.

Do we really wanna look at,
um, the next layer in the, in.

Composite, which is say, heart
failure, hospitalizations.

How many times have they
gone back to the hospital?

And only if maybe, you know, they,
they, um, if, if we wanna move past that

and look at quality of life as like,
the least Im important, um, outcome.

And so, um.

The hierarchical composite takes these
components and it, and it puts them

in this order that's generally agreed
upon by, by a clinical audience.

There's, um, a process for
comparing people on this endpoint,

which gets a little bit awkward,
but I'll, I'll pause there.

Um, was that kind of what you
were looking for, Scott, or does

anybody else have anything to add?

Scott Berry: No, I, I, I
think that makes sense.

And so, uh, we're, we're
interested in quality of life.

In that, in your scenario, if you're alive
and if you've avoided hospitalization,

uh, uh, you know, hospitalizations for
heart failure are, are particularly

bad if we've avoided that we're
interested in quality of life or, or.

Uh, but if, if death happens,
that's sort of the first thing.

So we've got these three things.

If we went in and said the primary
endpoint is only mortality, we

probably need huge sample sizes,
and it doesn't completely capture.

The, a patient feels functions
and survives the functions part.

Um, if we only did heart failure
hospitalizations, we, we have

some of the same d same issues.

We also have to incorporate mortality
into that, and it doesn't incorporate,

um, uh, any functional part beyond.

Um, uh, hospitalizations if we
did only quality of life now

we've, we're ignoring mortality.

We're ignoring hos hospitalization.

So in, in three endpoints like that,
we may come back to this, and I know

there are many examples of this, and
we may talk about different examples

of this, but that's the general ideas.

We'd like to include all of these
endpoints as a primary rather than

any one of them seems insufficient.

So, Cora, how do we analyze such a thing?

If, uh, you know, I, I
want mortality to count.

I want hospitalizations to count.

I want quality of life to count only
if they, they survive kind of thing.

So how do you analyze such a thing?

Is it an orbital endpoint
or what do we do with it?

Cora Allen-Savietta: You're
setting us up so well.

Um, so to analyze this, um, Amy mentioned
that it gets a little bit technical and

complex, uh, once we're trying to analyze
people, but I think you set us up nicely.

So first we're going to take all the
patients in the treatment group, all

the patients in the control group.

And I'm gonna talk about the win
ratio here just for simplicity.

And we're gonna compare each patient
in the treatment group to every

other patient in the control group.

And we're gonna count how many
times did that treated patient win.

Um, and then we'll do that for
all of the treated patients.

And so we have all of
the pairwise comparisons.

It's as if we s.

Took a big group of people and we
said, you're all gonna play singles

tennis matches, and then we're gonna
count how many times each of you won

and how many times each of you lost.

And now we take the number of wins and
the number of losses for each patient.

We sum up that for the treatment group.

We sum that up for the control group.

And the win ratio is the ratio
of wins in the treatment group.

The ratio of win wins in the control
group, or you can use the win odds,

um, which is typically, uh, preferred.

Now, um, where you take half of the
ties and you add that to the top half,

the ties are added to the bottom.

That's a side note.

Basically, basically you get a ratio
of the, uh, relative effectiveness

of the treatment versus control.

Does that answer your question?

Scott Berry: Uh, it, it does, but let,
let's, um, let's make sure this is clear.

The, the tennis match.

So, uh, and by the way, we're, we're,
we're a two by two table here on

the screen, uh, of the four of us.

So Jessica and I are on the right.

So we're in one group, uh, core
and Amy and another group When.

Cora is in one treatment group and
I'm in another, and you say there's

a tennis, but how do you compare?

Who wins between the two of us
and how does that incorporate

the hierarchical component?

What?

Cora Allen-Savietta: Oh yeah.

That's great.

Yeah.

Let's get into that.

So if I was being compared against you,
Scott, you were in the control group?

I was in the treated group.

I missed that.

Let's, let's say that.

So I, okay.

Um, let's say I, neither of us, um, let's,
first we're gonna compare on mortality.

So did either of us die during
the treatment period, um, that

we have shared between us?

Let's say we both have the
same follow-up time, so we both

were followed for two years.

Did either of us die in those two years?

Let's.

Neither of us did.

If that's the case, then we're gonna
ask, um, we're gonna go down to heart.

Heart failure hospitalizations.

Were either of us hospitalized for a
heart failure event, um, or maybe a

heart failure medication, uh, diuretics
were needed, something like that.

Um, if the answer is no to that.

Then will be a tie on both mortality
and heart failure events, and we're

gonna then go to the next level.

So it really only, you only come
down to those lower levels if you

have ties on the higher levels.

So now we're at a quality of life
measure, and here's where we're

gonna break a lot of those ties.

So let's say that I'm able to,
um, have a longer walk distance.

Let, maybe that's the quality of
life measure than you are, and it's.

It's a meaningful enough
difference that we're gonna

call it significantly different.

Maybe I can walk 60 meters in
the, uh, amount of time, uh,

and you can walk 10 meters.

Um, so we'll say that I then won this
comparison of me versus you, and I would

get one added to my count of winds.

Scott Berry: Okay, so the hierarchical
components are how we play the game

against each other, each treatment to
each control, and at the end of the day,

we're looking at how, how many times
the treatment beats a control patient.

Largely, and we get the statistics
right for understanding the

variability of it, and we're looking
for perhaps is there a statistically

significantly higher number of wins
from a treatment to a, to the control.

Now wind could be on mortality, that
one survive longer than the other.

They could, neither of them could
have of died, or they both could

have, and they could be a tie.

And then you move down the
level and you keep looking to

see if there's a, a win or not.

And it is possible, people tie
throughout and they end up as a tie.

Um, this is like the NHL.

There's, there's lots of ties and
forms of ties, but there, there, there

could be, uh, ties as part of that.

Now.

When we do a primary
analysis, that's a win ratio.

And I guess I, I, I will throw
it to Jessica, but I want

everybody to, to, to jump in here.

Now, I, I, I guess I should come clean
here, um, in this is, I'm not a fan

of the win ratio and, um, I have a,
a love hate relationship with the

WIN ratio, so I, I should be clear.

To our audience out there.

Now, I've used the win ratio and I've
used hierarchical composites, and that's

Cora Allen-Savietta:
I'm still seeing Amy and

Scott Berry: benefits.

They have some things that I, I find
peculiar or less beneficial to that.

So let's sort of dive into all of

Amy Crawford: in, he can.

Scott Berry: When we do
a primary analysis on

Cora Allen-Savietta:
I'm not sure I heard it.

I think.

Talking maybe about the final analysis.

Scott Berry: Yeah, the final
analysis, and, and I, I, I, I

understand statistically I can say
one group's better than the other one.

How do we estimate or provide
an estimate based on this test?

And, and, and I'll throw it.

This is gonna be hard because I know you
all want to answer all of these questions.

Um, and so, Jessica, how do I,
what, what does the, what does the

final sort of s demand look like?

What does the final analysis look like?

Jessica Overbey: Yeah, so, um.

For the final analysis, we use the
Finkelstein Schoenfeld test, and

that's to make, get us that p value.

Um, but that doesn't have a natural
treatment effect next to it.

So that's how the win ratio
came about to compliment the

Finkelstein Schoenfeld test.

So the p value tells us on average
treated patients are having

better outcomes than control.

Um, and we go to the win ratio
to get our treatment effect.

So that tells us, um, it's
the ratio of wins to losses.

That's one that means
there's no difference.

The groups are doing the same
if it's greater than one.

We know that on average, treated
patients are doing better than controls,

and I think the official estimate is.

Say you compare, you pull out a random
treated patient to a random control.

This is the, this is the ra, this
is the ratio that the treated

patient will do better than control
is the official estimate language.

Um, but you know, in isolation,
I think what you're gonna get

at is, is that interpretable.

What does it mean to win?

Especially in the
concept of the hierarchy.

If you just look at the win
ratio, you don't exactly know.

Where in the hierarchy these
wins and losses got decided.

So there's an
interpretability issue there.

So usually right next to the win
ratio, there'll be this tree or, um,

that shows you the breakdown of, of
at what level each win and losses.

Occurred.

So at the top, if we're doing mortality,
we'll know this percent of patient

or this percent of payers won for
treated this percent, one for control.

And then you'll have the number of ties.

And then of those ties, you'll go
down to the next level and say, this

is how many won on heart failure.

This is how many lost.

Um, and so the tree lets you see.

Exactly what proportion of pairs are
decided by each endpoint and then within

each endpoint, what's the difference?

So in, and you really need that
tree, um, to compliment the treatment

estimate, which I think is unique to
the win ratio to have, to have all these

supplementary analyses to understand it.

Scott Berry: So they, the, the
actual number summary might be this

proportion of, uh, or, or is it the
odds of a win for the treatment group?

We talk about 1.2.

Uh, or something like that as
the, the 20% more wins for the

treatment group relative to control.

And I understand it's hard
to explain what that means.

'cause it could have been on mortality,
it could have been on the, the second,

it could have been on the third.

What, what is the number
summary that we get out of that?

Or is that just not important?

Jessica Overbey: I think
it's the wind ratio, right?

That's your, that's your
core treatment effect.

And, um, the most popular
is the wind ratio.

Cora alluded to the wind odds.

There's been a call to
use the wind difference.

So I think you can slice up these, these
proportions a lot of different ways.

Cora Allen-Savietta: Yeah.

And I will just note there that the win
odds is typically recommended in a lot of

these trials that we're seeing coming out.

Now, we'll see even at the final
analysis, up to 20%, maybe more ties.

Um, and in cases where there are
a lot of ties, the win ratio can

overstate the treatment effect, whereas
the win odds is gonna give, um, I

think a more balanced, uh, sense.

Um, really incorporating those ties.

Half the ties go to the top
half, the ties go to the bottom.

Scott Berry: Okay.

Now, if, if we're gonna build the, part
of the, what's hard about this question is

what's the com, what are we comparing to?

But, um, I, I get the attraction that,
uh, in a scenario we're worried about

running a trial in mortality or mortality
plus heart failure events that we may

need very, very large sample sizes and.

Quality of life, functioning of those,
uh, patients matters, uh, in this.

And so what, how does power come
out in this if you're working with

a client to build a trial, a, a win
ratio hierarchical composite trial.

How does power tend to come out in this?

Is this a, a net positive?

And I know it depends on compared
to what, but generally how are

those comparisons falling out?

Jessica Overbey: If you're talking about
doing a win ratio that incorporates

these lower level quality of life
endpoints versus of course just

doing your more standard mortality.

And clinical events only.

The win ratio is usually
gonna be more powerful.

If you're talking about taking the
components of the win ratio and

forcing them into another composite,
it's not always the case that

the win ratio is more powerful.

Um, so for example, if you
were like to do an aggregate.

Global z statistic of each component
that might end up being more powerful,

but without assigning weights, you,
you run into the issue there of the

clinical severity isn't being taken into
account, which I think will open you

up to criticism, um, and is where the
wind ratio really shines because you are

able to take that severity into account.

Scott Berry: Okay.

The win ratio takes severity into account
by ordering them, but the relative weight

that the third component plays relative
to the first or second is somewhat opaque.

Unspecified,

Cora Allen-Savietta: Unpredictable.

Scott Berry: uh, yeah.

Yeah.

Cora Allen-Savietta: Yeah.

Scott Berry: Yep.

And, and maybe, and I know Amy,
you've been working on a great deal

of trying to measure the contribution
of each individual endpoint.

Um, and, and presumably if nobody
dies, that endpoint has no impact.

If 80% of the patients die, all of a
sudden, that becomes a super impactful.

So how do we understand the contribution
of each of the pieces in the hierarchy?

Amy Crawford: Yeah, this is,
this is interesting, right?

Um, so like Jessica mentioned
earlier, you can look at the tree,

um, where the percent of pairwise
decisions made at each level.

And that's a great first step of
kind of understanding, um, the,

the number of decisions that
are made in this round robin.

Uh, by each endpoint, but really,
uh, just because you've ordered

them in in a particular way doesn't
mean they're contributing to your

analysis, um, in that order, right?

It depends on the prevalence.

Like you said, Scott, if nobody dies,
you're not going to break any ties.

You're not going to make
any decisions on that first.

Uh, level of the hierarchy.

And so the weight that that
endpoint brings to the analysis is

going to be zero or, or very low.

And as therapies get better and better,
I think we're going to see the weight

of, um, like mortality become less
and less in these analyses, um, right

as patients are, are, are surviving.

And so part of what the work
I'm doing gets at is, um.

Is interpreting the weights that
the tree is a great place to start.

But there's some nuances that happen
where like, let's say Cora and Scott

are, are being compared head to head.

And, um, the comparison
is broken on mortality.

Let's say Scott, Scott dies during
the follow-up period, Cora lives,

that's a, that's a pair that's, um,
decided by the mortality endpoint,

the hi highest level in the hierarchy.

Now you have taken away the opportunity
for a decision on heart failure events,

hospitalizations, and quality of life.

And so, um, there's, there are some
nuances where, when, when these

endpoints in these comparisons actually
go into the statistical test, um.

Decisions on higher endpoints in the
hierarchy actually take away opportunities

for decisions on lower endpoints.

And so the number of decisions, right?

It's not, it's not really an equal
comparison because the lower endpoints,

they, they may have had that opportunity
taken away by the higher endpoints.

And so.

In as the statistician, I'm
thinking, you know, contribution

to explaining variance, right?

The, if you're breaking ties at the higher
endpoints, you're explaining a lot more

variance in your test where, where the
variance is coming from in your test

and you're taking away the opportunity,
um, from the lower level, say, quality

of life endpoints in the hierarchy.

So, um, it's a, the tree is a
great, great place to start and

we're thinking about ways to.

Measure, um, measure this
in, in that context as well.

One other thing that I'll say is, um,
you know, when we talk about interpreting

the weights of these things, we're,
we are not able to, the, the test

doesn't capture how much you won by.

So, um, it's not that you can't say, oh,
okay, well, mortality broke a lot of ties.

Um, we made a lot of
decisions on mortality.

Um, it's, we're not able to then
go through and say, um, Cora lived

three years longer than Scott.

So there's not a weight of sort of
treatment effect amount even within that.

So there's kind of all these layers to it.

Um, yeah.

Scott Berry: So, so in a way it's a, I
mean, it's a very non-parametric test

in that sense that there is no relative
hazard ratio that comes out of this.

There's no, uh, event ratio for
heart failure hospitalizations,

where 10 is much worse than one.

It's a loss is a loss.

Um, uh, and of course you, you know
that I do a lot of things in sports.

It's sort of saying who won and lost
the game, and you ignore the score.

Uh, as part of that, the other thing
that seems to be part of this and,

and one of the, the, the is putting
things together that have very

different, uh, clinical meaning.

So it's not unusual that in
cardiovascular trials we do death,

we do, um, a, a non-fatal MI
or, or, or something like that.

And then the third level is pro bnp,
which those are very different things.

Now, I think sponsors want to
put that in there because it's,

let's call it a biomarker.

And I know it's, it's a.

It, it, it, there's a lot of research
going into what that marker means and,

and what other markers mean there, but
it's something that, that many things

may go to that as the tiebreaker.

And so when my overall test wins and
most of my things are broken by that,

that's really good for the sponsor.

That affects that biomarker, but it
still leaves open the interpretation of

are we having positive clinical effects?

And, and, and so this is, I, I, and I
know I, I, it, it, it's not for us to

say, but when, when we're helping to
build these things, that's going to be

something that presumably the consumers of
this regulators may not accept something.

If at the thing at the end really
isn't very clinically important, and

90% of ties are broken at that point.

And so the contribution
is so high, OO of those.

Jessica Overbey: I think you're
right and that's actually.

As we've done more win ratio trials,
some of the more recent trials

published are only having wins on that
lower order, lower order endpoint.

So I can think about the Luminate
trial that I think was a three

order composite of death, some sort
of clinical event and quality of

life, and it was very clear that.

It really came down to
quality of life only.

So the buy-in of the clinical
community might not be as high there.

So I think when you're going
to design a trial, I agree.

I think we always try to show simulated
trials where that ends up being the

case and saying, what if this happens?

Are you comfortable?

Is this, is this really
the route you wanna go?

Is really important.

Scott Berry: Okay.

Um, um, in this now we at Barry,
we do a lot of adaptive designs.

And adaptive designs can
be important part of this.

It seems like adaptive designs
with this endpoint can have

multiple tricky things to it.

Um, and, um, well.

I, let me turn it over to you all
'cause I know there's a number of tricky

things when doing an adaptive design.

Uh, so who wants to, to do this first?

And this may be, everybody weighs
in on this part, and I know there's,

there's parts to quantities of interest.

There's parts to the
relative weights of things.

So what are the challenges of
doing adaptive designs with a

hierarchical composite win ratio?

Cora Allen-Savietta: Jessica,
do you wanna start us off?

Jessica Overbey: Uh, I didn't,
I didn't wanna go right after I

already spoke, but, um, I've been
thinking about this a lot lately.

So, yeah, a great question.

Important question.

Um.

It is hard to do interim monitoring
on these hierarchical endpoints

because when you do an interim,
you've got partial data on patients.

And if you think about that pairwise
comparison at the patient level,

that's gonna change over time,
and that's gonna be very sensitive

to how much follow up they have.

So if you have an earlier biomarker
endpoint at say, three months, and

then maybe you look at the data.

And at that interim, a lot
of patients maybe have less

than six months of follow up.

You are gonna find that the win ratio
is being really driven by the KCCQ,

whereas if the final follow up isn't
for two years, you're gonna have,

you know, you expect, uh, the higher
order end endpoints, the clinical

events will eventually start, um,
taking up more of the hierarchy.

So when you do an an

Scott Berry: So, so let me make
sure, sorry to interrupt, but

lemme make, so KCCQ is measured.

KCCQ is measured at three months.

Say, uh, and, and this
is a quality of life.

And so when you do an interim, relatively
early in the length of exposure, and you

look at the win ratio as it's designed.

Many people haven't died yet.

They haven't had bad clinical
events, and so almost everybody's

being determined on this KCCQ.

But when the trial reaches a longer
level of exposure and now patients

are getting two years of exposure,
all of a sudden the other things

are now changing the win ratio.

And so that, that's the issue that you're
concerned with in doing adaptive designs.

Jessica Overbey: Yeah, I call
it the immature win ratio.

It, it's at the interim It's
not really reflecting what

it's gonna be at the final.

And that does create issues on
making decisions that day based

on that raw, uh, win ratio.

Okay.

Scott Berry: Okay.

Um, does that mean we don't do interims?

Can we, can we model these?

Can we do things like predictive
probabilities of success?

Um, or is this something
we shouldn't mess with?

Jessica Overbey: Um, well,
obviously we can do interims.

I'll let, I'll let another
person take this core.

Do you wanna take this one?

Cora Allen-Savietta: Yeah,
I'm happy to jump in.

I mean, I think we touched on this
earlier, that the win ratio, part of

the reason we choose it is for the, um,
higher power with smaller, uh, smaller

sample size, smaller lo shorter follow up.

So, uh, along with that, we
often have reason to, um.

Do interim analysis.

If we're seeing really strong treatment
effects early on, we wanna get this,

uh, this treatment to regulators and
then to patients as quickly as possible.

Or, you know, we also want to be
able to call futility on trials

that that might not be as as useful.

So there's strong reason
to do interims here.

Um, even if there are some challenges.

And one of the solutions that, that
we've used here at Barry pretty

frequently with some success is.

The predictive probability.

So here this is, um, maybe
I'll describe what a predictive

probability is, uh, right off the bat.

So that is, um, taking the data that
we have at a particular point in time.

Um.

If we predict forward, how likely
are we to see a win on this trial?

That's our predictive
probability of success.

Um, we could ask with the current
sample size, what's our probability

of success given the treatment
effects that we're seeing right now?

Um, we could ask at the
maximum sample size, what's our

prob probability of success?

So typically we might think about
doing that for a single endpoint.

So we could maybe start as.

Start simple and say, what if
we look at the deaths that have

occurred already, um, on the
treated group and the control group.

Maybe it's just one or two on each side.

Um, what's our probability of
winning on the trial, um, based

on just that, um, hazard ratio.

So.

That would be a single outcome,
predictive probability.

But now we want to be able to ask, see
what's the distribution of treatment

effects that we're seeing on mortality?

Then separately, what's the treatment
effects that we're seeing on heart

failure hospitalizations, and what are
the treatment effects that we're seeing

on KCCQ and for patients that don't have
that mature data, with the longer follow

up, we're gonna simulate their data
all the way out to the end, and we're

going to do that thousands of times.

Over those thousands of different
trials that are comprised of partially

simulated data and partially observed
data, we're gonna create a thousand

different potential outcomes.

And of those a thousand data sets, what
proportion end up with a win and what

proportion end up as a loss using the
win ratio or actually the Finkelstein

Schoenfeld and its corresponding P value?

Um, how, how likely are we to win?

So that incorporates the idea of, um.

This kind of immature win ratio.

We're not relying on that immature win
ratio and it's P value that's more heavily

weighted to those lower quality of life.

We're not relying on that.

At the interim, we're actually using
a more sophisticated model that allows

us to think about how many deaths, how
many heart failure hospitalizations

are we gonna have over time as they ac.

Scott Berry: Okay, so let me see if I,
I, so early on, if you just calculated

the, uh, Jessica's immature win.

That's not necessarily a good estimate
of what it's gonna be in two years,

because we know we're gonna get more
mortality and heart failure events,

and it's not gonna be only this.

So you're modeling the effect of the
treatment on the individual components

to forecast two years from now.

What is this gonna look like?

So for example, if we said we're gonna
run this for 20 years, we might have.

Almost all of them are mortality
events and it's broken there, so

you're modeling the effect there.

So that can be done to make any
decisions, uh, during an adaptive

design, futility, uh, uh, stopping
enrollment, whatever it is.

The, that, that's a, a really
good way to understand what a

less immature win ratio could be.

That a fair summary.

Cora Allen-Savietta: I think so.

Scott Berry: Okay.

Okay.

Alright.

Um.

What about, uh, I, I, okay, I won't ask
the regulator question, but that's coming.

So think about that.

The, the regulator question's coming.

Um, um, it, it, what is frustrating,
and maybe I'm going sideways,

but I do go sideways a lot.

Um, what, what could, is this,
Amy, is this an ordinal endpoint?

Could I just ignore this whole thing
and create an ordinal endpoint instead?

Amy Crawford: I mean, um,
yeah, you could, there are,

Scott Berry: Okay.

Amy Crawford: you could do
whatever you want, Scott.

No.

Um, so yeah, there are a lot of cases
where, you know, um, this does just

collapse to an ordinal endpoint.

Um, if every, you can imagine, say
we're running a two year a, a study

with a treatment period of two years.

Everybody's followed for two years.

At the end of two years, I
don't really need to do any

pairwise comparison of patients.

To tell you, um, the order in which
patients, um, fell out in, in terms of

how well they did in the trial, right?

If everybody's got two years of follow
up, um, and, and I died first in the

trial, I had the worst outcome, right?

And you can kind of create a ruler that's
independent of any pairwise comparison

and you can just order patients and,
and then you have an ordinal outcome.

And there are

Scott Berry: so largely, so.

Largely you compared to
everybody, you lose to everybody.

Amy Crawford: I lose to

Scott Berry: sort of like, so you're the
worst and somebody beats everybody and

you could almost rank them, uh, in that.

Um, okay.

Amy Crawford: Yeah.

And so you can imagine, you know, if,
if, if we, if not all of us have two

years of follow up, if, if, if, say I
have one year of follow up and Jessica

has two, what happens in the pairwise?

This is kind of one of the nuances.

What happens in this win ratio?

Uh, the head to head is we actually take
Jessica's two years of follow up and we

scale it back and we say what happened
to Jessica in one year in the trial

and compare Jessica to Amy at one year.

Right.

And, and so we have all of this
differential follow-up, and, and

that's, this is kind of why the FS
test was, was designed is, is so that

you can, you can scale back to the
lowest common denominator of follow-up

time and make comparisons in that way.

And so you can imagine that
we lose some transitivity.

Um, when we do the pair, the round
robin, because I'm being compared

to Jessica one year, but Jessica's
being compared to core up two years.

Um right.

And so that kinda breaks the
independent ordering, um, which you

may want in an ordinal endpoint.

Um, for an ordinal model, but, um, but
for the most part, yeah, these are,

these are essentially ordinal endpoints.

Um, and, and depending on some assumptions
you make going in, you can create, you can

use them to create, um, an, an ordering
of the patients in the trial and, and,

um, and, and use that as an ordinal

Scott Berry: Okay, so, so sometimes you
do have a lack of transitivity, which is,

which is sort of weird that you beat me.

I beat Jessica, but Jessica beats
you, uh, would be, would, would be.

And that can happen in these tests,
which are sort of awkward in the

whole interpretation of this, that.

You know that, that it doesn't
hold that in that scenario.

Um, and so that, that can be
awkward in this, this part of it.

Okay.

Now I know all of you have gone to
regulators with win ratio tests, and

the worst question I can ask you is
what do regulators think of win ratios?

Because regulators are diverse.

Not every win ratio is the same.

There are some where the
components may all be.

Clinically similar, there may be
some widely different kind of thing.

Um, what are some of the concerns that
regulators have had in, in designs where

you've had win ratios and I, I, I guess
we can throw this to, to all of you.

Uh, Amy, what, what concerns have regulat.

Amy Crawford: Yeah, we've talked a
lot about, um, well there's, there's

a few competing perspectives, right?

There's C Clinic, clinical, um, which
is a lot of discussion about, um,

endpoints, ordering of endpoints, what
constitutes a heart failure event.

Um, things that are, things that are,
you know, defining the hierarchy.

Um, from a statistical perspective,
there are questions of, you know,

how, how are you analyzing it?

Um, we talk a lot about ties,
um, and I think this was

alluded to earlier, but, um.

What do you do with ties?

Um, what, what makes a tie?

So on the quality of life endpoint
at the end of the hierarchy.

Um, I think Jessica mentioned
earlier, a lot of times you

only want to declare a winner.

If there's a, you know, clinically
meaningful difference between patients

and quality of life, and what is,
what is clinically meaningful?

It's a kind of a clinical question, but
it's also a statistical question because

the more ties you have, the fewer, you
know, um, wins and losses you have and,

and that kind of can take your power down.

So, um, lots of competing
interests, but I think generally.

My experience, regulators have,
um, sort of accepted this after

discussion and, and, you know, with
a reasonable approach and, um, yeah.

But I interested to hear what Cora
and Jessica, their experience,

um, with the regulatory.

Scott Berry: Cora.

Cora Allen-Savietta: Yeah, so I'll echo.

What Amy said.

I think, um, after discussion on
important things like what's the

tiebreaker, um, thresholds, how much
better do I have to be on KCCQ compared

to Scott to be considered a win?

If we're just one yard different, or sorry
for one point difference for KCCQ, then

we're not going to, uh, call that a win.

For example, uh, maybe we need a
five point buffer, 10 point buffer.

Um.

Uh, or caliber.

Uh, those pieces are important, I think.

Um, something else that we often talk to
about regulators is what's the package

of supportive evidence that we're
gonna provide along with the win ratio?

So, um, we're gonna give the Finkelstein
Schoenfeld the p-value We're also gonna

give the win ratio, the win odds, the
win difference, maybe even a net benefit.

Measure.

All of these are really
nice, complimentary, what I

like to call win statistics.

Um, and each of them has
their kind of pros and cons.

Um, and together they tell
a more complete story.

But I think the most important pieces are
really that tree that Jessica described

where we're showing where the winds
are happening at each level, how many

ties we end up with at the end, and
then the individual component analysis.

So we wanna see, um, for.

People, uh, for everyone.

What was the mortality, um, for
people who didn't have a mortality

event, um, what does the rate ratio
look like for hospitalizations

for people who had neither
mortality nor heart failure events?

Um, what does, uh, the quality of
life measure look like in terms

of just differences in means?

And these don't have to be complex,
but I think that it's really very

challenging to interpret, um,
something like a win ratio without

all of that complimentary material.

I think.

That doesn't have to be a negative.

I think as statisticians we should think
of ourselves as storytellers and we should

always be ready to present a kind of
complete report, um, about the primary

analysis that's beyond just a p-value Um,
and then I am hoping Jessica maybe can

chime in on, um, some of the conversations
we've had with European regulators.

I've been mostly speaking about,
um, American regulators, but we've

also recently had some interesting
interactions with Europeans.

Jessica Overbey: Yeah.

Um, so.

Agree with everything.

Cora and Amy said, um, kind of leaning
into our recent experience, uh, with

European regulators, but I think
FDA also has these concerns is back

to that interim monitoring piece.

Um, I think there's some discomfort
with doing group sequential

analyses on the wind ratio.

And that would be the case where
you take the immature wind ratio

and make a decision that's gonna get
pushback, uh, on both sides of the

ocean, um, for a number of reasons.

And, um.

If you can demonstrate that maybe doing
the, once you, once you've enrolled every

patient, if your, if your final analysis
is at two years, you can sometimes.

Get a group sequential analysis, maybe
at one year, demonstrating that the

distribution of wins and losses isn't
that different from one to two years.

That could happen, but you have to, you
have to be able to demonstrate that.

And then there's also, um, the piece in
terms of what's the information fraction.

There's a lot of conversation there.

Um, and then another piece is if you do.

Wanna go the predictive probability route
for maybe stopping enrollment early.

Um, which I think we're all a big fan of.

There is a concern that we've heard
of stopping enrollment because of a

very strong biomarker signal only.

Um, and that's not, I mean, that's
a great reor concern, but it's also

gonna be a concern of the company.

So what we have done is with the
predictive probability, we will only

build in statistical significance.

We'll build in that there needs to be
statistical significance and also a

direction of events for the clinical
events that's in the right direction.

Um, and that can help with that too.

Scott Berry: Hmm.

So the, the, an interesting part of
the win ratio, so in your example

where you do a three months quality of
life, but you're following patients to

two years, if everybody was at three
months, this is largely quality of life.

If everybody gets to two years, there's
a much bigger impact of mortality.

So, um.

The interpretation of it and the
value of this relatively changes

depending on the amount of exposure.

And so regulators may want to make
sure you have the amount of exposure

that makes this something that
they clinically find attractive.

It sounds like now some of
the things you talked about,

like predictive probabilities.

If we're doing a Goldilocks design where
at some point you say, we think we have

enough patients, we're gonna follow
them all to two years, is somewhat.

Somewhat sponsor risk.

I imagine that the modeling is
going to, going to bear out.

That when you get the amount of
exposure that's appropriate, this

all, this all ends up successful.

Jessica Overbey: Yeah,
I think that's fair.

Scott Berry: Okay.

Um, um, have I have, I, I.

I was gonna ask about thi this is
naturally a relatively frequentist thing,

um, and it's a non-parametric thing.

Um, can we, I, and, and this
is not a religious thing that

we need to make this Bayesian.

Um, can we do things like
incorporate covariates?

Can we do, uh, that, could
we do modeling of this?

Uh, other than just purely
who had the most wins.

Amy Crawford: Can I go?

Um, I'm gonna go.

Um, yeah, so I've had some
experience with this recently.

So, um, yeah, so we talked a little bit
about is this an ordinal endpoint earlier.

Um, and there there are cases where,
you know, it's natural to use the

number of wins and the number of losses.

Um, and, and let in this
round robin comparison and

let that order the patients.

And then, um, we can, we can take
that as an ordinal variable and

fit an ordinal regression model.

To that.

And now, um, the ordinal
regression model, proportional

odds model is semi parametric.

And we're in a space where we can,
we're in regression, you know, where

we, we can adjust for covariates.

Um, we can be bayesian, we can put
priors on, on, um, on parameters.

We could do a meta-analysis.

We could have trial a as round robin
and an ordering and treatment effect,

and we could have the treatment effect
follow a common prior distribution

with trial bs, you know, round,
round robin and, and ordinal, um, you

know, odds ratio treatment effect.

And that follows a common distribution.

And, you know, so you could think of all
the, all the possibilities once you're

in a regression space to be bayesian,
to do modeling, to get intervals

that are interpretable and nice.

Um.

And so that's, that's something
that I've been, I've been

working on a little bit lately.

Um, one other thing that I'll mention,
uh, I had it and now I lost it.

Um,

Nope.

It's still gone.

Scott Berry: Oh, okay.

Okay.

Okay.

Alright.

So

Amy Crawford: I'll chime
back in if I remember.

Scott Berry: Okay.

So let, let, let me, uh, throw this out.

Um, and maybe.

Closing, parting shots in this or overall
viewpoint of this things going forward?

Uh, uh, closing.

I, I don't wanna do this 'cause I don't
know whether you all want to make the

case that these are, you know, this
solves all our problems, but, uh, closing

comments on, on hierarchical endpoints,
win ratios, where we're going, Jessica.

Jessica Overbey: Sure.

Um, I think that the win ratio.

Can be cool and very
useful in certain settings.

It has a lot of problems.

We've talked about them, but you
know, there's no perfect endpoint

oftentimes in, in a heart failure trial.

Um, so until we come up with
that perfect endpoint, I, I say

there's a lot of different options.

They all have pros and
cons go in eyes wide open.

Sometimes Lu win ratio is a great fit.

So I tend to be pro in, you know, when
it's the best fit, it's the best fit.

Scott Berry: Mm-hmm.

Cora.

Cora Allen-Savietta: Um, I
largely agree with Jessica.

I think thinking about where
we've been and, um, where we

are now of trials that we're.

Just ranking patients, just putting them
in two categories where they successes

or failures after two years, the win
ratio is a huge improvement from there.

Um, and even thinking about time to first
event as a comparison with win ratio,

I think the win ratio is capturing much
more, um, significance to patients.

And I think that's really,
um, a benefit in terms of, uh.

Delivering treatments that are
gonna have clinical meaning

to patients and clinicians.

I would love to see some more exploration
of other composites that I think put a

little bit more of the responsibility
on the trial lists, um, on the, um, on

the key stakeholders to put weights.

On these components, it's hard to
put weights on these components,

but we have really nice tests.

Uh, the global test is one that
allows you to say you do univariate

analyses of each of these
components and pre-specify weights.

If mortality is the most important to
you, put the highest weight on that.

Then a medium weight on heart
failure, hospitalizations, and

then a lower weight on biomarkers.

I feel a little bit uncomfortable
with the weights being, um.

Sort of left up to chance by how much,
how many heart failures and mortalities

we happen to see within a trial.

Um, I like trialists to maybe take
some responsibility for setting those

weights instead of kind of having
them be kind of hidden and implicit

because hierarchy is not weight.

Scott Berry: Now that's exciting.

Yes.

Uh, Amy.

Amy Crawford: Yeah, I,
I, I feel the same way.

I think waiting to see what the weight,
the, the, the weights fall out to be

at the end of the trial is, um, I,
I, I agree with Cora and, um, it's

a, it's an interesting space, but I,
I do think, you know, I understand

how, how the, the, especially the
cardiovascular field got to this place.

You know, lots of things
matter to these patients.

There is a clinical
ordering to those things.

Um, and sometimes I, like Jessica
said, it is the right thing to do, but

I, I, I think that, um, I think we're
headed in, in a direction, hopefully,

where we can do smarter things.

Um, and, and so it's useful, it, it,
it's not my favorite, but it is useful.

The interpretation piece is really
sitting down as the statistician

and, and having to say, you know,
bear with me or something like that.

I need to show you four things,
four slides in order for

this to come through cleanly.

You know, that's a challenge.

So, um,

Scott Berry: Yeah.

Uh, and I, and yep.

Sorry, and I, and I know you're all
working on very hard, really nice graphics

to demonstrate all of that, so that,
that's pretty exciting work as well.

So appreciate it.

Appreciate you all coming in
here and, uh, for everybody

else, we'll see you next time.

We will be here in the interim.