In the Interim...

In this episode of "In the Interim…", Dr. Scott Berry and Dr. Lindsay Berry investigate the statistical foundations and clinical implications of analyzing ordinal endpoints, drawing on experience from major stroke and COVID-19 trials. Discussion centers on the Modified Rankin Scale, DAWN, MR CLEAN, and REMAP-CAP, demonstrating that methods such as proportional odds, dichotomization, and utility weighting all impose explicit or implicit clinical weights on the outcome categories. The episode presents direct mathematical derivations, exposes the equivalence between proportional odds models and value-weighted analysis, and uses real trial data to explore how statistical and clinical perspectives on endpoint weighting may diverge. Emphasis remains on transparency and the need for clinically relevant weight assignment in trial endpoints.

Key Highlights
  • Structural overview and clinical significance of the Modified Rankin Scale scores.
  • Illustration that proportional odds models and dichotomized analyses apply hidden, prevalence-driven or threshold-based weights.
  • Utility weighting in DAWN, formulated from EQ-5D patient utilities and economic studies, with observed alignment.
  • MR CLEAN investigators' critique of utility weighting; empirical data demonstrated relative consistency and challenged the claim that statistical approaches resolve variation across patients.
  • REMAP-CAP platform trial: Organ Support Free Days endpoint analyzed with proportional odds imposed weights on the scale from death to free of organ support .
  • Extension of these arguments to win ratio/rank-based approaches, with caution that all methods encode clinical assumptions.
For more, visit us at https://www.berryconsultants.com/

Creators and Guests

Host
Scott Berry
President and a Senior Statistical Scientist at Berry Consultants, LLC

What is In the Interim...?

A podcast on statistical science and clinical trials.

Explore the intricacies of Bayesian statistics and adaptive clinical trials. Uncover methods that push beyond conventional paradigms, ushering in data-driven insights that enhance trial outcomes while ensuring safety and efficacy. Join us as we dive into complex medical challenges and regulatory landscapes, offering innovative solutions tailored for pharma pioneers. Featuring expertise from industry leaders, each episode is crafted to provide clarity, foster debate, and challenge mainstream perspectives, ensuring you remain at the forefront of clinical trial excellence.

Judith: Welcome to Berry's In the
Interim podcast, where we explore the

cutting edge of innovative clinical
trial design for the pharmaceutical and

medical industries, and so much more.

Let's dive in.

Scott Berry: Welcome everybody.

Back to In the interim, I'm your
host, Scott Berry, and today I have

a guest with me and my guest is Dr.

Lindsey Berry.

Lindsey, welcome to In the Interim.

Lindsay Berry: Hi.

Thank you.

I'm excited to be here.

I, I did not know I was gonna be on a
podcast when I woke up this morning, but

Scott Berry: Yes.

Lindsay Berry: it's the, the
crazy life of a statistician.

Scott Berry: Uh, yes, it's at
any time you could be called

into, uh, an interim analysis.

Uh, and I guess, and it's
particularly problematic when the

host of, in the interim is your
father and he can get a hold of

you, really at any particular time.

So, thanks for joining on
the spur of the moment.

And I, and we'll get to why, uh,
within this, uh, first you've been

on, I think this is your third.

Interim analysis, uh,
third in the interim.

And, um, but let's, let's
introduce to everybody who I know.

Everybody out there watches
every episode at least once

and probably know who you are.

But, um, you tell us a little bit about,
uh, how you got to in the interim.

Um, you, you went to undergraduate
at the University of Texas,

and how did you end up here?

Lindsay Berry: Yeah, I, well.

Let's see.

So I, I did my undergraduate
in math at, at UT Austin.

Um, I liked math.

I liked statistics.

I did a couple of summers
interning at Berry Consultants.

Um, just learning, you know,
what, what do they do here?

Um, um.

All, all kinds of things.

I, you know, got advice from some
berry consultants on graduate school.

That was kind of what I
was thinking of doing next.

Um, and, uh, I talked to a
couple statisticians here.

Uh, Melanie Cantana was one.

She had gone to Duke, um,
and really enjoyed her time

at the Duke PhD department.

And so that's where I ended up,
uh, studying and getting my PhD.

In statistical scientists,
uh, statistical science.

Um, so while I was there, I
learned a lot of, uh, you know,

focus in Bayesian statistics, um,
not much clinical trial design.

Um, I did my dissertation in Bayesian
dynamic modeling and prediction

for time series, and specifically
time series for count data.

So this was kind of.

The, the application was building
models to predict the daily sales

of supermarket, um, items was
kind of our core application.

Um, and, and so you can imagine
there's kind of elements of different

scales of, of modeling where maybe
the daily sales of a product are, are

small counts where you might sell a
couple of boxes of spaghetti each day.

Um, but then you have.

You have higher level scales where
you might aggregate across stores,

um, and have regional sales.

Um, you might aggregate over time
and have models on, on those scales.

So this was kind of working on.

Can we leverage some of that
hierarchical information?

So can, using information about, um,
pasta as a class of items and how, what

we know about predictions of pasta at
that level, can that better inform our

predictions of, um, you know, Barilla.

Spaghetti, um, on, on Tuesday next week.

So,

um, that was, that was what my
focus was on in, in graduate school.

And, um, when I, when it come,
came time to leave, I was not

sure exactly what I wanted to do.

Um, and I talked to Scott and he has a,
a very convincing way of, of selling.

Um, what we do here at Berry Consultants
and so I was kind of sold on the idea

that, um, Bayesian statistics can make
a big difference in clinical trials.

And it's, um, you know,
this is a way where.

What we do has impact and um, I
think it really matters to the

patients at the end of the day.

And it's also just, you know, I get to
do all the things I like to do, which

is, you know, a little bit of coding.

I get to learn about new disease areas.

Um.

I get to, to write, I get to read papers,
um, I get to collaborate with people,

clinicians, um, lots of different people.

So it, it all, uh, worked out really well.

And, um, I'm now working here at Barry
Consultants and I've been here, uh,

coming up on seven years and loving it.

Scott Berry: And, and to back up
a little bit, there's kind of an

interesting side story to this.

So I, uh, your grandfather Don Berry.

He had, uh, on his dissertation
committee, sort of a co-advisor, was

Jay Cade at Yale with Jimmy Savage.

And by the way, if you haven't seen
the podcast where he talks about Jimmy

Savage, you need to go listen to that.

It's fantastic.

And then I went to Carnegie Mellon
where Jay Cade was, and he was my.

And there, I've not heard of other
stories where, uh, a, a parent and

child both had the same advisor,
and your last choice of graduate

schools was Carnegie Mellon and Duke.

And

you could have gone to Carnegie
Mellon and been a third generation.

Jay Cade, he said he
would, he was emeritus.

But he would've of happily been there.

But you, you, you went to Duke and, and
Mike West was your, was your advisor.

And, uh, so an interesting story.

Okay, now what are we talking about today?

Um, I'm gonna introduce how we got here.

And I, I, I posed a problem to a
number of statisticians here, uh,

at, at Barry and Lindsay was one
of them and she helped solve it.

And so we'll talk about what,
what the problem is, uh, and,

and what the solution is.

And so I, I want to try
to set it up a little bit.

So years ago, this is, this is,
uh, analyzing ordinal endpoints.

And years ago we were involved
in the design of a stroke trial

and the, the trial was the DAWN
trial, DAWN, and it was an adaptive

design with potential enrichment.

And we were going to enrich off certain
patients, depending on the size of their

stroke and the clinical mismatch, if the
device, which was endovascular therapy.

Wasn't beneficial to them.

And so the adaptive design was gonna
analyze them and it might narrow

in on a, a position where, uh, the
endovascular therapy was beneficial.

This is at a time where
endovascular therapy, which by

the way now is an incredible
therapy, saves many, many lives.

Saves much morbidity.

At the time, it was
controversial, it was unclear.

There was actually
non-proportional effects.

It was increasing mortality, but
increasing the number of good outcomes as

well was quite controversial, and so we
were really worried that we would enroll

patients where it wasn't beneficial.

But understanding it's not
beneficial across an ordinal

outcome is a little bit challenging.

Are you only looking for
elevated bad outcomes?

Less good was particularly challenging.

So, uh, I'll set this up and, and
many of you may know the endpoint,

the standard endpoint in stroke
trials is the Modified Rankin score.

And it, I think it's universally
used in acute stroke trials.

It's used in a number
of other trials as well.

It's a, it's an outcome that seven
possible states, everybody recognizes

the orality of this and the seven
states are very much considered

clinically different states.

The best outcome is a zero.

So the modified Rankin score of zero,
essentially you have no symptoms.

Um, and so Lindsay is a zero.

Uh, I'm a zero as well, but, uh,
within this, so, uh, we're, we

are zeros, we have no neurological
symptoms, relatively healthy.

And then it goes from 1, 2,
3, all the way down to six.

So one, for example is not no
significant disability, you're still

independent, but you have some.

You have some symptoms,
neurological symptoms.

You may have weakness in the arms.

You may have a little bit of
difficulty walking, but you're

largely no significant disability.

Two, you have slight disability.

You still care for yourselves.

You're able to brush your teeth,
you're able to ambulate you, you,

you, uh, are still independent.

Then three is moderate disability.

You have some need for walk assistance.

You need some assistance.

Uh, from, from care, you, you, you
can't necessarily care for yourself.

Four is a, is moderately severe.

It's referred to, uh, unable to walk
alone, unable to really care for yourself,

but you are, you're out of a hospital.

Uh, you do have some, some
cognitive function within that.

And then five is, is
nearly a vegetative state.

Uh, you're, you're, you're
hospitalized, you're in a bed, we'll

call it a vegetative state, and then
six is dead, uh, uh, within that.

So those are the seven outcomes,
and it's a validated outcome.

We can measure that, and this
is the outcome we look at in

a stroke trial at 90 days.

So the question is how to analyze that
and how to understand, uh, the device

versus a control and whether we should
enrich proportional odds doesn't really

work because it, you know, it, it, it's
funky in, in how it does that we might

be increasing deaths, uh, within that.

So we had to create a weighting
of the ordinal scales in

order to make that decision.

So we worked, and I'll come back to this.

So we, we created what we called a
utility weighted MRS, where we created a

numerical ranking of those seven states.

And I'll describe more about it.

I think it's important to describe
that, but very simply, for

example, if you're a zero, that's
the highest score you can have.

You're a one.

That's the highest neurological status
if, if you're dead or actually vegetative.

They're both zeros.

And then the values between those
are numerically, so a one is 0.91,

uh, it's not as good as a
zero, and then it's 0.76.

For two, and then it's 0.65,

and then it drops to 0.33

at four, and then it drops to zero.

So that's the numerical waiting,
and we'll talk about how we got.

Okay.

The Dawn trial ran and it actually
turned out to be an incredible success.

We ne the trial never enriched.

It was beneficial for all
the patients enrolled.

Uh, it it, it launched endovascular
therapy through 24 hours.

These were patients
that had later strokes.

And, uh, it was significant on
any way you analyze the endpoint.

It was significant.

You probably could have dichotomized
it at any point in the trial,

would've been significant.

Uh, but it was certainly
significant on the utility weighted.

Okay.

Now we got a, we got
criticism for the scale.

And, uh, it's a really inter now it's
one of these things, I think it's

much more interesting to me whenever
you have this and all of a sudden

you're criticized in, in the academic
literature for, for the approach.

Uh, it's kind of exciting.

Within this, so the investigators,
European investigators, which

led a trial called the Mr.

Clean Trial, which also was
highly successful, demonstrating

benefit of endovascular therapy.

And, uh, this is, uh, uh, this came
out in stroke and it was, uh, pushback

on the utility weighted scale.

The lead author is, is deland.

And, uh, the Mr.

Clean investigators, now I'll get
to the criticisms they had of the

utility weighted, but did, did
I set the problem up, Lindsay?

Lindsay Berry: Yeah, I think so.

I mean, for people who are.

Thinking about the utilities
for the first time.

It's sort of the zero and one
are kind of arbitrary, right?

It's just sort of the best is
a one, the worst is a zero.

And then it's the intervening values
and the the distances between them that

are kind of the important element there.

And then maybe, I don't know if
you wanna mention sort of the

alternative analysis or the way.

Way most people might analyze the MRS.

Um,

Scott Berry: Well, I, I will, and
that was part of the criticism.

They say, you

shouldn't do this utility weighted, you
should do it a different way within that.

But I, I, you're right, and we'll get
back to this, that the difference between

an MRS zero and an MRS one is 0.09.

That's the important part.

It goes from one to 0.91,

then from one to two, it drops 0.15,

that clinically according to that
scale, that's a bigger drop than zero

to one from one to two, and then two
to three drops again, another 0.11,

and then it drops.

Point three two.

So from a, from from at that point,
from a three to a four is 0.32,

and a four to a five is 0.33.

Those are the big drops
in the clinical states.

And so the relative weighting
matters in how you analyze it.

Okay?

So the criticism of them and
their alternative way is they

say, you shouldn't do this.

You should do a proportional odds model.

So a proportional odds model assumes
that for each dichotomous shift, like you

want to get zero, that's a good outcome.

There's some odds of zero for the
treatment relative to control, and then

there's some odds of being zero or one.

Then there's some odds of being
zero, one or two, and so on.

All of these shifts, and it
models that with a constant odds.

Uh, proportional odds.

This is also in the literature called
a shift analysis in the, in the

stroke literature, and they said,
you should just do proportional odds.

Now there, it's, it's sort
of irrelevant for this.

They argue about that it's higher
powered, and we actually pointed

out that they were wrong on that.

Uh, but that's less important to, to this.

They the biggest, when we pointed out
that it wasn't more highly powered.

They came back and gave several criticisms
of the utility weighted approach we had.

And, um, the biggest ones were
that they, they complain about the

statistical assumptions of it, which
I think is just completely wrong.

When you analyze using utility
weight, it's essentially a T test.

On the utility values that come out.

Each of the values has a weight,
and this is a, a, a, um, uh,

finite set of values, zero to one.

The central limit theorem kicks in at
five patients, probably 10 patients.

So there are no assumptions to this
analysis other than the weights.

So there's no proportionality
assumption to the utility

weighted their biggest criticism.

Was that not everybody
has that utility function.

So you're going into a trial and
you're assuming that that utility

function that I just described, the 0.9

1.76

was not everybody has that.

Um, and that was a huge
criticism that they have,

Lindsay Berry: And do you think that
comment, is it not everyone is in.

Patients will have different utilities
or is it sort of clinicians when they're

interpreting the result or, or both.

Scott Berry: I, I think it's both, but
their comments were more about patients.

Not every patient has those,
and interestingly, in Mr.

Clean.

They did a, um, uh, a question
where they asked them their

relative utilities of them.

Uh, it's not uncommon that they do
this, uh, EQ five D quality of life,

and they asked them the scales.

Turns out the values they got were
very, very similar to what we used.

The interesting thing
though, as they pointed out.

There's a standard deviation.

Not everybody says A one is 0.91.

Theirs was 0.95,

by the way, relative to, to
uh, a one, and then it was 0.8

7.65.

Really very similar to, to what we got.

But the point was
they're not all the same.

So you should do a
proportional odds model.

They even say when we, they pointed
out the, the assumptions that even

if proportional odds is violated,
the result is still a good test.

And so they're, they push back
heavily against, um, the, our

use of these utility weights.

By the way, this is a common criticism
that a lot of people have of this approach

is that not everybody shares that utility.

Okay?

So I, this bothered me.

Um, it bothered me that a statistical
algorithm was kind of, uh, the

proportional odds, assumption was.

Was the analysis and it was
avoiding the question of what is

the relative weight of the scale?

So I posed to Barry, I think the
proportional odds is weighting those

values, but I don't know what they are.

So I posed that problem and what happened,

Lindsay?

Lindsay Berry: Yeah, I, I mean, and going
back to your previous point, talking

about sort of, even if the proportional
odds assumption is violated, if you

fit the proportional odds model and
you get this estimated proportional

odds ratio effect, it's still.

Can be interpretable in that scenario.

And I think a lot, most people
interpret it as some kind of average

or weighted average of the odds ratios
that you would get if you looked at

each dichotomizing of an ordinal scale.

So the, the zero versus the one
to six, the zero to one versus

two to six, and and so on.

So it's some kind of.

Weighted average of those separate odds
ratios, and the weight is really based on.

The prevalences or the number of patients
that fall within each of the categories.

0, 1, 2, 3, 4, 5, and six.

Um, and so Scott posed this question and
I think, you know, he kind of intuited

the answer that, you know, it has to be
based on the prevalences in some way.

And so that was a big hint of sort
of where, where a place to start.

Um.

But you know, we, we went back
and we read some, some papers by,

um, John Whitehead back in, uh,
back in the, um, the 19 hundreds,

Scott Berry: Careful now.

Careful.

Lindsay Berry: I think from my birth year.

Um, uh, and so in this paper, um,
Whitehead presents some formulas

for sample size calculations.

Using the proportional odds model.

Um, and so there's really nice formulas
there for, you know, calculating sample

size for the, um, the distribution of the
score test statistic from a proportional

odds model where you have the treatment
arm as, as in a covariate in the model.

So, um, really nice formulas there.

And that was where we started.

Um, we took those formulas and.

you know, I don't wanna go
through the math, but really

it was just a matter of, um,

algebra rewriting those, um,
the score test statistic and the

variance of that test statistic,
um, rearranging the terms and the

weights kind of just fall out of it.

Um, you can show that the proportional
odds score test statistic.

Um.

That test can be written, so it's
equivalent to a test of a mean difference

in a specific set of weights that
are given to the categories of the

ordinal outcome, and those weights are.

Uh, defined based on the number
of patients that fall into the,

each of the ordinal categories.

So essentially the weights are
defined based on the prevalences.

Um, so I'll, I'll stop there and, and
see if I answered your question and.

Scott Berry: Yeah.

Yeah.

So the restate, the punchline
of this is you were able to show

the equivalence of a proportional
odds model is, you can write it.

As a, an analysis of weights of each
of those as though I were to write

down those weights and say, rather than
our utilities that we started with,

we could write down the weights that
it's mathematically the same as that.

So a proportional odds model
imposes weights on each of them

that you can write down, and they're
a function of the prevalence.

So the difference between a
zero and one depends on how

many zeros and ones there are.

You think more about that depending
on where, how many there are compared

to the difference between a three
and a four or a four and a five.

So let's, let's give
people a couple of these.

Suppose you were to see
very equal weights of.

A very equal prevalence of

the six states.

What, what weight do you
give to each of those?

Lindsay Berry: Yeah.

Then the weights, it's sort of a,
a, you can imagine the weights is

a line, so the distance in between
each category is about the same.

So the, the, um, distance between
a zero and a one, uh, or a five

or a six, um, is about the same.

Scott Berry: Yep.

So we could,

Lindsay Berry: of the scale.

Scott Berry: we could go in and
say, we're gonna weight these equal.

So a difference from a three and a four
MRS or a zero or one is equivalent.

They're all one.

That would be the weights imposed.

If you have equal prevalence, if you have
more negative outcomes, you weigh more

the difference on that side of the scale.

So a difference from a six to a or what?

Five and six are combined in this.

Um, a difference between
five and four is bigger.

Then zero in one because
of the prevalence.

Uh, within that, the, the the utility
you impose by doing proportional odds.

So the title of this has the
Fallacy of Ordinal Outcomes.

When they say the beauty of the
proportional odds is that all you

have to assume is they're ordinal.

So a 0 is better than a 1 And a 1
is better than a 2 and a 2 is better

than a 3 That's all you're assuming.

They say that's not true.

It's a fallacy and you are imposing
weights and you could absolutely

calculate exactly the weights you've
imposed and we'll come back to that

on on some other endpoints and you
could do the analysis that way.

And by the way, you get rid of any
assumption about proportional Odds.

So in some ways we're sympathetic
to their point that you don't

really care about proportional odds.

But what they're doing is
they're creating the weights.

We brought utilities to the table.

They're creating them through prevalence.

Now, their biggest criticism of
us was that not everybody in the

population agrees with your weights.

You're doing bad stuff, team.

Um, because that, there's
variability in that thought.

The proportional odds does exactly
the same thing, just they don't even

tell you what they are, uh, what the
weights are, but they're imposing

that everybody has the same utility.

But the difference is theirs
is created by prevalence.

There may be a objective
comfort with that.

It really bothers me that a statistical
assumption describes the clinical

utility of different outcomes as
opposed to clinical utilities.

Okay.

Now, uh, you, you there?

I'm not hiding where, where I stand
on this point, uh, sort of thing.

Uh,

Lindsay Berry: I mean, I think, I
feel like we're being a little harsh

on the proportional odds model.

Which I think you would say you,
you do use in some cases, and we

think it's a good model in a lot of
cases, but the objection is really

this, people putting it forward as
though it doesn't make any assumption

about the weights of these states.

And it's sort of agnostic and
all it cares about is the order.

And there's no weighting which isn't true.

You know, to, to create a single measure
for this scale, you need some way of

weighting the different, different levels.

There has to be a weighting really.

Scott Berry: and and that's
the point to this, that you

cannot avoid weighting them.

So the question is, how do you
do it now, the most common way in

which stroke trials are analyzed.

Is not the proportional odds model.

It's not utility weight, it's
dichotomizing the outcome, and the most

common is they dichotomize it at 2 0 2 is
is a success and 3 and above is a failure.

Now, that also imposes weights.

Again, you cannot avoid imposing weights
that imposes weights on those outcomes.

And I claim that, you
know, the, the, this Mr.

Clean investigators argued that
not everybody has your utilities.

I claim it imposes
utilities that nobody has.

Is the difference with dichotomizing,
which, what if we do zero through two as a

success and three and above as a failure?

What are the weights we're
imposing upon those states?

Lindsay Berry: Oh, the weights
would be, uh, 1, 1, 1 0 0 0.

Did I get that number right?

Scott Berry: Yep.

I think there's one more.

Zero, but you're saying

being in a state of three where
you have some mobility, you have

some handicap is the same as dead.

Lindsay Berry: mm-hmm.

Scott Berry: a hundred percent saying
the weight of those are all the same,

and having perfect neurological status is
the same as having, uh, some disability.

Um, uh, within that some two levels of
disability, you're giving those equal

weights, which I claim nobody has that.

So you are imposing that.

Everybody in the population
has those weights, the same

criticism of the utility weighted.

So now.

Coming back to this, and I want to, I
want to jump to a different trial and a

different endpoint where this comes in.

This has wider ramifications to
the win ratio, to all kinds of

endpoints, but, uh, I, I, I want to,
we, we explore, now let's jump to a

different trial, the REMAP CAP trial.

And, uh, Lindsay said, by the way,
she went from analyzing spaghetti.

In a grocery store and she joined Barry
and it was almost right upon joining

that we had this pandemic explode and
Lindsay got thrown into the REMAP CAP

trial, which was a global platform trial
in treating hospitalized COVID patients.

Um, within that, now describe the
endpoint we use for COVID in that trial.

Lindsay Berry: Yeah, so when the COVID
Pandemic started, they introduced a new

endpoint to the platform specifically
for, uh, the COVID group of patients.

It was an ordinal endpoint and
it's kind of a composite endpoint.

So the endpoint is called
organ support free days, and.

It's a composite of in-hospital mortality.

So the worst possible outcome
is the patient dies in hospital.

Um, and that that would be
the worst possible outcome.

And we label that with a minus one.

Um, it's just a label.

Um, and then.

If a patient survives to hospital
discharge, their ordinal outcome is based

on the duration of organ support that
they received during their hospital stay.

So there were certain, um,
types of, uh, organ support that

qualify for this definition.

So being on a ventilator, um.

Taking vasopressors, inotropes, um, a
couple different types of cardiovascular

and respiratory organ failure support.

Um, what we look at is in the
through day 21, how many days was

the patient free of organ support?

So.

The best possible outcome would be
you're free of organ support for 21 days.

So if you survived hospital discharge
and you're free of organ support

for 21 days, your outcome is 21.

For survivors, the worst outcome is that
you were on organ support the entire.

Duration of 21 days.

So your duration free is zero days, and so
your outcome would be labeled as a zero.

So there are 23 possible levels.

Um, for this outcome.

The worst is death labeled a minus one.

Then we have zero up through
21, which is the best outcome.

Scott Berry: And we model that
with a proportional odds model.

In remap Cap, as you said, you, you
sort of want to make, to make clear to

people that I do use the proportional
odds model, uh, within that, and I, I

don't hate the proportional odds model.

The interesting thing is, and, and by
the way, the results that came out of

remap Cap, that use that endpoint and
the proportional odds, now it combines

death and length of organ support.

It is waiting death
relative to other outcomes.

We just, we just learned that.

We never said what it was, but it's
imposed by the, the prevalence in there.

The, the results that
came outta remap cap.

I think overwhelmingly CLI
clinicians really liked the results.

When it had a negative propor, a, a
bad proportional odds or a good one,

they looked at the overall results.

Yes, that's clinically as good.

Within that.

Now you went back and calculated
exactly what our weight was of those

and it actually looks reasonable.

But that's prevalence dictated that.

So for example.

From surviving and leaving the
hospital, but you're on organ support,

the entire time compared to death
was a difference in utility of 0.3.

Again, the max difference is one.

On this we can normalize it to.

One was 0.3

from zero days free of organ support,
so being on organ support for 21 days in

the hospital, but living a change of 0.3

above that goes to about 15 days.

So two weeks shorter of organ
support is equivalent on the right

to somebody dying on the left.

We didn't do that.

We didn't create those clinical weights.

Nobody did.

The prevalence of how often they
happen created the relative value.

Those.

Now when clinicians looked at these
weights and you showed them and said, by

the way, this proportional odds that we're
doing this is what it does, they looked at

it and said, you know, it's not too bad.

Uh, it's pretty reasonable.

Um, within that, which is why
they liked the results, that came

outta the proportional odds month.

Now we tried to create a new endpoint,
so we were trying in remap cap to

include things below organ support.

Level of oxygen, length of time in the
ward, length of time in the hospital,

and we used a daily ordinal model That
was, again, a proportional odds model.

We ran through the same data and
showed the results out of out of

results we already had using this daily
ordinal with different ordinal values.

And largely the clinicians, I I
say, hated, they very much disagreed

with the proportional odds model.

Because all of the switches were
happening in states that were much

more clinically, much less clinically
relevant than the big shifts.

Which only had smaller number of
shifts and it was attracted to shifts.

It wanted to see shifts in the
models, trying to figure that out.

And so they looked at the proportional
odds weight and said, well,

no, we don't like that at all.

And we had to actually scrap it because
the statistical assumption was creating

weights that we didn't agree with it all.

Now we've been working on ways, can we
actually clinically weight these states,

and it's a, it's an active activity.

So I wanna go back and I never told people
how we created those original weights.

And when, when we were stuck
with this problem in dawn, we

didn't just arbitrarily do it.

We, we found two studies that had done
research exactly on this, looking at,

uh, patient reported outcomes on quality
of life as a function of their state.

And their preference on, on the
states and then an economic valuation

for the economic burden of those.

The amazing thing was they were almost
exactly the same, the economic burden in

the patient rating of those particular
scales, and we used those utilities

to drive the decision in the trial.

And so we found, I think,
clinically relevant ways to weight

those seven potential outcomes.

Interestingly, the five, the
vegetative state actually was negative.

It was worse than death.

We, we didn't allow that to happen.

We made that zero, um, within the
scale the, but otherwise, these are

the ways we created our utilities.

Now we maybe, maybe it was
a mistake, maybe it wasn't.

We used the word utilities and
that gets everybody flustered

when others do proportional odds
and it imposes a weight on there.

People are less flustered with that,
but because we called it utilities,

well, I have my own utilities.

You can't impose utilities on me,
is the sort of reaction to this.

But the point of this mathematically is.

By using a statistical
assumption, you're imposing it.

Now, which one's more comfortable to you?

And I'll, I'll just throw out a
couple cases where this comes out.

Everybody's familiar with a win ratio.

Win ratio is an ordinal outcome.

The A-L-S-F-R-S, the A LS Functional
rating scale doesn't include death.

Everybody's trying to figure out
how do you combine death in a LS

with the functional status of them.

And we do.

Non-parametric things.

We do calfs or we do joint models.

All of those things impose a weight of
death relative to functional things.

We've gone in with a weight and we
are told no, but to a joint rank, sure

you can do that, uh, sort of thing.

So it's been really interesting, this
handling of ordinal outcomes, trying

to create utilities in this with the
statistical assumptions, imposing them.

Lindsay Berry: Yeah, and I
don't think you get away.

From this issue if you just
do non-parametric tests.

So, you know, those also even the
sort of like rank based test, those

also have their own implied utilities.

Sometimes it's just really hard
to figure out what they are.

Um, they're kind of hidden
behind the, behind the math.

Scott Berry: Yeah.

And if we do, if we do a win ratio
where you count wins, and the first

level is death, the second level is
heart failure hospitalization, then

it's pro BNP or a a, a biomarker,
you are weighting those clinically.

What is a change in the, the Pro
BMP relative to a hospitalization?

Now it's done behind the non-parametric
test, but you have imposed a weight.

So a lot of these things, the win ratios,
the proportional odds models, the door,

uh, is a Scott Evans, where they look
at desirability of outcomes rating.

Within this, my issue with them is
maybe we should just be assigning

clinical value to the relative states
and get in front of the problem.

And not hide behind a
statistical assumption,

assigning those clinical weights.

What do

you think of that?

Now we've tried to do that in remap cap.

Uh, yep.

And yeah.

Lindsay Berry: yeah, it's
hard because you, I mean.

At its core, you're writing down what's
the difference in death and not death,

and that's just really uncomfortable
and it feels hard to justify.

And I, I think it would be unfortunate
if someone listened to this conversation

and their reaction was just, oh
my God, that's so complicated.

You know?

I don't know what I'm implying.

I'm just going to.

Dichotomize or I'm just going
to use death as my endpoint and

ignore the other information.

I, I don't think that's
what we would recommend.

I, you know, there is additional
information from these ordinal endpoints.

We think we should use all of
it, and we think those categories

should be weighted appropriately.

Um, and there's just figuring out how
to do, how to do that is the hard part

in the work that needs to be done.

Scott Berry: Yeah.

Yeah.

And,

and I, I do

phase three trials in stroke
where we do proportional odds.

You're gonna have to beat me over the
head and drag me to do a dichotomous.

But, but we do proportional odds.

We do proportional odds in remap
cap and we, we do win ratio tests.

We, we, we do this.

Um, and, and so we're, we're continually
trying to figure out the right way to

measure feels, functions and survives.

And these are the
challenges that go into it.

All right, so

we, the, the battle continues and
I, I, I, I love being able to pose

mathematical problems and Lindsey, Lindsey
solved them and it brings about really

interesting challenges in clinical trials.

Lindsay Berry: Yeah, and maybe we should
mention there we are working on a paper.

We have a first draft ready and a goal
of submitting it this year and hopefully

much sooner than the end of the year, so

Scott Berry: Yeah,

Lindsay Berry: hopefully
more to come on that.

Scott Berry: so everybody out there,
that was Lindsay's incredibly polite

way of saying the, the paper's in
Scott's corner and please hurry up

and, and get it off of your corner.

So yes, we, we are moving on that.

All right.

Thank you everybody.

Until next time, we will be here.

In the interim,