Communicable

In this episode of Communicable, Emily McDonald and Josh Davis are joined by Roger Lewis (USA) and Ian Marschner (Australia) to compare and contrast Bayesian and frequentist statistical approaches. The panel discusses the fundamental principles of both methods, common misconceptions, and the extent to which they are often more similar than many realise. Together, they explore their use in clinical trial design, analysis, and reporting, including adaptive trials and sequential learning. Additional topics include sample size misconceptions, regulatory versus clinical thresholds, and the challenges of interpreting post hoc reanalyses of negative trials.

This episode was edited by Kathryn Hostettler and the executive producer of Communicable is Angela Huttner.

What is Communicable?

Communicable takes on hot topics in infectious diseases and clinical microbiology. Hosted by the editors of CMI Communications, the open-access journal of ESCMID, the European Society of Clinical Microbiology & Infectious Diseases.

Emily: [00:00:00] Hello and welcome back to Communicable, the podcast brought to you by CMI communications ESCMID's open access journal, covering infectious diseases and clinical microbiology. I'm Emily McDonald. I'm a general internist and infectious disease trialist, working out of McGill University in Montreal, Canada, and associate editor at CMI Communications.

We are here today to demystify Bayesian statistics, statistics for clinical trials and compare and contrast, Bayesian approaches with frequentist approaches. I'm delighted to be co-hosting this episode with Josh Davis, fellow editor at CMI Comms.

Hi everyone. Great to be here again today. we're lucky to be joined by our two guests, Roger Lewis and Ian Marschner.

Emily: Roger. Lewis is a professor of emergency Medicine at the David Geffen School of Medicine at UCLA and a Senior medical scientist at Berry Consultants, ,LLC, a [00:01:00] group that specializes in innovative clinical trial design. Dr. Lewis is the Senior Statistical Editor for JAMA and editor of the JAMA Series entitled JAMA Guide to Statistics and Methods.

His Expertise Centers on Adaptive and Bayesian clinical trials, including platform trials, data and safety monitoring boards, and the oversight of clinical trials.

Pleasure to be here. Thanks for having me.

And our second guest is a fellow Australian of mine, Ian Marschner. He's a professor of biostatistics and director of the Biostatistics Group at the Clinical Trial Center at the University of Sydney in Australia.

He has 30 years experience as a biostatistician working on clinical trials and epidemiological research in cardiovascular disease, oncology and HIV aids. His research also focuses on the development of new statistical methodologies, particularly for designing and analyzing clinical trials. And Ian was formally the director of the Asia Biometrics Center with the pharmaceutical company Pfizer, and [00:02:00] also associate professor of biostatistics at Harvard University.

Josh: You,

hi everybody. thanks for having me and, uh, looking forward to the discussion.

Great. Okay. As our regular listeners know, we always start the podcast with an icebreaker. So, Today's icebreaker question relates to whether you're a betting person, so please tell us why or why not, and bonus points.

Emily: . If you can tell us about a memorable time that you won or lost.

Josh: Over to you, Roger.

Roger: Well, I guess my default answer would be no, I'm not a betting person, but the one time that I can recall going into a casino was at the casino in Monte Carlo because I was there for a meeting and it occurred to me since Markoff chain Monte Carlo, methods are fundamental to Bayesian inference that I as a Bayesian statistician need to go into that casino.

I did not have a jacket so I had to rent the jackets they have there hidden for people with, inadequate style choices. I got my 20 euros worth of chips, [00:03:00] went to the roulette table because I was pretty sure I understood that game. Lost it immediately and left

Emily: a sad tale.

Emily: okay. How about you?

Ian: well, I wouldn't describe myself as a betting person, but I, do actually have a bet once a year Josh will understand this, because we have a famous horse race in Australia called the Melbourne Cup that everybody sort of stops to watch and, usually, people sort of have a bet once a year on that.

And I, do, do that. It's become a bit of a ritual. I've never won anything of note betting, which is probably why I'm not more of a regular betta. But, on a related issue, I have become quite fascinated with the mathematical modeling of betting markets. and in particular, For example, in high scoring sports, the score difference actually evolves very similarly to the way the treatment difference evolves in a clinical trial.

And so the sorts of mathematical models that underlie some of the clinical [00:04:00] trial statistics are very relevant in that context. So perhaps in my retirement I will, take a look at that and try to win a bit of money.

Yeah, there's a lot to be won there, I think.

Emily: okay. let's get into the math and the approach that we take with science.

So I'm just gonna start. By getting us into the mood with a little joke.

Josh: so how do you tell the difference between a frequentist and a Bayesian? You, you should say how Emily.

Emily: How Josh?

Josh: Well, if you're telling a joke and a frequentist will laugh if the punchline is funny. But a Bayesian will only laugh if the punchline is funnier than expected.

Okay, that's good. So that, that doesn't tell me if you're a frequentist or a Bayesian. But anyway. so let's start then with a bit of history. Ian, can you tell us what got you interested in statistics and a bit about how you've seen the landscape of statistics, change over time as they relate to clinical trials?

Yeah, sure. Well, [00:05:00] it wasn't initially a plan of mine to study statistics. I went to university, expecting to major in mathematics, but fortunately they made me take stats, which was, something I'm forever grateful for. and that led to me majoring in stats and ultimately a, PhD that. involved a lot of infectious disease modeling actually, which was just at a time when there was a lot of funding around for HIV and AIDS research, back in the early nineties.

Ian: and so that ultimately led to me getting a job in, in AIDS clinical trials, in fact, in the us and, really led to me developing that expertise in clinical trials, statistics that I've sort of taken with me over the course of my career and over that career. I think I have seen, the landscape evolve quite a lot, from what we might think of as.

very rigorous single question inflexible sort of studies that, we used to do, [00:06:00] to more flexible multi-question studies that, offer, as I said, greater flexibility, but come with a lot of challenges in terms of, maintaining scientific rigor. and so that's become a real, real challenge in the modern environment.

And in fact, that tension, I think is, kind of relevant to the Bayesian versus frequentist discussion that I'm sure we'll get into during this discussion.

Yeah, it's a great point, Ian that we'll probably expand on that. a lot of people think frequentist trials means simple fixed trial design.

Josh: but there's such a thing as adaptive frequentist approaches, which we'll probably discuss later. Roger next. Can you tell us what got you specifically interested in Bayesian statistics and, why you were drawn to this approach and the field?

So originally my PhD was actually in biophysics.

Roger: I did laser spectroscopy with a heavy emphasis on data and analysis. And I was in both a MD and a PhD program at the time. [00:07:00] And I became clinically interested in emergency medicine. And let's just say there wasn't a lot of overlap between laser spectroscopy at macromolecules and the everyday practice of emergency medicine.

And I was actually wandering through a medical bookstore and there on the bargain table were the books that they'd been unable to sell. And there was a book on clinical trial statistics by John Whitehead, who was a real leader in the frequentist approach to sequential clinical trials. From a number of perspectives and just a wonderful contributor.

And as I've told John, subsequently, I managed to pick up his book for a dollar, and I read it and thought that this was the intersection I was looking for, that combined the mathematical and data analytic training that I had in my PhD program with my interest in clinical medicine and clinical research.

And what really drew me to the approach in that book was not Bayesian [00:08:00] versus Frequentist, it was the sequential approach to learning this idea that as you run a clinical trial, there's an ongoing stream of information and how do you make the most efficient use of that stream of information to make decisions.

As soon as possible, but no sooner than you should make them, while preserving the validity and integrity the trial design. I also didn't have formal frequentist based statistical training, but a math background and from a physics perspective, and that made me somewhat agnostic to a statistical approach.

And in that setting, I found that the Bayesian approach seemed more naturally suited to sequential learning. And so I was drawn with that over time, but it wasn't, a foregone conclusion. It was what worked better with the goal of trying to make trials as efficient as possible.

Can you talk to us a little bit more about sequential learning and what you mean by that so that the audience understands as well?

Roger: Well, sure. And, I [00:09:00] wouldn't wanna to. Suggest that sequential learning is something that's limited to the Bayesian approach. So in, both statistical paradigms as an incoming stream of data come in, you can analyze it at multiple times and over time you can learn more and more as the data become more, more robust or greater in quantity.

From a Bayesian perspective, based on the use of base theorem, the fundamental approach to learning is that you have some pre-existing representation of your current state of knowledge prior to seeing the current data that is updated through base theorem, which is just a mathematical equation. there's no question about the validity of the mathematics, and it updates that representation of your belief system in light of the current data.

Roger: The advantage of it for sequential learning is that today's prior distribution. Plus data becomes the posterior, which is in fact, tomorrow's prior. [00:10:00] So as data comes in sequentially, you can be always updating these distributions. And what is your current state of knowledge today can be updated to be your knowledge tomorrow or the day after.

So in the setting, for example, of a group sequential trial, it's a nice, coherent framework for learning from a sequential set of data in a frequentist paradigm, you can also update your test statistics. And in fact, John Whitehead and others have developed very elegant mathematical representations of understanding how other test statistics like a z statistic will develop over time with an incoming set of data and how to draw conclusions in that setting.

So both approaches give you a way to do sequential learning. From my point of view, the Bayesian is a little bit more natural.

Okay, so this is sort of another term maybe for when we talk about, prior probabilities and posterior probabilities, we could call that sequential learning.

Roger: I think the sequential part is that your, the posterior that you have from the data you, acquired last [00:11:00] week becomes the prior for your interpretation of the data you're gonna get next week.

Emily: Okay. I understand. Thank you. Roger Ian, I think many listeners will have encountered or read studies reporting Bayesian statistics some listeners are maybe studying this approach. Others are maybe considering it for their own trial. Some are clinicians who are just looking to learn how to interpret trials that are reported in this way.

Can you help us and give us a bit of a primer on some of the key differences, like very basic, starting from scratch between, analyzing reporting, understanding a trial uses Bayesian versus frequentist statistics, maybe emphasizing why someone might choose one approach or the other. But first just sort of giving us a primer on the two approaches and how they differ.

maybe Roger, do you wanna go first?

Roger: sure. I'm happy to. You know, I think in, both settings, there's a set of standard things we would hope to have reported so we can understand the [00:12:00] approach that was taken. Understand that the design of the trial was fully pre-specified, including the criteria for defining the trial as successful or positive, if you will.

they're obviously different. I'll talk about the Bayesian and then turn it over to Ian to talk about Frequentist. From a Bayesian trial, we expect to see a clear definition of the prior probability distribution, which may be very uninformative and actually relatively unimportant, but it still needs to be clearly defined.

And if it is an informative prior distribution where we're bringing in substantial external evidence into the interpretation of the current evidence, we really need to understand where that evidence came from and be assured that it wasn't selectively chosen, that it's a, fair representation of the current state of knowledge going into the trial.

Emily: Would it be fair to say that that's somebody's conducted three randomized control trials in the same area and they all showed the same outcome? So you could use those trials and that would be an [00:13:00] informative prior as opposed to. There's never been a trial done in this area.

Roger: Sure. that certainly could be one. And it certainly you could, even in a setting in which there's no pre-existing randomized data, you might use expert opinion or you might use data from observational studies. But I think one of the things that our listeners should think about if the prior information is informative is do they think that that prior information is a fair representation of all the relevant information out there?

Or was it carefully chosen as a way of tipping the hand, if you will, in the current analysis to get an answer that the investigators may have found more favorable? So after the prior information is, presented, including its source. there's a basic approach to how you model the data.

Do you consider the data normally distributed or ordinal, what kind of outcome you have? And make sure that the Bayesian approach to combining the current data with the prior information is mathematically sound. We obviously want a [00:14:00] clear, presentation of the posterior probability distribution, the result, if you will, and in the setting of many of the clinical trials that we're seeing now in which there are pre-planned interim analysis with the goal of either being able to declare early success and demonstrate efficacy at an interim, or perhaps declare futility or evidence of harm at an interim and, terminate a trial early when the test agent is no longer promising.

We wanna know that those, decision rules, that those interim analyses were pre-specified and not sort of, changed in response to the data that actually accrued. You don't want there to be any possibility that people modified the goal line based on what the data was showing, at that moment.

And then as a, last point, I would say that we do wanna understand the frequentist operating characteristics of Bayesian trials. So the standard frequentist operating characteristics like type one, error control, power bias, and estimation, which is not, [00:15:00] specific to a frequentist approach. We wanna make sure that we understand those characteristics of a Bayesian trial design because those still have relevance for helping people understand the robustness of the evidence generated by those trials.

Emily: I think that's a good starter. Can we get you, Ian, to kind of compare and contrast that with a frequentist approach to a trial?

Ian: Yeah, sure. so Roger obviously emphasized the importance of the prior distribution in Bayesian analysis. A frequentist analysis, is distinguished from a Bayesian analysis in the sense that there is no prior information that is infused into the frequentist analysis.

some people interpret that to mean that frequentist analysis sort of. Ignore prior information that's, not quite right. it's just that they take a more modular approach in the sense that, a frequent test analysis will summarize the information that the data provide for this specific study in [00:16:00] isolation without the influence of prior information.

Ian: And then any, sort of synthesis with prior information will occur subsequently in things like meta-analyses and sequential meta-analyses, that sort of thing. So that's a point of distinction. Although, in the end, prior information can influence the interpretation of frequentist data.

in terms of the differences in the way the analyses, are presented and in interpreted, it's actually, I would argue often. Surprisingly little difference between, what a Bayesian analysis says and what a frequentist analysis says, at least in the setting where the Bayesian analysis has used a non informative prior distribution.

Ian: But where the differences, seem to occur is that, frequentist analysis might be presented in quite a different way to the Bayesian analysis. So a common way of presenting a frequentist analysis is using a hypothesis test in which we calculate a p value and compare that p value to [00:17:00] 0.05. And it's a very dichotomous sort of, yes, no significant or not significant type of process.

That's a common way to present a frequentist analysis, but it doesn't actually define. frequentist analyses. there are other ways to present analyses that, are, are sort of more aligned with the Bayesian approach. All of us are quite familiar, I guess, with, confidence intervals and the fact that we would expect to see, a confidence in interval of a specific confidence level, particularly usually 95% confidence level.

So that's a, an important feature of frequentist analysis. increasingly, these days we're actually going a step further. Frequentist analysis and presenting what are called confidence distributions, which are basically a generalization of confidence intervals that essentially, encode confidence intervals of all possible confidence levels and, probability distributions that are very [00:18:00] analogous to Bayesian posterior distributions.

And so if you're comparing a Bayesian analysis with a non informative prior to a frequentist analysis, which is presented in terms of a confidence distribution, often, surprisingly, little difference between those results. although underlying it is a sort of fundamental, philosophical difference in the interpretation, which, Roger's already touched on.

The fact that, Bayesian analyses are about updating our. Beliefs essentially in the face of new data. Whereas, frequentist analyses are more about producing inferential summaries, that have a known probability of being correct compared to what is true in nature. so that's a fundamental, philosophical difference, I guess.

but often the reporting can be, I would argue, quite aligned.

That's funny that, you know, we we're starting by thinking about, frequentist the results of a trial being dichotomous, yes or no. But in fact the difference, and that's [00:19:00] what I had thought about, the difference between frequentist and Bayesian approaches.

Josh: They're different things, they're dichotomous. But from what you are saying, Ian, in fact, They're a lot more similar than meets the eye. They're not dichotomous.

Yeah. Look, I would argue that the fundamental concept in frequentist inference is not hypothesis testing. It's the concept of what I would call coverage.

Ian: So when you think about a confidence interval, that process of, calculating a confidence interval has a known probability of covering the true treatment effect. Usually it's a 95% confidence interval and that, concept of coverage, and the probability of us getting a correct conclusion in repeated instances of the particular, trial or experiment that we've run.

That's really the fundamental notion of frequentist inference, not, hypothesis testing, which I think people often, think of as frequentist inference. and when you extend that notion of the confidence interval to [00:20:00] the confidence distribution, which is a sort of, more, more modern concept that's being used in trials then that notion of a frequentist.

Posterior distribution is, almost, very similar to what we see in the Bayesian analysis.

So I guess one way I would look at it is that as the frequentist try to become more and more Bayesian, it's just really a compliment to those of us who are Bayesian a long time. I think that Ian and I are probably gonna have to, agree to disagree on one minor point, which is whether the nominal coverage, probability of a confidence interval is actually the probability that that interval contains the true parameter.

Roger: I think most bayesians would argue that a coverage probability is actually subtly different. than a posterior probability. But I also think Ian is absolutely correct that from a functional point of view, this is a distinction that statisticians can argue about, but probably has little impact in the way we interpret clinical trial data and make decisions about [00:21:00] patients.

So a distinction without a difference, if you will.

Ian: It's a technical issue that, probably doesn't need to concern us too much here. but yeah, would agree. And just on the compliment that, the frequentist are paying the Bayesians, I guess I would return that by saying, ensuring that you, conform with the frequentist operating characteristics, I guess is something that the frequentist would take as a compliment.

Roger.

is there like a concrete example that we could give to discuss the 95% confidence interval that you were saying, Ian?

Emily: 'cause I don't know that I totally follow what you said about it.

Roger: let me give it a try and see if Ian agrees with this. Okay. so what Ian is saying is that if you did a very large number of confidence intervals, you did a separate experiment and you calculated a frequentist confidence interval after every experiment on average, 95% of those confidence intervals would include the true value of the parameter, whatever the coverage probability is.

So it's, an asymptotic [00:22:00] characteristic of doing the procedure over and over again. That is unfortunately, very subtly different than saying if I do the experiment once and I have one confidence interval, that there's a 95% probability that the true value is within that one interval, and that is not true.

that's a Bayesian interpretation. And it just turns out you cannot make that statement without using prior information.

And you don't get that from one study in one confidence interval. What we're talking about here is if you were to run what modeling, of multiple confidence intervals for a study, then you have that,

Roger: you actually have to repeat the entire process, the experiment, and the fitting of the data.

Ian: It's actually the reason why it's called frequentist statistics. where that word comes from.

because probability is a long run frequency in repeated experiments within the Frequentist framework, whereas probability in a Bayesian framework is a strength of degree of belief in something that's unknown. [00:23:00] so the word frequentist comes from the fact that probability is interpreted as a frequency, in the long run and the confidence intervals.

If you were to run the same trial many times, the confidence intervals would fluctuate around, but the frequency with which they cover the true treatment effect. Would be 95%. that's the interpretation and where that word frequentist actually comes from.

Okay. I think that's helpful, and this is very simple, but, Ian, what is your take on the p value being set at 0.05 or or 5%?

Emily: Do you feel that you've been hard done by, has anyone discussed. being more flexible with a p value or are we stuck with that?

Ian: I'm a big advocate for de-emphasizing, binary cutoffs, like p less than 0.05 or greater than 0.05 and more of a advocate for kind of more holistic, presentation through confidence distributions or Bayesian posterior distributions if you're doing it in invasion, [00:24:00] setting, where you sort of present a holistic summary of the evidence, as opposed to a single dichotomous conclusion.

So, I don't have a strong opinion about the actual. Cut off of 0.05, what I have a stronger opinion about is that we should sort of de-emphasize that binary decision making approach.

Josh: can I just pick up on that for a sec? the p value is a really, fundamental aspect of the way that frequent as results are reported, but I think a lot of people don't actually understand what a p value is.

So can you explain to us, Ian, what, does the P value actually mean?

well, the p value is essentially under the assumption that there's no treatment effect. So under the assumption that there's nothing going on interest, what's the probability that we would've observed a treatment effect as large as, or more than what we actually saw?

so if there's no treatment effect at all, how likely or unlikely are we to get. The sort of magnitude of treatment effect that we got in this study. [00:25:00] And if it's very unlikely that we would've got as large a treatment effect as what we saw under the assumption that there's no difference, then that's evidence to suggest that that assumption was wrong.

Ian: and gives us the impetus to reject, the notion that the treatment effect is, zero. so that's what the p value is. I should say. There are other interpretations of p value. Some people object to the idea of assuming that you have to assume the null hypothesis is true.

there are other interpretations. For example, one minus the P value is the confidence level of a confidence interval that exactly coincides with, the, treatment effect being beneficial, . And so it can be viewed as sort of like a confidence of treatment benefit as well.

Ian: That's an alternative interpretation. But yeah, assuming the null hypothesis is true and calculating the probability that we would get a result as extreme as we did is sort of the standard way of, defining it.

So it might be useful to contrast that with what a posterior probability means. So the frequentist interpretation of the P value.

Roger: as Ian [00:26:00] said, you could think of as the p-value being a measure of the inconsistency between what you saw and the null hypothesis. And it indirectly may support an alternative hypothesis. The Bayesian posterior probability is interpreted as the probability of a hypothesis being true.

It could be of the high alternative, the null, or of a certain treatment effect or greater treatment effect being present. But that is resting fundamentally on the assumption that your prior information was a fair representation of the state of knowledge before you, gathered your new data. So, in a certain sense, the Bayesian interpretation gives you an answer much closer to what you want, the probability that a conclusion is true.

But it rests on the prior information. The frequentist interpretation gives you a prior information free measure of the consistency of the data with [00:27:00] a particular hypothesis, but then you have to make the leap of faith that that degree of, evidence against one hypothesis helps you understand which hypothesis you should support.

do you feel, that the math within, a frequentist analysis versus a Bayesian analysis is more accessible? some people have sort of said that the Bayesian analysis is a black box, and that they don't really understand the math of it.

Whereas there's a larger body of people who understand the math of a frequentist analysis. Is there any truth to that?

I think there are elements of a frequentist analysis that you can do on the back of an envelope and understand. but there's certainly, an underlying, fairly impenetrable theory of frequentist inference. for example, the sequential theory that Roger was referring to earlier, involves very, complex mathematics involving sort of random processes and this sort of thing. So I think, non statisticians would struggle to really [00:28:00] understand the, mathematical basis for a sequential analysis at its deepest level. I think perhaps the issue that getting at is that in a Bayesian analysis, you have a prior distribution model.

it's not possible to do the analysis without a prior distribution model, even if it's a non informative prior. And with some, endpoints, that can be quite complex. So with some of the ordinal endpoints that we're using these days, your prior distribution model might have to be like a. 10 or 20 dimensional probability distribution, that is required to be infused into the analysis and has to be treated with these computational techniques such as, Markov chain, Monte Carlo, that sort of thing.

Ian: And that is a little, hard to understand and, Could be seen as, overly complex when all we're really trying to do is express the fact that we don't have any information about the treatment effect. And so in that setting, a frequentist analysis can be, somewhat easier to implement than a [00:29:00] Bayesian analysis, which might seem a little bit more like a black box, but underlying both, frameworks is a fairly theoretical basis to what's going on.

Let's talk about, a concrete example. my favorite thing to do to understand things. Josh, you're co-running a massive trial in staph aureus bacteremia. in addition to it being a platform trial, it's also making use of Bayesian statistics. Can you tell us a little bit about your experience with, the SNAP trial and the statistical approach that you decided on?

Emily: Can you tell us some lessons you've learned along the way? What made you choose that approach?

Yeah, sure. So I guess just to briefly, cover what the trial is. It's called the SNAP trial. It's looking at patients with staphylococcus aureus bloodstream infection or golden staph bloodstream infection who are admitted to hospital.

Josh: and it's testing a number of different, treatment domains in patients with staph aureus bacteremia, . What's the best. Backbone antibiotic to use. It's [00:30:00] combination antibiotic therapy better than monotherapy and multiple other questions as well. you said it was large. There's around 6,000 patients have been randomized so far across, 14 countries.

and so it's a platform trial, so that we're able to test multiple questions in parallel rather than in series, By, in series, I mean do a trial for five years, spend five years getting funding, do another trial for five years, et cetera, et cetera. so far we've done the equivalent of running three.

clinical trials and there are another three or four running within it currently. So it's certainly a lot more efficient, than taking a fixed trial design approach. why did we go with Bayesian statistical analysis? really, because we're working with, groups of statisticians, including Roger, who, have a lot of expertise in, setting up and running platform trials using Bayesian analytical approaches.

we kind of got the idea initially from the REMAP CAP [00:31:00] trial, which, is a trial, of patients in ICU with pneumonia, that was using the same approaches. and that's kind of how we ended up here. I've gotta say from a personal point of view, when I first heard about the REMAP CAP trial, which was at the time we were sort of thinking about snap, my initial thought was.

Josh: it's too complicated for clinicians and people that consume the results to understand. That was my initial, hesitation. When I heard, remap cap presented, I thought, well, it's all well and good to have a fancy trial design, but if people that are reading the journal don't understand what the results mean, you're not gonna change practice.

Now, I don't think that any longer, I've come round as I understand it better, to thinking in fact the opposite, that the results that are reported in a Bayesian way just mean what they say. So if the results are that there's a 99% probability that treatment A is superior to treatment.

B in terms of mortality, it means exactly what that sentence just said. Whereas if, [00:32:00] in a frequentist way, it's the p value is 0.03, , you have to get into all this tangled negative language logic to try and understand what that sentence means. So I think for me, because like most people I guess these days, I kind of grew up on frequentist statistics.

It's taken me a long time to try and, relax my mind a little bit to allow the Bayesian approach in but having done so from the point of view of the SNAP trial, it's been really fun learning all that stuff, but it's also made the trial a lot more practical and doable, and efficient.

can you tell us again what the primary outcome was from Cefazolin versus Cloxacillin? What the primary finding was so it was a Bayesian analysis, so how it was reported.

Sure. So, this is one part of the trial that's, concluded, not yet published, but we've reported the results at a conference.

Josh: so the primary outcome of the trial is 90 day all cause mortality. and in this part of the trial we were comparing two different antibiotics, cefazolin or ' cefazolin' [00:33:00] as Americans say, with, cl oxacillin or flu oxacillin. and the hypothesis we were testing is that, CZO is non-inferior to flucloxacillin.

Given that flucloxacillin was seen as the standard therapy. that, the trial at one of the interim analyses met one of the pre-specified stopping rules, which was that there was a more than 99% chance that cefazolin was non-inferior flucloxacillin in terms of that primary outcome of mortality. that's why it hit the stopping rule.

Josh: But in addition, we also found that in some of the secondary safety outcomes, cefazolin was superior to flucloxacillin.

Emily: And I guess from that trial, you can also get a probability of superiority that might not reach 99%, but that might look quite good, let's say. but, I found that the tendency has been for people to again, try and dichotomize that result.

So, Taking that trial or another trial, something shows a 97% [00:34:00] probability of decreased mortality. but the pre-specified stopping rule was it had to be more than 99% probability. But rather than saying that something is 97% probable, they're wanting to turn it into a negative trial. just wondering what you think about that.

And, and anybody can comment,

I might just quickly say that, this is, I think one potential issue with the Bayesian approaches. 97% sounds like a really high probability, but in fact, if you set a threshold such as 95, 96% in a trial, you'll often end up running into trouble with the operating characteristics of the trial and have a pretty significant chance of reporting false positives, for example.

Josh: But I'll let Roger and or Ian explain that better.

Yeah, look on the issue of, the appropriate thresholds, 97%, 99%, this sort of thing. As soon as you start imposing very strict thresholds on the Bayesian posterior probability of, benefit, you're getting back into this, [00:35:00] frequentist hypothesis testing framework where you've imposed a very strict cutoff on the P value.

And if you treat it as sort of a positive or negative trial based on whether it's above 97% or below 97%, you're kind of losing some of the big advantage in a Bayesian analysis, I would argue, which is to present the holistic posterior probability distribution of the treatment effect and to digest that in its entirety rather than just little pieces of it.

Ian: So I'm a big advocate. For kind of deemphasizing those dichotomous cutoffs. We need them in monitoring trials. 'cause you have to make dichotomous decisions. Do we keep going or do we stop? And so you, you do need, a framework like that for monitoring trials. But when you report the final results, I think it should be done in a more holistic setting.

And in fact, then, then bringing it back to the frequentist version of what you're talking about. Josh, I agree with you. the posterior probability of treatment effect [00:36:00] calculation is a very intuitively appealing aspect of a Bayesian analysis. That you can say something like, you know, there's 99% probability that it's non-inferior.

the way. To get that in frequentist analysis is, via this concept that I mentioned of, the confidence distribution. So, in parallel, you could be making a statement such as the confidence of a beneficial treatment effect is, 97% or whatever. And in fact, in one of my recent papers in, in Statistics in medicine, I took a look at the REMAP CAP trial, which in one of the domains, I think it was the hydrocortisone domain, had a posterior probability of 93%, treatment benefit, with a very complicated 22 dimensional prior distribution model with MCMC, sampling, et cetera, and with a very.

Ian: Relatively simple frequentist calculation. you came up with a 94% confidence of treatment effect using the confidence distribution approach, which [00:37:00] was, very consistent with, the Bayesian analysis. and I, would argue as intuitive, but somewhat simpler actually to implement.

yeah. So I think once again, Ian and I are in violent agreement. I don't think either of us like the summaries of treatment effects being a dichotomized positive or negative. We want estimation of the magnitude of benefit or harm with a good measure of the remaining uncertainty around those estimates.

Roger: And then depending on the clinical context or the regulatory context, whether or not. That summary justifies different decisions is almost a separable issue. You want to present the information and then you wanna make a decision from it. The place where, as Ian already mentioned, that you really do have to make a decision is you have to have decision rules for stopping trials early for overwhelming evidence of efficacy or evidence of harm or futility.

And you can't only partially stop a trial. So it, [00:38:00] fundamentally a dichotomous decision. I also think there's a very important rule for pre specification , and because without pre specification, it's impossible to really understand the operating characteristics of the design. And so there's sometimes places where you need to put in rules so you could understand how a specific trial design performs from either Bayesian frequentist, perspectives or both.

I think one of the things about. changing thresholds, is really a clinical and context question, not a statistical question. and I think, make a distinction between, two different settings. Suppose you're doing, and maybe this would apply to snap or for example to remap cap when they're comparing different uses of already approved corticosteroids in, patients with either community acquired pneumonia or COVID-19.

if the drugs are already approved, widely available, have known side effect profiles and low costs, one might choose to use one [00:39:00] medication over the other with relatively inconclusive evidence. For example, a posterior probability of 90% or 85% even. And I'm certainly have working in some of these comparative effectiveness Settings in which clinicians routinely use either of the choices or there's a lot of unexplained practice variation where providing evidence that there's a 90% chance that one treatment is better than the other, could really change practice and do so appropriately. I think that's very, very different than a regulatory setting when we're thinking about health policy and the information that we, use to decide what can get to market.

So if you think from a health policy or regulatory setting point of view, you have to think about what fraction of the, treatments that are available on market, you would tolerate being truly ineffective. And I think most of us, would be [00:40:00] uncomfortable with, for example, 10% of all FDA approved medications actually lacking efficacy against the endpoint for which they were originally tested.

And so a regulator may want to maintain very strict type one error or false discovery, control, whatever you'd like to call it. But it's because of the context in which they're, working they're looking at a regulatory perspective or a policy perspective, which is completely different than clinicians make a choice between two treatment options, both of which are accepted standard of care, and seen mostly as an issue of style or choice.

I completely agree. Yeah. This is how I've, thought about it too, where, you would need to have different thresholds for head-to-head comparative trials of two drugs that are currently being used on the market with, as you said, no one safety profiles versus bringing a new drug to market. I thought next that we could think about, you know, if we were designing a [00:41:00] trial, some of the considerations we need to work through.

Emily: Ian wanted to know from you, if you're choosing a frequentist analysis versus a Bayesian analysis, how might that affect your sample size calculation? And is there anything special to take into consideration? I think we alluded to it before, but some people maybe think that using an informative, prior might help you lower your sample size, for example.

So I wanted to talk through some of those considerations. yeah, look, I think sample size in Bayesian and Frequentist studies. Is a little bit of a fraught area and, an area where there might be a little bit of misunderstanding. I think there is a perception developing that, you need a smaller sample size with a Bayesian trial than you would need in a frequentist trial.

and that, an underpowered frequentist analysis might potentially be saved by doing a Bayesian analysis, for example. The reality is that there's no free [00:42:00] lunch here. you can't use Bayesian methods to sort of fix a problem with frequentist methods and then magically get a smaller sample size.

Ian: You can use, Bayesian methods to address a different objective that might require a smaller sample size, but that's not an apples with apples sort of comparison. And in general, I would argue that the principle is that a Bayesian and frequentist analysis will fundamentally require, essentially similar sample sizes.

one of the reasons why I think this perception may have developed, I think there's a few reasons actually. there is, a sort of pure Bayesian. Outlook on these, issues that, that sort of says we don't need to do things like adjust for the fact that we've looked at the data during interim analysis and, control our type one error or false positive rates, this sort of thing.

Ian: That's a very kind of pure Bayesian. [00:43:00] Attitude that would result in a smaller sample size, but is not really, well accepted within the research community. In fact, we've had this, recent FDA guidance on Bayesian trials, and they're very firm as Roger alluded to earlier on, the need for Bayesian trials to satisfy frequentist operating characteristics, in which case they will require similar sample sizes.

The other reason why, there might be a perception of differences in the sample size requirements is this idea of adaptive sample size designs, where you don't actually specify a fixed sample size. You just say that we're going to continually monitor the study until, for example, the Bayesian posterior probability exceeds 99%, or some futility is triggered, and we stop.

they will. Lead to smaller sample sizes on average than a fixed design where you go through to the end, of a planned sample size. but you [00:44:00] can do that in a frequentist framework as well, so that it's actually not, a uniquely Bayesian advantage there. The advantage is more adaptive versus fixed rather than Bayesian versus frequentist in that particular case.

Ian: So. no free lunch, you'll require similar sample sizes. There are of course differences in just the mechanics of how you assess the required sample size. Bayesian, design requires a lot more sort of reliance on simulation, for example, to ensure that the operating characteristics like, false positive rates are being achieved.

there's perhaps a little bit more of a theoretical, foundation for kind of calculating that in a frequentist setting, but simulation can also be useful in a frequent setting as well. But yeah, similar in terms of resources, I would argue.

Can I clarify one point though, Ian? if you used an informative prior Yeah.

Emily: Could you not arrive at a smaller sample size? Okay. Like this would be assuming it's a non informative prior, I guess then you need the [00:45:00] same sample size.

Ian: Yeah. Yeah. good point. and we can add that as a third reason I guess. Yes. if you infuse into the analysis, an informative prior, then you're quite right.

You would need smaller sample size. And that's because it's possible to think of an informative prior as a certain number of observations that you kind of. Injecting into the analysis. So, so you're adding new observations, new patients, into the analysis and you can actually do calculations to work out well, what, what is the equivalent number of patients that this informative prior corresponds to?

So, that is true that you can, potentially get a smaller sample size, but that's where I say it's not an apples with apples comparison. you're comparing a situation in which you are making an assumption about, what your beliefs are based on prior information and infusing that into the analysis, the [00:46:00] frequentist analysis, of course.

Ian: if there is good evidence existing in prior studies, the frequentist analysis can ultimately be synthesized with that in a meta-analysis. but it just wouldn't occur as part of the, sort of primary analysis of an individual trial.

Emily: Or we see sometimes in the power calculation for a frequentist analysis, there's a lot of maybe generous assumptions that go into what the treatment effect might be when we're doing a power calculation for a frequentist analysis.

Yeah. And there's point of subtlety here. People often think that with a frequentist power calculation, you need to know. What the treatment effect is to plan the study. that's not actually true.

Ian: What, you need to know and be honest with yourself about is what magnitude of treatment effect would change practice. and we wanna power our study. To ensure that we'll be able to pick up that treatment effect with high probability, with high power. that doesn't have to be the truth.

It [00:47:00] just needs to be what would lead to a change in practice. And then our study is sort of tuned to ensure that, if that magnitude of practice changing treatment benefit exists, then we'll have good power to pick it up in the study. and so that's why it's important not to delude yourself that, we need a, a 30% reduction when, in fact if you saw a 15% reduction and it was statistically insignificant, you'd be, tearing your hair out because, it would've led to a, practice change if we'd have had a larger study.

I guess that comes to really one of the big advantages of. Adaptive over fixed designs. And I know people equate that as being Bayesian over frequentist as, but as you said, Ian, it's not necessarily, but adaptive over fixed is that fixed clinical trials. People do a sample size analysis. They run the trial for 500 patients or whatever it is, they get to the end, and more often than not, they then find, oh, our assumptions were not correct.

Josh: This trial [00:48:00] was underpowered, or even this trial was overpowered. Like we didn't need this many people. Whereas in an adaptive, design with frequent interim analysis and pre-specified stopping rules, it's in theory not possible to finish the trial and realize, you haven't answered the question.

You've answered the question, but you don't stop until you've answered the question.

Yeah. Look, I would just point out though that there is often this sort of straw man, trial. Put up, which is the trial where you have a fixed sample size and you sort of bash through till you get to the end of it without looking at the data and find that you didn't need as big a sample size.

Ian: I mean, we've been doing interim analysis since, the 1980s. it's not a new concept to do interim analysis and, to stop trials earlier than. What we might have planned, what's a bit different is that we're adapting other features of the design, which are a bit more sort of innovative these days.

But the concept of the traditional frequentist trial bashes through to the final [00:49:00] sample size regardless, is a little bit of a straw man. Because we've been doing that kind of sequential monitoring that Roger was referring to using Whitehead's book and, others, for quite a while now.

That, book, I think was written in the 1990s that Roger bought for a dollar. so yeah, it's a little bit of a straw man argument I would suggest,

Roger: and I think, that. Makes the important point that there's a real distinction between a good structure for a clinical trial and the analytic strategy.

I don't think any of us would wanna be enrolled in a trial as the hundredth patient if it went the first 99 were enrolled and there were no interim analysis. and so the basic concepts of looking at the data sufficiently often to avoid enrolling more patients than necessary or enrolling patients when it's unsafe, I think those are just fundamental issues of ethical clinical trial design that are.

Separate from the question of frequentist versus Bayesian, I think the term adaptive, and as you know, I do virtually all of [00:50:00] my work in adaptive trials. Sometimes, people use and forget that the O'Brien Fleming Group sequential design is an adaptive design. It doesn't have a particular single sample size.

It's not fixed. It adapts the sample size according to the magnitude of the accumulating difference. Just Josh, as you're alluding to. So the strength of the Bayesian approach is that it'll facilitates more sophisticated adaptations, which in selected situations mitigate other. Risks to the patient or to the trial enterprise.

Just like an O'Brien Fleming stopping rule mitigates a risk of running a trial too long or too short in the setting of true uncertainty regarding what the real treatment effect will be. So I think there are some things you can easily address from either perspective. There's some things that are easier and more naturally addressed from a Bayesian perspective, but when you take that approach, [00:51:00] you need to have the necessary expertise to do it well.

'cause all of the issues that define, high quality study design, still apply. And in fact, the FDA guidance, which was mentioned, I think is a, an important effort of regulators to communicate what they see as a high quality approach to a Bayesian clinical trial, just as they for years have been communicating what they see as a high quality approach to a frequentist trial.

Emily: Let's go to this clinical question, and anybody could weigh in on this, but we discussed a little bit about how reporting a probability of benefit or harm for an intervention might be more aligned with our clinical reasoning.

what do we all think of that?

Josh: Yeah, I mean, absolutely. As I was alluding to earlier, it's intuitive whether you're a clinician or not. What it means to say. That there's a 98% probability treatment A is better than treatment B. But also I think as you alluded to Emily and Roger as well, in our day-to-day clinical [00:52:00] work, we're actually applying Bayesian reasoning all the time.

so for example, in diagnostic tests, and the example that, is often given is of pulmonary embolism. You have a patient you think may have a pulmonary embolism because of a, number of symptoms and signs, let's say you think the probability is high, and then you do a test.

We used to do these nuclear medicine VQ scans that gave very unclear answers, but with probabilistic. so the VQ scan would be reported as low probability of pe. that's an example of where you have to combine the prior and the posterior to come up with what. You then put it all together as a clinician and think the result is.

So I mean, people don't work in medicine might think that all the tests we do give a black and white answer, but they don't. They just, add another piece of information that we can combine with what else we know about the patient to come up with a posterior probability.

And Josh, just to add to that, I think I'd like to point out that even tests that we think have [00:53:00] incredibly good discriminatory characteristics can be subject to the same thing.

Roger: So I'm thinking of, serologic testing for HIV infection, that in fact even that test has a non-trivial false positive rate if you apply it to a very, very low prevalence population. and sometimes these tests that in normal clinical use, we think of as being highly accurate. Break down if we, use them in the wrong setting.

I just wanted to point out as well, Josh, that example of the diagnostic testing requiring Bayesian thinking, some people would argue actually that that process is frequentist thinking. the reason being that although Baye's theorem is actually fundamentally underlying what what you're doing there, Bayes theorem is actually as much a theorem of frequentist probability as it is of Bayesian probability.

Ian: it's just that in frequentist settings, it only applies [00:54:00] to the outcomes of random phenomena of which the outcome of a diagnostic test is a random, phenomenon. and so that is actually a frequentist probability calculation using. Bay the great power of, Bayes theorem though comes when you apply it to things that aren't random, such as your belief about a treatment effect.

And that's where it becomes a really powerful thing in Bayesian statistics and, not so much in frequentist statistics, but it is important to remember that that Bayesian thinking of Baye's Theorem is actually applicable in both contexts. Just a more limited context in frequentist statistics,

I see Bayesian trials being reanalyzed with a frequentist analysis and frequentist trials being reanalyzed with a Bayesian analysis. Now all of this seems to be happening quite a bit, after trials have concluded. So is there just any talk of reporting?

Emily: Both.

So I think that. Many journals are asking investigators to be consistent or internally [00:55:00] consistent in their statistical philosophy. So that if the primary analysis is Bayesian, the secondary analysis, the decision rules, the stopping rules, those should all be Bayesian as well. And similarly, if you take a frequentist approach, that these should be consistent.

Roger: And I agree with that because I think when they are mixed, there becomes a little bit more confusing and you wonder sort of how that happened, why it happened when those decisions were made. I think that's a teeny bit distinct. From the comment you made about, trials being reported that were, commonly negative per their pre-specified design that are then being reanalyzed with a Bayesian perspective to show that No, no, they were really positive.

And this has certainly occurred in clinical care. It's occurred in neonatology occurred in a variety of fields. I almost think of that re-analysis strategy as a bit of a fad right now. In my view, what is happening, and this is consistent with the points Ian made, earlier, [00:56:00] is they're really using this as a way of lowering the threshold for defining the trial as positive.

But by framing it as a movement from a frequentist to a Bayesian perspective, it gives the illusion that the Bayesian analysis creates new information. And as Ian pointed out, there is no free lunch. The analysis does not create information, but it provides a way that you can lower your threshold.

Because you didn't meet the threshold initially. So I, tend to view those re analyses with quite a bit of skepticism. If instead they're conducted with a purely descriptive approach, so you're simply report the posterior probability with the associated uncertainty, that can be informative in some settings that may be used in clinical decision making.

But I think as a strategy to reframe an initially negative trial is positive. It's ill-advised.

Yeah. I'm thinking of something I did recently, which was, you know, we looked at, um. [00:57:00] paper it was a study for corticosteroids for PCP in, non HIV patients. And that it was a frequentist analysis that just missed statistical significance, on the primary outcome, which was kind of short term follow up.

Emily: And then on the long term follow up, it appeared to reduce mortality. And we thought, huh, that probably means that steroids reduce mortality. ' as you said, Ian, when we looked at the totality of the evidence and we interpreted it even with the frequentist analysis, if you kind of ignored that P value, it looked very much like with the information that we had, knowing that the intervention works in patients with HIV. So we knew clinically that this intervention probably worked in a certain population applied to another. It looked like in the secondary outcome it reduced mortality. We said we're gonna analyze this Bayesian and show what the probability of mortality benefit is.

and so that was a re-analysis of a frequentist trial, using a Bayesian approach in order to, I guess, make use of that option of [00:58:00] using, you know, prior information, which was the intervention had worked in a another population in order to convince people not to just simply interpret this trial dichotomously as positive or negative.

Roger: But I think there's at least three different concepts that might be important to separate out. So one is the initial analysis. That just barely missed. The p value criteria was not evidence against the treatment. It was simply a failure to demonstrate that the treatment was beneficial and a confidence interval based analysis of those data would've shown that those data were consistent with a substantial benefit.

So the evidence was already there in the data. The Bayesian approach doesn't give you new information unless there's important prior information that you can bring in structurally into the analysis. So if you did a Bayesian analysis and you brought [00:59:00] in the information that, steroids improve mortality in the setting of HIV infection in that form of pneumonia, Then you would've actually been bringing more information to bear on the question, and you might get a different result. My concern is that that is only being done because the first result didn't quite meet its criteria. So the information you're bringing in, you're motivated by trying to get a particular result as opposed to in an agnostic way, trying to make sure your prior represents the totality of the information and doing that as true prior information, meaning information that you obligate yourself to use regardless of what the subsequent data are that you're gonna combine with that prior information.

You don't get to look at your data and then decide whether you need a prior.

Emily: Yeah, I think one challenge is there's so much data for clinicians to digest that what ends up happening is the clinician reads the bottom line of the study, which was what was the P value? And [01:00:00] that's often the take home.

And so yeah, you, you don't always get the option to have those nuanced discussions. Like sometimes you have a journal club, which is great, and then you can talk about those subtleties I almost see it as a second chance, as you said. It shouldn't be done that way. In reality, you should be able to look at the original trial, and as you said Ian earlier in the episode. use all of your clinical information, look at the confidence interval, look at the secondary outcome, and arrive at a conclusion that, our confidence that this treatment works is pretty high.

but I think what's happening nowadays is it just gets boiled down to that last line of the abstract.

Ian, Roger, is there anything you wanna emphasize from today's discussion?

Ian: I guess the key message that, I want to emphasize as a take home message to the audience is really, don't view Bayesian statistics as a way of addressing problems with frequentist statistics.

Some people sort of advocate for the Bayesian , along these lines, and [01:01:00] I would suggest that we should take those arguments with a grain of salt. particularly don't view frequent inflexible and fixed versus Bayesian statistics as flexible and adaptive. , That's not the right distinction, between the two frameworks.

I think instead, Bayesian statistics should be viewed as a different way of looking at evidence accumulation. it's a, framework in which. repeatedly updating our beliefs about something that is uncertain, like a treatment effect, is the primary focus versus frequenter statistics where, the primary focus is about things like, false positive control and, ensuring that our conclusions are, correct relative to what, the true state of nature is.

And, I'll end with sort of a personal view and I would, heighten to say that this is, not what everybody's view would be, but just the way I process things is that I actually have a lot of sympathy for using Bayesian methods early on [01:02:00] in the development. Process in the early phases of clinical trials where we are really trying to convince ourselves that it's worth keeping on going, that we're on a winner.

that's fundamentally a, personal belief situation. Whereas I see frequentist methods as being, uh, perhaps more appropriate at the latest stages, the confirmatory stages where the data really need to stand on their own two feet independently of anything that, we might infuse into the analysis in terms of our personal views about prior data.

That's my personal view about where the two sort of sit, other people have other views, .

what I would say, is there's a few sort of take home points to emphasize. So one is that the quality of the study design, The randomization measurement of confounding variables, follow up, those sorts of things. Those are paramount regardless of the approach you use, and they're often more important than the selection of the statistical approach. The second point that I would make is [01:03:00] that the Bayesian approach can facilitate, certain adaptive features.

Roger: For example, the adaptive features that are built into the SNAP trial and other adaptive platform trials, in a way that facilitates the conduct of trials that are more efficient, can answer more questions more quickly with greater, cost effectiveness. And whereas those same things probably could be done from a frequentist perspective, it's much harder and it generally, is not done, to the same, degree.

If one is going to take a Bayesian approach, you need to make sure your team has, The appropriate expertise on the design side, the analysis side, on the monitoring side, such as membership on your data safety monitoring board. then lastly, with respect to the place of the Bayesian approach in the continuum of drug or treatment strategy development, I think the FDA guidance, and recent experience across a wide number of trials shows that Bayesian approaches really are [01:04:00] accepted, even in late stage development for regulatory approval.

But there has to be a reason for doing them. You don't do it just because you can. There has to be some problem, or challenge associated with what you're trying to achieve that is best addressed from a Bayesian perspective. So the rationale for taking that approach Is clear to everyone.

Emily: Thank you both. thank you so much, Ian and Roger and to my co-host Josh for this incredibly interesting discussion. we learned that Bayesians and Frequentist can co-exist in harmony, which is lovely. thank you for listening to communicable the CMI Comms podcast. This episode was hosted by Josh Davis and me, Emily McDonald, editors at CMI Comms ESCMID's Open Access Journal.

It was edited by Katie Hostetler.

Theme music was composed and conducted by Joseph McDade. The executive producer of Communicable is Angela Huttner. and Any published literature that we discussed today can be found in the show notes. You can subscribe to Communicable wherever you get your podcasts, or [01:05:00] you can find it on ESCMID's website for the CMI COMMS Journal.

Thanks for listening and helping CMI Comms and ESCMID move the conversation in ID and clinical microbiology further along.

More episodes

Chapters

What is Communicable?