Each week, Health Affairs' Rob Lott brings you in-depth conversations with leading researchers and influencers shaping the big ideas in health policy and the health care industry.
A Health Podyssey goes beyond the pages of the health policy journal Health Affairs to tell stories behind the research and share policy implications. Learn how academics and economists frame their research questions and journey to the intersection of health, health care, and policy. Health policy nerds rejoice! This podcast is for you.
Doctor. Medicare Advantage is a recurring topic on this podcast and in the pages of Health Affairs. After all, it's a growing segment of our healthcare system and today covers a majority of Medicare beneficiaries. One of the fundamental principles underpinning the program is the theory that letting consumers choose among a number of private plan options can drive efficiency and innovation. Of course, in order for that kind of competition to work well, need reliable, consistent, and accessible information about those plans.
Rob Lott:Enter the Medicare Advantage Star Rating Program, which measures, plan quality and incentivizes improvement through public reporting and sweetens the pot with quality bonus payments. Payments. The program has been around since 2007, and a lot of people rely on it when choosing, their plans. And yet it's not clear how accurate the star rating system is. It's not clear how closely a high rating is tied to better health outcomes.
Rob Lott:The picture we get instead is much more murky than the system that designers promised, and that's a potential problem for consumers. It's a financial risk for plans who want to earn millions in those quality bonus payments, and it's the subject of our podcast today. I'm here with Doctor. Andrew Anderson, assistant professor in the Department of Health Policy and Management at the Johns Hopkins Bloomberg School of Public Health. Together with coauthors, he has a new article in the April issue of Health Affairs, and its title is also one of its main findings.
Rob Lott:Quote, Medicare Advantage star rating quality gains were concentrated in a narrow set of clinical and medication measures from 2015 to 2025. This is, really eye opening stuff and I cannot wait to dig in to their findings. Doctor. Andrew Anderson, welcome to the podcast.
Andrew Anderson:Thanks for inviting me. Happy to be here.
Rob Lott:Well, let's start with some background, if you will. Maybe give us just a brief tutorial, on the sort of full scope of the measures that may potentially contribute to a Medicare Advantage contract's star rating?
Andrew Anderson:Sure. So Medicare Advantage, the star ratings program, as you started to say, is I think one of the most important quality incentive programs in The US. Every Medicare Advantage contract gets a star rating between one and five based on performance across a wide set of quality measures covering things from clinical outcomes, medication adherence, patient experience and different plan operation measures or metrics. One of the things that can be confusing about how these plans are constructed is that if there are large insurers like United Healthcare, Aetna, Cigna, and people often call these health plans, which they are. But within those companies or carriers, there are also multiple contracts that they have with CMS.
Andrew Anderson:And within each contract, there can be multiple benefits or plans across multiple geographic areas. And these star ratings are assigned at the contract level. And most beneficiaries see the star rating of the contract when they go to select a plan, say on medicare.gov. So the ratings have, as you mentioned, real financial consequences. If a contract reaches four stars or higher, they qualify for bonus payments and CMS increases the payments that they made to the plans.
Andrew Anderson:The contracts that receive five stars get even higher bonuses and they also get special advantages like the ability to enroll beneficiaries year round. Those benefits affect plan revenue and the benefits the plans can offer often increase ultimately, which makes them more competitive in the market. So the star ratings program isn't just a report card. It's really about value based payment system intended to influence how plans organize and manage care for Medicare beneficiaries. The program also evaluates the performance based on a select group of measures often in collaboration with health plans, patient advocates and health systems.
Andrew Anderson:The program is managed by CMS. So the idea is that these measures capture aspects of care or patient experience that plans have some ability to influence, but that assumption varies across measures and some measures are more directly within a plan's locus of control. For example, medication adherence measures plans can actually use things like pharmacy outreach or refill reminders or care management programs. Other measures, they depend more heavily on what providers within the network can do in clinical settings like readmissions, for instance. So in practice, the measures that CMS chooses shape how organizations respond.
Andrew Anderson:Plans have to decide where they can realistically improve performance, where the incentives justify the effort, how to allocate limited resources across many different measures. In other words, measurement, it creates incentives and incentive shaped behavior. And our study, our question naturally flowed from that. When we measure and reward these specific outcomes or processes of care or patient experience, which measures actually improve over time. We looked over the last decade.
Rob Lott:Got it. And that's a great segue talking about sort of incentives driving the actions of various plans. And of course, that's sort of built on this general theory that when you measure and reward something specific, it sort of drives people to improve their performance on that specific measure. Yet, you explain in your paper that, quote star ratings are calculated from prior year data creating a lag between action and financial reward. And this is a fact that sort of potentially undermines the improvement incentive around any given measure, I think.
Rob Lott:Tell me if I'm wrong there, but prior to your study, can you say a little more about what we knew about the impact of that lag or any other factors that may have sort of diminished the program's effectiveness?
Andrew Anderson:So that's right. Star ratings are based on performance from the previous year and for some measures it's even multiple years prior to the reporting year. So there's always some delay built into the system. For instance, if a plan improves a medication adherence measure this year, that improvement won't show up in their star rating until the following year. And the bonus payments tied to that rating typically affect plan payments the year after that.
Andrew Anderson:So there can be a two year gap actually between when a plan invests in improving performance on a measure and when it actually sees the financial reward. In theory, that delay could weaken the incentive to invest in improvement because organizations usually respond more strongly to rewards that are immediate or and and predictable. The lag, though, is only one piece. The effectiveness of the program also depends on whether the measures themselves are capable of responding to incentives that CMS creates. So in the measurement literature, there's a concept called measure responsiveness.
Andrew Anderson:For a quality program like this to work, the measures have to be able to change when organizations take actions to improve performance. Another factor is how costly or complex is it for plans to improve performance on a given measure. Even if a plan can influence the measure, it still has to decide whether the potential improvement in star ratings is actually worth the investment required. And it's important to remember that the intervention here is not just managing and organizing per se. It's the incentive.
Andrew Anderson:The intervention is coming from the star ratings program itself. It's the combination of bonus payments, public reporting and comparison or peer comparison across plans. We know that these ratings do sometimes influence the extent to which enrollees enroll in a particular plan if it's two stars versus five stars. So that's how we got to this question about when those incentives are in place, which measures are actually improving over time.
Rob Lott:Got it. Okay. Well, let's, dig into your analysis, which examines specific measures, that drove performance changes, which measures showed little progress or no progress and whether the initial performance shaped subsequent gains, or sort of performance, on in those star ratings. What were some of your top line findings?
Andrew Anderson:So the headline findings were pretty surprising. Think most of the improvement in the star ratings over the past decade was concentrated in a relatively small number of measures. In particular, we saw improvements in medication adherence related measures, things like adherence to medications for diabetes, hypertension, cholesterol, as well as a few clinical process measures. At the same time, many other which is another important finding many other measures in the program didn't show much improvement over the same period. In some instances, they regressed.
Andrew Anderson:So even though the overall star ratings have increased over time, improvement wasn't evenly distributed across the measurement system. One interpretation is that, you know, plans focus their improvement on measures that are more responsive to intervention. Medication adherence is a good example, I think, even though it's somewhat controversial and some people have called for removing it from the program. But plans can do things like implement pharmacy outreach, like I said, refill reminders, medication synchronization and other programs that directly affect adherence rates. Other measures depend more on provider behavior in clinical settings, as I said, or on factors outside the plan's direct control, which makes them harder to improve.
Andrew Anderson:So the broader takeaway I would say is that not all measures respond equally to the same incentive system. One thought is that every measure, not every measure has the same room for improvement. And this is something that CMS discusses in places like, you know, the National Committee for Quality Assurance and, other measure developers and stewards, that some measures already start from very high levels of performance, so they may not move very much over time when plans respond to incentives, but it doesn't necessarily mean that those measures aren't valuable. In some cases, the incentives might actually be helping to maintain high performance even if they're not producing large improvements. And the challenge I would think for CMS is that there we really rarely know a measure's true ceiling.
Andrew Anderson:So what looks like a plateau may reflect clinical limits, measurement limitations, or simply the fact that we haven't really identified interventions to move that measure further. It's a complicated problem.
Rob Lott:Great. Well, complicated problems are what we're all about at Health Affairs, And I want to ask you a little more about putting, some of those complications in context. But first, let's take a break. And we're back. I'm here with Doctor.
Rob Lott:Andrew Anderson talking about Medicare Advantage star rating system and, the quality gains, they found to be concentrated in a narrow set of, clinical and medication measures under that program. Now, this program is not the only quality initiative out there, obviously. Payers and systems writ large have to navigate a whole slew of various quality improvement requirements and measures. And I'm wondering if you can say a little bit about that overlap. Has it been a factor in determining where plans invest their resources and attention?
Rob Lott:How big is the star ratings program as sort of a piece of that larger puzzle?
Andrew Anderson:Yeah, I think it's really important to think about all of these findings in the context of the broader system. This is one of many quality initiatives happening at the federal level and at CMS. So it's important to know that many of the same outcomes and processes that are a part of these federal quality initiatives like the hospital readmissions reduction program, value based purchasing programs, others, and accountable care organizations even have the very similar measures to what's in the star ratings program. So providers and health systems may already be investing in improving these measures because they affect performance in multiple CMS programs. So when the same measures appear across multiple programs, the incentives can reinforce each other.
Andrew Anderson:Improvement in one measure can improve performance across several programs simultaneously. So that kind of alignment can accelerate improvement in certain areas, but it all can also mean that improvement becomes more concentrated in measures that are most visible across the broader measurement environment. Another implication is that it can be difficult to attribute improvement to any single program, and this is a finding that came out of a report that Penn and Penn LDI put together an expert panel a couple of years ago, which I was somewhat a part of, which came came up with a conclusion that it was very difficult to figure out which programs are contributing to overall improvement, cost reduction, quality improvement, just because there's so many overlapping programs. And when you consider the scale of the incentives, for instance, in the Medicare Advantage star ratings program specifically, that involves over $12,000,000,000 in bonus payments each year now or as of the latest year in 2025, it becomes difficult to entangle which incentives are actually driving improvement and if the money is going in the right direction. Are we overspending or underspending on these incentives?
Andrew Anderson:So ultimately, in practice, I'd say that organizations are responding to the entire measurement and incentive environment rather than any single program in isolation. Even though this program in particular is for the MA plans, the providers that are under those plans again are there's so many there's so much happening.
Rob Lott:Got it. I'm curious if you've heard or what sort of we've heard from plan administrators about this dynamic. If you held up your paper for a plan administrator and they sort of read the abstract, would they say, you know, that rings true? That's sort of consistent with their approach to evaluating where to focus their attention? Or do you think that the sort of plan by plan basis might be a little more idiosyncratic?
Andrew Anderson:Yeah, I think it's a complicated story as it always is, So it depends, But overall, I would assume that plans are responding strategically based on the resources that they have. So there are over 44 measures in the program and each measure covers a particular important aspect of care. And depending on the population that they serve and their actual performance on the measure, they may have different capacity to address or improvement in some areas. So when you have limited resources as a health plan, you have to decide how best to spend those resources. And if you try to do everything, you may end up not doing much of anything.
Andrew Anderson:So think it comes down to that plan specific strategy and population and what they think is the best use of their resources in order to take advantage of the incentives in this program.
Rob Lott:There have been a lot of recent proposals for reform to the star rating program. And I'm curious, what would you want, the people advancing these reforms to take away from your findings in this paper?
Andrew Anderson:There are lots of conversations going on right now about Medicare Advantage modernization and star ratings is included in that, especially since it is one of the primary tools that CMS has to try to influence the maintenance and improvement of quality relative to their other goals, which is around care management and reducing costs of care. And so I think though one of the pieces that come out of this paper is thinking more deeply about the criteria that CMS uses for selecting measures and deciding which measures to phase in and out. There are obvious measures like what criteria, what's important to patients and families, what's a valid measure, reliable measure, one that can be influenced by health plans if their actions can actually take they can do something about that aspect of care or managed care. But in CMS, I would say they select measures based on what they believe to be actionable and attributable to plans. But in practice, some of those outcomes are much more directly within the plan's control than others, as I mentioned.
Andrew Anderson:I think one of the main issues is that if we find that a measure, for instance, has not been improving for a long time, how do we decide whether or not it is a ceiling effect or if it is a lack of the incentive isn't enough compared to what they could do in other measures. So I think also thinking through the level of difficulty that and the number of the amount of resources that are required for a health plan, given its locus of control, that should be a consideration in the selection of measures. I think the design of the program in choosing I guess right now they're thinking about the phase out of certain measures and whether or not the measure is actually acceptable to the health plans as well as the providers within those health plans networks. That's another important factor. And this does go through public comment and people are able to provide input, but also making sure that the decisions and the rationale is very clear, for instance, in the federal register for why those measures are included and why those are excluded.
Andrew Anderson:Sometimes it can be a little bit obscure. So I would say those are some of the ways that CMS could, and obviously the most important for the takeaway from this paper is the responsiveness issue, in that there should be some way of assessing responsiveness and acting on responsiveness in a systematized way.
Rob Lott:Great. Well, that's, perhaps a great, set of marching orders for researchers and policymakers going forward. Doctor. Andrew Anderson, thank you so much for your work on this really interesting paper, and thanks for talking with us here today. I really enjoyed it.
Andrew Anderson:Thank you. Thanks for having me on.
Rob Lott:And to our listeners, you just heard us talking about the locus of control. Well, one of your opportunities for control here is to weigh in, share your feedback on, this podcast, leave a review, share it with a friend, and subscribe. And, of course, tune in next week. Thanks, everyone.