How do product teams decide what to build and what not to? The Experimentation Edge is the podcast where product, growth, and engineering leaders share how A/B testing, feature flags, and experimentation drive real business outcomes — backed by named companies and real numbers. From DoorDash's 12,000 A/B tests a year to Atlassian's experimentation-led product win to UPS's $500M experimentation team, each episode goes deep with operators running experimentation programs at scale.
Hosted by Ashley Stirrup, CMO at GrowthBook and a 25-year executive in data and experimentation. For product managers, engineers, data scientists, and growth leaders at B2B tech companies who care about experimentation culture, statistical rigor, and shipping with confidence. No marketing speak. Just operators explaining what they shipped, what moved the needle, and how experimentation reshaped their teams.
Topics: A/B testing, experimentation, growth experimentation, product experimentation, tech experimentation, feature flags, experimentation culture, statistical significance, marketplace experimentation, conversion rate optimization, experimentation at scale.
The Experimentation Edge - Kevin Yang
===
Kevin Yang: [00:00:00] Experimentation, I think Just the industry as a whole, I think is gonna enter almost like a golden era, I feel like. with AI coming through, everybody is going to ship faster. But do you have the right infrastructure in place to measure those things that you're shipping out
Ashley Stirrup: Welcome to the Experimentation Edge, where product managers, data scientists, and engineers talk about how they make smarter decisions. I'm Ashley Stirrup, the chief marketing officer for GrowthBook, and in each episode, I'll sit down with an executive to unpack how they use experimentation and A/B testing to make better decisions.
This show is sponsored by GrowthBook, the open source experimentation platform leader. Now let's jump in and get started with our next guest.
Hello, and welcome to today's episode. I'm happy to have Kevin Yang, executive director and head of experimentation at [00:01:00] JPMorgan Chase with us
Kevin Yang: Hello. Hi, Ashley. Thank you for inviting me. Excited to
be here
Ashley Stirrup: Yeah. Thank you. Me too. You've had a extensive career at JPMorgan Chase. Maybe you could tell us a little bit about your role there and, what parts of the JPMorgan Chase business you work on.
Kevin Yang: Yeah, sure. So yes, I've been with Chase for a long time. It's I think 12 years, maybe going on 13, and the last six years I have been leading experimentation across our digital platforms. And so the team mission is basically helping the firm make better decisions, and we believe experimentation is one of the ways to really do that. My team is structured in kind of , two pillars. One pillar is the platform, so that is the technology that helps the team, that empowers the product team to be able to self-service on running experiments, being able to really scale that, being able to just do it without needing a data scientist on the team. And then the other side is the practice. So this is where we embed our team members to strategic [00:02:00] initiatives, so to make sure the teams that are running those initiatives have the right practice framework and the right culture to be able to use experimentation well. And with both of these things that's how we kinda make sure experimentation really scales and get value for the firm.
Ashley Stirrup: Terrific. And what parts of JPMorgan Chase do you work on?
Kevin Yang: So I sit within what we call CCB, which is the consumer bank rather than the investment bank, and in s- specifically in CCB Digital, so basically the entire digital eco space. And we support all different product teams that are in that space. So we have about 100 or so product teams and yeah, some of them are customer-facing, some of them are not.
But yeah,
many of them are running experiments with us.
Ashley Stirrup: So I would guess that's like banking and mortgages and loans and credit cards and all that kind of stuff, yeah?
Kevin Yang: Yeah, that's right. Yeah, so credit cards and like checking accounts, savings account, those are things that people usually think of. But yeah, we have a lot of different financial products for folks.
Ashley Stirrup: [00:03:00] Yeah. That's gotta be just an absolutely massive business with so many users.
Kevin Yang: Yeah. I think every single line of business can be its individual company. And so very,
large
Ashley Stirrup: if you just think about credit cards and the number of transactions that your business processes the numbers must be just staggering, like bigger than Walmart type of thing.
Kevin Yang: Yeah. Yeah. I'm not sure about Walmart's numbers, but y-
it is large, yeah
Ashley Stirrup: Yeah. That's that's pretty amazing. And so both web and mobile too, I would assume.
Kevin Yang: Yes. Yes. So
both web web and mobile. And we also support some backend applications, even call centers and things like that
Ashley Stirrup: Got it. Wow that's that's really impressive. And so roughly how many experiments are you doing a year?
Kevin Yang: So right now we're running about 300 experiments or so a year, and we're continuing to grow. And when we first started I think the first year after we got infrastructure set up was like eight experiments, and over time we started to run more and more [00:04:00] experiments and we're starting to grow.
We think there's still a lot a, lot of opportunity for this number to grow,
Ashley Stirrup: yeah. Yeah, I can imagine. And so what led Chase to start doing more in experimentation?
Kevin Yang: So experimentation, I think, is one way for leadership to understand the impact of their changes. And I think what was able to really get leadership to buy into the idea, which didn't really take long, was the idea that if you're not testing, you really risk not knowing what to double down on.
At the same time, you also don't know when you are the things that you're releasing is really driving the impact that you want. And so that kind of risk mitigation and as well as the optimization piece of it which was very well understood that's what got leadership to buy into it.
Ashley Stirrup: Got it. And I think you started out more on the marketing side, and then it made its way into the product as well. Is
Kevin Yang: Yeah, that's right. On the marketing side experimentation was alive and well. Like A/B testing, that's always happening. But on the product side, [00:05:00] that's more of how things are starting to evolve into and more and more just when you build products, particularly digital products or digital user experiences like this online experimentation is such a I think in a lot of ways it's easier to experiment, but a lot-- and you get a lot of data. But at the same time, sometimes it can it's also hard to interpret sometimes. But it's a lot of different signals that you have to sift through.
Ashley Stirrup: Yeah. Yeah. I love Chase's products. I'm a a a banking and credit card customer, and just really love the app. So you've done great work.
Kevin Yang: They, , one shameless plug, we recently just won number one in J.D. Power for US banking apps. So
we-- that was a big celebration. I
like to think experimentation has something to do with that.
Ashley Stirrup: Yeah. Yeah, I'm not surprised. I know the app. I just find it really just very, just fluid. It just does what I need to do when I need to do it, so that's
Kevin Yang: Oh, I'm glad to hear that
Ashley Stirrup: Yeah. And you've been able to deliver a tremendous amount of value through experimentation too, is that right?
Kevin Yang: Yeah. So one of the things [00:06:00] that we're really proud of is like over the years our estimated value from just experimentation and innovation just in general is it's over a billion dollars is our estimation. And so yeah, it's really, it's a number that we're proud of, and we think it's just continued to grow.
And that's only for , a lot of this from the winner of the experiments, not even from
the losers.
Ashley Stirrup: so that's not even counting the losers, so that might be another billion you avoided in losses right there,
Kevin Yang: yeah. And a lot of the value really is if you think about it the losers are probably where the value is really coming from
Ashley Stirrup: Yeah. Yeah, no question. The whole point of experimentation is to be able to sift through all the features you ship to figure out which ones are winners and which ones aren't, and all the learning happens on the losing side. If you ship something and it wins, that's great. You ship it and you move on.
But if it loses, then you say, "Okay, why did it lose?" And that's where the real learning comes.
Kevin Yang: Yeah. Yeah. And something unexpected making sure you have the right metrics in place just so you're not stack- yourself to win. I think that's also another [00:07:00] something to be on the lookout for
Ashley Stirrup: Yeah, you certainly hear stories about that with the selective choices of who, who sees variant A versus variant B, things like that, so
Kevin Yang: Yeah.
Ashley Stirrup: So how do you make decisions at Chase? How do you use data to, to do that?
Kevin Yang: So at Chase, we try to look at the experiences in a very comprehensive way and a balanced way. And I think I should first say, as far as metrics, there are different dimensions we try to cover, so make sure the user experience is balanced. So in addition to making sure the business makes money we also make sure we wanna look at engagement as well, but not vanity engagement.
So for example a while back, we were looking at a particular-- This was like when you go into mobile app home screen, there is like a tile that shows-- that basically leads you into your credit score, which basically every banking app has something like that. And there was a debate over on, like, how you would, what we should do here. Should we bring the score forward or [00:08:00] not? And the-- what you're worried about is that if you bring the score forward, maybe people don't really click into that feature anymore. And then maybe you should, hide it behind. But what we ended up doing was bringing it forward, and what we saw was repeat engagement. So that's like the good kind of engagement we wanna see. And then besides that, there's also satisfaction we wanna look at over time. And just like retention is also something else that, that we focus on. And different line of business making sure that, Because with shared services one-- multiple lines of business and shared services, we always wanna make sure... It's not just did it work? Did it work for everyone? If it didn't work for everyone, what's the right-- what is the right amount of trade-off if it's, if it didn't work for everyone?
Ashley Stirrup: Yeah. Boy, I can just imagine the it's gotta take some rigor to do that 'cause you've got different apps with different experiences. You've got different users that are looking for different experiences themselves, and engagement might look good in one place in the app and actually be a real problem somewhere [00:09:00] else.
Kevin Yang: Yeah. Yeah, it's a-- I always say decision framework and looking at different metrics, that's a-- it is tough especially when it comes to engagement
Ashley Stirrup: Yeah. Yeah, that makes a lot of sense. So can you tell us a little bit more about how you run experimentation at Chase?
Kevin Yang: Yeah. We have a platform where the teams will first come and register their experiment. And what we focus there on is making sure that we capture the hypothesis and what they're trying to do, what they're trying to measure. And that goes into our basically knowledge repository, right?
And then within the platform, we help them come up with the right design, and then they go through it. And just ensuring how we also try to guide them on the success metrics and goal metrics, and that's how decisions gets made
Ashley Stirrup: Got it. And are you using machine learning?
Kevin Yang: When you say machine learning, I think we are-- we're doing a lot of model testing right now. So like with model testing, we are-- we're-- [00:10:00] we definitely, That's an area where we may see gains in one outcome but not another, and that's when we need to iterate
And in models and of-- and maybe page redesigns.
Then we do a lot of big changes that sometimes need to be iterated a lot more
Ashley Stirrup: Got it. Got it. And is the, are the models of those more for figuring out who to offer what to, or are you talking about AI models and in-app experiences?
Kevin Yang: It's personalization models. So it is trying to figure out surfacing the right content for the right user at the right time. Yeah. For AI, I we're looking into things right now
Ashley Stirrup: Yeah. And I would imagine that, some of your teams are more forward-leaning and really get experimentation and other people, they see a loser and they really want it to be a winner. Maybe they question the platform?
Kevin Yang: That can happen. I think That can occur. I think , it is, it's really important to make sure that you [00:11:00] align on the metrics and align on the decision criteria beforehand. I think that is definitely not every team will do that and you just try to instill the right culture. But it is-- If you don't have the right decision frameworks in place and later on you see the results, you may start to have confirmation bias, start to try to look for evidence to support what you wanted to do, call into questions on the way to look at a metric.
It's important to have that kind of organizational alignment on how to measure
Ashley Stirrup: Yeah. I would imagine getting alignment on the hypothesis definition and the key metrics and your decision framework's really important. And then also showing people that just 'cause a l- a loser's a loser doesn't mean there isn't room to iterate and find a way to make that feature a winner.
Kevin Yang: Yes. Yes that's really important. One of the things that we wanna preach when we work with teams is that plan for failure. I think a lot of [00:12:00] times those type of bias comes in because you never thought you would lose. So when it happens, you don't really know what to do. But if you have a playbook for, and you already planned for it, and you already kinda losing is just part of just for me to do the next thing that I had already planned for, it doesn't feel like a loss.
And it just a little bit of a mental trick figuring out what are we trying to accomplish? How are we going to measure that we have accomplished that? And then if this doesn't happen, what does it mean for us about our assumptions? And should we... Is it, is our assumption wrong, or was it just maybe a difference in execution?
Or was it some of the execution needed to be refined, or we try a different different approach?
Ashley Stirrup: Yeah. Yeah. Makes total sense. So often, you know, there's lots and lots of examples where the V1 of something that got shipped, even, the iPod, I was listening to a podcast the other day talking about the iPod, and it wasn't till version three [00:13:00] that it was truly a winner. And if people think that any new feature they build that it's immediately gonna be a winner, that's probably not realistic.
So helping set that expectation would go a long way.
Kevin Yang: Yeah. Yeah. And experimentation is one way for you to really see the impact. People believe that they can monitor if things were going wrong or going right just by-- without having a control group. And I think that's one of the biggest myth that people have and That's the first myth I try to bust when I do an overview on experimentation and try to get people to understand the value of having a control group.
Ashley Stirrup: Yeah, that makes sense. I think you have a little example of something else that you share with your teams to kinda help educate them on experimentation
Kevin Yang: Yeah, this was one of the exercises that I'm that I do when I run this training. I think it's a fun exercise, and I think it will be a fun one to do on the show. So I will pull that up and we'll see. We'll see. We'll let maybe Ashley will be the one to try to go through this, and we'll see if you [00:14:00] can get it right.
Ashley Stirrup: Sounds good
Kevin Yang: so far every time I try this, I don't think I ever had anybody who ever got it right. So we'll see.
Ashley Stirrup: Okay, that's a high bar. I'm gonna be humble.
Kevin Yang: So I hope you can see this.
Yeah here basically this is a replica of what I do and I say, and this is like a completion rate of something-- of a app completion rate, something really important to us. And a small relative change here is worth millions to us. And so what usually I do to the audience is that, okay, I'm gonna show you this time series of how the app completion rate is going o-over the past, like 90 days.
And a change happened here, a really pretty-- quite significant change, and it's worth millions. Can you guess where it happens? So I will give you a chance to look at this and pick a spot
and just roughly tell me.
Ashley Stirrup: Yeah.
Boy, does that look like a lot of the marketing data I look at. It's like up on Monday, Tuesday, down on Saturday, Sunday.
This... [00:15:00] You can tell that there's really no good spot in here. If I was gonna guess, I would guess it was somewhere around the end of January
Kevin Yang: Oh, end of January. Ooh, you, let's see. Let's see if you get it It happened right there. You're very close.
Ashley Stirrup: I was close. Yeah. That's
Kevin Yang: Very-- yeah very close. But there's really no good way if you're like just looking through, there's a lot of peop-- a lot of times people will pick like the spikes or the drops and just kinda seeing it. And this is like simulated data. It's not real data.
And so like it seemed like there's some noise reduction, but overall, really just variance just moving.
And so it's very difficult to figure out where changes occurred, right? But if we ran an experiment You'll be able to have a different line. And
the orange line here is your control, and the shaded area w- is your impact. So now having this one line is why you're able to really you know, figure out the impact of your change. Because most of the times the stuff that you're doing and releasing to [00:16:00] market, it is not gonna create this huge spikes and that, that you can really observe, and especially with j- with seasonality and everything.
So having a control group is super important
Ashley Stirrup: Yeah. That's a really great example. It just so highlights that these small changes can have a huge impact, especially on a business of the scale of Chase. Even a 1% the size is a huge one, and very hard to see with the naked eye.
Kevin Yang: Yeah.
Yeah.
And those things are like hard to move a lot of the times. And when it does move on a large scale, it means a lot for the business.
Ashley Stirrup: Yeah. And it just shows that, if you're a business and you're not doing those kinds of things and you're making lots of changes, there's just no way you're gonna be able to see with the day-to-day noise, did, if things are going up, why? And, which of the 10 features I shipped moved it up a little bit and which moved it down?
So that, that's a really powerful example.
Kevin Yang: Yeah, and it's once you start to measure everything, one, you learn a lot. [00:17:00] And another thing is you start to get a comprehensive view a- across your entire portfolio. And like you start to look at things at a higher level now that you start to measure everything. And that is something that that really helps the organization to move in the right direction
Ashley Stirrup: Yeah. And how do you share learnings across just within a team or even across teams?
Kevin Yang: So we do have forums and so we within the platform, the information is there for people to go get. We have champions across different teams that will hold their own experimentation forum. My own broader org does have an experimentation forum that we also host. We actually host a couple of them, and some of them are more executive-focused audience, and another one is more like general audience and like we also have one on the analytics side that is more analytics-focused. So sharing these different learnings, approaches and practices across the firm. And some teams will also compile it into... Like last year, I sent like a [00:18:00] experimentation wrapped newsletter out to the firm to some of the things that, we've seen that across.
That was just like a fun thing to, to send
Ashley Stirrup: I love that. That's a great idea. should do that.
Kevin Yang: Especially with the help of AI nowadays, it's much, much easier to get something done that way.
Ashley Stirrup: Yeah, that's right. Yeah, that's an idea that's come up a few times in the past and I've always loved that idea. It's a great way to share things, especially with experimentation 'cause, it can be very visual and the insights can be really powerful. So putting them in something like that as a year-end summary can be a great way of, raising awareness for all the learnings that went on during the year.
Kevin Yang: Yeah, that's right
Ashley Stirrup: And so every team's a little different. I think you said that the travel business is a little different as well. Is that true?
Kevin Yang: Yeah, I would just say so even though we're a bank, the financial products, we also have a really big travel arm, travel business. And so with this travel business, the metrics there are different. [00:19:00] And you start to not so much looking at like finances.
It's a travel business, so you're looking for people to book travel, what kind of inventory are they getting and like the right strategy for rewards for redemption. Those kind of things are important for the travel business, like you optimize. And so that's why it's interesting to be in this, to be in my seat because I get to work with so many different line of business and just a different perspective from different products.
Ashley Stirrup: Yeah, you can certainly imagine the use cases would be very different for somebody who's trying to check their latest credit card statement versus someone that's doing research on going on a vacation that might come back four or five times, think about it a little bit in the middle and so kinda how you measure the impact you're having w- would need to be very different.
Kevin Yang: Also how you're surfacing the right inventory there to the user
Ashley Stirrup: Yeah. Yeah makes total sense. It's a very different browsing type of experience, exploration [00:20:00] versus where's that bill from, my
Kevin Yang: The core banking. That's right. Yeah, speaking of core banking, some of the things that we think about optimizing for, like it's for as far as engagement, sometimes I say it's one of those things that's easy to measure and hard to interpret is because for us, like engagement, if you spend more time on the app, that may not be, that may not be a good thing because we want to we ultimately want to build build trust.
We wanna be able-- you to be able to complete your core banking task easily. And for payment flow, maybe we make it simpler for you to be able to send a payment. But at the same time, we might also introduce speed bumps, so to prevent fraud. So it's hard. It's like you need to balance and need to in-check to see if what you're doing is is right for the customer and what we actually want to happen.
Ashley Stirrup: Yeah. Yeah, for sure. That makes a lot of sense. I make a lot of Zelle payments in my Chase app, and you don't want me doing it too fast and sending money to the wrong person. That's a g- way to create unhappy people.
Kevin Yang: Yeah, especially [00:21:00] Zelle. Zelle, there's people trying to trick you to send money. Yeah.
Ashley Stirrup: that's a great point too. I hadn't even thought about that.
Kevin Yang: Yeah that's one of the speed bumps that I was talking about that we had considered
to introduce to prevent people from getting getting tricked
Ashley Stirrup: That makes a lot of sense. So how do you see experimentation evolving at Chase going forward?
Kevin Yang: Experimentation, I think we're gonna... Just the industry as a whole, I think is gonna enter almost like a golden era, I feel like.
Like
with AI coming through, everybody is going to ship faster. But
if,
do you have the right infrastructure in place to measure those things that you're shipping out
is super important, and if you don't measure them right, your mistakes are going to compound.
So I think just the sheer fact that there's gonna be more things getting shipped, it's just going to increase the need to have more experiments. And you start to get into a world where things are [00:22:00] non-deterministic. I think that's also like another area where experimentation really comes in and need to figure out how to...
Like, how do you measure non-deterministically? That's something like I'm looking into, but I don't have an answer for.
Ashley Stirrup: Yeah, that's a super interesting topic. One of our customers, Khan Academy, th- they have a AI tutor, and so the metric they wanted to optimize was cognitive engagement. Is the student just trying to get the answer out of the AI tutor and so they can finish the class as quickly as possible? Or are they asking questions that show they're trying to get learning and trying to understand the problem better?
And they were able to show that as they optimized, played around with the prompts and different models and all that, they were able to increase cognitive engagement by 6%. Which back to your thing, super hard to measure. So they actually did a lot of training and labeling on what's a good question, a highly engaged question versus one that's not.
And I think that's a [00:23:00] super interesting aspect to LLM-powered apps. Some of them it's very easy to measure. Did they complete the form faster? That's an easy one. But it's was that a good chat experience? Was that a good customer service experience? Those are much harder to answer.
Did they just leave mad and decide they're gonna call the call center instead of getting it resolved in the app?
Kevin Yang: Yeah, it's the qualitative piece of it. It's something-- I think it's gonna be a really tough problem for the industry to really solve and figure out. And when things are going wrong, like why are they going wrong? And the evals and traces are, like just, yeah, it's hard to wrap your mind around
doing all of that
Ashley Stirrup: Yeah. It's also really interesting how, eval is good for QA. I put in this question, did I get the expected answer back? But it's not good at measuring how humans will interact with it with all the nine million different variations on how they ask the same question, and does it actually lead them to take the actions that you want them to [00:24:00] take?
Evals are really poor at that, and experimentation is just critical.
Kevin Yang: Yeah. Yeah. It's, Yeah, so we're hope- hopefully, I'll still have a job. We'll still have more things to do. Yeah and then, and making the right decision, which is like the really tough part of this business.
Ashley Stirrup: Yeah. I'm actually personally convinced that it's gonna create more jobs. AI's gonna create more jobs, not less. Yes, jobs will change. But when it comes to like building software, even before AI, people were only using 10% of the features of almost any software product. So it's not about churning out more and more features, which AI lets you do.
It's about how do you create the best, the most excellent experience, and that's where experimentation and really understanding your users and what they're trying to do is just so critical. Yeah, I think this-- Like you said, I think it's gonna be a golden age for experimentation.
Kevin Yang: Yeah, especially if you start to get into the world where people are starting to customize their own features allowed to do that, [00:25:00] then how do you measure that? Like how much you allow people to do, that's like another layer to think about.
Ashley Stirrup: . Yeah, especially when it comes to Chase's business. I think everybody has a different way of looking at their money and how they wanna manage their money and understand their spend and all those types of things. And the... I hadn't thought about that before, but that really opens up some exciting possibilities on how you could create better user experiences.
Kevin Yang: Yeah.
Everybody becomes a builder. Yeah. It's a fun time
Ashley Stirrup: Yeah. Thank you so much, Kevin, for coming on the show. I feel like we learned a ton and it's really exciting to see, you know, a billion dollars worth of progress in experimentation. That's a pretty incredible number.
Kevin Yang: Yeah, no, thank you. Thank you for inviting me. Happy to talk about it
Ashley Stirrup: Thank you
[00:26:00]