The Experimentation Edge

This episode of The Experimentation Edge shows how UPS's A/B testing program drove $500M+ in incremental revenue across 80+ customer-facing applications. Dave Massey — head of the J.E.D.I. team (Journey Experience and Design Innovation) — walks through the first test that proved UX could move revenue, how he defended counterintuitive results to skeptical execs, and how a small experimentation team can override opinion with data at enterprise scale.

Summary
Dave Massey walked into UPS in 2016 and immediately got pulled into a meeting about AB testing tools. By the end of the day, he owned the platform—and the problem: UPS hadn't run a single meaningful experiment. Three years later, senior leadership gave him a hard number to hit. Prove UX could move revenue, or the pilot dies. His first test—removing navigation from the checkout flow—delivered $35 million in incremental revenue. Senior leaders didn't believe it. They made him defend the results upside down and sideways. When the dust settled, the data held. Today, Massey's team has driven over half a billion dollars in incremental revenue by treating UPS.com like the e-commerce business it actually is.

Massey's approach is simple: test everything, especially what senior leaders think will work. His team, Journey Experience and Design Innovation (nicknamed J.E.D.I.), has built a reputation for saying no with data, not opinion. When a business unit demanded required recipient emails to capture customer data, J.E.D.I. ran the test in 24 hours and killed it. Conversion tanked. Two years later, the international team asked for the same feature—but framed it as a customs solution. That test passed. Same feature, different reason, different outcome. That's the edge Massey's team delivers: rigorous hypothesis design, a UX research team embedded in the experimentation workflow, and zero tolerance for untested ideas.


Timestamps
03:09 Dave's first day at UPS: inheriting an AB testing tool with no program 
05:59 Senior leadership's ultimatum: prove UX ROI or kill the pilot 
08:38 First test result: $35M from removing navigation in checkout 
09:48 Defending the numbers: how Massey's team survived scrutiny 
11:07 Why a data-driven engineering culture made experimentation inevitable 
16:12 Team size: 80 people supporting almost 80 customer-facing applications 
19:08 The 24-hour test: when required email fields killed conversion 
22:28 Why Massey embeds UX research inside the experimentation team 
24:41 AI at UPS: treating it as a tool, not a replacement 


Takeaways
- Massey's first test removed navigation from UPS's shipping checkout flow and delivered $35 million in incremental revenue—proving e-commerce best practices apply even when customers think "this is just a tool, not e-commerce." 
- J.E.D.I.'s win rate stays high because UX research and experimentation teams operate under the same leader, giving the program both behavioral metrics and voice-of-customer insight before tests ever launch. 
- When senior leaders push ideas, Massey's team tests them instead of arguing—then delivers results that either validate the idea or identify three better alternatives the data actually supports. 
- The same feature (required recipient email) failed for customer data capture but passed for international customs—proof that framing and customer benefit matter more than the feature itself. 
- UPS runs everything centrally now, but the real win is that demand for testing has decentralized—business units across the company now come to J.E.D.I. asking to test their ideas.


Connect with the guest
LinkedIn: https://www.linkedin.com/in/masseycreates/
Learn more about UPS: https://www.ups.com


Sponsor
Growthbook helps you ship features with confidence by bringing experimentation and feature flagging into one open-source platform. No more guessing whether that new checkout flow actually moved the needle, waiting weeks for data team bandwidth, or flying blind on rollouts.

Growthbook gives you a single place to run A/B tests, manage feature flags, and analyze results against your existing data warehouse.

With powerful stats built in, it takes the complexity out of experimentation, helps you catch regressions before they hit every user, and makes it easy to test ideas that keep your product improving and your metrics moving in the right direction.
See a demo at https://www.growthbook.io/


Topics: A/B testing, experimentation, conversion rate optimization, feature flags, UX research, e-commerce experimentation, statistical significance, experimentation team building, growth experimentation, sequential testing.
  • (03:09) - Dave's first day at UPS: inheriting an AB testing tool with no program
  • (05:59) - Senior leadership's ultimatum: prove UX ROI or kill the pilot
  • (08:38) - First test result: $35M from removing navigation in checkout
  • (09:48) - Defending the numbers: how Massey's team survived scrutiny
  • (11:07) - Why a data-driven engineering culture made experimentation inevitable
  • (16:12) - Team size: 80 people supporting almost 80 customer-facing applications
  • (19:08) - The 24-hour test: when required email fields killed conversion
  • (22:28) - Why Massey embeds UX research inside the experimentation team
  • (24:41) - AI at UPS: treating it as a tool, not a replacement

What is The Experimentation Edge?

How do product teams decide what to build and what not to? The Experimentation Edge is the podcast where product, growth, and engineering leaders share how A/B testing, feature flags, and experimentation drive real business outcomes — backed by named companies and real numbers. From DoorDash's 12,000 A/B tests a year to Atlassian's experimentation-led product win to UPS's $500M experimentation team, each episode goes deep with operators running experimentation programs at scale.

Hosted by Ashley Stirrup, CMO at GrowthBook and a 25-year executive in data and experimentation. For product managers, engineers, data scientists, and growth leaders at B2B tech companies who care about experimentation culture, statistical rigor, and shipping with confidence. No marketing speak. Just operators explaining what they shipped, what moved the needle, and how experimentation reshaped their teams.

Topics: A/B testing, experimentation, growth experimentation, product experimentation, tech experimentation, feature flags, experimentation culture, statistical significance, marketplace experimentation, conversion rate optimization, experimentation at scale.

Ashley Stirrup (00:01.902)
Welcome to the experimentation edge. We're excited today to have Dave Massey, head of user research, personalization and experimentation at UPS. And Dave, one of the things I think that's really interesting about your background is that you started it off on the digital side and then kind of walked into an interesting situation at UPS. Maybe you could tell us a little bit about all that.

Dave Massey (00:23.608)
Yeah, and thanks Ashley. I really appreciate it. I'm excited to be here and talk about my journey in experimentation. But yeah, before I came to UPS, I was on the advertising side of things and run experimentation through advertising. But when I came into UPS in 2016, so going on 10 years now, it was, you

conversion rate optimization, experimentation, like it was, it was really quite new to them as a, an enterprise really. And I, it's funny, the very first day I walked into the company, I got brought into a meeting where they were trying to decide on which AB testing tool that they wanted to go with. And I, you know, I basically stood up and raised my hand. like, you,

because there are a lot of things that were kind of misconstrued about what tools can and can't do and everything. And I kind of jumped up and raised my hand. then I guess they said, you know how this works? OK, so you get to play with it now and also own it. So kind of moving from there, it was a slow go. We had a testing tool, Adobe Target, and we ran

A test I ran the very first test on ups.com. It was very, very tiny button change kind of thing. And then it kind of just fell to the wayside for a little bit because you know other business things got in the way and then cut to 2019. I get called in by some of our senior leaders and. Basically, and I will say we had really great senior leadership and kind of believing you know what could be done here.

But they're like, listen, we hear this whole UX thing is probably something we should consider. But just like any big company, know, UPS is, you know, kind of big. Or like, well, we need to prove that it's worth it, right? We got to make sure that the juice is worth the squeeze. And so with that, I was basically given the keys to run a pilot.

Ashley Stirrup (02:33.197)
Kind of.

Dave Massey (02:48.627)
On testing out UX improvements to see if it could if it could move the bottom line. I was given a big old dollar amount revenue amount that I had to hit. And it was given one other guy at UPS with me and then some support from a vendor and. That was, you know, kind of how it got dropped into it, but.

it turned out to work out really, really quite well because our first test pretty much met our revenue goals. it was quite interesting.

Ashley Stirrup (03:26.061)
That's pretty incredible. Yeah, yeah, don't come out, usually you're not gonna come out of the gates and your very first test is gonna have that kind of results.

Dave Massey (03:35.542)
Right well and full transparency, you UPS you guys can look up the SEC filing, but I mean we. We do a fair you know share of revenue and. And this is something even I had to kind of get my head wrapped around coming from. The agency side of things and I mean, I dealt with big customers anyway, like you know Marriott and so forth, but. The sheer volume that UPS does in business is.

Remarkable and you know, so a 1 % increase here or there at UPS. I mean that's that is not couch cushion money.

Ashley Stirrup (04:06.7)
Yeah.

Ashley Stirrup (04:14.103)
That's right. That's right. And, you know, in the years since you've been doing all this, you've been able to drive a significant amount of revenue for the company, right?

Dave Massey (04:22.207)
Yeah, so since we started the team, I mean, at this point we have delivered over half a billion dollars in incremental revenue. And that's not even necessarily accounting for some of the savings that we've also generated. But yeah, we know how to make some money.

Ashley Stirrup (04:40.353)
That's pretty incredible. Have there been certain things that you've done in particular that have really led to that kind of game?

Dave Massey (04:49.079)
So I guess I was kind of, I was set up for success essentially when we started this pilot because when we started testing these things, like again, a lot of, I know a lot of AB, you know, and experimentation teams, you know, they'll run campaigns, right? And they'll run AB testing on campaigns and full transparency. That's what I did on the agency side, right? But that's not where we started, right? We started in,

Ashley Stirrup (05:12.396)
Right.

Dave Massey (05:17.749)
the shipping tool for ups.com. So in other words, the moneymaker for our digital channels. And we are looking at improving the interface in that tool and to see how that could help the bottom line. And that is where it was easier for me to kind of show those gains versus, you know, through an ad campaign to be quite honest with you.

Ashley Stirrup (05:45.665)
Right, right, right, right. And you kind of applied some e-commerce best practices to that whole shipping experience, right?

Dave Massey (05:53.654)
Yeah, yeah, absolutely. You know, I know most people don't think of UPS as like as an e-commerce company, right? But we do have a flavor of that because again, if you go and ship a package with UPS on UPS.com, I mean, that's e-commerce and every sense of the word, right? You've got a flow that happens there. You've got to check out, know, given it may not be something physical. I guess you the label is physical. So you're buying a label really is what it boils down to.

Ashley Stirrup (06:19.891)
Mm-hmm. Yeah.

Dave Massey (06:24.225)
But yeah, so in our shipping flow, when we started, it wasn't thought of as e-commerce at that point in time either. It was looked at like, hey, this is just a tool. No, it is e-commerce. So yeah, one of the first things that we did was apply it like the most basic of e-commerce strategies to that checkout flow essentially for shipping.

Ashley Stirrup (06:35.009)
Yeah, right.

Ashley Stirrup (06:49.163)
Yeah, yeah, it makes sense. know, anytime you can find best practices from an adjacent industry, yeah, you want to take advantage of those. So you have a fun name for your team. Maybe you could share that and tell us where it came from.

Dave Massey (07:01.175)
Yeah, so our legal team I have to say it all out loud before I give you the acronym, but it's the journey experience and design innovation or Jedi team so. But it's J period E period. Again, I don't want the lawyers to get freaked out on me.

Ashley Stirrup (07:21.015)
That's funny. Got it. I understand. Yeah, yeah, yeah. Well, but I hope the other groups, when they have a hard problem, they say, let's bring in the Jedi team.

Dave Massey (07:32.503)
That's 100%. We have actually over the years, we have created that sort of reputation within the organization. And yeah, it's fun. like, hey, has Jedi tested it?

Ashley Stirrup (07:47.211)
Yeah, that's great. That's great. So maybe you could tell us about that first test, maybe a little more detail on what you did there.

Dave Massey (07:55.146)
Yeah, I mean, and it wasn't. It wasn't rocket science, it was. It was simply going again back to the basics of once a customer enters the checkout flow, you remove distractions and so that's what we did. So you know we basically removed navigation tools and stuff like that. Once somebody entered the shipping tool to get them to continue through the flow and not get distracted and go go somewhere else, right? It's basic stuff.

I was told before I launched the test that I was crazy. It wasn't going to help anything. It was just going to make customers angry. Well. I did it anyway. Because one of the great things is and say listen, this is just a test like we we learn from every single test we learn what to do and what not to do right? So there's no such thing as a bad test. So when we ran that first test, yeah, it turned out.

Doing that increased conversion rate to the tune of around $35 million over the course of a year.

Ashley Stirrup (08:59.264)
Wow, that's incredible. And I imagine that really changed a lot of people's perspectives. You know, when you have that first surprising result where people are convinced, that's not going to help. And then suddenly it does. I assume it was a wake up call for everybody.

Dave Massey (09:11.063)
It was a wake up call, but I will say we had to defend it because, know, showing a number like that, you know, right out the gate, everybody's like, there's no way we don't believe that. And to my team's credit, you know, the data folks that we've got, and this is, this is so critical for any experimentation team. You gotta have the best data team you can afford. But

Yeah, we had to defend it upside down and sideways, know, had holes poked in it, everything else. And then when the dust settled, they're like, yeah, this is legit.

Ashley Stirrup (09:48.373)
Yeah, that's great. And so I imagine you went on, you did a few more tests and some were winners, some were losers, but I would imagine after you got your kind of your second win and your third win, people started to really think, maybe we've got something here.

Dave Massey (10:01.355)
Yeah, no, it's true and it's I say a lot of times that the business now knows that they ignore Jedi at their own peril. So I'm because listen there. There have definitely been things that we have tested that you know I've come from senior leadership and this is this is something that I also pride my team on myself and my team is being able to push back on our senior leadership, but with data right? Because.

If you just go to a senior leader that has an idea like, I want to do X, Y or Z. And you say, well, that's just a terrible idea. You're just difficult to work with. Right. But if you come back and say, okay, we tested your idea and it did not work well, but we learned these three other things that we can do to kind of accomplish the same goal. That changes that conversation. And it makes it has made us really kind of the center of excellence when it comes to.

proving out what to do first and foremost for our customers, but also for the bottom line.

Ashley Stirrup (11:07.36)
Yeah, that's great. And it's interesting thinking about UPS, because on the one hand, people weren't that familiar with A-B testing when you joined. You didn't have that culture. But you did have a very data-driven culture there. And so that's a rich environment for somebody to come in with A-B testing, show a new source of data, and get people excited.

Dave Massey (11:21.057)
Very, yep.

Dave Massey (11:30.325)
Yeah, no, it was. mean, yeah, we are an engineering company like at our core. So, you know, for there, if you can't measure it, you know, it doesn't matter. So, yeah, we have we have data all over the place. But that that was also it was a blessing and a curse because, you know, again. That data lived in all sorts of different other places. When I first walked in the door, we were using a data tool for our for UPS dot com. They literally you open it up.

Ashley Stirrup (11:50.817)
Yeah.

Dave Massey (12:00.318)
You want to run a report and you come back after lunch and hope it's done. Right. So it, and it, and there was no backward. mean, it was no, it was just not good. So when we made the shift to a more modern sort of analytics tool, it made us that much faster. And it just turns out that when we kind of launched this team as a pilot, that's also what was happening as well. So we were able to get that.

Ashley Stirrup (12:10.251)
Yeah.

Dave Massey (12:27.221)
really granular data and kind of start connecting those dots. And it was really a game changer.

Ashley Stirrup (12:37.492)
Yeah, yeah, I can imagine. Because if you've got to run a report and wait an hour, it's pretty tough to be iterative. okay, what did this mean? And all that. So yeah, that's, that's pretty interesting. So just coming back to eCommerce best practices again for a second, as you've kind of continued to experiment, have you found a pattern where certain things from eCommerce really transfer over well and other things you're like, no, UPS is different, you know, our use case is different.

Dave Massey (13:04.523)
Yeah, I I guess that's a little bit of a hard question to answer only because, you know, and this is this is what I've said since day one is, you know, we we are testing humans, right? And humans are humans here, Europe, Asia, wherever it is. Right. So we've got some tendencies that seem to, you know, that are kind of baked into our DNA and

Most of the things that we have looked at in terms of those, and I'll say UX patterns, not even necessarily e-commerce patterns, but just UX patterns that we know work with humans and around choice architecture, those sorts of things. Those standards, I mean, they apply to us as it would apply to any other e-commerce business.

Ashley Stirrup (13:57.813)
Yeah, yeah. And I would imagine, you you've got a super wide range of users where it's somebody who maybe ships a package once a year and then somebody who works at a company who's in your tools all day long. And so like their expectations and their use cases and what they're doing in the tool, I imagine, is pretty different.

Dave Massey (14:03.574)
Mm-hmm.

Dave Massey (14:19.701)
Yeah, and that is, mean, given our primary customer on ups.com and our digital channels really are the SMBs, right? We've got big enterprise customers that, know, but they've got other systems kind of baked in that they use. But then, yeah, you do have your kind of one-off occasional type of shipper. But everybody tracks, right? And that's one thing that does make it

An interesting balance for how we do things, but this is also. Honestly, this is where we have the most opportunity moving forward is on kind of the personalization side of things and understanding who our customer is better and they give them the right experience at the right time. Because maybe one day. They're showing up to ship something to grandma, right for a birthday. Or in the next day they they're at their day job and they're the shipping manager.

And so again, it's being able to kind of understand that. And that's where experimentation and audience targeting and all those sorts of things, they all really kind of can come together.

Ashley Stirrup (15:29.226)
Yeah, yeah and and you're supporting a pretty wide range of products there, right?

Dave Massey (15:36.204)
Yeah, I think I think last time I checked there there were almost 80 different customer facing applications that live on some sort of UPS digital property. I would love to be able to support absolutely all of them, but you we don't. We have a. We have a good sized team, but we can't support all those things, so we do kind of have to look at that. Customer and business benefit and you know we we've got a.

We don't have an infinite dance card, so you know we've got to kind of prioritize that.

Ashley Stirrup (16:06.975)
Yeah. Right. How big is your team?

Dave Massey (16:12.065)
So it depends kind of on the day, you know, because things do shift up and down depending on kind of what season that we're in, but between designers, you know, data folks, developers even, and, you know, our full-time contract vendor, it's around 80 folks.

Ashley Stirrup (16:33.163)
Yeah, it's a big team. But for the size of a business at UPS, I'm still, it sure feels small.

Dave Massey (16:38.071)
Yeah, I mean for as big as we are, we could be twice that and still not be able to cover everything we

Ashley Stirrup (16:45.605)
Yeah, and so what's the model there? mean, do you try to run everything centrally or are trying to empower the teams to do it?

Dave Massey (16:54.081)
So right now we do run things centrally from the experimentation side. I would love and you know, there very well may be a possibility of getting this sort of thing expanded in the larger part of the business. But the one thing I can say, even though we may not have decentralized the act of experimentation, we have decentralized the

want and need for testing because now all sorts of different business units and product teams are coming saying, listen, we want to test this. So the fact that we can't test everything that we want to test is great in my eyes because that means people understand it and understand the value of

Ashley Stirrup (17:28.149)
Yep.

Ashley Stirrup (17:44.295)
Yeah. Could you tell us about a time when you ran a test that didn't work the way you expected and maybe you had some new learnings?

Dave Massey (17:52.022)
yeah, I for one, I tell my team all the time, you know, my gut is wrong more than it's right. So what I expect to turn out in a test is one thing, but I will say probably a more concrete example of something that.

I guess the business like senior leadership in the business was they were really. Keen on getting some more customer information to kind of help us as a business serve them right? And it was an instance where. We are are shipping product team actually. Had a request in to build. A functionality to require. The recipients email.

Right? So whoever you're sending a package to having their email make it a required field to complete that shipment. And we had real issues with this because, know, again, not necessarily everybody will have the other person's email or whatever. We don't necessarily want to give it. And. You know, the business is like, no, no, we need this. We need this so that we can get customer data and we can retarget and blah, blah, all that good stuff.

Ashley Stirrup (18:56.213)
Yeah.

Dave Massey (19:08.053)
The product team was worried about it and we ran the test and it was today is to date the shortest test we have ever run. It was up for about 24 hours. We saw such a decline in conversion on shipping on.com that we pulled the plug on it and we went and we had to go back and tell the business like yeah, you can't like this is not something that is a good experience and therefore it will cost us business. Now cut to.

Ashley Stirrup (19:27.135)
Yeah.

Dave Massey (19:38.048)
A couple years later. Our international shipping team wanted to do the same thing, but they had a completely different reason for it. In terms of shipping cross border, customs is a big deal, right? And it costs people a lot of money, time and effort in getting their shipments. And they came to us and said, hey, listen, we want to. Capture emails so that we can help with customs because when something gets held up at customs, they don't go back to the shipper, they go to the recipient.

Right? They say, listen, you've got to deal with this. And at that point, when we would make it required for an international shipment and we put in there, this is to help it get through customs. We saw no issue. Nobody had a problem with it because we gave them the reason why it's not just like, hey, we want this because we want this.

Ashley Stirrup (20:23.113)
Yeah. Right.

Ashley Stirrup (20:28.381)
Yeah. And what's your approach in general if you have a test that fails? how do you, do you, you try to, know, when do you decide to iterate? How do you decide to kind of learn more about, okay, this failed, but why did it fail? That type of thing.

Dave Massey (20:43.125)
Yeah, I mean, I feel like that's our every day, right? I mean, and you know, I guess that there's no such thing as a bad test because we we learn one way or the other, right? If something completely goes the opposite direction of what our hypothesis was, then we back up and we honestly we get we go to the data and say, listen, OK, why did that happen? And getting to the Y is incredibly useful, and this kind of goes to why.

You know, we've got a pretty high win rate and part of that is because. Our UX research team and our experimentation teams are all under me and so they're all they all work together. And having the kind of voice of the customer along with those behavioral metrics. Gives you that real 360 view. And it's incredible because we can kind of anticipate.

more before we even get to the A-B testing side because we've heard customers say, hey, this is a problem, this is not a problem. And when we do have one of those kind of, I guess I don't want to say a failure, right? But just something that didn't go the way we expected, we can go right back and talk to customers and ask them, hey, why do you think this didn't work out well for us? Right? So it's an incredible thing. And I got to tell you, I don't know,

I've talked to a few other leaders of their experimentation programs and other companies. And when I tell them that we have our UX team kind of connected to the hip of our experimentation team, you know, it kind of blows their mind. Well, I mean, that's to me, it seems like that that's like 101. You would have to have that right. But that's just that's just me.

Ashley Stirrup (22:28.158)
Yeah.

Ashley Stirrup (22:33.14)
Yeah, yeah, that makes a lot of sense. As you look forward, how do you see the experimentation program evolving at UPS?

Dave Massey (22:41.217)
So I would, I do hope we are able to have the technology where we can let other teams play, but still give them the best practices and standards that our team uses. Cause I will say that's, that's another thing too, that my team at UPS is known for is the rigor that we go through in terms of kind of setting up a test and make sure we're not, you know, blowing anything up, breaking anything with IT. But also,

looking at the results and being as unbiased as possible because it's like, we have no problem saying, hey, this isn't the right thing to do, sorry, or it is the right thing to do. And kind of setting those standards for the rest of the organization, that I think is more crucial than anything. But beyond that, it's like we need technology that is simple enough for our business.

you know, other business units to be able to do this sort of thing. Because right now, I mean, it's, you know, I've got a pretty heavy tech team to help us kind of run these things.

Ashley Stirrup (23:45.522)
Yeah, right. AI is always top of mind with everybody. Are you thinking about ways to apply AI to the UPS experience?

Dave Massey (23:55.432)
Absolutely, in fact, UPS, you know, again. Being a kind of an engineering company at its core, we've we've been playing with AI since before it was really a buzzword. But yeah, my team as a whole look at it as a tool, right? It's not a replacement for anything that we do, and it's it's not. It is just something to make us more efficient. You know, get through results faster, those sorts of things. And it is.

we emphasize the human in the loop, right? So there's nothing that AI generates that just, you know, do not pass go, right? It has to go through its checks, just like something that, you know, a human on our team would have to go through.

Ashley Stirrup (24:27.72)
Yeah.

Ashley Stirrup (24:35.252)
Right.

Ashley Stirrup (24:41.406)
Yeah, yeah, AI is such an interesting topic when it comes to experimentation, because it literally has the potential to affect every step of the experimentation process, from the coding to have I actually created my hypothesis correctly, to what do these results actually mean?

And then even the product itself could have AI in it. So it's pretty incredible all the different opportunities there are to apply AI to experimentation. And it's interesting as I talk to a lot of different guests, people are all over the map on their journey. And some are really strong in one phase and some are really strong in another. And it's going to be pretty exciting as the whole thing comes together and the whole industry lifts up to another level.

Dave Massey (25:30.347)
Yeah, and that's where, you know, I kind of mentioned that we really, we want the technology to continue to improve so that, again, we can expand this idea of experimentation throughout the organization.

Ashley Stirrup (25:44.306)
Yeah. Well, thank you so much for joining us today. This was a fabulous episode. I feel like I learned a lot.

Dave Massey (25:47.904)
Absolutely. was a pleasure.

Dave Massey (25:52.119)
Well, I hope I can I can help there help somebody but yeah, this was great. I really appreciate it actually.

Ashley Stirrup (25:59.274)
I'm sure you inspired a lot of people.

Dave Massey (26:01.623)
Well, I hope so.