The Experimentation Edge | How Fin (formerly Intercom) went from weeks to hours of analysis using AI

Summary

In this episode of The Experimentation Edge, host Ashley Stirrup sits down with Raunak Kumar, senior manager of GTM analytics at Fin (formerly Intercom), to unpack how experimentation actually works when the data is messy and the traffic is thin. Drawing on nearly 12 years in marketing analytics across Atlassian, Stripe, and Fin, Raunak explains how AI tools like Claude Code have collapsed analysis from weeks to hours and freed his team to clear its experiment backlog, why declining organic search traffic and a 5x jump in untagged ChatGPT referrals are forcing teams to rethink attribution, and how the most valuable experiments are often the ones that "lose." From a Jira Service Desk bundling test that won on trials but had to be rolled back, to a Stripe contact form that was quietly blocking real buyers, this conversation is a practical guide for product managers, engineers, data scientists, and growth marketers who want to learn more from every test they run.

Chapters

0:45 Welcome and what the show is about
1:45 Raunak's role and 12 years in marketing analytics
2:45 How AI and Claude Code changed the analyst's day
4:15 LLMs, declining organic traffic, and the 5x ChatGPT jump
5:15 Two kinds of experiments at Fin: on page and off page
7:15 The Jira Service Desk bundling experiment
10:45 Why the trial winner became a rollback
11:45 Contextual onboarding turns the loser into a winner
14:45 Reading an experiment that loses
18:45 What's next: incrementality, connected TV, and testing creative

Takeaways

AI has collapsed marketing analysis from weeks to hours, and the real payoff is a cleared experiment backlog plus analysts who compete on the questions they ask, not the speed they query.
Organic search traffic is declining as ChatGPT, Gemini's AI mode, and Claude answer buyers in place; Fin saw a 5x rise in ChatGPT referrals, but LLMs don't tag that traffic, so attribution has to be proven through experiments.
A guardrail metric saved Atlassian from a costly mistake: bundling Jira Service Desk lifted trials more than 50 percent but tanked activation and paid conversion, forcing a rollback.
A failed test can hold the real winner; contextual onboarding matched to user intent roughly doubled activation and became the default variant after the bundling experiment was rolled back.
In low-volume B2B, read losing experiments for sub-segment signal; a "failed" Stripe form simplification revealed the form was blocking legitimate small-business buyers using Gmail.

Connect with the Guest

LinkedIn: http://linkedin.com/in/raunakkumar1991
Website: https://fin.ai

Sponsor
Growthbook helps you ship features with confidence by bringing experimentation and feature flagging into one open-source platform. No more guessing whether that new checkout flow actually moved the needle, waiting weeks for data team bandwidth, or flying blind on rollouts.

Growthbook gives you a single place to run A/B tests, manage feature flags, and analyze results against your existing data warehouse.

With powerful stats built in, it takes the complexity out of experimentation, helps you catch regressions before they hit every user, and makes it easy to test ideas that keep your product improving and your metrics moving in the right direction.

See a demo at https://www.growthbook.io/

What is The Experimentation Edge?

How do product teams decide what to build and what not to? The Experimentation Edge is the podcast where product, growth, and engineering leaders share how A/B testing, feature flags, and experimentation drive real business outcomes — backed by named companies and real numbers. From DoorDash's 12,000 A/B tests a year to Atlassian's experimentation-led product win to UPS's $500M experimentation team, each episode goes deep with operators running experimentation programs at scale.

Hosted by Ashley Stirrup, CMO at GrowthBook and a 25-year executive in data and experimentation. For product managers, engineers, data scientists, and growth leaders at B2B tech companies who care about experimentation culture, statistical rigor, and shipping with confidence. No marketing speak. Just operators explaining what they shipped, what moved the needle, and how experimentation reshaped their teams.

Topics: A/B testing, experimentation, growth experimentation, product experimentation, tech experimentation, feature flags, experimentation culture, statistical significance, marketplace experimentation, conversion rate optimization, experimentation at scale.

raunak_kumar-9e09280c-972b-45dc-ad70-1-CFR
===

[00:00:00] Welcome to the Experimentation Edge, where product managers, data scientists, and engineers talk about how they make smarter decisions. I'm Ashley Stirrup, the chief marketing officer for GrowthBook, and in each episode, I'll sit down with an executive to unpack how they use experimentation and A/B testing to make better decisions.

This show is sponsored by GrowthBook, the open source experimentation platform leader. Now let's jump in and get started with our next guest

Ashley HOST: Hello, and welcome to today's episode. I'm excited to have Raunak Kumar, senior manager of go-to-market analytics from Fin, and Fin is formerly Intercom. Welcome to the show, Raunak.

Raunak Kumar GUEST: Thanks, Ashley. Really excited to be here and chat a little bit more about my background and talk all things experiments.

Ashley HOST: Sounds great. Could you tell us a little bit about your role at Fin?

Raunak Kumar GUEST: Yeah. I currently am the senior [00:01:00] manager of GTM Analytics at Fin. My team charter is to work with our demand gen and brand marketing team and provide them with all the analytics and data needs. So my team is responsible for building all things attribution, measurement frameworks for all the different marketing campaigns and channel, and really make sure that our stakeholders and marketers are, like, taking data-informed decisions.

I also support the team run experiments which I feel like we'll talk a lot about in this chat and really make sure that marketing at Fin is kinda optimizing for the best thing in terms of acquisition and, driving our funnel forward.

Ashley HOST: And you have a really great background with time at Stripe and Atlassian as well, yeah?

Raunak Kumar GUEST: I did, yeah. So before Intercom and Fin, I was at Stripe where I was there for four years doing similar stuff in B2B marketing analytics. Different industry obviously fintech. And before that I was at Atlassian, where I was there for five years. Started as a, as [00:02:00] a, as an IC and then bits and pieces, but essentially I was doing a lot of product marketing and like brand marketing analytics.

So that's been my journey. I've been in like marketing analytics space for close to 12 years now.

Ashley HOST: Yeah. And I bet the job's changed quite a bit with the rise of ChatGPT and all the other LLMs.

Raunak Kumar GUEST: Yes. I feel just in the last four or five months, especially since the advent of Claude Code, things have changed a lot. Mainly like I'll just try to keep it short, but, what used to take weeks and days of like data analysis work now can be done in a matter of hours. But what that means is we also need to be differentiated as data analysts.

So it has freed up a little bit of time on our side to do things which we might not have done otherwise. And one of those things are actually experiments. Usually we would have a lot of backlogs of experiment we just couldn't get to, but now we are able to run these more experiments and also the the time to insight is improved a lot.

If let's say an, a question came out from a stakeholder [00:03:00] like what's happening in the funnel back in the day, it would take us like write a SQL, create a pipeline, et cetera. Now, like with the advent of Claude and a bit of work, and I can talk to that if you want to, like we are able to like speed up those insights very rapidly.

So things have changed quite a bit.

Ashley HOST: Yeah. That side is super exciting. I hear from a lot of folks that, organic traffic is down attribution has gotten a lot harder to track 'cause so much of the buyer's journey is now going on inside of the LLMs. Are you seeing that as well?

Raunak Kumar GUEST: We are. It's so interesting that what's happening industry-wide is reflecting on pretty much everyone I talk to, including us. So typically, organic traffic represented like the demand for your product and like people coming to your website was a good representation of that.

But with the advent of, LLMs and ChatGPT and like Claude, etc., Gemini and AI mode actually is the biggest one. The information-- the informational queries and all the information [00:04:00] about your product is being served right then and there. So it removes the need for folks to click through and land on your website.

And hence we are starting to see like a decline in what we used to call organic search traffic. But that's been compensated by other form of traffic, right? Like direct traffic or people coming in from other sources. Like we actually saw a 5X increase in our ChatGPT referred traffic.

So I think it's compensating. But I think we also lost a little bit of tracking in the process because these LLMs don't necessarily like tag the traffic coming in. So-- And that's one of the areas I've been doing a little bit of experimentation to prove that what we are putting out in the LLM world from a content is actually working, just because we don't have straight forward tracking on it.

Ashley HOST: Yeah. Yeah. Super interesting. We could have a whole episode just on that. Could you tell me a little bit about experimentation at Fin?

Raunak Kumar GUEST: Sure. So as I mentioned, I support the marketing side of the world here. So typically we have-- we also have a very decent amount of product analytics and ML [00:05:00] team at Fin. So they've done a lot of sophisticated experiment which I am personally not qualified to talk about, so I'll stick to the marketing side.

At marketing... Cool. So at marketing we run two types of experiment. One is I call them on-page, on-site experiments. So let's say we wanna redesign our contact sales page form or, we wanna move things around, like we usually run A/B test on the webpage. So that's one side and then the other side is pre-website.

So let's say we have campaigns different ad copies out there which we want to test, like what's working, what's not working, and I support the team run those experiments as well. So those are like how I categorize them. A-and it all depends on like how many we are running and what's the priority at any given time.

But those are two areas I am involved in.

Ashley HOST: Got it. And how does it typically work? I- if you're having a team run a, an experiment is it you? Is it a marketer? Is there a data scientist involved? H- how are the, what are the different [00:06:00] roles?

Raunak Kumar GUEST: Yeah. I think that's also one of the things which have changed with AI. Traditionally, we would have a data analyst, a data scientist, and what we called as a web engineer who would actually configure things. But I think right now at FIN, like I am the data person, and then I just work with one other web engineer to work out, what we want to test as an example or like I work with a marketer.

And then I am able to do a lot of the the sizing of the audience, creating the queries, et cetera, with the help of Claude. So that's the setup, like one, one data person and one stakeholder, like a web engineer or a marketer. And then we have a lot of tools, right? Like we have, I can go into the depth of that, but essentially we use a lot of third-party tools which helps us like orchestrate these experiments.

Ashley HOST: Got it. It sounds like you're very lean and nimble on these, so that's terrific.

Raunak Kumar GUEST: Yeah

Ashley HOST: Yeah. Could you tell us of a time when you had an experiment where you had a lot of learnings?

Raunak Kumar GUEST: [00:07:00] Sure. So I'd like to talk about a prior role experience, which was at, back at Atlassian where I was supporting our product team namely the Jira Service Desk team. And for folks who don't know about Jira Service Desk, like Atlassian typically is a well-known brand for project management, which had like leading products such as Jira Software and Confluence.

But during my time supporting this product, which is the Jira Service Desk, which is suited for the IT products, and it was actually one of the fastest growing product at Atlassian, and this is back in twenty seventeen, eighteen timeframe. The the experiment which we wanted to do is we wanted to supercharge the the trajectory of signups for Jira Service Desk.

And what the hypothesis we had was if we bundled the product together with our core offering, which is the Jira Software and Confluence, we might see an increase in signups just because we would tap into the existing existing traffic, [00:08:00] right? Like we don't-- we didn't necessarily had to go and like generate traffic because Jira Software is already well-established at that point.

Fifteen years of brand equity,

Ashley HOST: and so just so I'm clear, there's the traditional Jira product and then there's the Jira Service Desk product as well. Is that right?

Raunak Kumar GUEST: That's right. And the distinguishing was like software versus service desk

Ashley HOST: Got it. Got it. Okay. And so you wanted to bundle those two together

Raunak Kumar GUEST: Correct.

Ashley HOST: And what happened?

Raunak Kumar GUEST: Yeah. And so the hypothesis was it'll lead to an increase in sign-ups. And then the the counter-metric we put in place was that retention should not drop on the core offerings, right? So we, we shipped this we ran it for three, three-plus weeks. And I did a lot of work at the time in concert with data scientists to figure out how do we roll it out because it was, the main of-- the main thing where people would sign up.

So we actually designed it carefully where we roll-- we did a measured rollout where we started exposing the test group of the [00:09:00] bundled offering to twenty percent of the traffic and then gradually increase it to fifty percent. So that was like the whole layout. Okay. So then what we found out three weeks later is like our sign-ups for Service Desk increased dramatically just because anyone and everyone who would sign up on the bundle would get a instance for Service Desk.

So like it, it-- I'm not able to recollect the exact figures, but I remember very vividly that there was a step change increase in s-in Service Desk trials more than fifty percent or something to that tune. But the main learning at that point was the activation drop. And like our counter-metric which we designed was a re-- the activation and the retention.

That took a real solid hit, especially for our software product because what happened was folks were getting lost essentially. The short of it is because we were like, bundling things together, people were just not sure what are they actually trying in the product.

It became clear when we started looking at our activation rate metrics which is, the metric we used was like week two [00:10:00] active, are they actually coming in and logging back in the product two weeks from their sign-up. That dramatically was down compared to our control group.

The major learning there was that like onboarding and activation messaging matters a lot and just by bundling things together, you might see an increase in trials, but like it, it takes a hit on your activation. And then I also followed up with paid conversion rate, like of the sign-ups, how many actually convert to a customer.

Like at the time they, they had to pay us like twenty dollar a month. That rate also dropped between the control and experiment. Essentially that was the major learning. So we kind of-- Even though we sh-we-- It was a winner because my goal, primary objective in this experiment was to see an increase in Service Desk trials, which clearly I did.

I mentioned fifty percent. My, my minimum detectable was ten percent, so it was a winner. But like the counter-metric was significant enough that we actually had to roll back this experiment just [00:11:00] because it would impact our revenue targets, which was a sensitive thing at the time. So that, that's the high level o-of the whole

Ashley HOST: Got it. Sounds like some important learnings that your buyers getting confused, probably not sure when they're in the core product, when they're in the service desk product. Did you try to iterate on that? What were the kind of the follow-on next steps?

Raunak Kumar GUEST: Yes. So the immediate next step we did was we ha-- we were using a tool called Amplitude back then, was to try and s-- learn through some session and heat mapping just to understand, what was the journey of these bundled trial people, like what were they clicking. And like after some qualitative and studying the sessions, doing some more in-product logging and ex-- tracking, we figured that the vis-- like the onboarding flow was a little bit confusing.

Folks were just not sure when-- how to navigate to service desk because like we were bu- we were bundling the product, but it was like hidden on the left sidebar. So th-that insight was was-- led to us [00:12:00] working our way and iterating on the onboarding flow. Essentially, guide them a little bit better.

So I'm sure like you might have come across at some point where when you log into a product, it get-- it-it's called like guided pro-- guided onboarding, where like it'll-- something will pop up and it'll tell you exactly what it is. So th-that was the iteration. And then we redid-- relaunched the experiment with the revised onboarding.

And then other thing I also want to call out, which was cool at the time, is we worked with our engineering team to do something called as contextual onboarding, where we actually took the keywords coming in like folks would type in "I'm interested in project management," or, "I'm interested in Agile Scrum management."

So we were able to detect the keywords and then serve up those exact keyword-based template of onboarding. Because, in project management, like either people are interested in Agile or like Scrum. So once we know the intent of the audience, we could actually come up with the exact landing experience.

So it's called cont-contextual onboarding. So these are the two main things we were able [00:13:00] to dig up through the data and then redid-- redo the experiment with the corrected or I would say the enhanced onboarding flow. And I can talk to the results as well if you

Ashley HOST: Yeah. Yeah, please.

Raunak Kumar GUEST: Yeah.

Ashley HOST: how'd it go?

Raunak Kumar GUEST: It actually it, it-- the biggest winner of these two was our contextual one.

Even though both of them, we did see increase in service desk trials the contextual one was l- was very good for activation rate because we kinda matched the intent of the audience like for and one of the other, pain points of Jira product was like there's a lot to do.

You can do many things. So like we, we-- this experiment helped us also identify that we should go after a use case for our target acq- like acquisition of customers. And in, in this case, our activation rate again, I'm not able to recollect the exact numbers, but doubled or something.

And we also saw steady increase in Jira Service Desk customer count and and as a result of that we made it the default [00:14:00] production variant. So that was exciting that, after a failed experiment and then following up with some qualitative and l- learning together with the product team we didn't give up, came back at it, and found something pretty unique for our onboarding experience.

Ashley HOST: Yeah, I love that. It's such a great example of unlocking the, a business and the learning and, taking a winner-- a loser and turning it into a winner. Just think how many teams would say, "Oh we rolled out the new thing," and maybe they're, looking at some of the short-term metrics and saying, "Look, this is a huge win," and then they move on, and then they missed the true winner that you're able to uncover.

So that, that's a fabulous story. I-

Raunak Kumar GUEST: Thank you

Ashley HOST: in, in general how do you approach experiments that lose? Like, when do you decide, "Okay, this is what I really wanna dig in on," and when do you decide, "Okay, this is a loser, I'm just gonna move on"?

Raunak Kumar GUEST: Yeah, that's a hard one. So essentially, like there are enough-- [00:15:00] So a well-designed experiment honestly is a one where you are able to detect some something quickly enough. Again in B2B especially where I play, like the biggest headwind we have is the lack of traffic volume.

If, had I been in B2C, you'd see millions of visitors, but in B2B it's like usually thou-thousands as a reference point. I think designing the experiment in a way where we have some leading indicators a-and if we can see some meaningful shift in those metrics at least even if they're not stat sig, but let's say we saw three percent increase in visit, and we were hoping for five, like close enough.

That I feel like I always look for something more in the data, even though like we call it a loser just because we didn't meet our threshold which we decided to. But I like go and look at some of, again, the things I mentioned. Like I'll try to... So we log everything in our warehouse, like at a user level, like obviously anonymized, but like at a user level.

So I'll try to see some engagement metrics. I like to look at some of the sample [00:16:00] session data just to see if there's anything interesting happening there. But and on the flip side I've have had-- I've had experiments where for example, I can share one with you where we this was at Stripe, where we were trying to redo our inbound contact sales form and make it, more in line with the industry practice where, typically you don't ask a lot, where you just simplify the form of on name, email address, et cetera.

So over there, we actually made the form like smaller, but still we were like not seeing an increase in the contacts and submission. So clearly something was still not working. Like we thought by just simplifying the form, we'll see an increase in contact sales, but it was not working.

So that was like an obvious one for us to okay this variation is not gonna work. So like, how do we like not waste cycles and stop it and then like re-re-attack the problem? So I guess my answer to your question is try to study the experiment data. I know like people might call on me that, "Oh, you're doing early peaking and [00:17:00] stuff."

Like typically you're not supposed to like peak ahead of time. But in, in traditional, like in real world when you know you are against the the quarter timelines and like executives are asking you like, "Hey, what's going on?" You wanna have some intuition, and that's how I approach it.

Ashley HOST: Yeah. Yeah, it makes sense. Yeah, the whole peaking thing is tricky, but as long as you're not doing anything, as long as you're not ending the experiment early peaking isn't terrible. Yeah, and I really liked your point on the hypothesis definition, so that putting a little extra time in as you define the experiment to understand like really what are you expecting to come out of it and what are all the different data points that you might look at so that maybe the overall thing's not a winner, but maybe it's a winner for a sub-segment or maybe it's a winner on certain metrics but not others and that leads you to new ideas on how to iterate to move more of the metrics at the same time.

Raunak Kumar GUEST: Yeah, so great that you call that out. So the example I was sharing with you was and the contact sales one. What we [00:18:00] did there was like we made, We only allowed business email address to submit. Like to a lot of companies you would notice they, they only accept business email. But turns out we have this whole slew of small businesses who like still use their Gmails and stuff, which is legitimately they are like trying to contact us, but we're just blocking them.

So that was an insight where, okay, it works for a select segment, which is the enterprise, but like for small business, like we need to re-rethink how do we collect their intent. Like we don't wanna miss out on them as an example, like just because they don't have a formal email address.

Ashley HOST: Yeah, I was I ran marketing for Algolia, which is a e-commerce type search engine. And one of the largest customers originally came through with a Gmail and,

Raunak Kumar GUEST: Sí.

Ashley HOST: He still owned the admin account under his personal Gmail. And so we always reminded ourselves of that when we were tempted to say business emails only.

So it's funny what you learn and what sticks with you. So how do you see experimentation [00:19:00] evolving at Fin? Kind of what's next there?

Raunak Kumar GUEST: Yeah. At Fin, where we are is a, the, I would say at at a pretty interesting spot where our product is definitely maturing, but also not that nascent decent mind share in the market. So like one of our biggest goal at hand is to further increase the mind share of our product, Fin.

Anyone who's interested in customer service AI agent should think about Fin amongst the competitors. So that's a space I deal with and essentially that's what we are doing a lot more of essenc-- and running more experiments in this space. So to answer your question, how do I see it evolving?

We'll continue to do a lot more off-page when what I explained earlier, like on-page and off-page. And off-page meaning, we'll do a lot of incrementality experiments because, we wanna figure out like what are the channels which helps us drive this mind share. So for example, I just concluded a full funnel geo lift study in, in US where we [00:20:00] did a match market with five states with control in five states as test where we, held back any and e-any and every media in the control states and then compared the incrementality, incremental lift in our traffic and like aided awareness.

So that was like a winner for us. Similarly, we wanna know like which other marketing channels helps us drive awareness. So the next I'm tackling is connected TV because it's a hot thing right now where, people are getting ads served on like Hulu, YouTube, et cetera, et cetera, through Trade Desk through Bombora, through LinkedIn, et cetera.

So that's one. And then I'll tackle like other channels. So pretty much I'll be tackling a lot of like off-page experiments and also a lot of creative I, I forgot to mention. That's one thing where I feel like with the advent of AI, earlier if you would wait on a agency to give you a copy, right?

Like now anyone can go and spin up a copy. So like same thing is hap-happening at Fin. Like we are getting like tens of copies of different things. How do we like, have a measurement framework to, to figure out like what's the winner quickly? That's what we are also [00:21:00] gonna do, is run a bulk of copies through the audience we have.

Do ABC test kind of a thing, and then figure out like what's working, what's not working. So those are the things I'm interested in and like I'm cur-currently staffing and I'm really excited to like support at Fin.

Ashley HOST: Yeah. It, that's a pretty impressive list. Y- as a marketer myself, you're making me feel like, ah, I've got so much I need to go do. I need to find my, my my own version of you to have on my team to do that. I think it's a really interesting point about the ability to test more iterations.

I don't, I still don't think AI is at the point where you can create more visuals that are at the same level, caliber as if you were using an agency or something. But it does give you the opportunity to rapidly test a whole bunch of things and then maybe find a winner and then say, "Okay, great.

Now let's go turn that into an even more professional version that we wanna, push out on more channels." So it gives you a lot of opportunities to get [00:22:00] more information and know where to invest.

Raunak Kumar GUEST: Yeah, for sure. Can, Sorry I lost track. Was there a question there?

Ashley HOST: Yeah, it was more just a comment on the... You were talking about doing more iterations and that you can use AI to create new visuals, right? They may not be the same caliber as what you would get from an agency, but allows you to test a lot more things and then maybe know where to spend your time with the agency, is what I was saying.

Raunak Kumar GUEST: Yeah, for sure. And same thing is happening on the landing page. I think one of the-- one of my aha moment at Fin has been like our marketing team, especially the demand gen team, is like fully enabled to spin up a landing page. Like what used to take like weeks and weeks of working, coordinating with PMM and web team to spin up a landing page.

Like they have end-to-end tooling at their disposal. And then, they're launching intent-based landing page. Let's say they identified a keyword, and then they'll spin up a landing page. So just like there's so much more data coming our way to like track. So how do we [00:23:00] actually keep up with that?

And I think that's where AI is helping us a lot. Like it's speeding up things, but then I can tell you that of all the fifty landing page, only ten works. So it's not everything is working. So like we're also figuring out quickly what's working, what's not working, and then we can shut those off, like not index them.

Like we don't wanna index things which are not working, as an

Ashley HOST: Yeah. Yeah, but that's a great example of the power of AI is just the ability to go test so many more things and maybe you find a handful of winners you never would've known were out there if you didn't have the ability to test a lot of things,

Raunak Kumar GUEST: Yeah, for sure. For sure

Ashley HOST: Raunakk, thank you so much for joining today's show.

You've shared a lot of great examples of things that I'm sure a lot of our listeners will wanna go apply to their businesses. So thank you so much

Raunak Kumar GUEST: Thank you, Ashley, for the opportunity, and hope the listeners found something valuable. Thank you.

Ashley HOST: Yes. Thank you