Data Matas

Analytics Engineering: Internal Risk vs. External Rigor | Ft. Jack Doherty (Fresha)
The stakes are higher than ever for Analytics Engineers. When your data becomes a core, customer-facing product, the game changes.

Jack Doherty (Head of AE, Fresha) discusses the massive difference between internal and external analytics:

Risk vs. Rigor: Why internal projects can move fast (risk), but product-facing data demands DevOps-level rigor, testing, and governance.

Real-Time Data: The technical shift from scheduled batches to CDC for meeting customer demands for speed and consistency.

The Missing Link: Why the Semantic Layer is the future of AE, crucial for codifying business logic and powering accurate AI/Chat interfaces.

A must-watch for any AE treating data as a product.

What is Data Matas?

A show to explore all data matters.

From small to big, every company on the market, irrespective of their industry, is a data merchant. How they choose to keep, interrogate and understand their data is now mission critical. 30 years of spaghetti-tech, data tech debt, or rapid growth challenges are the reality in most companies.

Join Aaron Phethean, veteran intrapreneur-come-entrepreneur with hundreds of lived examples of wins and losses in the data space, as he embarques on a journey of discovering what matters most in data nowadays by speaking to other technologists and business leaders who tackle their own data challenges every day.

Learn from their mistakes and be inspired by their stories of how they've made their data make sense and work for them.

This podcast brought to you by Meltano - "Unlock the Insights in your Data"

Aaron Phethean (00:28)
Welcome to today's show. We are joined by Jack Doherty from Fresha who's an analytics engineer and built a team where they're delivering lots of value in internal analytics, as well as building into the product. So they talk us through that experience of the two mindsets and the different ways of thinking and working. So without further ado, let's dive in.

Aaron Phethean (00:50)
Hello Jack and welcome to the show. ⁓ Absolute pleasure to have you on. We were chatting about loads of things before the episode, but before we even get into it, why don't you just take a moment to introduce yourself and tell us a bit about how you got into analytics engineering.

Jack Doherty (01:04)
Yeah, sure thing. Well, thanks for having me, Aaron. I'm excited to chat today. So yeah, I currently look after analytics engineering at a company called Fresha. We are building software for the health and beauty industry, focusing mainly on back office, like helping businesses to do everything they need to do to operate, but also trying to build a marketplace for consumers to find new services to book. Kind of like a mid-sales, kind of like scaling data team. So currently about seven, eight engineers working specifically on A.E.

And I think what's cool about what we do is we really see data as a core part of our product. ⁓ It's a very pen and paper industry. I'm sure you've got experience with people that, know, barbers love to take cash, they love to bookings on the phone, they love to have calendars in notebooks, which all feels a bit old school. And collecting data and analyzing data is one of the big reasons why you would take the jump to our software.

Aaron Phethean (01:34)
Yeah.

Mm-mm.

Yeah, that's really

cool. That's as an industry, you can visualize it immediately. You can imagine someone keeping track of things and data just doesn't really exist when it's just on a piece of paper. it's part of your proposition.

Jack Doherty (02:03)
Yeah,

A lot of data people just, know, it's a given these days that like stuff digital, right? And you have the data to analyze, but actually it's interesting working in the industry, but that's largely not true. And one of the first battles is just collecting the data to actually work with.

Aaron Phethean (02:15)
Yeah.

Yeah.

And then, thinking about the, we were talking about the different challenges in your particular company and the kind of size of the team. I guess where we kind of landed is that you've kind of got the internal analytics as part of your challenge. And then like you said, part of the product and giving analytics to your users then became part of your challenge as well. And I think the kind of cool thing that when we were chatting about coming on the show is that actually,

were not initially taking on that scope but that became you you as a team were delivering more and more and you sort of got to this point where they said oh why don't you do more on the product side maybe you could take a you know through that journey of like what were you delivering and what did he sort of notice that you were doing well and

Jack Doherty (03:07)
Yeah, part of me thinks, know, part back to the days when we didn't have the product facing stuff because it was a lot easier in a lot of ways. Probably a good thing that we're providing more.

Aaron Phethean (03:13)
Yeah.

Jack Doherty (03:17)
Yeah, I think was actually one of the things that appealed to me about joining Fresha a couple of years ago was the fact that we work, as you said, in these two worlds, so we have the kind of like classic internal analytics stuff. But we also have a kind of parallel stack that serves data directly to our users. And like that as a keeper of the product, like people pay for it, you know, people notice when it doesn't work, like it's become part of people's workflows and people's expectations. And it's like two very different modes of thinking, right? And like kind of different paces, which I think is nice to be

Aaron Phethean (03:31)
Yeah.

Jack Doherty (03:45)
to flick between them. The internal stuff is kind of...

more risky and like you can move quicker and you kind of expect more literacy from your users and more tolerance for mistakes. Whereas the product facing stuff, you know, it's a lot more considered, it's a lot slower, a lot more like rigorous testing, a lot more making sure you don't break things and also a lot more thinking about like, because we don't have a direct relationship with the user, right? It's all through the UI.. like thinking about how they're going to perceive things like what mistakes might they make, you know, how can we make this easier for them to understand? So yeah, it's cool to think of.

Data is a product with people that you don't actually interact with directly. It's very different to building stuff that you you're sitting next to someone and talking through.

Aaron Phethean (04:22)
Yeah, that's interesting. And you mentioned that you're sometimes just reasoning about what the users are doing and what they're thinking. A lot of companies have analytics about the product and how it's being used. Is that also part of your scope, is the kind of click analytics, like they click that button or they dwell on that screen or?

Jack Doherty (04:39)
Yeah,

a little bit. So we do have like a product analytics stack. That's actually more driven by the product teams, honestly, they do their own thing, which is nice in their respect. But the kind of analytics that we offer to our users is about their business. So it's basically like us acting like their analytics team, bringing together transactional data, sales, payments.

Aaron Phethean (04:53)
Yeah.

Jack Doherty (04:56)
customers, like team member stuff, you everything that we collect and then offering it to them in a way that helps them to understand what's going on. And as you said at the beginning, you know, for a lot of people, that's like a proper wow moment when they're like, oh, I can see, you know, trends and I can see patterns and I can see like which location is making the most money or, you know, which day of week is busiest or I can see, you know, where I can afford to, you know, get more shifts from my staff or where we're not very busy and all of that stuff that suddenly becomes unlocked when you, you know, recall things in the system that you're blind to otherwise.

Aaron Phethean (05:24)
Yeah.

Yeah. And I think probably most of the people listening have that kind of more internal analytics role. quite often, having discussions with people about how critical it is to deliver an accurate insight to an internal user, where, like you said, you can meet face to face. You can perhaps repair the damage if you got something a little bit wrong.

idea of there being thousands of users out there where they just judge you based on what you gave them. It's actually quite a scary prospect.

Jack Doherty (05:54)
Yeah,

it genuinely is. I think when engineers join the team, there is definitely this adjustment in pressing merge on something that's going to impact lots and lots of users. It makes you think, in a good way.

and you consider everything, you consider edge cases that you would never thought of or mistakes that you wouldn't necessarily worry about that you kind of obsess over and make sure you test. I think it's a good practice. mean, it feels like...

I always think product engineering teams, the thought of pressing a button and you're pushing something to an AWS service or Netflix, the stakes behind that are huge and obviously what we're doing is nowhere near that level. But to have a little peek behind the curtain as to what that feels like and how much rigor and how much confidence you need in order to press that button has been quite fun.

Aaron Phethean (06:40)
Yeah, I think that that's a journey that, you know, I've been in software quite a long time and we were chatting about what it's like to deliver product change or software, you know, changes versus analytics changes and the kind of differences.

think the software industry, DevOps was sort of an emerging term and the idea that your development team could also operate the service, that whole mindset across the kind of industry and the people arriving to it was suddenly like challenged with, you mean I'm going to have to see my code go live. There was always this kind of wall between like, we hand it over and someone else has to sort of worry about running it.

particularly in a kind of like vendor situation, know, SaaS companies now often just run, develop and run, whereas in product companies, days gone by, we just develop and release and then someone else kind of runs it. When you think about that kind of change, even talking about clicking merge on the change, there's that same kind of workflow that's become popular in analytics engineering.

I wonder if you could, from your perspective, think about the extra overhead that perhaps an internal team...

has some choice about whether they adopt versus the product team. They don't have to go through this code with the tools are available where there's no code in the kind of analytics space. And then there's this kind of, DBT has become really popular and it's like much more code and merging. I wonder if you could kind of visit the two and compare what's the overhead like of developing it like software.

Jack Doherty (08:09)
Yeah, I mean, it's definitely slower, right? Because you're being more cautious and the risks of getting it wrong are higher. I mean, think internally, know, we, for example, like we wouldn't always test on the full size of data. know, we would quick see iRUN on a subset and if it passes and we're relatively confident, you know, we'll go for it and then check tomorrow morning that we haven't broken anything. You know, we, whereas for the external facing stuff like our see iRUNs basically rebuild production, you know, in a separate environment.

Aaron Phethean (08:22)
Yeah.

Jack Doherty (08:36)
we have lot of assurance in that. We have lot more tests, which we run the test more often. And a lot more paranoid about the kind of things that we test, and not just the classic, not nulls, but we're not sharing crazy values. Lots of different, almost semantic testing that we run to make sure we don't do anything. And also just the communication as well. So I guess the other thing is the stuff that we build in the warehouse is a

Aaron Phethean (08:45)
Okay.

Jack Doherty (09:02)
contracts with a consuming app that then runs the front end and stuff. So that's another consideration is like, know, mostly if you're building like an internal dbt DAG and maybe a couple of dashboards, like you can control the release cycle and pretty confident on the downstream. But we always have to think, you know, hang on a minute, everything's green for us, but you know, are we breaking anything on the app side or it's releases at a certain time or coordinate something. And even little things as well, like time of day, like there's a very clear usage pattern. So we will try and release when things are quiet and

Aaron Phethean (09:20)
Yeah.

Jack Doherty (09:29)
slightly lower, whereas for internal lights, not a thing. You merge whenever you're ready.

Aaron Phethean (09:29)
Yeah.

And do you

find the two disciplines influencing one another? Like if you have a developer who's perhaps not used to pushing to a production, do you see their behaviors changing internal, whether they're delivering internal analytics? That's in good way.

Jack Doherty (09:46)
Maybe, yeah, but I also think it's quite hard to flick between the two modes in a way. I think we tend to try and, if we're making a big change on the external stuff, to dedicate ourselves to that, to get into that kind of mindset and that zone. If on the same day you're doing some work internal, some work external, can be a bit like, I'm gonna be like, am I skipping something or have I messed up? But yeah, I think that kind of sense of rigor and quality and stuff, we've definitely...

things like linting and CI and stuff that we first built external is now part of what we do everywhere.

Aaron Phethean (10:19)
Interesting. Yeah, and we're talking about some of the technologies involved and you're you know as a team You've got your kind of I you call the platform team when the other guys need to kind of deliver

There's a range of technologies that you're building and investing in. I think you're building semantic layers and you're delivering near real-time analytics in some cases. What does that kind of technology stack look like on the product versus the kind of internal as well?

Jack Doherty (10:45)
Yeah, that's the other thing actually, it's been, because the stuff that powers the product-facing stuff is basically like our analytic stack on steroids and seeing how far we can push it and like all these kind of tips and stuff as to how we can make it scale and operate in that world have been fun. So the, and as you said, we're kind of moving more and more into real time, which is a whole new thing with a whole new set of challenges. So like the current stack is built on Snowflake, we use Snowflake for...

Aaron Phethean (10:53)
Mm-hmm.

Jack Doherty (11:11)
and all stuff too. But the first big challenge was how do we scale it to the number of queries essentially that the app needs us to run? So we ended up hacking together this multi-account reader account based thing. Snowflake were convinced that they were going to get us to pay for their top tier enterprise thing, which gives you the bigger warehouses and the higher concurrency. But some of our smart platform engineers realized that we could just provision a bunch of reader accounts and then

like create our own load balancers. So like from Soto's perspective, they just have like a reasonable number of queries per warehouse, but we can scale that horizontally. So that's like, you you'd never do that internally because like you wouldn't need it for this view to see it actually work. And then the, on the ingestion side, the other big shift actually, because the product requirements were quite strict on freshness. It's like our internal DAGs run every couple of hours and you know, that's fine for what we need, but.

because we operate in so many different time zones and we didn't want our customers to have to wait for their daily sales numbers or whatever they're waiting for. we had quite strict SLAs on about 30 minute refresh times. So we had to look at our ingestion too and that kind of forced us to move to like CDC based rather than like batch ingestion from our production warehouses. And really focused us on like cost and optimization and came up with...

I've deployed quite a few efficiency techniques like lambda structures and a lot of incremental models and stuff to really try and operate at that frequency without spending too much money.

Aaron Phethean (12:31)
Yeah, and I obviously...

Yeah, all those non-functional things. The product, in a sense, is not really changing. The outcome for the user is not changing, but they're getting it an awful lot faster, leaving a lot more up to date. And I think as time goes on, one of the things I noticed is that users' expectations grow, or at least stay in line with the best product they've seen. And the kind of iPhone, or the other kind of expectations that arrive with consumer devices suddenly in every B2B time.

So you've got your users judging you based on getting it right, also on their expectations of being fast and what do mean? I just did that order. So why isn't the analytics there immediately?

Jack Doherty (13:13)
Well, this is

the other thing. I think non-data users, who obviously most of our people aren't like analysts, why would they be? They don't distinguish between analytics reports and other data sources. And so our app is a blend of production DBs and widgets that come from us and reports that come from us. And so actually one of the big drivers for real time is actually not so much the need so much as just the feeling of consistency.

that wherever you look, whether it's driven from a live production DB for payments or a sales report, all of it matches up. Because nothing feels worse as a data user than two numbers that you want to be the same that don't match. And obviously, we can communicate that, and we give last refresh that and caveats and stuff. But actually, the best experience from our perspective would just be like, everything's basically the same freshness.

Aaron Phethean (13:49)
Yeah.

Jack Doherty (13:59)
And it does unlock some cool new use cases, but actually it's that consistency thing that's the biggest. How do you tell a user that, don't worry, check back in 20 minutes and it'll be right?

Aaron Phethean (14:10)
Yeah, yeah, exactly. Which is probably, you know, potentially one of the arguments for a stronger architecture function that thinks about how products work. Perhaps another thing that I see happening in analytics engineering, mostly internal analytics, is, you know, kind of a rejuvenation of the role that's just about structures, just about models, just thinking about how things work together.

And that's, I think kind of goes back to one of the things that we discussed when we were thinking about the episode that actually there's a lot of habits and engineering type concerns that are arriving into data that software has done for a long time. And there's a kind of influence, you know, that, you know, they're on that journey. I wonder, you know, did you notice at some point that

know, software engineering and data engineering and this kind of engineering arrival, like the habits were changing. What did you notice in your teams of how they developed those analytics models? one of the things we were talking about is that it seems to be behind software or following the same kind of trend. Maybe it's leading in some ways, but yeah, perhaps, what do you give your thoughts on that?

Jack Doherty (15:18)
Yeah, I guess maybe at a start, I'm fairly new to the data world in the scheme of things. So I've actually probably already properly got into analytics in the last five years or so. And previously I was in more like strategic or product roles, chief of staff for a bit, of data adjacent, but not like actually understanding what was going on. maybe I've got a slightly unusual perspective or different set of expectations on how like AE works and how things should fit together. Because it definitely feels like...

On the one hand, we've made so much progress, right? And especially in the last five, 10 years, obviously the move to cloud, all these new capabilities, the kind of democratization of being able to do what we do and new tools, especially things like dbt that really introduce what you actually can legitimately call engineering, the kind of analytics space. the whole role, mean, like, analytics engineers wasn't a thing 10 years ago. You have people doing it, they just didn't have the badge of being engineers.

So you've got all this capability and opportunity on the one hand, but also some fundamental stuff that just doesn't feel mature yet, or that we're missing something, some foundational capabilities that really unlock the promise of all of this capability that we have, but doesn't seem to materialize all the time into good products or good experiences.

Aaron Phethean (16:30)
Yeah, repeatable, yeah, stuff goes with that, yeah.

Jack Doherty (16:35)
And I think you're right. mean, fundamentally, like, engineering is about, like, bringing structure and extracting, helping people navigate, like, you know, the full semantics of a business, which, like, for most businesses is ridiculously complicated. Like, it's more than one person could ever hope to know, right? I think it's one of the fun bits and also, like, one of the contradictions of being an analytics engineer is, you sort of expected to know everything about, like, what's going on across an organization. And obviously, like, as teams scale, you...

mentalized to keep people sane. But there aren't many roles that you get to legitimately sit across everything and kind of have the right to poke your nose into everything. ⁓ You might be working on financial ledgers and accounting audits and then the next week you're looking at the launch of a product feature and trying to work out if it's changed behaviors. mean there aren't many roles that kind of have that breadth I guess.

Aaron Phethean (17:05)
Yeah.

Yeah.

Yeah, definitely.

You're definitely not the first person I've heard say that kind of like the overarching nature of a data team and seeing across everything. That seems to be a kind of, know, when teams realize that or when people realize that the kind of remit and the expectation it's, yeah, it's quite, no, that's really, it's really interesting. No one else has that kind of view.

Jack Doherty (17:47)
And

I wanted to draw a parallel with software is like, that's not sustainable, presumably, right? Because like you can't be an expert in everything. And I always get the sense that we're of, we're borrowing other people's work and like playing it back in a probably a way that's like missing a bunch of nuance. cause like, you know, you're copying someone, like you're not the person that necessarily should be like owning that thing. And I think there's definitely a shift in analytics engineering to thinking more like a platform team and like, how do we just like,

Clearly there's the need for this aggregation layer, this team that of bring everything together and provide the structure, like you said, to make sense of things, to organize things, to make it accessible. And I guess there are particularly top-level exec questions. They don't care about microservices or how the product structure. They just want to know, how many users do we have? Do have much revenue we're making? Those questions naturally span several teams and loads of different day spaces. So you need a team for that. I get that.

But when it comes to the nitty-gritty of understanding the data that a particular service produces, the people that build it are obviously best placed to do that. And when we come along and ask them questions about what they build, you get incomplete answers or you miss something. And then what we're offering to the rest of the business is a poor copy of what we should be offering.

Aaron Phethean (19:02)
Mm-mm-mm. Some of those ideas there about what you're across, my mind is kind of wandering into how it plays out. And you can kind of imagine that the people who are running the company...

They want to know how many users because that should be a very well defined thing. And the legal department defines that really well because that's what the contracts say. And then obviously sales and commissions have heard of that potentially. So they sort of have a view and then have a vested interest in that being right.

you know, as their decisions flow into finance and you can kind of see like they're kind of more than just, you know, it's easy to say across the whole company, but when you think about what is the definition of a user is like really clearly defined in law somewhere in the company. So to have a sort of ad hoc like, ⁓ we mean that kind of user is like quite jarring for a senior person. What it's making me think is you're sort of saying, well, that's kind of really hard.

Jack Doherty (20:02)
Yeah, I was at a talk last week and someone was talking about, like, they're a much bigger team. I think they were talking about, like, hundreds of analysts and, like, know, a similar number of AEs.

I think something that made the audience gasp, he was like, yeah, I mean, I've given up on trying to maintain this kind of definition centrally. We just kind of deal in facts, and then we're comfortable with people interpret them differently. And that just felt so kind of anti a lot of drives towards, a single source of truth, and everything has to be defined in that way and stuff. And it was actually sounding much more pragmatic, to be honest, because there are legitimate reasons why people

Aaron Phethean (20:35)
I can see why. Yeah, I can

see why. ⁓

Jack Doherty (20:37)
want

things in different ways, right? But I think the thing that's frustrating and maybe actually the real problem is how do we make those definitions non-opaque and provide the context as to why it's defined in such a way for like, case, exactly, this is what I mean by in a codified way that's like people can interpret and challenge and offer counters to and stuff. So rather than it being like a hidden black box thing and then you get two different numbers that you expect to see the same.

Aaron Phethean (20:45)
Hmm.

This is what I mean by that.

Yeah.

Jack Doherty (21:01)
You're like, OK, yeah, cool. Customer service, they define a user based on the first time they logged in, because that's when they tend to get tickets. Whereas the legal team defined it when they sign the contract, because that's when they legally have a relationship with them. And maybe our own product team defined it the first time they used a feature, because they don't care about anything. All of that is legitimate. Why can't we support all those definitions at once?

Aaron Phethean (21:11)
Mm-hmm.

Yeah.

what are you seeing as good ways of explaining that? Because this feels like another software, another engineering concept that you write code, you document code, you prove it with tests, but then you write interfaces and tell people what the functionality is. In analytics, how are you communicating that what I mean by new user is, and then the kind of measures, how do you do that?

Jack Doherty (21:46)
Yeah, well, I could just be admitting that I'm behind the curve here and maybe other people have solved this problem, but I'm not sure we are, like in a standard way. I'm not sure there is like a kind of a agreed set of technologies to to encapsulate that. I think there's like, we're heading in that direction and there's some interesting ideas out there. But for me, I think like the missing link in the analytics stack is the semantic layer. And like I love looking at different semantic layers. I think like particularly with

AI and like agentic stuff, we have to move in that direction because I think a lot of AI teams see their product as like a bunch of database objects, right? But like it's not a definition, a lot of stuff only exists at like the point of aggregation or a particular filter and like things that you can't codify in a table. And when you try to like codify that stuff and then you do like roll ups and things, then you lose a lot of the flexibility because you have to pre-define what people are going to ask basically.

And in the last year especially, as we played around more with different semantic concepts and stuff, yeah, it's been a proper like, this is definitely how this should work. You finally feel like you're encoding things that people actually relate to and will ask about. People don't ask about, fact sales. They ask about how much taxes you paid last month. Or they don't know that an active partner is from this monthly ledger.

Aaron Phethean (22:56)
Yeah.

Yeah, okay.

Jack Doherty (23:02)
You know, like they just like tell me about my customers and stuff. And the semantic layer is the first place where I felt like you're putting that in a structure that actually, you know, can be an asset and can be kind of shared and queried and used in lots of different places.

Aaron Phethean (23:15)
Yeah, it does feel like a, you know, as an industry we're perhaps rediscovering those things. I definitely remember the first time I was challenged on a chart and, you know, the user who's challenging is like, well, what are the filters? What are the aggregations? What are the... And actually like it became pretty apparent that one of the best ways is to just show them, like let them see the table, let them see what you're making the count of.

And like you said, when you're aggregating that up for performance, you're naturally creating a separation from what's going on. So, you know, the semantic layer does solve that to a degree because well, you get great performance, you still get to drill down to what the decision is, plus you get this kind of real world concept of, okay, what are these things I'm looking at and what will they actually relate back to? Yeah, we keep reinventing those. There's no sort of like well-defined, this is how semantic layers should work or not.

There's a whole lot of engineering going on for everyone still.

Jack Doherty (24:15)
Yeah, and I think it's about interfaces as well. I've seen a lot of teams try and be like, in order to control all of this, you can only access our data in this ecosystem and we're going to validate everything. Try taking the spreadsheet off a finance team or try taking that HubSpot abstract away from the sales team. This is not how people work and it's never like in a realistic organization. It's just, you can't enforce that stuff.

Aaron Phethean (24:24)
Mm-hmm.

Jack Doherty (24:39)
The semantic layer for me seems like the place that you ensure that consistency, like while letting people talk to your data in whatever way you want, or whatever they want, I should say. Yeah, and particularly with chat style interfaces and talking to your data through natural language, you have to. Otherwise, just, you end up with some crazy SQL and lots of hallucinations like that, that semantic context.

Aaron Phethean (24:49)
Yeah.

Jack Doherty (25:04)
is the thing that gets the most out of these agents.

Aaron Phethean (25:07)
increases the accuracy and like, you when someone asks something, produces something useful. And you mentioned that chat style interface and you mentioned AI a little bit earlier. I wonder then if the semantic layer became richer and more fully functional, would analytics engineering even be a role anymore?

Jack Doherty (25:12)
Yep.

⁓ I mean, I hope so. But I think it's evolved. mean, it's only very new, so naturally it's going to evolve. But I think it would look very different, but it would still be equally valuable and interesting, I think. But it would be more about structures, as you say. how do we... I mean, because you still have to create the underlying data sets that define all this stuff in a rigorous way, right? So that doesn't go away.

But then on top of that, you're adding the challenge of what questions are people asking of it and how do we organize the semantics of this data in a way that's going to work for people that they don't have to know about which tables exist, but they can just ask questions using the common terminology of the company and get sensible answers. I think it becomes a lot more about people and relationships and understanding different users and their needs, maybe less about kind of like, yeah.

wrangling a million charts into what you think they want. Because you're basically, in some ways, you're shifting that last mile downstream, right? So people actually want to make the requests, and your job becomes facilitating those requests. Not directly, but indirectly.

Aaron Phethean (26:17)
Yeah, yeah, yeah.

Yeah, I do like that sort of concept and that way of thinking about it.

If I think over time how that's played out, it used to be the sensing and collecting and the picking out of the metrics that are intrigued by analytics was very, very far one way. It was basically quite, unless you collected it at the time, it was gone. And then over time, we've probably got more and more used to everything or sense everything and then building models and deciding what we want to know.

And it feels like as a trend, if we're more and more that kind of fuzzy style, collect absolutely everything and then try and really understand what's happened, the trend seems to be even further that way, pushing it downstream to the user sort of in essence, starting to define things when they think of them and what does that mean in the data? And yeah, that's that.

perhaps the engineers get more of facilitation from AI in defining those things. And if that becomes sort of lower engineering skill necessary, then users could start doing it and building and modeling. Yeah, that might be a long time away, but that's how it feels like a trend there, yeah.

Jack Doherty (27:34)
I think you've.

Yeah, I think you touched on something which is we got too much data. We're in this kind of like, know, at the start but not very much and now we've got like so much flying around that I don't think we quite know what to do with it. And I think when you define like the semantics of something, it kind of forces you to think about what matters, right? Because naturally you define the things that you think matter and not define everything that you can, So actually I think that, yeah, hopefully it reduces the surface area because you've...

it's very easy to create big, tables, and then you can basically aggregate them in infinite ways. Whereas you say, actually, here are the 10 metrics that we really care about from this entity, because people may be bit more focused.

Aaron Phethean (28:16)
Yeah.

Do you feel like in the product space, coming back to that, they're a bit more comfortable with just letting things go, like not collecting absolutely everything?

Jack Doherty (28:25)
wasn't, and you.

Aaron Phethean (28:26)
Yeah, like,

you know, not the obsession with, find that almost it is almost an obsession with that. We're to have all the data. We're not going to let any of it go. We're going to just have absolutely everything. Whereas I just wondered in the product world, they're like, no, no, it's fine. We wanted to fund that screen for some users. And that's all we will collect like because it's, it's performs better. It's easier to manage. It feels like they might be more comfortable.

Jack Doherty (28:47)
Yeah, I think working with product people, everything's intentional, right? You'd never just add a feature that you didn't think anyone was going to use or that somehow clashes with something else. Whereas I think data people do that all the time. Maybe we need a slightly different table, or we need to add this, if only we had this. Yeah, it comes back to that thinking about like...

are you helping or are you hindering? Are you adding to the confusion? I think Oli has some really good concepts around this. Our job should be to bring clarity, not additional confusion. And just adding to the amount of possible things that you can query and learn from is not adding clarity. So I don't really like that.

Aaron Phethean (29:22)
Yeah, exactly. everyone

sees listening, Jack's talking about an episode a couple of ⁓ early in the season where Ollie came on, CEO of Count, and talked about how their tool looks at data and the value of BI tools. And you're right, it's like tons of things that resonated with me, like how on earth did come

Jack Doherty (29:41)
And that actually worries me about of chat interfaces and opening up even more ways to ask questions. It's like, you realize, I mean, we all learned this with Google 10, 15 years ago. It's like, yeah, you can ask whatever question you want, but you have to ask the good questions to get the good answers. And there's a lot of skill in knowing what to ask. And I think, yeah, hopefully we shift more towards thinking what should we be asking rather than always trying to answer stuff.

Aaron Phethean (30:01)
Yeah.

So if that then plays on your mind that that's a risk of what chat interfaces to analytics engineering looks like, the other kind of the bad, what in your mind is the best use of AI and engineering? What do you see users could really gain from it?

Jack Doherty (30:24)
Yeah, well, I don't think we know yet, right? And it's such an exciting time because I bet like whatever I say now is going to be pretty redundant in a couple of years. I think there's a couple of things. mean, just in terms of like operationally, think like, you know, development and kind of operating data platform, like I really hope some of the boring stuff gets increasingly automated, you know? I mean, I was thinking.

Aaron Phethean (30:30)
Yeah, probably in 30 minutes, like, you know, it's like that fast. ⁓

Jack Doherty (30:45)
And again, it might just be outing us as like the not very advanced A.E. team, forgive us. But how we respond to like test failures seems really old school. You know, like we get an alert, we've got some nodes in here and then we have to go off and query it. And then we kind of like examine the DAG and figure out where it might have came from.

And then maybe even look at source repos and see who's committed recently and what might have changed. And all of that feels something like an agent could do that reasoning for us. And rather than getting an alert of a failure, we get, oh, hey, I think this service has added this new type of credit note that's broken this, by the way. Here's a fix. That all seems very achievable. I just haven't seen anyone stitch that all together yet and all the different meta-laces that you would need. And then that's

Aaron Phethean (31:26)
I totally agree. And I think there

is investment in that product teams, but also teams by themselves in companies. And I guess the best I can compare that to is there's a product, a front end tool that we use that looks at what's happened as an error, looks at the different input, including commits and changes and the actual stack traces.

and then makes a pull request and then says, okay, I saw this error happened, this might be your fix, and here's the kind of reasoning why. That doesn't feel that far away from us, you know, in our space. And what I'm kind of most excited about is that because our stack is mostly, you know, generally in our community, like, know, dbt and snowflake and, the way we, we.

you know, some tools, you know, allow us to work on code. That means that pull requests and changes and actually committing a change or proposing a change is more available. I've always been quite negative about UI only tools. I can never really get a sense of why I was so negative about it though. Cause it's like, well, I do lose the change history straight away, but this is kind of like brings it all together. Like I can't tell what happened or why.

history of that is, whereas that is available as contacts now. That means some agent can propose a change. I've been really bullish on that as a space to spend energy.

Jack Doherty (32:49)
Yeah, and it's just like, it's not the kind of work that we want to be doing, right? So like, we'd be happy to automate that. I think, yeah, yeah, yeah, of course, like you're, you're reviewing, like you're checking stuff, but yeah, the legwork of getting to the answer is somewhere else. I think Tristan, DVC, Tristan, was talking about like two types of agents that they're thinking about, like the observability agent and the development agent.

Aaron Phethean (32:54)
They still have the controller, right? Like there's a proposed change. It's not like autonomous chaos.

Jack Doherty (33:11)
And I can't wait for the observability agent, Think about how much metadata we all have to pass around data platforms. Costs and usage and test failures and runtimes and all that stuff that's just waiting for someone to automate and take that off, think.

Aaron Phethean (33:24)
Yeah. Yeah. What

was the other agent you reminded us of? I get that metadata and what's going on around it. And then the engineering is more like the code assistant. Yeah. Yeah.

Jack Doherty (33:34)
Yeah, the development agent, as they call it. So yeah, they're

kind of the more classic software engineering, like, know, code gen, like coding assistance kind of stuff. But I think we, like, data are lagging behind, right? Because I think that the challenge of data platforms is always like, it's not, the system is not just defined by the application code. I think a lot of software, you know, if you understand the repo, the project as a whole,

you understand how it functions and you can unit test it and you can be sure of its behavior. Whereas analytics codes, more often than not, is built on assumptions of another system, whether that's their house or a data source or something. So I encourage you to check out.

Aaron Phethean (34:09)
We kind of know that to be true

because we can't just develop on test environments. We have to see the real production data, because it's gappy and there's things missing. That definitely resonates with me.

Jack Doherty (34:23)
And not to sound like an absolutely broken record, but I feel like that's another semantic problem in a way. you can obviously define classic assumptions about a data set, primary keys and foreign key relationships and not nulls and stuff. But it's very hard to codify, I expect this to represent this. I am assuming that this subscription period, for example, represents a period of time where someone had access to the product.

Aaron Phethean (34:28)
Hmm.

Yeah.

Jack Doherty (34:50)
Where do you put that? How do you test it? Often you only know you're wrong when someone says, well, that doesn't look great.

Aaron Phethean (34:52)
Yeah, yeah, yeah, yeah. How do you communicate that?

Yeah, I wonder if that phrase strikes fear into everyone's... Because as soon as someone says that, they're god. And they're generally right. They get that real sense. Which probably means that it was always able to be spotted because they could immediately spot it. That's always uncomfortable.

Jack Doherty (35:02)
Yes.

I think that's

another contradiction in AI and data generally. It's like, think when you work in data, you realize that you have to be skeptical of every data set, right? Because it's built on all these fragile assumptions and the tiniest little condition can completely change how you think about something, right? So like one filter or one... I always come back to the example of we were seeing a small handful of our customers were creating ridiculous high-value services and was making some of our metrics pretty mental.

Aaron Phethean (35:29)
Yeah.

Jack Doherty (35:46)
And it turns out they were just like using us as a calendar and they were just having some fun selling like million-pound haircuts. So like nothing had broken, you know, like the numbers were accurate and like it was reflecting their behavior, but like, you know, we can never could have guessed that we would do that. Yeah, which doesn't make any sense. But I guess people always think, you know, ⁓ like data is like analytics, you know, it's fact, it's rigorous, like it can never be wrong. Whereas actually like...

Aaron Phethean (35:51)
Classic.

Yeah.

didn't make any sense anymore. Yeah. You know what?

Jack Doherty (36:13)
data practitioners obviously know that it's always built on assumption and pretext and context and stuff. So I think that's a funny contradiction of like, have to, the best teams are like open about that, right? You have to kind of convey this sense of like trustworthiness, but also at the same time, be skeptical of everything.

Aaron Phethean (36:18)
Yeah.

Do you know what that is just classic and probably timeless advice for data teams. Yeah, I feel like we can actually carry on for a lot longer. I really enjoyed looking at the differences of how data teams operate.

teams operate, know, the kind of different approaches. I wonder, you know, we might have to wrap up though. I wonder if in wrapping up, you could give some advice or some kind of practical takeaways for a team who's in analytics engineering and has been asked to do some product engineering and deliver analytics there. Because you've lived that journey, that might be a really good space to leave some advice.

Jack Doherty (36:47)
Thank you.

Yeah, I I would recommend it because I think it teaches you so much about what it actually takes to build true kind of software products and you get a glimpse and that kind of experience around pressing that button. As I said earlier, it's definitely invaluable. But I would take it slow, start with some very simple use cases, build confidence, build trust. There's nothing like scale slowly, make sure that you're not like...

promising things that you can't deliver. And also having this to show of any projects, but particularly like product stuff is like you'll spend, you build the initial model in a day and then you'll spend two weeks like working on the weird edge cases or the, you know, the one record from last year that doesn't make sense or the two customers that have found a way to like configure something weird. Which I think is like the classic kind of software adage. And then, but yeah, I go for it basically like it, because I think what we've seen is

once you prove like software product teams, they're very constrained by like the data that they have access to, know, their databases and like the things that like their particular application producers. And once you kind of open the doors to say, actually, if you wanted to, you could get a feed from these guys, you know, or like, we could join your stuff with their stuff and, know, and we can like build something new. You get a lot of creativity and kind of new ideas coming out and then, you know, more work to do, which is maybe a thing. ⁓

Aaron Phethean (38:19)
Yeah.

Yeah, I really,

really like that advice. If you imagine the product team, the way you've painted them, the rigors are really important. If you have to drive everything through that same level of rigor, you never get the opportunity to add context and make some wider judgments. That's really interesting.

Jack, absolute pleasure. I really, really value that advice you're giving people. Hopefully there's some amazing takeaways there for teams that are just getting into product or have this particular problem. But even just looking at the teams and the way they operate, I think there's tons of value in what you've shared with us today. So thank you. Thank you for coming. Absolute

Jack Doherty (38:59)
Appreciate it. Thanks for having me.

More episodes

Chapters

What is Data Matas?