Data Matas

"AI won't replace data engineers. But engineers using AI will."
In Season 4, Episode 2, we sit down with Julian [add last name + role] to unpack what's actually changing in data engineering — and what's about to disappear.
We get into:
→ Why dashboards as we know them are dying (and what replaces them) → Does AI really need a semantic layer? Julian's answer might surprise you → The low-code trap quietly racking up tech debt across data teams → Where AI is genuinely useful today: tech debt, testing, and data governance → The one skill that still matters most when AI can write the code → A spicy closing question for the next guest about cloud cost
⏱ Chapters 00:00 — Intro 01:30 — Julian's path into data 04:00 — The career pivot that changed everything 07:30 — How his team is adopting AI (and who resists) 10:00 — Does AI need a semantic layer? 13:00 — Local models are closer than you think 15:30 — Where AI is actually working: tech debt, tests, governance 19:00 — What the data industry is getting completely wrong 23:00 — The belief Julian held 3 years ago that's now wrong 26:00 — A question for the next guest

What is Data Matas?

A show to explore all data matters.

From small to big, every company on the market, irrespective of their industry, is a data merchant. How they choose to keep, interrogate and understand their data is now mission critical. 30 years of spaghetti-tech, data tech debt, or rapid growth challenges are the reality in most companies.

Join Aaron Phethean, veteran intrapreneur-come-entrepreneur with hundreds of lived examples of wins and losses in the data space, as he embarques on a journey of discovering what matters most in data nowadays by speaking to other technologists and business leaders who tackle their own data challenges every day.

Learn from their mistakes and be inspired by their stories of how they've made their data make sense and work for them.

This podcast brought to you by Meltano - "Unlock the Insights in your Data"

Aaron Phethean (00:00)
Hello Julian and welcome to the show. It's so awesome to have you in person at our first event where we're actually recording. Lovely studio. So pleased to have you here. Thanks for joining us. It's amazing to be here. Thanks so much, Aaron. Yeah, this place looks incredible. Yeah, it's really different in person, isn't it? We were chatting earlier about the kind of stuff that happens, know, coming into the office, you know, what it's like, actually working where the expectation is to come in now. And, you know, I think

that's

probably where I'd like to begin. If you could tell us a little bit about you and your company you work for and just frame it for our audience. What's your typical day like? Sure. Well, my background is in engineering. I'm a software engineer coming from Argentina. All my life I have worked in the data industry. I studied with databases, bit of reporting, BI, analytics and more recently

and years mostly cloud computing and that engineer and now well I guess we need to shift our career again now with all this AI boom and stuff yeah for sure it's a it's a massive time of change it probably has always been but this feels like a big big leap

Thinking back then, all that time in your career, was there a moment that stood out as data? This is for me, I absolutely love this.

Yes, yes, for sure. I think it was in university. We got all sorts of subjects. And I was never much into actual creating websites or software engineering, 10%. But as soon as we started to dig into databases and analytics, I knew that was it. So more the data than the... A lot of people say the connection with the organization with the user. Was that the algorithms then? To me it was the structure.

data into things that I could analyze and see. It was mostly driven by the technology, think, as well at the time. Big data was becoming a buzzword every now and then, and you had to go really complex with, well, had a hoop or a spark at the time, creating your map, reduce jobs, and so on, just to do a small aggregation, like a row count of a different set of dimensions. But the challenge was

billions and billions of roles which we didn't have the capabilities back then. There was also a professor at uni that he was passionate as well the way he was teaching the subject. I think the combination of those two things really helped me drive my career towards something I really really liked. And so that experience working on huge data sets, not everyone gets to do that. Is that now your day job? You get that kind

scale of data? Yes it depends but I think now it's not much of a bottleneck anymore because most of these managed services they can process billions of rows with little to no effort right we're used to like a few seconds or milliseconds of response time to analyze billions of rows and and that's it you don't have to configure your own infrastructure you don't have to configure anything really perhaps a few clicks and you have a proof

concept

done and you can focus perhaps a bit more on the business side rather than the tech side. yeah I think progressively engineers, tech leads and so on will need to be closer to the product to the business because technology won't be a challenge anymore at least for the kind of problems that we're solving right now with like normal enterprises. Yeah I agree I mean it's kind of incredible yeah like you the challenge used to be

all this infrastructure to do something relatively simple and now with AI and great infrastructure you're really like you know struggling to put it to you know always to use because there's so much power there and both speed of delivery but also speed of know executing um yeah fascinating time and i think that's you're right probably the shift has to be more towards business value which is which is going to be a good thing um i'm to pause there to try your next question

How are we doing? I think it's going well. think that's cool. I think that's exactly what I have in mind. Like it's conversations, all that stuff, pretty natural.

Okay.

Sorry. The microphone is getting weird. We're going on question one now. On to question two. So you've actually covered daily career. Oh. I might do, I might then the project failure or something. Sorry, can we do the question one again because I have a...

the only story that I have is there, like a personal story. Okay, so what I'll do is I'll sort of intro it into something that happened that will then lead into your story.

Alright, so if you cast your mind back to your career and the things that really drew you into data, you said you kind of drawn to the technology. I wonder if there's a time that stood out as something that went really well or something that you regret or an issue. Is there something that kind of stands out to you as an experience you'd love to share with everyone? Good question. How people tell on career change.

Yes, yes I do. think earlier on in my career I realized that...

my manager is not responsible, or any manager is not responsible of being your career coach. Neither the company you work for is not responsible for your career growth either. So I learned that I had to look elsewhere, else, in order to uplift my career. And this particular moment was around perhaps 2016. I was working in a company at the time and cloud computing was coming out.

AWS

was perhaps the key player. Google Cloud Platform was starting to eat the market a little bit. And I knew it was coming, not to go anywhere, but to be the new industry standard. So I knew we had to pivot our careers towards the cloud.

But there was an issue. The company I was working for, didn't trust with the data to the cloud. So I knew I had to look elsewhere. So through friends, a friend of a friend, they had a startup and they had a particular need. They needed to analyze some data by creating a real time pipeline in Google Cloud. I was like, bingo, this is, I'm going to help you. Like, ⁓ cool. Do you have experience? No. But I'll make it happen.

I

trusted my skills, my background in engineering and so on. said, me one month, I'll deliver it. And that's how I did my first, perhaps, contractor, a side gig, but a professional experience on the market. And I ended up creating this Dataflow Pub-Solve, real-time pipeline. yeah, I was really proud of it. I think the takeaway is...

I use that particular experience to go out there in the market and now say, look, I have professional experience in Google Cloud. I can apply to the job role that you're advertising, which is Cloud.Engineer.

So yeah, to me that was my pivotal moment. They really dug into technology and gained some experience first hand. It sounds like you then moved on from that role and you went and sought out this kind of challenge. There's always a hesitation I find, like a company thinking, ⁓ they're doing something in their spare time. I've always found that to be massively productive, like you investigated the whole thing. There's clearly was an industry trend.

they will benefit from. I wonder if you, going back, what advice would you give that company, instead of the fear of the cloud, how could they have done something different, or you're encouraged, if that was? Interesting. And depends on the industry. We can give advice to every company the same way, but there are certain industries that they want to let the waters to settle to see who are the winners in the market, and then go with them once they secure, trustworthy.

This was an insurance perhaps closer to the banking sector so certainly they risk averse. Exactly, yeah, they had their own servers, own mainframes and they own the data physically so yes, then there was a big push not to do it. Yeah, it's an interesting challenge for that kind of organization. I spent a long time in banking and the industry around fintech and there's a lot of narrative around if you don't keep

keep

up, don't innovate. And at a level, they all appreciate that, but then the reality is still there. They're a risk-driven company. There's real threats to their business if they get something with their data wrong. And let me tell you a story actually from what you mentioned. I call it the entrepreneurship spirit, know, like trying to do something outside of working hours, still related to the field or perhaps within the company, but something that your manager didn't ask you.

to do. I think it's really important. I personally love it and when I interview people I try to find out to see will they go the extra mile, are they curious, can you see this entrepreneur spirit that will bring you to the next level in terms of waves of new technologies, practices and so on. I think that makes the big difference because if you are an individual contributor you are waiting for a manager to tell

you what to do all the time. It's just not, definitely not the path to success. I totally agree. I also look out for that kind of what are, you know, they a self-starter? Are they looking innovative? Are they looking for new things? And it strikes me that actually with AI, that's the skill gap that everyone talks about. know, actually like knowing what to do, then instructing how to do it, like actually doing it now.

so much more productive than it's ever been, but you can't necessarily have a person that thinks that way to figure out what to do. wonder if that's, how do you see AI adoption in your team and how are you experiencing it?

I told my team this is a non-negotiable thing. We need to learn it, we need to adapt and we need to embrace it. Take the time you need but it's not going to be a question of whether we can use it or not. It's here to stay and it's going to become the industry standard.

I don't think AI agents will end up replacing engineers, but perhaps the engineer might be replaced by another engineer who uses AI, right? Yeah, I'd definitely say that as well. And it's really, like I said, this conversation quite a lot, and it's really hard to sort of pinpoint, you know, I see some engineers reluctant to take on AI as an assistant to their work.

That's a puzzle to me. Why? It's clearly great at producing certain kinds of output as an assistant. So I was like, why the reluctance to use it? Do you see that in your team? Do you have a spectrum of people who are really ready to adopt it and others who are not quite there yet? There's a bit of everything, yes. Perhaps it brings him back to the software engineering ego, perhaps.

You know where it's the code they produce and they're very proud of it and they want to babysit it They want to keep maintaining it. They want other people to perhaps Appreciate it. Yeah, appreciate it or change it in any way and I think with AI I feel that they are being invaded by the way that the code is produced not by themselves, but perhaps by Not being able to delegate a little bit more. Yeah, that's actually fascinating point of view

It feels like then it's more like an identity threat to their value was this amazing beautiful code and suddenly like a machine can produce it so there's where's with me gone you know in their head. that's interesting. That's actually cool. I think that's I've got to disable this screen somehow. That's my next challenge.

Nice, thanks for going through the first question again. That was really cool actually. That is exactly what you want.

I mean, it's hard to try and style it out. Okay, I'll take time. We've kind of done AI actually changing the work quite naturally. Okay.

I'm going to have cards, but I'm just going to ask it.

Okay, Julian, this is the part of the show that gets a bit nervy for you. We're going to ask you a question that you've never heard before, know, from our experience and things that are going on. And this comes from a conversation. was at an event and one of the speakers on the stage from a smaller company. part of their presentation was that AI doesn't need a semantic layer. That data doesn't need a semantic layer.

layer

and this had quite a reaction which is why I'm actually asking you if you if you think about like AI as an assistant to producing analytics do you think a semantic layer is necessary to achieve that outcome? Wow good good good question. Does AI need a semantic layer? I think it depends when. Today I think it does.

in two years, maybe not.

And why am saying this? I recently read this article from the DBT where they basically were evaluating the accuracy of the response of an agent for doing analytics engineering with and without the semantic layer. The percentages, the response was really, really, really high, high 90s percent with the semantic layer. But without the semantic layer, it was around the 70s, 17 percent.

To me, the way I interpret it is, let's throw more compute into this problem. So if you can get 70 % right, with no semantic layer, with not so much context, perhaps it's just a matter of time until the amount of tokens that it uses goes down, or the amount of thinking that it does, it goes higher, or in two, three years time, when the chip users go twice as fast, the amount of compute goes significantly faster. I think it's a matter of time.

Like we did before, throwing more infrastructure into problems we couldn't really solve from a code. That solved it at the time. I think similarly now, now it's important to have that semantic layer, but perhaps in a few years, maybe not super necessary. Although I think the politically correct question is yes, let's do it in semantic layer. I think that was quite chopping to the audience. Maybe it was designed to chop, is that there was actually more of an attack on the semantic layer vendors.

answer and your position is quite insightful. So anyone who's been in technology and software and know for any length of time has seen the progression of CPU and compute, seen the progression of networks and cloud storage and like you know that capacity isn't really a problem.

to tomorrow, next year, in a few years time. It is everyone is quite worried today. There's acquisitions and there's company valuations that are skyrocketing because this is a very finite resource that everyone's trying to make use of.

And I think that the way I was thinking as you were talking that through, there's like a context that's built up in a conversation. But there's almost like an infrastructure context that could be built up through how the questions are being asked. And maybe the efficiency comes from a kind of almost like a self-building of a semantic layer. Because that's all we're really doing with models anyway.

I'm also seeing like perhaps a similar shift when it comes to running local models. We all get excited with the latest model every month coming out. now Anthrobic, now GTT. But there's a race as well for running local models and there's a big uptake as well for like different companies that might have like different requirements from a security perspective and so on. They might want to run things locally.

are open source. So I've been playing around running Chinese models that are open source on local laptop.

They are good, but they are not as good as what we are used to using the APIs. But what I'm saying is that perhaps with the local capacities that we have in our laptops, we are perhaps a couple of years behind to what perhaps GPT-2 was a couple of years ago. the token generation, the war next to war, might take a little bit longer to generate when you're running locally. But if you extrapolate that in a couple of years' time,

what we see and we are amazed with the launches at the moment, we will have that capability locally. Yeah, that's interesting. You're already seeing the progression. yeah, that, that, I think in the past was quite hard for a person to see over the length of time that it was happening. With AI happening so fast, actually, anyone watching can see it. There was this very popular list of LMS published at some

point

and the author of the list actually had to stop when it got more than kind of 16,000 different models and there are just so many models and you see like these specialized choosing the type of model for different tasks that's a crazy space. It's not quite software, it's not quite data. The other thing kind of gets me excited is the edge data kind of challenge. It's like you've got a model running locally but actually having the data running

locally perhaps as a way to get rid of the kind of token problem, you know, networking problem in that. So yeah, the future looks quite, quite different I think. What does today look like for you? Like if you think, that's kind of like future, like practically how are engineers using it today? And I'm wondering what advice you'd give them to do it differently, either in the current thinking or what might be happening next if you had some advice for them.

What's working really well for us at the moment, I can see it, is a few things. We are perhaps starting to become more productive in a way of amount of code being pushed.

Now that doesn't necessarily mean that the code that we push is as good as it was when we were doing it manually. But there are a few elements that I've identified that the AI is super useful for us. The first one is tech debt. Addressing tech debt has always been a constant hurdle with the business. know, like, oh, why do we need to spend this many weeks on this? We won't see any difference. But we are the engineers that actually, you know, end up getting a call when something breaks. So we know it's not stable.

We know we need to work on it. So yes, being able to address the tech debt in ways we weren't able to do that before is helping us a lot. Another thing is writing tests. I think the industry in software development, engineering, we all see the benefits and we all claim that test-driven development is perhaps the way to do our work. But in reality, we know that that's not the case.

It's process that takes a lot of effort, perhaps twice or three times the amount of effort, a lot of way more code perhaps to fake or to mock the behavior that you desire. And we ended up not doing it in the industry. But with agents, I think now is a good time. The agents are really good at going through, crawling every single use case and actually

case, right? So that's what they're really good at. At least I'm finding it. so using the agents to review your code or ⁓ evaluate all the possible scenarios to make it stronger, more reliable. think that's working really well. And also, test is a little bit boring. And yeah, finally, another perhaps boring task within that engineering, to me, I find is static governance.

about

it but a technical person, they don't want to go digging into writing documents, analyzing permissions or checking like column lineage fields that have the same type of different names. I think it's a task that requires, it takes a lot of effort, a lot of time and it's not super rewarding. We can see the value, that's for sure, in data governance but I think agents,

Excel

at writing text. So I think the agents helping us out drawing the data governance, it won't be perfect, you know, but I believe that if they can help us to get like 70 % there.

That's amazing that you end up doing the 30 % with very minimal effort or perhaps that's all you need. It's a, to anyone who's worked for me, he knows me, I've always had a very strong emphasis on the testing and you know asserting what you think your part of the software is doing and definitely have seen the motive to do it, often like you said, boring work perhaps, or they can't see the benefit yet because you know it's the future then.

sort of protecting against the scenarios that they've not envisaged. I do still wonder whether even though they see the benefit, even though it's easy, still thinking like that is still a bit of a challenge. You're actually getting the AI to do that piece of work. There's no one without the experience knowing that's the difference between someone who's using AI quite well or someone who's an engineer who's building software quite well. I wonder how we solve that.

as an industry to still get the appreciation and do these sort of more mundane tasks even though they're easier doesn't mean that we will do them.

Totally. I think that the biggest leverage that an engineer can have is context. Context that the agent won't have because of the meetings that you went, context in the market that you perceive but it's not documented, it's not written. I think that's one of the main leverages that engineers should use with their natural thinking on, what code should I write that the agent cannot know. Yeah.

I'm gonna do the last question I think. Little wrap up and then last question. This is, this is gonna be a killer. There are gonna be millions of views yeah? I forgot one little thing I'm Okay, what's the, what would you need to take it up?

So we do the question too, what's something that the industry is doing completely wrong? Yeah, so...

Is that the one you were thinking of? We can definitely go back to that. No, I remind.

Yeah, I don't think I'll try to read it in the beta, so maybe that's why it didn't stand out. We just done the number 3 one, which is how... So my intention was to introduce the question for the next guest phase. I'm sure we did the... If you had something around question 2, we could easily do a little bit on it, but I we did it. So we just did what's working in the industry. We're missing what is not working.

and what's going to disappear in three years. Yeah, okay. Well, let's do the data industry getting completely wrong and then I'll lead straight into the kind of question for the next guest. Okay.

Cool.

So.

you

Okay, so we've come into the last segment of the episode and we've talked a lot about what the industry is doing well and what's changing the way that we work. This part, we're going to close with a question from another guest and you'll get a chance to ask the question to the next guest. But before we do that, I wonder if there's something that you see that we're doing completely wrong as an industry.

You advise everyone that was listening to change. Wow, what's the industry doing completely wrong? You always need to look at the motivations, right? The industry is always following the money. But not necessarily the good practices. So I see in the industry in these events that...

There's so many low code tools that promise that everyone can build a data pipeline. Which is great. They've lowered the entry barrier so low that anyone can do it. But then you actually, you have a few side effects, you know, like...

you get vendor locked in, then the bill keeps creeping up and then you don't know where else to look. But also, effectively, you're digging your own cave. You're getting tech debt, And with this, you have a team, perhaps, creating data pipelines that are super important and critical for the business, but without much engineering background.

And what I've seen in teams, in large companies as well, I've seen teams that perhaps don't use version control. They don't write unit tests on the SQL.

They even use the personal credentials or raw keys within the scripts or code they push. And I think this is really important for the industry because why are we allowing this to happen in the data industry and not in other engineering industries like software engineering or civil engineering, for example? I think that data is as important as the rest. But as we were discussing before, data has always been a bit behind in terms of

practices. So I think it's about time to bring back engineering into data. Yeah, you know what, I absolutely love this and we chatted before the recording and we talked about various things. This one didn't actually come up but my perspective, think I'll share with you 100%. Low code tools to me is always just a bad smell. Like I think the vendor I

was the productivity that that would enable. That's probably not, that's not bad motivation. You've got someone who's not very technical that can't deal with deployment problems, with operational change management problems, with the sort of, you know, maintaining something for the long term. So well, the solution that I just totally disagree with, putting code, low code, somewhere in UI, locked in.

I now think that with the products like us, know, Bender, we develop, and with that ethos have been able to change the code.

with an AI that helps you change it, totally game changing. people talk about the kind of SaaS apocalypse. Well, my view is the first ones that should go are these kind of low-code tools because they will not be necessary. And you can maybe extrapolate that to, you know, UIs in general. That might be going too far. Yeah, that's just such a cool observation.

I wonder then what you think the steps might look like for the industry to change to more engineering, because a lot of people don't think like engineers.

follow the money, we need to offer some sort of reward for making that happen. The value that the engineer should produce should outpay ⁓ whatever it costs. Otherwise, these other local tools or shortcuts perhaps, they might continue to prolific. And maybe that will happen naturally. Maybe careers, employment.

Maybe vendors not be able to sign new deals because it's obvious there's a better way.

And maybe that will play out ⁓ naturally. Yeah, interesting. coming to the final part then. So we have a question from the previous guest. And you've not heard this question. You used to get a chance to react to it. And then you'll have a chance to ask the next guest a question. So hopefully a kind of chain of questions that really helps other data leaders understand and think about things differently. So the question from a previous guest, Kevin, was, is there a belief

that you held three years ago that's no longer true and what caused you to change that belief? What do you feel now is so different to something you believed was necessary three years ago? I'm totally bang on that one. We might pause and ask then, in a slightly different way, give you a chance to think about it.

So I find the question a little bit challenging. Maybe I'll be riff on it. So his question verbatim was, what's the belief that you held strongly about your data work three years ago that you now think is wrong? And what came to change your mind about that belief? So maybe that constraint around the data work makes it little bit easier to answer.

Okay. Maybe you can orient it through the dashboards. Don't you want to validate it for us? No, no, no. I think so then if you've got an idea, let's go with that. So you're a little bit of a dashboard, so maybe we can ask the question and then you can sort of dive into that. Okay.

Okay, so we're coming to the final part of the show and this is where it gets a little bit interesting. We've got this idea that we can ask each guest to come up with a question for the next guest. so Kevin and I, last recording, has come up with a question for you and you'll get the chance to ask the next guest a question. The idea being that we can share, know, challenge and get people to think. So the question Kevin had for you, which you've not heard before, is there a belief that you had three years ago about your

data work that you now believe to be fundamentally wrong or not true and what caused you to change that belief.

Interesting, interesting question.

So I think we need data engineering.

the bread and butter within data was displaying that data and telling a story about it, you know, like building beautiful dashboards for the executives to take excellent decisions that drive revenue up to the sky. So I always thought like, yeah, dashboards were the bread and butter with the engineering. But what I'm seeing now with agents, and perhaps we're not there quite yet, is that dashboards will be a thing from

the past. Not all of them, but most I would say. If you think about it, dashboard is something relatively static, know, it's something that tells you what happened. But you're assuming that an executive or someone had the time, effort and concentration to actually apply filters, see what's happening and then say, okay, this is what happened.

but with agents.

can go not one but two step forward. So the agent doesn't need to look at that floor, he can look at the summarized data. He can identify dips and peaks, we are seeing that already like with monitoring alert. But with agents, you will be able to identify that and then go in digging further. Go and look at the granular data to understand not only what happened but why it happened. Once they

has a nice understanding of why it happened, cannot even like one step further and say, okay, what can we do to fix this or to prevent it in future cases? By looking at other sources for theta or other things. So in a similar way of like software engineers, we don't really look at the logs every time to debug things as we used to like my life. We throw that into the agent, give us a summary,

or

best bet on what you think this error is about. I think the agents will become more like replacing a dashboard for like a notification and saying, look, this will happen. This is why it happened. And this is my suggestions actions for you to take or do you me to do it actually? Much more, much more narrative. I love talking about dashboards. I definitely have a building them or, know, they're kind of, they're probably one of many people who think it's a kind of lossy way of looking at the world.

As you were sort talking about through, I've really got a of mind racing. Actually, if you think about a pilot flying a plane or someone driving a car, well, they have a gauge. They have a kind of well-defined and decided metric that governs some of their activities.

But it's fundamentally a very poor reflection of what's going on. And know, that gauge is broken, that could cause serious issues. yeah, as you're talking that through, that definitely brought to life for me the dashboard as a gauge. It's you know, it's just a very poor representation. But if you could comprehend all the things going on, well, you wouldn't actually even need the gauge. You don't have a, you know, I'm thinking like a base jumper or a kind of wingsuit, you know, doesn't have a gauge. I feel how it's

flying

and there's a lot more input and able to really fly their organization. That's a pretty cool vision for the future, you've inspired in my mind at least. Coming to your question then, so the next guest, another data leader, what would you ask them given the chance? I have a spicy one for the next guest.

You know, we're in an economic turmoil at the moment. If you were asked to reduce your cloud costs by half to allow the company to survive three months, six months more, why would you chop down and why? Interesting. I love asking this question. And his particular reason is cloud cost and not cost.

Well, within the engineering, mostly we do things in the cloud. But whatever is in this data leaders realm, whether that is cloud or something else. ⁓ The reason I ask is often employees are the highest cost, far, far in excess of the cloud. So I'm going to leave the witness, I'm going to ask you a question. You're really, really awesome and thanks for coming onto the show. This has been an absolute pleasure.

Absolutely,

the pleasure was mine. Thanks so much for having me here. Cool, great success. Cheers mate, very good.