Data Matas

In this episode of Data Matas, host Aaron Phethean and his guest Stéphane Burwash dive deep into what it takes to build a true data-driven culture. Recently promoted to Data Engineering Lead at Potloc, Stéphane shares his thoughts on building trusted analytics, where quality data is at the foundation. The conversation digs into the hot topics of AI and self-service analytics - and questioning their relevance - as well as the application of modern technologies such as Meltano and BigQuery and "the separation of church and state" in the data space. Not only that but the two touch on the importance of the people element and emphasise the need for open and honest stakeholder management in an organisations journey to data excellence.

Takeaways

Stéphane started his data engineering journey alone, relying on community support.
Building a community is crucial for learning and growth in data engineering.
Potloc evolved from a market insights company to a data-driven organization.
Navigating data engineering challenges requires asking questions and seeking help.
Stakeholder management is essential for successful data projects.
Technologies like Meltano and DBT are integral to Potloc's data stack.
AI is being leveraged to improve data quality and analytics processes.
Self-service analytics can empower users but requires careful governance.
Data quality issues often arise from a lack of awareness and communication.
The role of a data practitioner is to maintain a big picture perspective.

Sound Bites

"Ask questions, don't be afraid to learn."
"Everybody has been in that position."
"We shouldn't be trying to do the custom solution."

Chapters

00:00
Introduction and Background of Potloc

04:43
Role of a Data Engineer at Potluck

06:34
Data Sources and Technologies Used

09:58
Balancing Complexity and Impactful Work

15:30
Working with BI Analysts and Data Modeling

23:46
Focus on Data Quality and Maintenance

25:42
Challenges of Data Quality and Data Integrity

36:12
The Importance of Stakeholder Engagement

41:14
The Concept of Self-Serve Analytics

43:25
The Value of a Holistic Understanding of Data

47:14
The Role of Data Practitioners

48:15
Introduction

49:24
The Value of Online Communities and Asking Questions

50:22
Overcoming the Fear of Feeling Lost

50:48
The Generosity of the Data Community

52:10
Networking and Learning at Meetup Events

53:21
Building Connections and Getting Insights

What is Data Matas?

A show to explore all data matters.

From small to big, every company on the market, irrespective of their industry, is a data merchant. How they choose to keep, interrogate and understand their data is now mission critical. 30 years of spaghetti-tech, data tech debt, or rapid growth challenges are the reality in most companies.

Join Aaron Phethean, veteran intrapreneur-come-entrepreneur with hundreds of lived examples of wins and losses in the data space, as he embarques on a journey of discovering what matters most in data nowadays by speaking to other technologists and business leaders who tackle their own data challenges every day.

Learn from their mistakes and be inspired by their stories of how they've made their data make sense and work for them.

This podcast brought to you by Matatika - "Unlock the Insights in your Data"

Aaron: 00:10

Welcome to today's show. We're speaking with Stefan from Potluck who talks us through his experience of being the 1st data engineer and all the learnings that had to happen. And his big takeaway is never be afraid, ask questions. Let's get into it. Hello, Stefan.

Aaron: 00:28

Welcome to the show. We absolutely love to hear your story and then all about a day in the life of, Stephan at potluck.

Stephane: 00:38

Yeah. Absolutely. Well, I'm excited to be on here, excited to, demystify sort of how it is to get started in data engineering in this, modern modern data stack age. And Yeah.

Aaron: 00:53

I think that's probably a journey a lot of people are going on. So that Yeah. It could be very valuable.

Stephane: 00:59

Hopefully, people can learn from my many, many mistakes and go a bit more quickly than my slow thredging through, the merits of the different portions of the data engineering stack and the data engineering workflow.

Aaron: 01:15

That sounds that sounds perfect. So maybe before we dive into that, tell us a little bit more about the company and and, you know, they're they're not that old, are they, the company? So perhaps the journey of the company and and you in it as well.

Stephane: 01:30

Absolutely. So we're not that old, but we're also not that new. We're a startup slash scale up that actually just turned 10 years old. But most of that was our founders really going through the motions of starting a company, which was actually, as a university project, here in Montreal. And, it started off as a, market insights company for neighborhoods where they were actually trying to establish if people wanted to start businesses in a neighborhood in Montreal.

Stephane: 02:05

And they did door to door, surveys with people of the community to establish what kind of businesses they would like to start, in the neighborhood, if they were interested in new businesses, and then they would create a report based on that. So that's where we started off in the survey business, trying to source from the community insights so that they could reap the benefits of their own insights. That's actually a

Aaron: 02:32

really fascinating business. Okay. I can only imagine what they were uncovering as they dug into different regional needs, and that sounds quite cool.

Stephane: 02:41

Absolutely. And it's very fulfilling because you're actually feel you're actually, helping the community get access to better services that are tailored to their needs, which is, you feel like you're contributing to your own community. And that basically evolved into through multiple and multiple iterations and a lot of sweat and hard work from the entire team, evolved into what Potluck is today, which is a tailor made, solution mostly for consultant companies, where we will take care of the entire facet of market research for, consultant companies when they are working on a project. So for example, if you are a London London based consultancy or you are trying to work on a project based in London around people that, you want to install new bike lights, let's say, and you want to source the community. Is the community interested in installing new bike lanes?

Stephane: 03:39

Where would they want these bike lanes? What is their opinion of existing bike lanes? You would have this entire project, which has multiple facets, one of which is sourcing the community into their current opinions. And, you could obviously do this yourself. SurveyMonkey exists.

Stephane: 03:55

So you just go on SurveyMonkey, craft what you think is a good survey, throw it up on your company's LinkedIn page, and then you see the results, not walk in. Or you see this, some results come in, but you do not have the expertise to analyze that data, or you don't have the expertise to actually establish what is a good quality answer from what is a fraudster, what is a bot. There is a large swath of different reasons why you could get incoming data, which is not actually the data that you want. It is data from people that are doing surveys for money and so on and so forth. So we basically take all of that pain and suffering into acquiring real data from real people.

Aaron: 04:37

Yeah. Yeah.

Stephane: 04:38

Yeah. That is our niche, our expertise.

Aaron: 04:41

So So you're you're a data professional building a business intelligence capability and technology inside a data company, which must be quite unusual because there's a lot of data professionals there, but probably with strong opinions. So tell us a little bit more about your role and what you do in, analytics and data engineering inside a data company. You know, fundamentally, that's that's what they're dealing with as a product.

Stephane: 05:09

Well, that's what the that's the very ironic thing about my role within the current company, which is, I work within a data company, but my role is purely internal as a data engineer. We work as a team of 4, data practitioners in the data engineering and analytics space within the company. So our role could really be transferred at McDonald's or at Walmart or at any Fortune 500 company, and we would be doing very similar things. So the we work very much with the core product, which we call the Potluck app, as an analytics team, purely as another source of data. So we will work within the company, working on the key analytics that any company needs.

Stephane: 05:56

Revenue, cost, COGS, churn, all these different metrics to ensure that we are profitable over time and try to reduce, time spent on different activities, optimize our workflows. And our main product offering, we will actually consider as a separate source as with our CRM, as with our project management tool. We will just import Potluck app data and infer analytics from there.

Aaron: 06:21

Yeah. So if you were to so if you include the Potluck app, the the you know, one of the main sources of your data, you mentioned CRM. If you were to count up all the other things, how many data sources are you talking about dealing with?

Stephane: 06:34

So it really depends on, if we're counting the small one offs versus our main sources. If we're talking about our main sources, we would have our HR management tool, which allows us to manipulate things like salarial information, time off, and stuff like that, which allows us to predict, when we will be able what type of load we will be able to take in the future, Our project management tool, where we store all information regarding projects. Our CRM, so we store everything regarding sales. And then finally, the PodLock app, where we store everything from costs associated to acquisition of respondents, to the campaigns themselves, to, information on respondents and their quality status.

Aaron: 07:20

Yeah. So you'd expect as a data company, they sound pretty data driven. You know, if they're looking at, you know, employee load and they're trying to forecast, you know, their their own availability. That's that seems fairly advanced, actually. Like, a lot of companies would like to be there, and, you know, you're already dealing with quite a lot of sources.

Aaron: 07:40

And this yeah. From my view of the world, this seems like the norm that you're dealing with a lot of sources to be able to get a gauge on the company from different angles, so that's that's pretty cool. What what, what thing that you you mentioned there what what were the one offs then? What what are the one off sources?

Stephane: 07:58

Lots of different things. Basically, we would qualify one offs, not as integrations that were easier to integrate because they aren't. They take exactly as much time as all the other sources, but they're, integrations that usually will be less leveraged in our general data warehouse. So I'm talking things like Google Analytics, like, Jira, which is a ticket management system, Lever, which is a hiring management system. And usually what would happen with those is that we would have only a few users within the company that were actually leveraging analytics from it, or the tool already had embedded analytics and there was no requirement to, perform any joint analytics on it, which was an uphill battle that we had to tackle because it is not necessarily because you have an existing data source that it must belong in the data lake at all costs.

Aaron: 08:54

Yeah. Yeah. I understand that.

Stephane: 08:56

Especially if, some tools like Google Analytics have embedded analytics. Usually, you can spend a lot more time just analyzing the the data in the tool itself instead of trying to put it into another tool so you can analyze it there.

Aaron: 09:10

Yeah. I mean, Google Analytics is an interesting one possibly for quite a lot of people. What was your experience like going from the Universal Analytics to the GA 4 world? Is that was that painful for you?

Stephane: 09:23

It was less painful just because I I fought very hard not to put that much Google Analytics in the warehouse just because it is compared to most, data sources, which are very REST API based, endpoints are well documented, easy to manipulate. Google Analytics, to me, was fairly complex, and nobody knew what they wanted to analyze as information. They were just, like, try to dump as much information as you can into the warehouse, and we'll take it from there, which was not a workflow that I was comfortable with. So we very much limited the, incoming data, which made the transition much easier for me, which made me very, very happy.

Aaron: 10:06

Yeah. So that I think for and that's an interesting one. So from your point of view, the requirement came as let's get all that data, and you're like, hey. There's a massive engineering challenge here without a huge benefit. So why are we even doing that?

Aaron: 10:22

That's that's quite fascinating.

Stephane: 10:24

Absolutely. But that is, that's sort of something that is very near and dear to my heart, which is you try to get the most complex data. You should not shy away from hard work, but you should also try to lean towards impactful work. So, and sometimes people are not aware of the engineering challenges regarding importing specific datasets, or they are unable to explain what their need is for specific data. And since they lack accessibility on the raw data, they're just like, oh, import everything, and then I'll sort out what I actually need, which is very complex on our side because we can do a lot of work for finally a data source that has very little value.

Stephane: 11:09

And that's why it's very fun to be in a smaller company or in a company that has actually a culture of sharing information. So we have a very vertical architecture where we can communicate directly with the stakeholders and with the requesters of information and be like, what do you need?

Aaron: 11:26

Yeah.

Stephane: 11:26

Like, I I understand what you think you need, but what is it that you're trying to achieve? What is your goal? And let's work together to identify the information that you need to succeed in that goal.

Aaron: 11:36

Yeah. So you're I know from, you know, us working together on similar technologies and being in a similar space, you're quite experienced on the engineering side, but relatively inexperienced in your career, like dealing with stakeholders, dealing with the requirements, and, actually, you're recently promoted to to recognize your your seniority, and and that that experience, what was that like dealing with the stakeholder requirements and dealing with, you know, maybe pushing back sometimes? Like, how did you how did you find that?

Stephane: 12:12

Well, it's not what was how was that? Like, it's how it is. Like, I'm learning every day, and I'm still terrible at it. But, like, steep, because, to give a bit of context, I started at Potluck as a solo data engineer. There was the beginnings of a stack that were started as a one off project, like, oh, we're starting to have data engineering needs, which very much spawned into, we actually need a full time person.

Stephane: 12:39

And that's when they hired me. I finished my master's in data science with a focus on data science, and then was less interested in the data science space as I was in the data engineering space, which I considered to be closer to the business, and I got hired directly by Potluck. I started there basically, Virgin Canvas. Let's get things started. And,

Aaron: 13:02

what could go wrong?

Stephane: 13:04

Yeah. Exactly. And, I reported directly to the head of engineering at this time, who is now the CTO of the company. So it was very much a space where my manager was tackling a lot of different fires at the same time, lots of more impactful portions of the business. And so he trusted me to take on the responsibilities of data engineering.

Stephane: 13:31

Basically, I trust you to be able to manage your own time, manage your own projects, and manage stakeholder expectations. So it was very much trust upon my shoulders with the faith in me that I would be able to do a good job, which fingers crossed, I hope I'm doing so far, but learning to deal with stakeholders, learning to prioritize projects, especially in a more waterfall communication where work would come through our analysts and then finally to the data engineers Yeah. Was very complex because every time you're playing that game of telephone where a request goes from maybe a high level executive to their more junior associates, to our analysts, to me, you're losing all that context. And at that point, it's just like, I need this. I need that.

Stephane: 14:23

Yeah. And sometimes you need to bump the brakes and be like, hey, let's talk about this. Let's see what you actually need, and let me explain the reality that is on my side. You have your own reality of what's complex and what's easy. Mhmm.

Stephane: 14:39

On our side, this is complex. This is a quick win. And that was an uphill battle for me because there is a certain way to communicate with people, without telling them that is a bad idea, or we really shouldn't do this because that makes people shut down, as opposed to going with words like, let's look at this together. We're on the same team. I wanna work on your priorities.

Stephane: 15:04

How can I help you achieve your goals? Working on your just the basic language of how you communicate with people can really help creating that rapport with your stakeholders.

Aaron: 15:14

You know, I I definitely couldn't agree more. You know, that whole experience is just life experience at the end, actually. You know, it's just dealing with dealing with people and requirements and, you know, trying to understand things is just hard work. It's it's really, really hard. So that's the

Stephane: 15:32

validating when you get it right.

Aaron: 15:35

Yeah. Exactly. Yeah. Exactly. And then so on the on the tech side then, just give us a little whistle stop tour of the kinds of technologies you're dealing with.

Aaron: 15:47

Absolutely. And then I'd love to dive into kind of what's in the future. Let's let's start with what what kinds of technologies does a as a BI professional at Potluck

Stephane: 15:55

do? Well, as a data engineer, at Potluck, basically, how it works, we have very much segmented the responsibilities of a data engineering versus a BI analyst. So our BI analyst will work starting with what I'm not, will work exclusively from our marts forward, with our analytics tool and stuff like that, driving insights. So they do not use a code repository like GitHub. They will store all of their work as views or as dashboards.

Stephane: 16:29

Yeah.

Aaron: 16:29

So people aligned with the business, asking questions, delving into the data, yeah, doing the proper analysis stuff. Why would they code manage that? Why would they do the change promotion overhead? That's not what I need. Yeah.

Stephane: 16:42

Which is something that we can get to after because I feel very strongly more into the in the gray zone of the analytics engineer than I do in the separation of church and state between data engineering and BI. Like that to

Aaron: 16:56

the church and state of users and data.

Stephane: 16:59

Yes. Everybody should do everything in my opinion, but we can get to that. First, the current state at Potluck. So if BI is working off marts and performing their analysis, data engineering does everything before that. So, starting with ingestion, we, like, we, you know that we're big fans of Meltano, So we use Meltano to perform our extraction of all of our different sources and load them into a centralized warehouse, which at this time is BigQuery.

Stephane: 17:31

And then from there, we'll mostly use dbt for data modeling, create our base, our staging, our intermediary, and finally our market. We only use DBT core currently, which is fine for us, and we host everything in Airflow. And then finally, we also manage our entire infra stack, so everything is Terraformed in AWS. So we have some ECS containers to run our Meltano stack. We have some websites up to host things like dbtdocs, and we used our airflow managed, MWAA, which is AWS's version of a managed airflow.

Stephane: 18:20

Mhmm. Mhmm. Like, beyond that logging and all that stuff.

Aaron: 18:24

So you use managed airflow. Is that as well as your own airflow, or is that just is that the one?

Stephane: 18:30

Just the managed one because, basically, when we started, we had our own hosted version of Airflow in an ECS container somewhere. But we found that updating, our DAG files and updating the, Airflow itself was becoming more and more of a pain point, and we were going to have to invest heavy resources into the management of, that self hosted Airflow. And then we did the cost analysis, and we took my salary and mine, and we were like, if Stefan works on it this amount of time, we can get a free version, which will cost less, or we can just get started right now

Aaron: 19:11

on a Yeah.

Stephane: 19:12

Cheap hosted version, which does everything we need, and that was good for us. And one of the tenants that I very much hold to with making decisions at the company is we are not reinventing the wheel. Like, as I said at the beginning, our team could have been at McDonald's, could have been at Walmart, could have been anywhere, which means

Aaron: 19:32

It's the same problem.

Stephane: 19:34

Yeah, it's the same problem everywhere, which means that we shouldn't be trying to do the custom solution that, will revolutionize the world. We shouldn't be trying to use these emerging technologies which are unproven on the market. Just take something that everybody says works. It's not fancy. It's not sexy.

Stephane: 19:53

It works. And just

Aaron: 19:55

Yeah. Yeah.

Stephane: 19:56

Go on doing impactful things for your company.

Aaron: 19:59

And I I sort of detected so one of the sort of, indicators to me that you might not be going to stick with something. You mentioned you're currently using DBT core only. Does that mean you have some plans to move to managed other things?

Stephane: 20:12

Not at, this time. It's more we are branching out from just using dbt core to also, using custom Python scripts and stuff like that where we're starting to feel the stretch of some restrictions on the use cases for pure SQL, where we have worked very hard to stretch SQL as much as we can, but especially for advanced statistics and stuff like that, where our analytics team is starting to perform more deep dive analysis, trying to do regressions and stuff like that. Things that pure SQL is just not adapted for, we're starting to include Python scripts in there, which is a reason why we are very happy that we stuck with a Python based solution for logging as opposed to using a tool like dbtcloud so that we could have some alerts in there and then have a separate stack for, all of our Python scripts.

Aaron: 21:10

Yeah.

Stephane: 21:10

The fact that we currently use Airflow, which is an orchestration solution with dbt running on that, but also all of our other scripts such as Meltano, such as custom scripts, we get to have a unified logging solution for all of our different tooling at the same time, which makes it much easier at the end of the day to go through all of your logs for your different solutions on one tool and be able to say this went good, this went bad, Why did it go bad? And that to me is

Aaron: 21:39

the the hidden costs of building, managing, looking after a data stack is that, you know, the the wrong tool can destroy a nightmare that's sort of been invisible. You know? Like, I go over there to see the problem in one space or go over there to see a problem in another space. So, you know, there's there's a lot to be said to see it in one place. I mean, obviously, you'd be preaching to the converted.

Aaron: 22:04

That's sort of the point of MatterTek here in our platform. That that's what we're about. But, you know, that that kind of ethos for me was, like, very obvious when I was, you know, working in a software company, you know, deploying the solution. How difficult it is to look after and run it is is not to be underestimated. Absolutely.

Aaron: 22:26

Is there is there still ongoing time and maintenance fee, or is it kinda like build it and then it's pretty okay until you reach the next hurdle of, like you said, some, regressions that aren't possible in in, SQL or, you know, how how much maintenance does it require? That's those are the questions.

Stephane: 22:45

There's always some maintenance. So currently, we are quite stable, and then we have automated tools for doing the small maintenance of upgraded packages. I would, what makes us very, very happy is that we have, Dependabot, which allows us to, get automated PRs on new packages and stuff like that. We also use a tool, a package manager, which is called Poetry. I don't know if you're familiar with it, but that allows us to get our dependencies upgraded much more quickly than if we were using a requirements dot TXT because we can see our dependencies' dependencies, which makes everything regarding maintenance a lot easier.

Stephane: 23:28

Currently, we're we have plateaued in terms of maintenance, and we are more looking to expand our offering, or expand the quality of our service into a new direction, which is, funnily enough, data quality, which is, everybody's bane. But it was an interesting journey for us because data quality was not on our radar 8 months ago as an actual project to tackle because we didn't have the data or we were having huge issues with having access to the data in itself. So our first big focus was just make analytics available. Give access to a table that has the proper column names and has the proper Yeah.

Aaron: 24:14

Yeah.

Stephane: 24:14

Title and stuff like that, and then people can start working on it.

Aaron: 24:18

And was there was there one event, or was it, like, a trend over time? Like, why did it become, like, oh, we need to do something about this. So so what what was that event?

Stephane: 24:27

Well, not an event per se, more accumulation of you start coming in, seeing the tickets come in. And when you get one ticket a month, which is like, hey. This data is weird. Could you look into it? Oh, I don't know if, that notification came in loud and clear on my side.

Aaron: 24:44

Well, it did. You can just, we'll we'll snip that. No worries. Perfect.

Stephane: 24:49

But, yeah, if you get one ticket amount a month, that's like, oh, this data, you look into it. But slowly but surely, we're getting more and more tickets every day, and all of these tickets seem like a 15 minute ordeal. Could you look into this? It'll be easy. Just go look.

Stephane: 25:07

Oh, identify this. And finally, when you start digging, not only do you realize that that ticket you thought would be 15 minutes will actually be 4 hours of your life, but it's also allowed you to uncover all of the other things hidden in the floorboards that you had not seen before. And then we started realizing that we had very little awareness of what our data quality was.

Aaron: 25:33

And Given that experience, it sounds like you might have started it earlier if you knew.

Stephane: 25:42

No. Absolutely. But it's also the philosophy because you we started it earlier, actually. We started a data integrity project a year and a half ago, but we had no experience in how to tackle it. And so we started putting, dbt warnings a bit everywhere.

Stephane: 26:01

So just a bunch of DBT warnings, but it would, 1, very much slow down development because you are, you have to be a lot closer to the stakeholders. You have to spend a lot of time writing those warnings, and we had very little way to capture the warnings that we were setting. So we had no plan into how to actually analyze this data. We were just being curious and be like, oh, let's put warnings everywhere. And it was creating a lot of noise.

Stephane: 26:28

Aaron: 26:28

this is like, just to help everyone understand. Some people will be technical listing, some people won't. So the upstream is delivering some data and then starts delivering data in a different fashion, and you wouldn't really necessarily be aware of it. So you you are using the warnings to spot new unexpected data shapes, you know, flowing through.

Stephane: 26:51

Absolutely. But you, like, it's, that requires that you know exactly what the upstream should be doing, which requires a lot of inherent knowledge of the data, some that you may not have. It also requires that you have a very strict requirement of what is the actual incoming data with things such as data contracts, which was a term that we were completely unaware of a year ago, but now are working very heavily with because they have very much saved us, currently in a few instances.

Aaron: 27:29

So Tell us more about that because that's that's one of the things that I often see is that, you know, we're in engineering. We're building a platform for the company. We're supplying data, but then, like, these kinds of improvements, often the stakeholders don't necessarily see the direct benefits. So tell us a little bit more about, like, how it saved your bacon. Like, what how did it, like what what benefit would they have perceived, and what what did it mean for you?

Stephane: 28:00

Well, the benefit is usually fairly easy, and it's very easy when you're working with automations and things like that. So people are basing automations on your warehouse. It's a simple way of talking about reverse ETL. So we're not actually loading data into another location, but we are powering automation. Creating

Aaron: 28:23

something to happen, yeah.

Stephane: 28:24

Exactly, which are things that usually they cannot do from source data alone because it's a combination of the information from different sources And so when you are capable of stopping the data before it comes to an automation, or when you are capable of flagging early before, it's not even usually the owner of the automation that is worried, it's who that automation powers, either through stack notifications or through, loading into a different tool or the yeah. Those are currently our biggest use cases. But when you're able to warn them early, hey. There was an issue with this data, and this automation may may have misfired. They are grateful that you are capable of catching a problem early.

Stephane: 29:14

When they are not grateful is when they have to come to you and say, why didn't my automation fire? And then you're scrambling to go look at the data, look at the logs, and see, okay. This didn't happen. What was our issue? What was the data integrity problem?

Aaron: 29:31

Yeah. So it I can't tell you how often we hear that kind of discussion and perhaps in different said in different ways. But, you know, in in the data space, when one small, aspect of the data is wrong, the users instantly distrust the whole lot. And then if you have you know, if they have to come and tell you about a problem, that also loses trust. And then yeah.

Aaron: 29:56

So this kind of proactive nature, I mean, I was to go to them and say, you know, we think there might be an issue. That is that is building trust. It's like, oh, wow. These guys actually understand what's going on. They actually understand the impact to me and and, you know, you put too much to get more together with them and then, you know, you kinda now actually serving them, which is fundamentally our role.

Stephane: 30:17

Absolutely. And, definitely something that we had to learn the hard way where, and it's also, it's very, in the data space, it's more complicated in my mind because usually you're manipulating information that they are generating. Your stakeholders, you can also be your generator. So they know a lot more about what you're manipulating than you ever will, which gives them a sense of ownership and makes them very quick to distrust your abilities if you get it wrong because they start thinking, well, I could have done this. This is my data.

Stephane: 30:55

I know it. Exactly.

Aaron: 30:58

Do you know what? One of the things that strikes me is, you know, I've seen quite a lot and experienced quite a lot is that, you know, if you think about the whole space, someone's capturing a record, they are progressing it through a life cycle, and then it generally ends from their perspective. So like a system, they they kind of get that's what they implemented it for. And the kind of analytics or reporting or the kind of what happened or what ifs tend to come later, and there's a lot of frustration that they're like, well, why isn't this just simple? But, you know, if you went all the way back to the beginning and you're like, you thought about what you were capturing and why, you weren't only capturing it to move a single thing through a process.

Aaron: 31:45

You were capturing it to understand and measure the whole thing. And if you you knew that upfront and planned for that upfront, life would be so much simpler, and they'd sort of appreciate the downstream a lot more. So that that feels like a maturity that the data world has to go through yet. You know, that's that's we're not there yet.

Stephane: 32:04

Absolutely. And I also think that it comes, very much to your point of, like, getting, more of an understanding of the big picture as opposed to building the railway as you, are riding it.

Aaron: 32:19

Riding it. Riding at the station and building it.

Stephane: 32:22

Exactly. It's also having a knowledge of what is the value of having a data team, Because every single portion of your company has information, and they are looking to be capable of better accessing that information, which is great value across the board, but you need to find what's most impactful. In my mind, the most impactful version of a data team is the fact that they can see the entire big picture. So they get to look at all of the teams together and then drive insights from the company as a whole, not as different parts.

Aaron: 32:59

Yeah.

Stephane: 32:59

Exactly. But if you are getting bogged down with every single individual team's data needs, where they want to be able to better manipulate their information, you are spending a lot of time enabling people to manipulate 1 single data source. So let's say that you, have a project management tool which does not have an embedded analytics platform. So currently, your team that uses only this project management tool has 2 options. Either they can do mass Excel exports and just analyze in there, or they can ask the analytics team to create basically an analytics platform in the warehouse based solely on this project management tool.

Aaron: 33:45

And then, essentially, you're doing operational reporting for someone rather than analytics. You know? That's Absolutely. That that's another thing I see that, oddly, if you focus on that particular problem, to do operational reporting requires a different rigor and a different kind of testing compared to doing analytics. Yeah.

Aaron: 34:05

In analytics, you you're happy enough with the trend, like, because you you want to make a decision. You don't really care. Whereas with operational reporting, an outcome might be that you bill a client, so you need to know the exact number of jobs. You need to you know, that that that kind of thing. So, you know, there's a there's a different rigor that's maybe not obvious to people who are asking you to do these things.

Stephane: 34:27

No. Absolutely. And it's also a different tech stack. Like, for operations, usually, you'll use, instead instead of an analytical warehouse, you may want to use just a database. And you may, instead of wanting to use, like, the ETL tools, you may want to use webhooks to make sure that you can get access to the data fast.

Stephane: 34:49

And what I found is that we're really truncating the way that warehouses are used, in from what was originally a pure OLAP analytical use case. And now people are more and more trying to shove in everything that was great of OLTP and turn it very operational.

Aaron: 35:09

Yeah. Now

Stephane: 35:11

I've had issues with this in the past, and I'm very much turning around. I'm finding more and more that there is a beauty in sharing an analytical and operational use case within the warehouse.

Aaron: 35:23

Mhmm.

Stephane: 35:23

But and it can have very great benefits for the company. It just has to be a choice.

Aaron: 35:30

Yeah. Exactly. Because at least they're working in the same data source, so that's a that's a sort of good fundamental. I think there's a systematic difference between, you know, integration, like moving, you know, events between systems and and the kind of reporting side. You know, you're certainly, your tolerance for loss, for example.

Aaron: 35:51

And it's, yeah, it's it's a perennial battle, I'm sure. And many, many, technical people will be nodding their head and thinking this is this is exactly what they're going through. They're like, but this problem is not solved here in this space or or in fact, it might be. They might be like, it should be solved over here, and it's actually they're solving it over somewhere else. This is, this is this is normal.

Stephane: 36:14

Yes. Absolutely. And it's it's a tech there's some technological aspect. Like, there are some tools that will help you bridge that gap. There's tools for everything.

Stephane: 36:26

Currently, there is very much an abundance of tools. If you are looking for

Aaron: 36:30

tools This is the other database thing, isn't it? There are so many tools, and that that obviously brings its own challenges. So Absolutely. Let's, let's park the tech for a second. AI.

Aaron: 36:44

What is happening at at Footloft in AI? What do you think the future is? This is this is something I think has to be discussed because it's it's another thing that's absolutely everywhere. What what's your opinion?

Stephane: 36:58

So every every podcast must have the section

Aaron: 37:02

think we've got AI.

Stephane: 37:05

Well, let's talk AI. You have not had the best resource at potluck if you wanted to transfer onto the AI conversation. I have an incredible coworker, Vin, who is our head of data science at the moment, and he could have run circles around you with everything we're doing. So I apologize in advance. I'm a bad So they care about it.

Aaron: 37:26

So they care about AI. They want to use it. Maybe you could give us some idea of what they think the benefits might be or how it's being used to to is that possible?

Stephane: 37:36

So heavily investing. We are, very much doubling down. We work in the bringing it all back, not only internal analytics, but we actually have a product which is responded acquisition. And the art of market research is incredibly complex and identifying high quality responses, or the responses that will bring the most value to our clients is very tough. We've done a lot of easy, well, not easy, but basically statistical implementations and just common sense implementations for trying to identify, like fraudsters and stuff like that.

Stephane: 38:17

I'll give you a basic example. If somebody's IP address is in a different country from the country they say that they are currently in, usually, that's a very telltale sign to say this person's a fraudster or this is a bot. Interesting.

Aaron: 38:32

Yeah. Actually, interesting. But at some point eliminate that from your results and just say, this is not high enough color. We can't validate that. That's genuine.

Aaron: 38:41

Stephane: 38:42

Exactly. Yeah. And we have a lot of examples like that. But as you go up, especially if you're starting to look at what we call open ended answers, so textual answers

Aaron: 38:54

Yeah.

Stephane: 38:54

Or you are trying to find the alignment between one person's responses between their first question and their last question. So does this fit their current opinion, or are they just answering random things? Are they always selecting a?

Aaron: 39:09

Interesting. Yeah. Yeah. Okay. Interesting.

Stephane: 39:12

Though that becomes a lot more complex, and at some point, pure engineering is unable to cope, and we have to start looking at AI. So we are heavily investing in methods to identify fraudsters in open ended questions. We are also looking to, make it easier to take an open ended question and boil it down into core concepts, which makes it easier for analytics. And we've just released, a, analytics tool that basically allows you to converse with your data. So from, your data, allow it in a chat GPT like interface, allow it

Aaron: 39:53

to your product as your customers. Yeah. Yeah. Okay.

Stephane: 39:57

So that's an it's an awesome way because it sort of ends the loop, for people to be able to analyze the information that they've gathered, but we are also very conscious that the information must be good from the get go. That's actually our product is the information. So we're not only investing in the cool ways for you to interact with your information, but we're investing even more in making sure that the information you are getting is of high quality.

Aaron: 40:26

Standard. That that's certainly one of the spaces I'm particularly excited about as a data vendor is qualitative or unstructured information, making it structured enough to do the kind of analytical analysis that that happens downstream. That still seems like an unsolved problem. There's a lot of infrastructure to manage. There's a lot of process to manage.

Aaron: 40:48

There's a lot of new ways of working, and that's become an awful lot easier with, you know, kind of generative AI. But it's not necessarily easier to get the outcome, you know, or or the highest quality. Because it's it's been possible for a long time, you know, machine learning is not particularly new, but, you know, it's accessible to so many more people now because it's got a relatively simple interface. And the output's quite, you know, hard to deal with, but it's still not really structured. You know?

Aaron: 41:20

It's these you know, asking questions through prompts on on some data and expecting to be able to use it. It's it's it's gotta fit into an engineering process somewhere. So I get to I get that in the product side, and I I get the requirement. Is there a use for it in BI and analytics that you can foresee?

Stephane: 41:40

Analysis of unstructured data?

Aaron: 41:43

No. The the use of generative, AI or the use of AI, is it still, like, not quite found its space, or is it like, oh, yeah. I can definitely see that particular use case. I wonder what your opinion might be thinking about it purely from the the the

Stephane: 41:58

Well, it depends what you're talking about with you're talking about, AI use case. If we're talking about the use case in your end product, so you are using AI to output an analysis, yes, there is, probably a use for, like, in-depth analysis, but I would argue that that's mostly statistics. So you can just use an SK learn regression to be able to gather insights from your data. We've been using that for years, which is great for analytics.

Aaron: 42:30

Or be it there's a user requirement to know how to use the tool or the library to get the output. So maybe there's a a lot of dumbing down, but, like, a simplification of the interface through you know, they can ask an an AI to do something more intelligent that they perhaps know how to do currently. That that sounds like a cool use case.

Stephane: 42:50

Which would be, coming closer to things like self serve analytics and stuff like that, which is

Aaron: 42:57

Yeah.

Stephane: 42:58

A incredibly interesting proposition, but it's also the pipe dream that every single analytics and data engineer, strives towards. It is something that I've strived towards and still strive in some way to this day. But it's also something that any senior data engineer will turn to you with their war stories. Like, I can remember when I started trying to do self serve analytics. This is everything that went wrong.

Aaron: 43:25

So I I I was desperate to ask this earlier, actually, and we sort of moved away from it. I the I wondered who is interacting with your data? And with that question in mind, I wonder if you could the self serve analytics. I sort of have my own experience in the moment, war stories, and and what can go wrong. So who is accessing your data, and what are the dangers of them doing it self-service?

Stephane: 43:51

So always a trade off. Currently, it's, the main main stakeholders are BI analysts. So they are the people that are leveraging our march to do an, analysis.

Aaron: 44:07

Educated. Right? They they are professionals. They know what they're doing with the data, and they serve users. So I wonder what would happen if the users got access to it.

Aaron: 44:16

What do you think would happen?

Stephane: 44:18

Well, we started off, with, turning our, accounting team and our finance team into a very sandbox, very controlled safe environment for safe serve self serve analytics Yeah. Only with their own data. So this, because they had trouble accessing information and they had trouble accessing cost and revenue coming from different sources. So that, we helped them with. And that was actually a very good success.

Stephane: 44:50

Mhmm. We were capable of giving access to our finance team without them taking bad decisions, but that's because they are, by their nature, very careful people that will double check with others. Familiar

Aaron: 45:05

with data. Like, you know, obviously, they're not, like, exactly data day in, day out, but mostly what they do is data. You know, they they are recording and counting things and yeah. So they got the right kind of individual and mindset. I could imagine.

Stephane: 45:18

Yeah. And then we, branched out to what we are calling our data champions initiative. So around a group of 20 people within the company spread out through different teams. And, again, we gave them a controlled space where they could have access to information, but also access to a lot of help. So this was championed by our BI team, and it's an incredible program that where they will accompany different key figures in the company by with trainings and with support and with office hours to Yeah.

Stephane: 45:52

Help them in their self serve analytics. And the goal was to reduce small tasks for the analysis team. So questions that could have been very easily answers answered or more of that operational side where people were trying to get insights from their own information without requiring joining to another source that we tried to offload back to our clients while very much highlighting the fact that if they were doing an in-depth analysis, they should rely professionally on BI because they are the people that have the experience with the tools and the experience with the wide ranging data.

Aaron: 46:35

Yeah. Yeah. So what I hear a lot is companies that want to be data driven, self-service initiatives, you know, this this kind of nature of things. And I think this is another great example of, you know, the company is a data company. The company understands data.

Aaron: 46:53

The company is pretty well data driven already, but you can still be better in lots of ways, and they can get more access to the data, and they still need some aspect of handholding. Because you might have someone in accounts who joins who's not really familiar with data, has come from somewhere else, lots of great experience in some areas, but less experience looking at the data and the and the pitfalls of of joining in the wrong way or or what have you. So, yeah, there's a there's definite, like, I can see, and I think a lot of people get benefit from, this is what a data company, data driven company might look like, is that you're starting to get users closer to the data, able to do more themselves, but not expecting them to do everything. You know, still serving them through a centralized function. That sounds No.

Stephane: 47:41

Absolutely. And you you're basically offering them a service which is the bigger picture. And, again, that's what I think the data practitioner does. And even if you grow a larger data team where you start having embedded analysts, the analyst job will always be to have that bigger bigger picture in mind and to work with the entirety of the data. But everybody, in the modern world, everybody's a data practitioner in some sense.

Stephane: 48:14

You just have to make sure that you are putting the level of responsibility on them that they are capable of coping with while maintaining their other responsibilities. If they're Exactly. 90% of their time is spent on operational, don't give them a huge analytics workload as well because somewhere, they would just won't have the time, or they may take bad decisions based on the data, which is also not a good outcome.

Aaron: 48:42

So, Stefan, you mentioned that on your day 1, it was just you and, you know, trying to figure out how to how to crack the data problem. And then we got on to, like, the community and how you find other people like us and, you know, solve the problems. Tell us more. Tell us tell us what it was like.

Stephane: 48:57

Absolutely. So when you get started, well, as I said, I got started alone in my own team with a manager who was incredibly supportive, but he had no experience in the modern data stack in the modern data space. So I was on my own. And so where do you find that support? Where do you find help?

Stephane: 49:19

Like, my answer is everywhere. Like, there is no single solution. There is no that is the Reddit thread you have to be following, and you'll get all the answers from there. I I was scrounging for information. So we met on, the Meltano Slack, which was just the Slack of an open source tool that, I still use to this day with It's great.

Aaron: 49:41

It's the the Slack group's great. The tool is great. The support there is great. Like, you know, if you need a need someone else to speak to, like, they probably know how to solve it.

Stephane: 49:51

Get emotional and technological support. But, the Meltano Slack actually bred well, we should be on other Slacks. We should be on the DBT Slack, the airflow Slack. But not only Slacks, Reddit. I went on the data engineering Reddit a lot, asking questions, read a lot of books, went a lot to meetup events, in person.

Stephane: 50:15

We actually were so inspired by meetup events that we actually started our own meetup events, so we're now in charge of the PyData Montreal and the MLOps Montreal meetups. We have one every 2 months.

Aaron: 50:27

That's cool. That's really cool.

Stephane: 50:28

Meet new people, but, just scrounging for information anywhere you can. And the one piece of advice I would give to anybody that is just lost, as lost as I was and still am to this day, in the data space is ask questions. A lot of people are scared of asking questions on the DBT Slack or anywhere they feel like they'll be judged. They feel like their question is stupid. And I feel, very strongly about this because everybody else within that space was exactly where

Aaron: 51:08

Exactly.

Stephane: 51:09

You are now, which is you have no idea what is going on. And the beauty about it is there's so many tools out there, and there's so many methodologies that you are constantly put back in that position of having

Aaron: 51:23

no Thursday.

Stephane: 51:24

And everybody has been in that position, which makes, in my mind, everybody incredibly generous

Aaron: 51:32

with their knowledge. Find that. I think the community culture is very strong. No. No.

Aaron: 51:38

I was gonna say at the moment, but I I do feel that that, you know, you can ask questions. You don't get, you know, called out for being, you know, stupid or not knowing, because everyone like I said, everyone was there. Everyone didn't know one time. And, maybe maybe a little bit more so in the open source world, in the kinda new tools world because, well, basically, everyone remembers that because it was just the other day. You know?

Aaron: 52:03

It's it's it's all new. That's that's cool.

Stephane: 52:06

Absolutely. I can leave you on one funny anecdote of, like, when I was starting off, I was actually very keen on finding information. So much so that I actually, booked meetings with all the major, providers Oh. Of data. So I did one with DBT Cloud.

Stephane: 52:27

I booked one with GCP. I booked one with Snowflake. I booked one with Starbursts, like all these different companies. And we would have a conversation where the understanding was they would try to sell a product to me because we had no data stack. So they were trying to sell a product so that we could integrate it in our stack.

Stephane: 52:48

And in exchange, I was trying to scrounge any piece of information I could from them about how to get data

Aaron: 52:55

They trade off if you ask me.

Stephane: 52:57

Exactly. And then at the end of the call, I'd be like, okay. We're not ready because we obviously have no idea what we're doing, but I appreciate this exchange, and we'll keep in touch.

Aaron: 53:08

I love it. You know, I suppose yeah. I as a as a data vendor, I feel very different because, like, it it to me, the absolute best thing in the world, I, you know, we're a startup still. It's still early days. You know, a good product.

Aaron: 53:26

But if someone tells you what is missing, that's just the best thing ever. You know, that's just like, you know, that is gold. So, you know, I think that's probably the sad thing about much more mature companies is that they take that view that they're just there to sell. And that's just that's just not that's just the wrong views is wasting opportunity.

Stephane: 53:47

Creating connections and getting insights, and the sale will come when the sale will come.

Aaron: 53:52

Absolutely. Ah, brilliant. Yeah. Really, really cool. Well, that that's a that's a fantastic point, Stefan, and probably a great one to leave it on.

Aaron: 54:01

So thank you very much for coming on and and and sharing your war stories, sharing your experience, and and and telling us a bit more what the world looks like from from your point of view. I really appreciate it.

Stephane: 54:12

My pleasure. Thank you for having me.

S1E3 - Building trusted analytics at Potloc with Stéphane Burwash

S1E3 - Building trusted analytics at Potloc with Stéphane BurwashS1E3 - Building trusted analytics at Potloc with Stéphane Burwash

More episodes

S1E3 - Building trusted analytics at Potloc with Stéphane Burwash

S1E3 - Building trusted analytics at Potloc with Stéphane Burwash

Chapters

What is Data Matas?