The Data Engineering Show

Barr Moses, CEO of Monte Carlo explains the difference between data quality and data observability, and how to make sure your data is accurate in a world where so many different teams are accessing it.

Show Notes

What is The Data Engineering Show?

The Data Engineering Show is a podcast for data engineering and BI practitioners to go beyond theory. Learn from the biggest influencers in tech about their practical day-to-day data challenges and solutions in a casual and fun setting.

SEASON 1 DATA BROS
Eldad and Boaz Farkash shared the same stuffed toys growing up as well as a big passion for data. After founding Sisense and building it to become a high-growth analytics unicorn, they moved on to their next venture, Firebolt, a leading high-performance cloud data warehouse.

SEASON 2 DATA BROS
In season 2 Eldad adopted a brilliant new little brother, and with their shared love for query processing, the connection was immediate. After excelling in his MS, Computer Science degree, Benjamin Wagner joined Firebolt to lead its query processing team and is a rising star in the data space.

For inquiries contact tamar@firebolt.io

Benjamin: Welcome everyone to another episode of the Data Engineering Show. Today, we are joined by, Barr Moses, who is the Co-Founder and CEO of Monte Carlo. For everyone who has not heard about Monte Carlo before, basically it's a data observability company or a data observability platform and I'm sure Barr will go into much more detail. Some maybe high-level facts about Barr, about Monte Carlo. They have recently raised $135 million series D in May of last year, 2022. Barr is the Co-Founder. Before that, kind of spent time at a bunch of different places and I'm sure she'll tell us more about that in just a minute.

Eldad, of course, thanks for joining as well, as always good to have you.

Eldad: Thanks for having me, as always. Perfect introduction. You're getting better with each episode.

Benjamin: It's amazing All the prep work and kind of practicing in front of the mirror paying off.

Barr: Can I get compliments too on my progress throughout podcast? Is that possible?

Eldad: You're amazing.

Barr Moses: Thank you so much.

Benjamin: Our sample size is one, so I'm not sure if we can. Perfect!

Barr, do you want to basically give us the high-level story of what Monte Carlo does? What data observability is all about?

Eldad: Let's start with yourself as well.

Barr: Yeah, for sure. I'm happy to. It's a big question.

My name is Barr. I started Monte Carlo three and a half years ago. Monte Carlo's mission is to help organizations actually use data by reducing what we call “data downtime.” We've coined the term data downtime. In brief, data downtime is periods of time when your data is wrong, erroneous, inaccurate, or just unusable or untrustable for some reason.

That is a reality that data teams everywhere experience. So, we're fortunate to work with hundreds of customers ranging from folks Vimeo, Drata, Gusto, CNN, New York Times, Roche, and many others. All of whom have data teams who want to deliver trusted data and that's really hard to do if you're not thinking about data observability.

Take a step back and thinking about my background, I was born and raised in Israel. Started my career in the Israeli Air Force, and throughout my career, worked with data in different formats. It was actually sort of present in what I would call this acceleration of data.

I guess a decade ago, we said we were data-driven, no one was really using data. We maybe collected a little bit data and thought we were cool and moved on with our lives.

Then, maybe I want to say 3-5 years ago, people realize that you can actually make decisions based on data, and in fact, your decisions might be better if you use data. I think that was a big change for companies. I think we're still in the period of time when we're trying to figure out how to do that. We haven't figured out how to do it quite yet.

One of the most interesting trends, I think that's the backdrop to that, is what people call today data products. By that, I basically mean people using data, whether it's for internal reports, for data teams to make decisions based on data, or actually customers who are using data. Maybe that can be a dashboard that your customers use or your website. There's lots of examples for how customers actually use data and once we start putting that data out there, it'd better be accurate.

My personal story - Prior to Monte Carlo, I was at a company called Gainsight. Fortunate to be part of the category creation story at Gainsight. At Gainsight, we helped organizations basically reduce churn, and increase renewal rates and upsell. Basically, increase customer happiness based on data. I was responsible for the team that was using data, both to make decisions internally but also to surface with our customers. And the problem was that data was wrong all the time, literally just all the time.

Fast forward to today, there are some examples of where data being wrong is really painful. One of the most examples from that time, 2016, was that Netflix was down for 45 minutes because of duplicate data. Netflix was down for 45 minutes is really long time.

Eldad: I remember that moment.

Barr: Do you remember? Where were you in that moment?

Eldad: I was too young. I couldn't enter Netflix.

Benjamin: And also where are you like now?

Eldad: You mention Benjamin?

Barr: Eldad, you actually flew out of the picture at some point. I didn't know if…

Eldad: Oh, really?

Benjamin: You became invisible.

Barr: You literally flew out. You exited on the left. It was very dramatic. There you go. Okay, now you're back.

Benajamin: Welcome back.

Barr: Oh, now you flew out again.

Eldad: Okay, I'm going to fix it in a second. Give me a moment.

Barr: No problem.

Eldad: Please go on.

Barr: I thought it was by design. I was like, “oh, that's a cool feature.”

Eldad: Okay, we have no electricity or equipment doesn't work. That's the reason.

Barr: Yeah, no problem. Where were you?

Benjamin: Going, right? Netflix being down for you 45 minutes.

Barr: Yeah. Everybody should remember that day. but guess what? They were down because of duplicate data. How crazy is that? In this world where actually data being down is the main culprit or even worse than applications being down and in that world, you need to make sure that your data's accurate, otherwise your applications and infrastructure are going to be down.

And that is a big change that has happened over my career in the last decade or so. I remember this was in 2016, I was responsible for a data team that was living data. And as I mentioned, the data was wrong all the time and I tried to fix it and I was like, this is so freaking hard. I remember going into a room with a whiteboard and trying to draw things and I'm not an engineer, and I'm like, this is so terrible. Why is this so freaking hard? And I remember asking our customers too, and they were like, “yeah, there's just no way to do this. We just have 6 eyes on every report.” And I was like, really? We need 3 or 4 different people to read every single report. That's where we were at.

Eldad: That's a BI mindset.

Barr: Yeah, exactly. Is that normal? At some point, you don't know if you're crazy or the world is crazier or both are like, what's actually happening?

Basically, that actually inspired me and my team to try to build something. Me, and the person on my team, his name is Will Robbins. He's actually at Monte Carlo today, leading work on product and customer success. So, it's pretty cool to continue that path.

The bottom line is we hacked something together which was pretty crap, to be honest, but it worked well enough and improved like nothing that we had today. And then we implemented it with some of our customers and it worked well for them too. And I was like, Hey, why don't we get someone who's is actually an engineer to build this and let's see what happens. Can we build something?”

And then when we look at our software counterparts, they have solutions like New Relic and AppDynamics back in the day, and then Datadog now, and you have to be really irrational to build an engineering team without some observability platform or some observability measure. Why are data teams releasing data out in the wild without, something like observability or ways to make sure that data is accurate?

That basically kind of inspired starting Monte Carlo, spoke with hundreds of data teams, asking them what's keeping you up at night? And kept coming back to this problem that folks had, literally people sweating on Monday morning because they're going to be sharing a report and they're not sure if the data is right. I don't know if you have that, but you can hear your heart beating and you're like, “oh man, is this going to be right? I don't know?” Who's going to call out?

Eldad: We talked to the same people. They couldn't sleep at night. We took the data warehouse part.

Barr: Okay!

Eldad: You're saying so many things and resonate so well. Data was serving consensus and BI and it was all about having multiple versions of the truth and we couldn't get out of it. If you look at companies over the last four or five years, they have actually started to sell their data, utilize their data, build products on top of their data, and we're moving from having three people looking at a dashboard in the best case to having tens of thousands and hundreds of thousands of people, and you can't apply the same mindset anymore.

Also, you compare the Monte Carlo to Datadog and others. I don't think so. I think whenever I hear Monte Carlo, first, it's on a whole different ball game, so it's kind of less observability, more really it's all about data observability, data quality. Data quality needs to come first before it gets out to the customer. Observability goes after. So, it serves different needs. This is why I love Monte Carlo because to me, this is the first company that looks at data, looks how users use data, and rethinks observability from scratch. And it's also nice that you brought real engineer eventually to build something serious out of it, which is amazing, and we hear you all the time.

So, tell us, you quit the second and why you realize that nobody understands you when you drew that on the whiteboard, including the customers. How did that feel?

Barr: I’m not sure was that dramatic, but I think the experience of working with customers and the data being wrong all the time was terrible, really bad. I think that was for me personally. I would get emails from people, like WTF, what the fuck? Why is the data wrong?

As a data professional, you're like, “I had one job, I was just get the data right.” When you can't get that right, I think that's very frustrating. I think when talking to other people and they're like, “yeah, that's kind of like what happens?” We sort of came to accept that as sort of normal. Like yeah, it is wrong and that's okay. I remember living in that period and I was like, “there's just no way that I can accept that.” And I don't think that makes sense. We have to get over that.

The days where it's fine that the data's wrong, it's just not cool anymore. Because data is used, for example, I’m reporting numbers to the street. So companies that almost reported wrong numbers to the street. That happens or companies that actually lose millions of dollars because the data is wrong. I'll give you an example. The most notable examples for the last few months. Unity is a gaming company, that basically made one mistake with their ad data and that one mistake cost them a hundred million dollars. One mistake! That's like a big freaking deal. It's not one mistake over a long period of time, multiple issues. One issue - a hundred million dollars. That's crazy! And that was made public. Think about all the issues that are not made public. They're way worse.

Then to give you an example that it's more on the social side. Equifax for folks who aren’t familiar is a credit score company. So, basically a assigns credit scores to users and allows those users to take loans, take mortgages basically to live their lives. And it was made public that Equifax issued millions of wrong credit scores to users based on wrong data. That means millions of users that have the wrong credit scores today and literally…

Eldad: Inflation.

Barr: Yeah. They can blame inflation exactly or whatever it is. You think about the impact of bad data has on personal life. Even one person that's impacted in that way is terrible. Think about millions of people who have that is horrible.

I think there's like these trends, one is we're using data more and more. To your point, it's not just three people in some team looking at data once a year, maybe once a quarter, it's actually millions of people using the data, and actually, the stakes are way higher. They're not just looking at the dashboard. They're actually making decisions based on it, or they're trying to get a mortgage based on it. In all those instances, the data being wrong is just way, way worse.

I think there's just this point that we've crossed in the last few years. We can't look back. We can't unsee what we've seen with data now.

Eldad: We went all in on data and it's too late to go back.

Barr: Exactly.

Eldad: 20 years? From a product perspective, data is moving around so much, and there are so many hands being involved, apps, scripts, processes, steps, and in each step, data gets changed, gets expanded, and enriched. Where do you fit in, and if I'm a data warehouse or an engineer, can I use you as a data source to make sure that my data is kind of flow, no matter where it's coming from, I want to use you as my formal data source. Does that connect to my jdbc driver? How does it work from a product integration perspective?

Barr: Yeah, for sure. A few things, I would say, first of all, just to use this as an opportunity to explain the difference between data quality and data observability. I think a few years ago when data teams were thinking about data quality, they were really pulling a report from SAP or something where they had all the data in one place, you just had to dump the data once and then you would use it once a quarter or something like that.

In those instances, making sure that I think the concept of “garbage in garbage out” was really popularized at the time and that made it really important. I think that was the rise of data quality. During that time, it was very important for data teams or for analysts to make sure that the data's accurate, single point of time and that's it!

But the world in which we're today, which is a pretty cool world by the way, in terms of how we use data, is one in which there are so many people using data. So there are engineering teams, upstream, influencing, schema changes making adaptations or changes to code that have downstream implications. There are data engineers. There are data scientists, analytics engineers, and machine learning engineers. The list is long and each of these people actually want to use data and each of them needs to be able to use data.

Lots of things that are kind of a common trend or common thing that customers ask us is, how do I democratize data health for data quality? How do I make sure that the ability to know that data is accurate is not just for one person on the data team, but for everyone who's working with data? I think that has to do with your question around where does data live, and the thing is because there are so many people working with data, data also lives in different places. In order to make sure that your data is actually accurate and trusted, it's no longer sufficient that you're just looking at data in one place. You actually need to make sure that the data is accurate wherever it is. That may be your data lake, your data warehouse. You want to use some of your eTL orchestration solutions to help make sure that the changes that you're making there are not influencing data quality as well as your BI. At any given point in time, you have to be aware of the changes made to data and making sure that data is trusted, remains accurate and trusted.

I'll tell you a little bit about the history and evolution from my perspective. Before starting Monte Carlo, when I personally experienced this, noticed that all of my customers experiences, decided to leave Monte Carlo. I decided to start a company and I in fact started 3 companies in parallel. So, worked on two different ideas that are totally different, to kind of see what kind of pull looks.

In those early days, in order to test this idea of data being wrong, I actually reached out to lots of people and asked them does this ever happen to you? It was really hard to explain what does happen and I was looking recently at some of the wording that I used. It was like “reports breaking” or “dashboard is wrong” and that reality has not changed. Those words still hit a cord with data teams. You still get dashboards breaking. You still have reports going wrong. The thing is it's not just because of one source, it's because we now have data coming from your data warehouse, from your data lake, going through so many different transformations, it's actually hard to trace why the report broke or why the dashboard is wrong.

But that core problem actually hasn't changed, and I think we can trace that back to years ago when people are starting to use data and started to ask themselves, can I actually trust this data?

Eldad: God bless, copy and paste. The source of all evil!

Barr: Exactly. That's right. Multiple sources everywhere.

Benjamin: Awesome! Can you give some concrete examples, say I integrate with Monte Carlo today? What are the actual types of insights you can provide as a product?

Barr: Yeah, totally. I'm trying not to pitch Monte Carlo here. But basically, 'll give specific…

Eldad: We're getting great, existing and future product features from Monte Carlo. So, if there are startups out there trying to compete with Monte Carlo, now would be a time to listen carefully.

Barr: Great sound bite. Not going to happen. Just kidding!

Here's how to think about what data observability actually means and I think we touched on a couple of those already in very tactical sentences.

One is it has to have coverage end to end. By that we mean wherever your data is, it has to cover it. For some data teams that might mean, Firebolt or other competitors, not to be named, but others like AWS, Snowflake, Databricks, different types of places to store your data, aggregate, and analyze it, as well as your orchestrator. So you might have dbt as well, just use that an example and let's choose a BI. Let's say you have Looker. For many modern data teams, be a modern data stack. Actually having a solution that can cover all of those three categories is very important, data warehouse, data lake, ETL orchestration and BI. Again, because data is each of those things, and so actually having integrations that work which each of these is very important for your observability platform, whether that be Monte Carlo or not.

Second important thing is what does it actually do? Or how easy is it to get started with?

Oftentimes in data quality solutions in the past, everything was manual, so you had to manually specify the thresholds that you want for your rules, and you had to manually specify what lineage looks like. That doesn't cut it anymore. Not even close. Solutions need to be automated, and so within 24 hours, you need to have a view of your lineage, both upstream and downstream, both table and field level and you need to have a baseline for what healthy tables look like.

For example, if you have particular distribution of a field, let's say there is some specific null rate. It is possible to learn automatically what is the acceptable null rate, and let users know if that's being violated, without actually any end user specifying that as an example. Having this kind of aspect within 24 to 48 hours, getting started with those sort of machine learning out-of-the-box elements is super important.

Third thing that I would say will be called, the “5 Pillars of Data observability.”

Those are:
Freshness.
Volume.
Schema.
Data Quality, and
Lineage.

I touched on each of them, but just to kind of explain the history and why they've come together. I mentioned we spoke to hundreds of data leaders before we even started the company. By now, we've spoken to thousands of them. We've worked with thousands of data engineering users and hundreds of data teams, and actually ask them, what are the main reasons for why data goes wrong? And what does it look like when you're trying to troubleshoot it and resolve it?

In reality, there's a lot more commonality than people think. I think one of the main objections that folks have for observability is that they think that everyone is a Snowflake and your data is different. And yeah, that's true. You are a Snowflake. You're very special and your data's different, or actually, there are some patterns along with other folks, in how they use data. And so there is a certain amount of help that automation can introduce.

By codifying these 5 pillars, we are able to capture what the best data teams look for when automating observability.

The first round, “freshness,” pretty straightforward, but basical is your data up to date? is easiest way to summarize that.

The second is “volume,” which is basically like, is the volume of the data that you have in line with historical patterns are what you'd expect it to be. So, literally in terms of the size of the file or number of rows.

The third is “schema changes.” Schema changes is a big deal for folks who know. It's also kind of tied to different trend called data contracts. We might touch on that later. Schema changes cause problems. It is the bane of existence for many data teams and so actually tracking schema changes in an automatic way is a huge help. Fun fact - It's one of the very first thing that we did in the product because it was so impactful on data teams to know what are some of the Schema changes that folks have.

The fourth is “data quality,” which is basically you can think kind of from the worlds of profiling and actually making sure that the data itself at the field level is accurate. So, values that you expect. I talked about null rates, unique IDs, etc.

Then the fifth, which is “Lineage,” which kind of brings it all together. Talked a little bit about lineage, but power of lineage is that when data goes wrong, the first thing that people ask themself is, where and who cares about this?

We talked about everybody hoarding data and using a lot of data. If you're dumping data and nobody cares about, no one's using, maybe you like only Eldad is looking at it at 6:00 AM, that's it! No one other than Eldad. So, maybe it doesn't matter, maybe it doesn't have to be accurate. But if Ben looking at the data or if Bard's looking at the data, then you really want to make sure the data is accurate, or maybe your customers are using it. And so being able to answer that question, Hey, is there anyone downstream who is using this data? Who should care about that? And if so, this should be a high priority. And maybe you can start thinking about actually announcing automatic severity to your data incidents and saying, Hey, there's specific conditions under which this data needs to be, there's a higher standard for it to be accurate and trusted.

Bringing it all together, I think that's sort of what makes a strong data observability platform, and we see amazing data stories of customers. I just use JetBlue for an example, for folks who've been flying a lot recently for the holidays and with the snowstorm and everything, there's a lot going on. For a company JetBlue, they manage tons of data. So, they use data both to drive their operations, whether it is like flight time or where's your luggage and what's your connecting flight? but also to manage their support and so the data team at JetBlue is actually like a great story of a team that's very, very thoughtful about making sure that data's accurate, so your flight is on time, so you get your luggage, etc.

Working with the JetBlue data team has been really cool. They, for example, manage 100% status rate incident every single week. Where literally every single week they go through each and every incident in Monte Carlo in particular and then triage that to make sure that the data quality issue is resolved and we're actually able to reduce significantly the number of data downtime incidents as a result.

Benjamin: Awesome! especially looking at data lineage, how does, especially in big companies, the adoption of a data observability tool look? Is it you need buy-in from everyone and then you need to integrate it across your entire stack because that's how you get the most value? Or is it like here's a single data team in your big company and you can integrate it with your local partner already get value?

Barr: Yeah. Every big journey starts with small steps towards that big journey. For sure I'm definitely in the camp of like, starts small and grow from there. Mostly because in any organization you want to show value, and have a win story really quickly. I think that that's true for anyone, not just observability, but I think in data in particular, if you put yourself in the shoes of the data teams in large organizations, they're in a tricky spot and I'll explain why. They invested a ton in the last few years in the best data infrastructure. World-class data warehouse, data lake, world-class infrastructure and also they invested a ton in hiring lots of people to use that data. We've seen the rise of data scientists. All of those roles have been growing a ton in the last few years.

Here's a tricky spot! Now, you need to deliver. You need to show that you can use the data, and that's new reality, which I don't think many organizations have proved. You're on the hook to see the ROI of all that investment. Oftentimes, the ROI doesn't exist because people don't trust the data or can't use it.

I would say in order to avoid being in that situation, starting to think early as you're rolling out your data infrastructure, how do I actually make sure that people also trust the data is really important. And for larger organizations, yes, I do think that starts with identifying a particular use case or a particular team. For example, it can be a team that’s supporting financial data in particular, that's very important. If you're reporting on revenue growth or customer growth, you want to make sure that there's no questions about that data. Maybe starting with that. Sometimes there's a marketing team that you might want to start with. If they have hundreds of millions or tens of millions dollars in budget, you want to make sure that that money is deployed, that resources are deployed.

Another kind of examples of companies in healthcare, for example, potentially you have production data and you want to make sure that clinical data is accurate. In all of those instances, starting small, making sure that folks have that technology, but also have the mindset. Think about it, going back Eldad's point from before, maybe the difference is that for an organization like Datadog, the concept of observability and engineering is something that lots of folks have done before, but for data teams, it's a new motion. And so understanding who's responsible for what and what you actually do is a totally new ballgame for most people.

So, actually most of the work that we do is helping organizations think through what is a great data observability practice looks like? And oftentimes that has nothing to do with the technology. It's more around the practices and the types of culture that we have in the data team.

Eldad: Love it.

Benjamin: Definitely! Take us through the action part. I'm a data engineer now I get kind of my alert in my Slack channel. I don't know how it works. Kind of this upstream data pipeline is looking weird right now. What actually happens? Is it about spotting failure early to make sure you can kind of keep surface area small? What's usually the immediate action you take when you see these types of alerts?

Barr: Yeah, great question.

Yes, oftentimes, teams actually love getting that information to them directly and so that might be in Slack or Teams, or even via email if you'd like. Large majority, I think use Slack today. You get an alert in Slack, and that could be something, “Hey, this table that gets updated typically every hour, stopped updating for the last day and hasn't gotten an update for the last, let's say seven hours.”

There are a couple of things that you can do. First of all, is an automatically assigned severity, maybe you see that it's a high sever severity might be a situation where you drop everything and take a look at it immediate. On the contrary, maybe this is a data set that you know is not being used and not looked at, then maybe you can ignore it for now.
Maybe you can snooze it actually and snooze it until tomorrow because today you're working on a production issue and you need to come back to it later. So assuming you didn't snooze it, you determine that this is actually high severity based on the assigned severity, and the impact of assets that it looks at. What you can take a look at is look at impact radius and start seeing who are the folks who are impacted by this type of issue.

For this particular table, there are three reports in your BI that are actually using this and the particular team that using that as a marketing team, maybe what you could do is you could tag the person on that team and say, “Hey there's an issue here, FYI, I'm investigating.”

Then what you could start doing is click into that alert and start investigating and say, “what else is happening around that time? What else is happening to that table that might give me clues as to why this issue is happening?” And then maybe you can see, okay, well the job is running, but actually, no data is arriving, so maybe there's a problem with the low that's feeding data, just as an example, or maybe someone upstream, you noticed made a scheme of change and that scheme of change made it such that the implications are that now data's not arriving anymore. Maybe they change the field type and that messed up the scheduling of the data into that particular table.

On the other hand, maybe when you look at the downstream implications, you recognize that you need to reload all the data to fix that and to make the report accurate now. There's actually a lot of work that might be, involved in both understanding why this issue has happened and also making sure that the data is sort of back to normal or back to its schedule.

It's a combination of looking at both the data itself, looking at metadata and oftentimes looking at the code that's driving all of this to make sure that the data pipeline itself is healthy. So, you might be moving between those different systems and where you're at, and Monte Carlo in particular, we try to bring all of those different feeds into one place. So you actually get both your alerts from Monte Carlo as well as your alerts from dbt in one place, for example. If you're using dbt in this example, you might be not notified about that same issue in the same place.

Bringing all of that together, maybe upon investigation, you realized you understand the root cause of the problem. You understand how to resolve it and now the problem is that you're not responsible for the table itself. So, you can't actually fix it. So, you need to find who is the person who owns this. You can see that information as well and then ping them and work with that person to fix the issue.

At a high level, there are three different things that we're trying to help users do.

One is to know about the problems almost in real time. Just to give you insight into this, most teams are not even aware of these issues and so getting that alert in Slack is a big deal.

The second thing that we help with is resolving these faster. Oftentimes, data teams won't know all this information about, like schema changes or freshness or volume, all the examples that I mentioned, and it would be harder for them to pinpoint what exactly is happening and why?

The third thing is by increasing the communication collaboration on this, you're making it easier for everyone who's touching data to know about data downtime issues and eventually actually preventing these from happening.

Eldad: Would you say that in a few years from now, we will use less classic black box mindset observability plus Jira and more of new context-driven, lineage-driven observability, which listening to you, it's so different. It's so, so different from looking at CPU or disc utilization while someone altered the table but that table affects a hundred Looker dashboard and that's not important. What's important is one of those dashboards serves a customer. There's no way to know that without spending three weeks involving six people, at least using tons of tech and companies try to do it on their own. They try to stitch it. They try to build it, and it's really sad.

But listening to you makes me happy because it is a real thing, and as engineering organizations are trying to become data organizations, I think they have a lot to learn from that. From data engineering mindset where it's taking very seriously. So when someone is modifying something for those scheme, that's like code, that's like a product release. And having data observability that has the schema, that has the data warehouse, it's aware, it knows what an information schema is. It's not like just raw JSON pushed into some time series database, which is meaningless for most people. I love it! It's fascinating!

Barr: Yeah, I think that's spot on. I think as a data industry, if we're successful, it's because we adopted what we need from engineering practices and built that into data engineering. It's going to be impossible for us to actually really, truly become data driven if we don't do that. So, yes, and I think our approach is that starting with the observability is the right place because it's the most imminent problem for engineers.

If you look at data engineering today, the most imminent problem that they have is, “hey, the data is wrong all the time.” That's the number one thing that they have as an issue and as a result, data consumers can't trust the data and can't use the data.

I think there's a lot more that we can learn from engineering and that we should over time bring in a lot of those concepts into data, but starting with observability is what we think is the thing that will bring the most value to customers today.

Eldad: Nice.

Barr: And I'm curious to hear more about how you all think about it at Firebolt, but so feel free to let me know if you all are upright.

Eldad: We feel the pain and need to make sure that data quality is right. We've seen some really, scary stories. You've mentioned some of them, but data is so actionable today that customers use data-driven products, their business run on those products. If I'm subscribing to a sales optimization product and that sales optimization product has given me a recommendation how to do my ad spending, and I'm going and just doing this ad spend based on that recommendation, and it's wrong. That affects my business, and it it's real. So, it's much more serious than just looking at two different types of sales numbers, one coming from Salesforce and one from Excel, which is also important. We've been dealing with that for 20 years. But the beauty is that data is really driving the business now versus just being used to understand the business.

So, having data quality should be kind of trivial in core stack of any data driven team. They are listening to you and looking at kind of what you're doing there. We really want and wish Monte Carlo to succeed big time. And having said that maybe we can wrap with that.

If you can share some insights, some recommendations, something with our fellow startups, with our fellow users, how do you see this year coming? What would you recommend for startups going forward? Anything valuable would be highly appreciated.

Barr: I can share only non-valuable advice.

Eldad: Perfect!

Barr: Actually, on that note, I will say, for startups or listeners, I will start by saying that you should not listen to advice. That's my biggest takeaway. I think for any question that you have, both in data in startups and otherwise, you will always get, 50% of people will tell you something and 50% of people will tell you the otherwise, the opposite. So, really I think the answer actually lies in the data. There's no one else but you to take a look at the data and listen to your customers and see what it tells you and actually draw conclusions based on that. Folks who are asking themselves like what does this mean for me and for my team and for the industry, you have the data, you are closest to the customer.
Pick up the phone, talk to the customer, ask them what they think. That is the data that will help you get to the answer, whatever that may be. That's my take on advice.

With that in mind, looking ahead to this year, I think this is the year that's sort of a natural progression of the importance of data teams. I think obviously, like it's no secret, that the market has changed difficult times for lots of folks across the economy and yet I continue to see data teams growing stronger and stronger in organizations. And that means companies doubling down on their data strategy, building more and more data products, and investing in data teams because recognizing that data is a foundation to making strong business decisions, gaining competitive advantages, and generally furthering business outcomes for their customers. And so I think when we zoom out and think about where are data teams today? I think they're on the hook to deliver ROI from all of their investments in the last few years and this is the time when folks are going to be more scrutinized. Yes, I do think that folks will be asking themselves, is there actually ROI on this? And it's a great time for data teams to be able to articulate that and say, yes, we have awesome data, we have best in class infrastructure, and we can also trust the data so we can actually use it to your point, to drive the business. And I think that's an inflection point that I'm really excited for data teams to bring. I hope 2023 will be the year that we see that. I don't think that we can see it soon enough, but I do think it's an inflection point that the data industry needs to drive. And I think there's no better time than this year to do this because data teams are more important than ever. Data is more important than ever and time, the onus on us to prove that is today. So, I'm excited to see us do that.

Eldad: Boom!

Benjamin: Awesome! That were amazing closing remarks. Barr, thank you so much for your time. We really appreciate it. Have a great rest of the day and see you next time on the Data Engineering Show.

Barr: Thanks, great podcast

Eldad: Thank you everyone, looking forward.

Benjamin: Thanks.

More episodes

Chapters

Show Notes

What is The Data Engineering Show?