The Data Engineering Show

Archana Ganapathi, Head of Data & Analytics Engineering at Eventbrite, shares Eventbrite’s data stack modernization process, and how you get engineers to adopt new technologies like dbt which may be outside their comfort zone.

Show Notes

Archana shares Eventbrite’s data stack modernization process, and how you get engineers to adopt new technologies like dbt which may be outside their comfort zone.  

What is The Data Engineering Show?

The Data Engineering Show is a podcast for data engineering and BI practitioners to go beyond theory. Learn from the biggest influencers in tech about their practical day-to-day data challenges and solutions in a casual and fun setting.

SEASON 1 DATA BROS
Eldad and Boaz Farkash shared the same stuffed toys growing up as well as a big passion for data. After founding Sisense and building it to become a high-growth analytics unicorn, they moved on to their next venture, Firebolt, a leading high-performance cloud data warehouse.

SEASON 2 DATA BROS
In season 2 Eldad adopted a brilliant new little brother, and with their shared love for query processing, the connection was immediate. After excelling in his MS, Computer Science degree, Benjamin Wagner joined Firebolt to lead its query processing team and is a rising star in the data space.

For inquiries contact tamar@firebolt.io

Archana Ganapathi
Head of Data & Analytics Engineering at Eventbrite

Boaz: A little bit about Archana. Archana is Head of Data Analytics and Data Engineering, Data Science at Eventbrite. She has been there for almost a year, spent a lot of time at Splunk before, knows data in and out, and she is a little bit upset with us for not using Eventbrite for this event. We cannot please everyone.
Thanks for keeping your smile.

Archana: Will you promise me that next time around, you are going to use Eventbrite.

Boaz: I do not know, talk to marketing about that.

Eldad: Disqualified us for being a startup.

Boaz: Archana, tell us about what you came to do at the Eventbrite?

Archana: I joined the Briteland as we call it about 10 months ago to lead all things, data, essentially what this means is traditionally we have data engineering, the data science sitting somewhere else, and analyst sprinkled all across the board in functional business units. Eventbrite came to the realization, that we need to bring all of these different roles and functions under one big umbrella. So, we can think through this end-to-end and share context as much as possible and solve data problems and look for data opportunities more holistically. We can leverage the scale of all the rich treasure troves that we are sitting on.

Boaz: This is why Archana has one of the best of the things, titles, which is Head of Data Analytics/Engineering/Science at Eventbrite. We need to come up with a new name that can umbrella.

Archana: It is all data.

Boaz: It is all data. So, maybe before we go back to Eventbrite, you spend part of the time at Splunk. What did you learn in that career at Splunk, doing a variety of data positions, that how you feel, lead you to this phase of being ready to take on this new challenge?

Archana: That is a very good question. I think if I reflect back, the journey started much before I even joined Splunk. During undergrad and grad school at Berkeley, basically, a lot of the research I was engaged in was constantly centered around data. This was before big data was a thing or data science was even kind of a formally recognized profession, so to speak. Really, it came down to how we take advantage of traditional computer science systems, research, and technology out there to allow us to scale up insights and get people value from everything that we are accumulating, all shapes and sizes and forms of data. And then, scaleup, compute scale-up storage and make it just easier to drive consumption. That was kind of the backstory, straight out of grad school, I thought if I do not join a startup now, I will probably never go back to it. So, I joined this late-stage start that was then around 200 people at Splunk. I was part of the core engineering team, building some of the platform capabilities to leverage the data that was being ingested, stored, and queried in Splunk. A few years then, I realized, that maybe we are not solving the biggest pain points for our customers. So, I moved to the field to understand where the true pain points are for customers that were trying to use Splunk at that time. And a lot of it came down to people and process gaps, but also, some nice to have again, to drive self-service insights from the platform. Long story, short, soon after we realized we should be drinking our own champagne, just like Slack uses Slack a lot, back in the day. Why should not we use Splunk for internal insights, and I built up the data and insights team, advocated for heavy investments to instrument our product, collect even more data, and then triangulate it with instrumenting our processes to enrich the context there to drive business value. After a good chunk of time, 11 years later, it was time for a change. Then pandemic was a forcing function to want and create that next chapter.

Eldad: Oh! it was a great excuse.

Archana: Yes, it was a great excuse. We are also just partly reflecting on where are the gaps in my own journey here? I thought really thinking about the true scale of impact from data, I need exposure to something that is a bit more consumer rather than enterprisey, and hence, Eventbrite and the mission here is to connect the world together through live experiences, and what better north star for really leaning in on data to bring that to reality.

Boaz: I know even at Eventbrite there is a modernization process now having. Tell us a little about what is in place, what you got in, and what is data stack or environments look like.

Archana: Yeah, absolutely. A bit of history here. When Eventbrite started primarily thinking about ticketing. The data stack as well was designed primarily for ticketing and transaction management and reconciling our books and, those kinds of use cases. Over time, as the demand for insights and analytics grew, there was a lot of duct tape that was fit on top. Early stages, everything threw it all into my SQL database. And then slowly you see, okay, maybe we need some nicer pipelines here and there. So, Spark on EMR was primarily used for compute and the query engine is Presto, storage all our data, moving around S3, HDFS, a lot of the metadata over time is sat on Hive. Then, Tableau was used for dashboarding, and Luigi for orchestration. A lot of technology was right at the time, decisions that were right at the time but if we really think about future-proofing, the infrastructure, this is not the stack that will get us to the future state. So, that is where we are right now, thinking about modernizing.

Boaz: What do you think was your tipping point on that moment of realization, Hey, it is time to reconsider the change.

Eldad: Join. That is as simple as that.

Archana: That is a part of it, but frankly, I think a lot of that, demand also came from bringing data science and analytics under the same umbrella as data engineering, and saying, "Hey, here is what we are trying to do. Here is where the current infrastructure is not meeting our needs." If you think about it, there is reporting, there is ad hoc analytics and then, there is really driving some of these insights from the data back into the product, data power product functionality, whether it is even simple heuristics or fancy machine learning models, requirements change for what needs to happen end-to-end to make that a reality and make it a good experience for our customers as well. I think that was the forcing function and as part of that, we just took a clean slate to say, okay, if we were to design this from the ground up, what do we need to solve for it? Some of the pain points are really like stuck in this chicken and egg loop, our platform is kind of killing over so we cannot support some of these use cases and scenarios, so hold off. But then, in the process of ripping off some duct tape and adding more stuff, you are still stuck in legacy and more things that, snowball and cascade, and any incidents or bug on the product side that impacts data quality, for instance, now becomes the data team's problem, also to stop, undo, redo replay, and it is days of cycle time to get back to clean slate state. I think all of these... it is a domino effect that happened together. And to your point, me joining was a good checkpoint, to say, okay, you know what, let us figure out where we need to go.

Boaz: How do you go about that? The organization needs a big stack of support and then there is a huge journey ahead of us. How do you manage that? How many people are assigned to the new project or are the same people working on the data stack, is there migration plans? Is there a deprecation of all things plans?

Archana: Yeah, that is a very good question. I am under no false misconception that I have all the answers, but I can share what I have done. I guess. The first step, really when I joined was to listen. I get a better understanding of what people want to do, where they are getting stuck or where the challenges are, where their blockers are today, and then figure out how much of this is a fundamental infrastructure technology problem. How much of it is a process issue? How much of it is just like knowledge and awareness? Do people even know this is the place to start? And there is a rich set of data and dashboards that they can leverage already and is there an access problem? Really just going through and figuring out where all the current challenges are and then, that just turns into the requirements for what we need to build. Based on that, the first thing we realized was now we need to modernize our data warehouse. That is one decision we need to make. We also need to make it much easier to instrument and collect richer data at the right granularity and this is upstream of data, back to the dev teams to say, "Hey, this is the information that is missing that people need for what they are trying to do with the data." So, almost kind of teasing apart, data producers' requirements and constraints and data consumer's requirements and constraints, and then on my team's plate is how do we build out a platform that simultaneously solves both?

Eldad: Switching from XML to Jason.

Archana: No comments!

Eldad: Long term, long term that is long term.

Boaz: You managed the part of your origin of stories, putting this under one's roof, the various teams. Tell us a bit about that. What happened? I mean, is there still science to data engineering, so how was the change?

Archana: I think the key is really more communication, more shared context, and shared aligned goals, that matters a lot. If you think about it, if incentives do not align, there is really no benefit to solving for something truly end to end. That is the first step, to say, our biggest goal for North Star here internally is to enable all bright links. So everyone internal to Eventbrite, to have access to the data they need to do their day jobs, arm them with the insights they need to run their own part of the business and make sure we are that bridge between the data producers and data consumers. So, getting folks to talk to each other, put themselves in each other's shoes, and empathize with the challenges or kind of what are each function trying to optimize for and building that awareness went a long way because there was a lot of realization that was not happening around. Here is where I am getting stuck. I didn't realize that it is not a platform limitation. I just did not know that this was the best way to do it, encouraging that dialogue.

Boaz: Now that there is more progress that has been made from the time did you start with your stack, what do you know, for sure, once in a while, how are you getting...?

Archana: I think the top of my list is probably Luigi.

Boaz: Does everybody like the name?

Archana: It is a cool name. I must say that it is associated with the Mario brothers and pizza. Who doesn't like pizza?

Boaz: So Luigi is gone, what else?

Archana: I think next would really just be figuring out how we move folks from having to build pipelines in Spark to just up-leveling that. So dbt as much as possible. Some of the bad behaviors, were kind of, workarounds to the old stack where folks are leveraging Presto as a way to kind of shortcut into the pipeline building which is not the right thing to do in the longer term. That is another thing we will be doubling down on dbt is kind of on the radar.

Eldad: How do you do that? How do you like it because you have a team who writes Scala and build some engineers stuff, and then you counter to switch to dbt and it is such a different universe, different skill set, different people? How do you manage that transition between the team so that nobody gets freaked out too early?

Archana: I think I would almost turn that around to say like, I think historically we were trying to force people that were not comfortable doing some of those things, to use those tools that were outside their comfort zone. Now we are just saying we have the foundations to make it easier for you to do the things that you have to do. Just double-clicking on that a bit. We have a data platform, an infrastructure engineering team, and an analytics engineering team. Analytics engineering should just double down, focusing on driving consumption and building those gold layer data large stack, making everything easier downstream. But that was not necessarily where all their time was spent historically just by nature of the tool stack and toolset that we were using. So, I think it is a big welcome to the change and they are eager to modernize and learn the things that will make their lives easier.

Boaz: Also, Archana is under the BI side. The reports and the dashboards will often have to be remade, are people worried about that?

Archana: That is a good question. Well, it is okay for right now, but I know that there are better tools out there that we need to lean into and I know, Apun mentioned, that Looker was one that they use. There are others as well, that are much easier. I almost think that there is not going to be one size fits all. Also, maybe I need to do the same exercise around, like, what is the preferred interface that each of our data consumers has? In some cases they want dashboards, sometimes they just want the handholding, like, this is the thing you should focus on. I will give you that full service, analytics experience to handhold you through insights from the data and how that should drive your decision? The third flavor, I would say, there are tools that people are already using, and they just want data fed into the tools that they are comfortable with. For instance, sales prefer CRM tools that they are already using. So, the more we can push things automatically into the interfaces they are comfortable with, that is going to matter.

Boaz: That is going to be painful.

Archana: Well everything is painful, but everything is also kind of like if we take it apart into a piecemeal thing that we are solving for each pain point, then we will see some things that we can generalize into patterns and solve for and maybe that is bringing in a third-party tool. In other cases, they already have what they need in terms of the interface. Now, it is just how do you route things to the right people in the right way?

Boaz: What is one thing that sort of so far you are super happy with the specific pain that is already solved for, that you are just happy about?

Archana: I think they just love the partnership with the data team and really being able to lean in on, like, tell me more proactively how I should be thinking more data and really working through with end-to-end from. What should I collect, if these are the metrics that I am solving for and optimizing for and bringing in some of that end-to-end thinking? It is almost like being a consultant in many ways. And kind of having more of the analysts, like, channel the inner consultant in the way they approach solving for the internal stakeholders.

Eldad: I miss the days when you just guesstimate the stuff and use intuition, and gut feeling, and now you were freaked out about following through with the wrong KPI. Makes sense. Life is getting more complex when data is...

Boaz: Migration stories, everybody, some people probably think, ah, you know, I would not bother data stack, those shouldn't care about. One thing is for sure if you are on a data stack from the bottom for a while. So migration stores all is more important than we think.

Questions from the crowd.

Audience Speaker 1: Consuming your data at Eventbrite for a long and actually also using it but that is pretty interesting. You have descriptive tags. Do you use a tool like Yokozuna and pipeline into their warehouse? And because you can let people rank the term that maps back to the event in lights more or you let people say I follow this and they get notifications. So you have a feedback loop, not just for your internal consumers, but for your external users and event holders where you can weigh what terms really map to the events people want to sign it for, register for, pay for, and to convert him later. That is the area where you just literally let them follow the tags better. They will forget about the really good tool like this, it is super fast, textbase, and then out of that, you can let get more discovery of earlier events. You can have a higher conversion. They will probably pay for the marketing part.

Boaz: Hire him!!

Archana: Yeah, I am eager to hear more, but I also want to plan, I don't know if you've looked into our new marketing tools suite, but highly encouraged that.

Audience Speaker: I have downloaded a new Eventbrite app, but I was like this Oh it was painful enough to use the old app. So, I didn't look at it.

Archana: Okay, we will chat offline, but certainly something that we are solving and moving on.

Audience Speaker: Actually more useful for me. We will definitely talk about it now.

Boaz: Anybody else?

Audience Speaker 2: Okay. What was the final nail in the coffin for Luigi and why do you think that there is any escape?

Archana: I would not say it is the final nail in the coffin. You still have not done that migration successfully.

Audience Speaker 2: Okay.

Eldad: There are multiple coffins here.

Archana: Yes, multiple coffins. Yes, multiple opportunities to be born again. How about that? It really comes down to just the Daisy chaining of pipelines and the capabilities around automating that better and that is where I think airflow frankly has done leaps and bounds better in that user interface.

Boaz: I have another question based on that, how much do you feel, data engineers in general, prefer working on the data, most interesting. Do you feel the teams that they are excited to do that or it is modernized, or do I know what I am more concerned about that keep doing as I do?

Archana: That's a very good question. I have been doing a lot of thinking about that. I would say it is a 50-50 split. There are folks that are just in their comfort zone with the tools they have been using and I know folks who have gone from one role to another, who just want to follow the stack that they feel happiest about or feel like they are most knowledgeable about. Then, there are other folks who at least right now, the data engineering team at Eventbrite are super pumped to just rip off all things, legacy, and just modernize. It actually makes their lives easier day today. And some of the more modern capabilities around like scaling up and scaling down to zero and the cost benefits of that, the performance benefits of that, and checkpoint rollback recovery, those kinds of capabilities that people have lost sleep over not having. So, that is where a lot of that excitement comes from and push for, yeah, let us do this as quickly as possible.

Boaz: Awesome.