Welcome to Energy 101 with Julie McLelland and Jacob Stiller. Join us on our mission to help raise the world's energy IQ.
0:00 Welcome to Energy 101, where we ask the dumb questions so you don't have to. So we love getting energy professionals in here and really just, um, dumbing down all the topics so we can understand.
0:13 Um, today we have Bobby Neeland. How's it going? Good. How are you doing? Well, thanks. Good. Bobby just recently started his own consulting firm called Jill Don analytics. Um, so today
0:25 we're going to be talking all about data and oil and gas and data and what they have to do with each other and the messiness and all of the fun. Yeah. Problems you solve. There's a lot to, a lot
0:35 to talk about there. Yeah. Yeah. I mean, data is one of those mundane things that is in literally every industry, every walk of life, especially these days, I guess oil and gas would be the
0:47 best example and with your expertise, the best way to, you know, describe how it works in your industry. And you can even start with, maybe not just explaining it generically It's kind of a
0:59 loaded question, but. maybe the life cycle from beginning to end. Like what is it that you do?
1:08 Yeah, no, I mean, I think even just starting off with like what is data? I think most people think of data as numbers in like in a rectangular format, but I think more and more. And especially
1:16 you guys are seeing it with Kli that almost literally everything is data now. And image can be data, a video can be data and all these things. And we can, you know, we can take PDFs now and all
1:27 these things that before really difficult to get information out of. And at the end of the day, I think data is just, it's information that you have about, you know, different things. But I mean,
1:35 my goal is to take any and all of that and distill it down and to a way that, you know, business users can interpret and use to make better decisions. You know, it's kind of the end goal. But I
1:47 mean, it's, and again, I think it's not anything new, but people said like data is the new oil kind of thing. But I think it, you know, it is a really good analogy sense of like you go from
1:57 like this kind of raw layer of data. and then it goes through all these processes that refine it, and then you get different products downstream of it that
2:07 are extremely useful to people. And again, it's not exclusive to the energy industry at all. I mean, it's, I don't know, you're big in the marketing side and like there's tons of people on the
2:17 big, I mean, that's actually big data there. When you start getting like click stream analytics and Google ads and all that stuff, it's like you're getting thousands of points per second or
2:29 whatever it is.
2:31 And again, like that's where being Facebook and Google and all these have been able to make billions or trillions of dollars because they have all the information on everyone in this room and
2:43 everyone out there and everything else. I mean, and now they're able to just monetize that over and over by, you know, they almost probably know more about me than I know about myself at some
2:52 level. Yeah, it's kind of scary. Colin was talking about, building this facial recognition thing, where when someone walks into the office, it'll play their theme song. And he was just talking
3:04 about how easy it is for him to build with that technology. And it's because all of our faces are already mapped. And it's terrifying. Well, I mean, even let's say on what, on Google Photos,
3:15 like I can filter on to me. And then basically it finds all the photos that has my face. And it's crazy to work back. If you have like historical photos, old photos when you were in middle school
3:25 or high school, it actually, I didn't find it. Or my kids now that are growing up, right? It still recognizes that Joey is Joey, even though he was one and now he's five, you know, that all
3:34 those patterns kind of evolve. So crazy. Okay, I have a question. It's kind of top of mind. Is there a point where specifically talking about operators, where their data becomes like obsolete or
3:47 it doesn't really matter anymore? I think of like data like paper files from the 50s
3:52 Do they need to? anything with that? Or is it kind of like you don't really need that anymore? I think with a lot of the data services, you don't need it until you do. Or, I mean, I think
4:02 forever people probably thought, I mean, and you're probably, I think y'all are still seeing it going into some of these operators. They've got file cabinets full of old documents that they still
4:10 have. And they probably thought it was useless for years and they've kind of hopefully kept it around. Some of them might have just shredded it eventually and said this is not worth
4:20 the couple of thousand pounds of everything that it's taking up. I
4:27 mean, it's probably the 8020 rule. Most of it probably will probably never get touched again. But then there's going to be some stuff. Now you're seeing, say, even like Brian McDowell and Sabata
4:36 going and taking old logs and it's like, wait, this is really useful. And now he's got a data set that no one else has that people are like, I need that now because now if I want to do more
4:45 discovery or I can go back I can look through it. It's a historical analysis. Yeah, there's a lot of value there, but.
4:52 At the same time, I do a lot of work with operators and most of them are focused, maybe they might look back a year or two and then put their focus on what now and going forward. And there's not as
5:01 much of a look back process or they're going so fast that they're not looking back at it. But I think now with AI and machine learning models and everything, you want as much data and context as you
5:12 can get. So anything that you have previously that can inform better decisions now. I mean, if there's, 'cause again, even when you're drilling, there may have been certain zones that were
5:25 tapped into 50 years ago that zone is dry, but you still have to drill through that zone. So I think there's a lot of stuff in that way too that even if we're gonna target zones lower, we have to
5:38 drill through that zone. So there may be learnings that we got 50 years ago by drilling into that zone that we could make use of now whether it's over pressured or whatever H2S, all those kind of
5:49 things I didn't think about that. That's really interesting. Um, yeah, with, I feel like with AI at the forefront of everything, I think data is on the top of everyone's mind and just how to
6:01 clean it up. Yeah, because it's a mess. So I guess like, what is the biggest problem you see that operators are faced with when it comes to data? I think the biggest problem is like not like a,
6:15 a lot of them not me not having like the proper governance or centralized, you know, like data sets that people can trust Like, you know, I, especially now with consulting and I've seen it
6:25 previously too, it's something in the companies, but you just see companies where it's like, Jim's got this spreadsheet and Joe's got this one and then they go to a meeting and then these answers
6:34 are conflicting and they, they calculated the same thing into different slightly different ways or the data that they put into it is different or they didn't know that they had to do this to the data
6:42 to clean it, you know, to get it in the right, you know, format or what account codes they should have been using You know, so I think it's you're kind of like that siloed data are people working
6:53 in little silos and not seeing maybe how it all ties together is probably the biggest thing.
6:60 I mean, you don't have to have like a perfect data warehouse and that's what I try to help a lot of my customers put together or even past companies I worked at put together. But the thing you need
7:10 to have like certain processes and like, this is where you go get this or you know, or even just like, I think, on the government side, the naming conventions like, you know, what do you call
7:19 this or what does this mean to you? Because like, say one example, I learned it when I was at university lands, but holds true generally, but you know, say completion date to a regulatory person,
7:30 completion date actually means first production. That's like, when they file a completion date to, you know, the railroad commission, completion date is like, actually the first production. But
7:39 you go talk to pretty much anyone else and they assume that means like, when they frack the well. But again, that completion date
7:48 standard or we're going to call it, you know, can well before you know, horizontal fracking became a thing 15 years ago. Yeah.
7:56 So just coming up with like, you know, there's almost like a people side of it is more complex than the actual data side. And especially, again, I may keep struggling back to how AI is changing
8:05 things or making things more accessible, but like the coding side is not the hard part anymore. Like I can talk to Claude or even chat GBT or Gemini and get good code that will probably do what I
8:16 need to do, right? But do I understand the problem? Do I understand the industry? And again, getting people to buy in and understand the value and that it's important that we stack hands and
8:30 understand what we're talking about and that we're trying to get to the same answers. Yeah, yeah, I feel like I'm very familiar with different ways to slice and dice the data to tell the story you
8:41 want. Oh yeah, you can make it say whatever you want. Like CEO, what do you want this to say? Like I can probably make it say that I can enjoy this. With the right filtering and yeah. you know,
8:51 skewing a thing or whatever, put on a log scale, it looks a lot better, you know? Yeah, yeah, yeah, that I feel like that's, for me, I mean, outside of oil and gas, just more on the
9:02 marketing and analytics side. That's like, the biggest thing is documenting how I found a number. 'Cause sometimes I go back and I'm like, what did that actually mean? And I'm sure the same thing
9:13 is happening. They'll just come up with some thing that they, in their head, they know. Yeah. But then down the line Yeah. I mean, hopefully they did an Excel and I can go, you know, peel
9:24 through there. Right, yeah. Some if bizarre, you know, X look up statement or something like that.
9:29 But I think, you know, maybe I can say, getting back to more of the AI side, that's where almost the fact that, that can be kind of a black box is a hard part for, especially engineers to kind
9:40 of be get on board with, 'cause they want to understand why did it get to this answer. It doesn't necessarily matter if it gets to the right answer. Like, how do I know? How do I know it's right?
9:49 Yeah, like, so how? I guess how do if anyone's facing that problem, how do they go back and check? Yeah, I mean, I guess you'd have to check it. I mean, if it was trained on historical data,
10:02 then you'd probably want to. And again, this is not dissimilar to what you would have done five, 10 years ago with more traditional data science methods, but even if it's just linear interpolation
10:12 or whatever it was, something that was more explainable, but you'd still have to, you'd have like a
10:18 test set and then you'd have like a validation set and you'd want to validate that I came up with this R squared and this, whatever on this set, but we need to see that does it hold true for a
10:29 separate validation set that wasn't part of the training set? Yeah. So I think even still, and it seems like I think just talking to John when we came in and he was trying to build a little app
10:37 just so he could validate QA stuff from Cali that it gave good answers and so on. So
10:45 I mean, I think that can give some confidence, but at the same time it still doesn't give you like, how did he get there now? Again, I think some of the stuff y'all are doing with Collide and
10:53 some different places where it's able to actually cite it, sources is super helpful. Like, all right, this is great that you gave me this answer. And now if it gives me, it takes me back to page
11:02 46 of this, you know, document that I can feel confident that it's using the right, you know, formula or even just, or the right context to give me the summarized answer. Now I can feel a lot
11:13 more confident about that Um, but I think a lot of times it makes sense, say from someone in my discipline to try to go with the most explainable, easiest thing, and even if it's slightly less
11:25 accurate, but if it's explainable, that usually you can get more buy in. Um, and I think, you know, we were talking with Jeff Kremel on the podcast about it about a month or two ago, but you
11:35 know, a slightly worse answer with, but, or a decision, but with more like actually kind of gumption or whatever behind it is way more impactful than like something that people aren't really sure
11:47 about, you know, so I mean, that confidence in the answer is a, is a huge. each piece of it as well. Right, yeah, that makes sense. Yeah, I mean, you're talking about AI a lot, but, and
11:58 you know, and you specifically named like LLMs, which didn't exist a few years ago, right? Yeah. So, were you in the industry before like GPT-3 and like this whole big AI boom? Yeah, yeah,
12:11 I've been in the industry since, I mean, I guess it's almost exactly 11 years now. I was a math teacher in a high school teaching coach in the night. I always forget about that with your
12:22 background. So in the evolution of, you know, using data and energy, maybe specifically oil and gas, like what are you doing before AI? Like how manual and how much, I don't know, worse was it,
12:37 just 10 years ago? Yeah, I mean, yeah, I mean, I got in and I always helped people, I didn't even know what a VLOOKUP was in Excel. When I first got in and I guess at that point, I would have
12:47 to have Googled it and then try to figure out how. these examples and then try to make it work for what I was doing. And now we can have - But now it's like I could - You can just write the view out.
12:56 Yeah, I mean, even I remember really struggling, I was trying to learn R, which is a programming language, and luckily had some good people at Conoco Phillips, and there's one guy kind of
13:05 unlocked this idea for me, but it took me probably a day, day and a half of banging my head and then try and talk to the right person that might be able to help me unlock it. And then once I did
13:14 that, I kind of broke through. And I think there's value in that too, but at the same time now, it's like, if you don't have to don't, like I tell people all the time, like, I
13:23 mean, again, I think even talking to college kids, like they're professors who haven't probably been in the industry for a while will tell them, no, you need to learn Python or you need to learn,
13:32 you know, this syntax and you shouldn't have to look it up. And it's like, that's not really true. I mean, like, you need to, the biggest bigger thing is, you need to understand how coding
13:40 words and what the output is, but then more even more important, like I said earlier, is understanding the problem that you're, that you're trying to solve So I mean now, what just what I'm
13:48 seeing is. And again, there's been stuff now where I'm using Claude and, you know, GPT or whatever to write a little PowerShell script or this and like, I'm sure I can figure it out. But why do
13:58 I need to like prove to anyone if it takes me an hour or 30 minutes or 15 minutes figured out when I can get the answer that does what I needed to do and I'm in it or two. You know, I mean, like
14:08 just amplifying people's intelligence, you know, is just making people more productive is just what I've seen, like things getting better before it was stack overflow And but again, I would have
14:19 to take that answer from stack overflow and then adapt it to my use case. And now it's you get 89 of the way there just by giving it the right prompt. But I think there's it's always been in the
14:30 battle and I think even back to college, like knowing how to Google was always a very useful skill, right? Knowing, you know, even how to use ands and ores and quotation marks and all these
14:39 things were super important. And now it's changed to I know how to prompt these LMS to give me what I want. Yeah. I mean, I just helped a former employer redo their website. I would sell people,
14:49 I did a full stack web development bootcamp, but I wouldn't pay me to build a website. But now someone paid me to help them use level bowl to build, rebuild their website. Yeah. And it took a
14:59 four hour working session and then they took it the rest of the way. And by the middle of the next day, they're like, Hey, what do we need to do to publish this? And it was already better than
15:06 what they had in WordPress that someone built. I'm obsessed. I absolutely love it, but I think it does go back to what you said about like knowing the end goal And especially with with data, like
15:20 I feel like there's been so many times where I start with knowing my end goal, and then I start going down rabbit holes. And then I forget. And then I'm like, wait, what am I actually trying to
15:31 achieve here? And I've just like done so many different things to try to get to a number. And I'm like, what does this actually mean? There's been so many times that I've done that. So I think
15:40 it's very important to have that like written down. Like I want this is what I want Yeah. And then how do I get there and with. Yeah, with AI, you can - Well, it's funny, too, because it has
15:51 kind of evolved this way. But I remember, say before, the whole L stuff came out, but people would always say, so when you're writing code, you can write comments, right? Like, that's part -
15:59 so don't run. But it's kind of like documentation within your code. And a lot of people would suggest, like, write out your comments and then fill in the code. Well, literally now, that's what
16:10 GitHub Copilot does. Like, you're - or I'm sure you can do it with cursor or whatever, too. But GitHub Copilot, you can write your comment, and then it will generate the code That's nice.
16:18 Within it. So you could literally start from that now. If you could scaffold out what you need, just plop my comments in there, and then just hit Enter. And now it generates what it thinks you
16:28 need. It's probably, again, 89 of the way there, if not 100. And then you just keep iterating through.
16:35 So yeah, I think if you know what you want, again, I think to your point - I mean, even going back in the last year, two or three, say, people have been using Power Apps because it was an
16:45 easier way to create a little cell phone
16:48 I'm super bullish on like lovable or even just people being able to have access to cursor where if you need to build a simple little crud app where you want to import, you know, input five different
16:58 fields and hit submit, you can build that in minutes. Yeah, same with like just creating quick MVPs of different things. Like you want to launch any feature. I think as
17:09 like these MVPs are probably better than a lot of the software that oil and gas companies or even like, you know, even consultants previously were building for them Like, and then they can actually,
17:19 if you know how to prompt it, you can get to exactly what you want without ever having to like iterate with someone, you know, God, I mean, there might be an America, but God forbid probably
17:28 India or near shore or something like that. Like, and it would take iterations and more money. And it's like, now I can just, oh, that wasn't quite it. You know, change this, move this, you
17:36 know, change the padding here. Yeah, yeah. I mean, I just did the day for like spot fire. I mean, I've used spot fire literally since I came into the industry. That's what I was using at
17:45 Conoco Phillips. But the text area in Spotfire is basically a website. It's a HTML, you can use JavaScript CSS. And I had an old script, I couldn't find it to make some of the components like
17:58 flexible, like Flexbox lets you move things around, it'll rescale, like how like Bootstrap would do as well. But I was like, well, let me just pop this in the quad. And like you're a Spotfire
18:07 expert and also like a front-end web developer guru, whatever. Like, take this original HTML that it generated from like their editor and make it flexible and style it with Bootstrap and pop it in.
18:20 And it was like, it works like. That's amazing. Let's, it was kind of inevitable that talking about data we'd get in the weeds with a lot of jargon. And you've said, this is Energy 101. So
18:34 let's, you know, let's maybe break down. Yeah, stop me whenever and like, whoa, you're getting too far ahead of it. Let's break down some jargon for grandma who's watching I mean, you've said
18:42 everything from full stack stack overflow, common terms we hear all the time to, you know, cursor and R, you know, me and Julie and kind of understand what that is. And then a big one, maybe to
18:55 get more on the rails with, you know, oil and gas, you said spot fire. You know, let's maybe break down some of these, like what digital tools I would call them to go back to like just what does
19:08 data have to do with oil and gas or energy in general? And something I wrote down to prep, I don't know nothing about just waiting for, just waiting for you to say one of these big words, is
19:17 Spotfire and how it's kind of compared to Power BI. Sure. So can you just maybe explain what these tools are, how they help you like just from the ground up, what exactly is it going on here?
19:29 Yeah, so Spotfire and Power BI, Power
19:32 BI's Microsofts, you know, so they're both what are called business intelligence tools. And then if we're gonna still back probably further to what people understand would be like reports and
19:40 dashboards or what they kind of do at their core, there are some fundamental differences. when you get deeper into it. But yeah, Spotify has been like kind of had a stranglehold on the industry
19:53 pretty much for the last, since I got in, I think even a little head of that, so we're probably talking to them in 15 years. And I think for good reason, there's some things that it does better
20:00 than Power BI. But at their core, I mean, both of them, they can ingest data from any kind of a source, whether it's a database or a CSV, like a Excel file, you know, whatever And it makes it
20:14 really easy to generate charts, like a bar chart, you know, line chart, pie charts, you know, tables, you know, pivot tables, all that kind of stuff. And they're both, I think equally good
20:26 at the visualization side, but I think then you get further in, and there's certain things like, I was saying, Spotfire, like say, as text area is like a website. So at that point, you can
20:36 basically, if you know how to do web development, you can take it, you know, as far as you want to take it Spaffire is really great at.
20:44 So another term would be GIS, which is like geospatial information systems, like map mapping. And like you can layer on like five different layers of maps and do some cool things with the data
20:56 where you size and color the data based on like other associated data columns. And you can do that in Power BI to a degree, but not I think all the layers and make it as nice there. And I think
21:09 spot fires a little more tailored to data science, which would be you start getting more into your predictive analytics type tools, where you can do machine learning and like different interpolation
21:21 methods and clustering. Some of it has like native in the platform and other stuff that you can extend it 'cause you can use Python or R, which are both scripting languages within spot fire.
21:33 Whereas Power BI is much more geared towards like the charts and graphs and everything. And the nice thing with Power BI for most people is baked into their Microsoft. their company's enterprise
21:45 Microsoft license so is generally cheaper because that was already a sunk cost, if you will.
21:54 And also, you know, one thing that Microsoft does really well is makes all their things kind of work and talk together. So if I publish a Power BI report or dashboard, I can embed it in Teams or I
22:04 can put it on a SharePoint site or, you know, there's different ways of kind of like passing around, passing the data around that can make a little bit nicer for an enterprise and like spa fire,
22:16 it might be a little more cumbersome or you have to have people have to have proper licenses and it's another thing that they need to log into. So, but yeah, at the end of the day, they can take
22:26 the data, put into them and, you know, you can do some more manipulation of the data there and then also just charts and graphs and all that kind of stuff. Yeah, so it kind of sounds like this,
22:37 these are tools that could be used kind of on anything 100 isn't just for oil and gas.
22:43 Generally Power BI, especially, is definitely not geared towards oil and gas and spot fire is 100 you could use it for any data set you wanted. Now I will say, especially in the last year to spot
22:55 fire, they've been sold just a couple of times via private equity and they just merged with, I want to say Citrix or some company like that. It would seem like a weird marriage, if you will, but
23:06 it's called the cloud group now and they're spot fires, it's own business unit and they've done a lot to come back to their roots and really focusing back on oil and gas. I think they just realized,
23:17 we're not catching up to Power BI or Tableau, probably in the broader business intelligence market, but we've got such a good thing here, let's really focus on oil and gas, or these various
23:28 industries that we're really strong in, so they actually have some more visualizations and some scripts and stuff that they've baked in that are more industry specific that make it much more
23:39 attractive to use because it would be much more cumbersome to do in other tools, so. Yeah, I mean, I honestly, I wrote down like 20 different tools that I keep hearing about, and something funny
23:50 I realized is that a lot of them are kind of just owned by what I would call like the big three, like it's like Amazon, Google, Microsoft, and you just realize like why they're like trillion
23:59 dollar companies, like they aren't just like at their surface level user, like they're literally integrated in all walks of life, especially in data, it's kind of kind of crazy.
24:10 Another two comparisons I've heard, like I just mentioned to earlier, another big one with oil and gas seems to be snowflake and Databricks. Are they kind of on the same level just as the last two
24:22 I mentioned were? Yeah, that's definitely an interesting one.
24:27 So I mean, 'cause both of them are about a price, similar maturity level, I don't know, seven, eight years, ish, maybe goes back even further, but I mean, let's say definitely in the last
24:39 like decade Um
24:43 So they're interesting in that they're very much fierce competitors now. And they're databases or what? Yeah, what are they? We call them data platforms. Data platforms, okay. Yeah, I think
24:55 it's pop hard. Or Snowflake literally calls itself like the cloud data platform. But they're data platforms. So
25:02 they're kind of arriving at the same similar platform but from different directions. So Snowflake started off purely as a, it was like a cloud data warehouse. So basically it's like serverless.
25:15 And I guess we're gonna get into those like the cloud is basically someone else is hosting. Oh, we'll get into that. Yeah, we can talk about that. So kind of loop back on that. But
25:25 essentially like you can log in as a software as a service and you just log into Snowflake, pick which cloud you want it to run in. And like you're writing SQL. Yeah, almost immediately you just
25:35 gotta get some data in there and it's good to go. And then you can choose like how big your servers are.
25:41 and a lot of options there. So they started off as just a data warehouse, like writing database queries on your data, essentially. And they've been, well, then let's talk about what Databricks
25:53 started off. As Databricks started out, there's a technology called Spark, which was like an open source thing that allowed to run like queries but across multiple computers, like so to
26:07 parallelize it. So it could take this one query, but run it, use all the cores from like multiple computers and then, you know, return an answer faster 'cause it was using more computers rather
26:20 than just if you just had one computer, you're constrained to whatever the capacity of that computer is or was. So the people that invented Spark created Databricks because Spark was hard to deploy
26:34 yourself. Like there's a ton of things out there that are open source and again, we probably have a whole not a little, thing here about what what is open source mean.
26:43 But basically, like, a lot of times, like deploying some of the software can even take a team to manage like the servers and keep it up and all these things. So Databricks started out, they were
26:53 managed Spark. Basically, it was like a, again, they were a software as a service, a cloud thing and you can run Spark and Spark was more for like ETL extract, transform and load, like taking
27:05 the data, transforming it and then loading it somewhere else.
27:09 And then even like, it was also good for like
27:14 data science. So you could run like some of your machine learning and stuff faster and train bigger models and stuff like that on it. Well,
27:23 fast forward and now Snowflake has been adding some of that more of that capability of on the data engineering data science side and the AI obviously now and Databricks built their own data
27:36 warehousing engine. So basically arriving at a similar platform that does things similarly, but we know where you have almost unlimited compute, supercomputer, if you will, behind it. And you
27:47 can run your queries on petabytes of data within these platforms.
27:54 But
27:56 yeah, they're, I think, but looking at it, I think they both still shine in what they were originally. Databricks, I'm using it for a client now, 'cause that's what they had. And the data
28:07 warehousing part is good, it's fine, but like, there's a lot more to manage on Databricks. Like, you have to be more technical to be able to actually play in and clusters manage to how understand
28:15 to have you, There's. sandbox Databricks the
28:20 there's certain things there, but it is super powerful. Whereas like, Snowflake, I was quick to like an iPhone. Like you just, you can pick it up and you can start using it and it's a little bit,
28:28 you know, more hands off. So if you've got a smaller team that maybe doesn't have like that more like computer science technical chops, like Snowflake might be more approachable,
28:39 you know, a team behind it, you know, there's a lot of power behind Databricks and I think both are really good. And it was funny, even I guess I won't name names, but I'm sure people go look at
28:51 LinkedIn, but like company that I was working for right before I got into consulting actually had both. Because when, because when they designed this architecture, Databricks was really good for
29:04 what I said it was good for for the extract and transform and load and data science and all that stuff. And then snowflake was really good for data warehousing. So they, and because it would
29:14 probably be way too cost prohibitive to refactor everything, move it over. They're using so using Databricks and stuff like together. And it sounds crazy to people now or because it seemed like
29:22 these are just same platforms. Yeah. Well, that and just people see them like going at it on social media all the time back and forth like exchanging blows. But at the same time, like, it's a
29:33 perfectly good architecture and it works for them. And
29:36 again, that's what they were they you know, the time and it's, you know, and I think I'm sure they've got people, sales guys on both sides of the fence being like, oh, you need to, you know,
29:46 come over here a whole hog or whatever, but it's probably better for them to keep it that way than keep some tension, you know, from the pricing side, like, well, I can go over to day bricks or
29:53 I got a snowflake. True. They have the leverage. Um, so yeah, hopefully that didn't get too far down the rabbit hole. Oh, it did. Yeah. Sorry. Yeah. Again, like I said, it was inevitable,
30:04 but I mean, if we kind of take it back to oiling gas stuff like that, I mean, what, what is kind of like, you said, or petabyte, you're saying these powerful machines, like, what exactly are
30:19 they doing in oil and gas that has all this like, uh, data and storage, like, I assume it's like a bunch of charts, papers, sheets, like every single one of those, like a few megabytes, like
30:30 if we're going to talk literal size, like why is this industry so high tech when it when most people picture it, they picture a blue collar. Can we break
30:42 it down to a well-being, I don't know, drilled or whatever. I mean, if you give it like - What data, what all data is coming from that? Sure, if you take five seconds and you realize, oh yeah,
30:54 they are drilling 10s of thousands of feet in the ground and they have to be exact, they're not just guessing. So obviously there's data and stuff, it's like, why all the high tech? Yes, and I
31:08 think it depends on even the company and like how big that, I mean, there's people that would argue and I can't really argue against them, like most operators don't have like big data. You know,
31:19 to some people, big data is something that doesn't fit in a spreadsheet. But we said, we start getting into like, at least terabytes, if not petabytes. You know, you're probably, petabytes
31:27 you're probably maybe starting to talk now about exons and BP's of the world. What's a petabyte? So you have a gigabyte? A thousand terabytes? A gigabyte is. You know, well, a terabyte is 1,
31:39 000 gigabytes and then, yeah, 1, 000 terabytes is a petabyte, I'm pretty sure. A gig is 1, 000 megabytes, it's all but 1, 000. So 1, 000 terabytes is a peta, peta? It's a lot. Yeah, and
31:50 again, I think the petabytes, you're probably more talking about like what I was talking about earlier with like your tech companies and stuff that are having the click stream data and like just, I
31:57 mean, exploding the velocity that it coming in even is just, yeah. Yeah, every second is just, you know, massive, probably bigger than what a lot of operators, you know, create ever, but
32:09 even at that,
32:12 if we want to talk about some of the bigger datasets in oil and gas, I mean, SCADA
32:17 can get pretty big. So SCADA, I guess we'll step back as like controls and automation, so like think of like the sensors and stuff in the field, and a lot of those are collecting data, I'd say
32:29 typically every 10 minutes or 15 minutes and they take you, multiply that over all the wells you have and, you know, We'll call it hundreds of data points, whether it's temperatures and pressures
32:39 and volumes and all these things coming in. So that can add up pretty quickly. And you've got, those quickly turned into like billions of rows of data, I would say.
32:51 Which sounds big and it was probably really big data. Like, you know, even 10 years ago, but again with,
32:57 you know, the improvements that year over year on like how much compute improves and just the efficiency of software and everything. It's much more manageable. But then I think the biggest data
33:09 sets typically end up being more like on like the seismic and the geology side. There's like,
33:17 yeah, that's where you see, I think traditionally companies had, like bigger companies had their own supercomputers or their own server rooms with like massive computers or they would have to rent
33:28 space from someone who had server computer and they would run these simulations And those would take days to run on these, you know, at the time, we're super. you know, powerful computers, and
33:38 they'd have to, you know, they might have to wait a month to even get in there and then they would have to run a simulation that would take days or weeks. And then they would get the outputs back
33:47 and then if they want to run another iteration, they'd have to wait and do that again. And now, whether it's just having their own cloud through Azure or AWS,
33:56 they can spin up those things or again, like having their data in a platform like a snowflake or Databricks, I mean, you're probably not running simulations in a snowflake or Databricks, but you
34:05 can write queries on data that previously would have been prohibitive or you would have couldn't have done at that time. So some of it's on the more, seismic would be more the exploration side of it,
34:18 but when you get into operations, I'd say SCATE is probably one of the bigger ones.
34:23 Some of the accounting data sets get pretty big, I mean, which are talking hundreds of millions, maybe billions of rows, depending on the company
34:31 But again, like for snowflake or Databricks, that's child's play like. So like it now enables them to run queries and do things that would have been really difficult to do 10, 15, 20 years ago.
34:43 Yeah, that's pretty crazy. It's just, so I mean, they are, it is a lot of just small data, small bytes, but we were talking to billions, trillions, and obviously that adds up. And then of
34:55 course, seismic, I mean, seismic is when you just, you literally map underground for miles, right? So I can assume that that's kind of where a lot - Yeah, and you get the data points across,
35:05 you know, this wide area. And the 3D space, right. Okay, so yeah, that kind of adds up. So we've touched on like cloud a couple of times, you know, everyone these days, like uses the cloud
35:16 on a commercial small scale to like your free two gigabytes on Google. Sure. To, you know, AWS, Azure, Google's GCP,
35:26 you know, like, so it goes for, you know, just here at a collide, you know, we pay for a Dropbox that has like 100 terabytes.
35:36 Believe it or not, in the last five years, we got it to like, we're at like 70 terabytes of like, just footage. We save everything. Yeah, and how it is as well. Or yeah, kind of the same
35:45 thing you brought up earlier, like, you know, to keep all these files from decades ago. It's like, just because, you know, again, before all this stuff happened, think of, you know, five
35:53 years ago, you know, you guys were like, we don't need these episodes anymore. But now it's like, all the transcripts and everything you can get from that is now super valuable, right? And we,
36:02 and we take advantage of that Yeah, it kind of leads me to the thought of like, you know, like, I don't have to think about 70 terabytes, you know, you just kind of paid a price and Dropbox
36:12 handles it. But you couldn't say that like 10, 20 years ago. I mean, I remember when cloud, the word cloud became part of like culture and everyone's like, what even is the cloud? And it's like,
36:23 well, you know, it's just storage online, basically. But like, for some reason, it was like so like, like mind-boggling at the time Like what? 20 years ago, I would say cloud storage in exist,
36:37 right? 30 to be safe. Yeah. Like, but some industries, I'm sure even oil and gas, had all like, you know, we're talking trillions, billions, whatever, like what were they doing physically
36:52 to store it? Like everything was obviously physical, right? Like, how did we live before cloud storage? I feel like a lot of them are probably still on-prem. There are a lot that are still
37:02 on-prem, but most of them are at least probably hybrid at this point Especially just 'cause you have, by virtue of having, again, Office 365 license, you've got one drive and stuff like that,
37:11 where it's probably just, again, that wasn't, say, even available at the time. So you've had to save things on a local file share. Yeah, can you talk about on-prem? 'Cause if we're gonna talk
37:21 about, if you're gonna look up stock footage of data, the first shot you're gonna see is the date, like, the
37:29 on-prem, matrix-looking thing. Like, that's what you picture. when you picture and, you know, that's what they did, right? Like, yeah, you had to, you know, server room. Um, and again,
37:41 like, you can't even like trivialize what that was. I mean, you had to put all these computers in there, right? And like, it's to have those running. But one thing you have to do in the server
37:49 room is keep it really cool. And then you probably had a multiple, like kind of engine, you know, uh, engineers or network engineers and all these people that, you know, their job was to keep
37:59 that thing running. Um, so I mean, now you think there was a, a couple of salaries, it just assigned to keeping that up and running and secure and, and then thousands of dollars in price energy,
38:10 you know, bill, just, you know, keeping that room cool enough, um, so that everything didn't just like burn up. Um, but yeah, I mean, and then it was just, because now everything's just so
38:21 on demand and we obviously we see it at home with, I'm like, I can watch Netflix whenever he wants, you know, like it just turned out, whatever show he wants, whenever, but like it now we,
38:28 like he said, just, oh, I just need to go put my credit card in and, you know, pay50 more per month. you know, however, like two terabytes of storage, or I can go buy an SD card and just plop
38:38 it in my camera. And now I've got a terabyte of storage
38:44 before. I mean, I mean, I've talked to multiple people. I think they even brought it up when I was on my first podcast every day with Jeremy Funk. You know, we were talking about the cloud and
38:54 maybe snowflake, but like basically roughly for the cloud is 23 bucks per month per terabyte So it's 23 bucks for a AWS blob storage for a terabyte. Apparently, I mean, like 20 years ago, people
39:07 were literally selling storage and like you were talking hundreds of thousands of dollars for a terabyte. It's crazy. I mean, even commercially you can remember to go pick up like a 32 gigabyte SD
39:19 card for like your Sony camcorder was like with inflation, like hundreds of dollars. Now they're like, it's like, you can have like a pile of them just sitting like in a random place and they all
39:32 cost like5 a pop. Yeah. You know, like, no, it's nuts. Like we got so efficient and good at storage that like a terabyte is literally nothing. And we even have it on micro SDs. I think they're
39:43 at like two, three terabytes now. No, probably. I mean, like I grabbed a one terabyte micro SD for this camera that we use for my daughter's like, softball games, like this game changer camera.
39:52 Right. And like just plop it in. Like I haven't even come close to hitting the limit on it. Shout out game changer 'cause I'm baseball dad and that's just a whole different thing. Just everyone's
40:01 a live streamer now. Yep So funny, I could swallow on accident something that cost hundreds of thousands of dollars, but when I was still alive, you know, as a kid. No, yeah, I mean, just the
40:14 improvements in storage and compute power. I mean, I've made so many things possible and I guess if you wanna make an analogy to say oil and gas, I mean, everyone knew that the oil was in shale
40:27 was there. They just didn't think it was accessible. And then the technology, well, I guess that confluence oil price and gas price made it economic to do, but then like the technology came along
40:37 that allowed them to do it, and now we could get oil and gas out of there. Like, I think AI and some of these things have always been what people wanted to do. And like people have had the
40:46 imagination to think about it for
40:49 at least the last hundred years, if not thousands of years of what would be possible if I could have a machine that could do this for me, right? Like, but now we've hit this inflection point of,
40:58 we have so much compute power and we have so much storage that we can actually train models on the entire internet, like we've done with the GPTs and clods and all that, where they just literally
41:09 have access to all the information that's ever been generated ever. Yep, I brought up in John's podcast, when we talk about LLMs and I was like, you know, I think decades ago, even like science
41:20 fiction, I was like, we all predicted that there would be a magical machine that you can ask anything and don't answer it. And that's what the era we live in now. So as we kind of wrap up things
41:30 here, like, we get like some kind of like. insightful, hopeful thing or just something to pick your brain, like, what do you think can exist in our lifetime in like a decade or two? That like
41:42 is not possible right now, especially like in data.
41:47 Man, I mean, I think we're that far off from, again, just being able to ask questions, I mean, you can in a way, but I think it's just going to get better and better, but just literally,
41:56 literally asking questions of the data and getting like a truthful answer back from it But I think
42:03 like with quantum computing, I kind of on the, you know, you're starting to, you're starting to hear some like bubbles about that, but like, that's just going to turn everything on its head,
42:12 because I mean, being able to, it's going to break any encryption that we have now for passwords, like, like within minutes or seconds, like, so it's just, and again, I think I'm sure I hope,
42:24 you know, cyber security is going to evolve at the pace of, you know, the compute power as well because hopefully you'll be able to use quantum people. computing against itself, right? But
42:35 it's kind of, I mean, it's kind of scary on it. I mean, you just hope people are going to use it for good, you know, with all this stuff. And the scary thing is too, like, I'm more than
42:45 positive that the government and other governments will have access to those things before it ever hits commercial hands, right? So
42:53 be really interesting to see, see where that goes. But I guess maybe that wasn't most optimistic. No, I think we already started off, like, super nihilistic in the beginning with all the AI
43:04 stuff. So, you know, that's just the world we live in. We kind of, we all understand that we are getting a little too powerful as a society. Yeah. When I think it's one of those two, when
43:13 you're really close to it as a you know, data practitioner, like, you know, like the good and bad, you know, like, when you know too much about something, you're like, you know, you even see
43:22 people again, may take myself, for example, I love technology, but then when I'm going to buy appliances, I'm like, give me the the thing with the least moving parts. Yeah, we bought our first
43:33 house 11 years ago or 10 years ago. I was like, I want a washer dryer that just works and I wanted to work for the next 15 to 20 years. So we bought a speed plane with a crank dial 'cause it turns
43:44 and it washes and it dries. Like, and I haven't had any problems with it. That's good life advice. Like if you would get a bag of chips, there shouldn't be 40 ingredients into chips. It should
43:53 be whatever's fried into seasoning and oil. And that's it. Yeah
43:59 Yeah, and so it's just stuff like that where it's like, yeah, I love to play around with technology and yeah, it would be cool for me to have a Samsung fridge with the camera that shows me when my
44:08 milk is too low and I always think - I just had a friend get that and it is absolutely insane. Yeah, I want one. But then like in three years, if it stops, you know, cooling things, then I'm
44:19 gonna be really freaking mad. Yeah. So like some of them just like give me the thing that's the most simple thing And I think there's part of that too, where I, you know, in the work that I try
44:30 to make it as simple as possible or reduce the moving parts because any extra thing you add is complexity and sometimes it's necessary, but a lot of times you're adding unnecessary complexity. You
44:42 have to simplify, yeah. Well, you wanna add anything else or do a quick little end session here before you call. Yeah. All right, what time is it? So, you good? So I like to, I'm trying to
44:53 incorporate like on-screen little graphics, little fun games at the end And I came up with one last second for you, and I think you can already see where this is going. I'm assuming you deal with
45:02 all kinds of files and stuff. So what I did is I got all the suffixes, is that what you would call these? Yeah. Okay. And describe, you know, guess what, just kind of describe it. And then,
45:13 but more challenging, thanks to
45:19 John not knowing what GPT stands for. I got a little inspired to see if you can know with the acronyms of these files you see every day. So why don't we solve a GPT? Do you know that one? I did
45:32 when you guys explained it to John and now I totally forgot. All right, well, let's just get into our second here. So what is a PDF?
45:40 It's like a, it's almost a version like an image file, but sometimes if it's not an image PDF, can have actual text encoded in a binary way. Right, it's, of course. Do you know what it stands
45:52 for? But when it's pointed to get one word to something data format, let's see. I've seen, oh, portable document format. Okay.
46:02 I didn't know that. I know, we're all about to learn just like, oh, okay, what is this, an XLS? An Excel file. Of course, you love those, right? Or hate those. Love to hate them? Exactly.
46:12 No, Excel's grade is not going anywhere. And,
46:18 but I like it from like a
46:22 proof of concept standpoint. Like if someone gives me, hey, I built this in Excel, but now can we turn this into something that. I can take the logic that they built and like now incorporate that,
46:31 but yeah,
46:34 I've also seen people put things into spreadsheets and then they're tallying it up on the calculator next to it and like you're not using that properly. Let me help you. I think if you like tight
46:45 looked up the top keyword on the collide database, I said data and that data. Sorry. I know that gets to you. I'm just kidding.
46:54 I don't like that. I don't like it. I don't say that. I just slipped out. It goes back to the last energy 101. I know. I think the spreadsheet is like said like the most like every single
47:04 keynote panel like this word spreadsheets said like 20 times. I feel like dot wave. That was like a sound the music file. Yeah. You know what it stands for?
47:16 Wait, I want to draw your visual or it's going to say it's waveform audio
47:24 audio. Waveform audio. Okay. Pretty simple. Where's the V come from? Um, exactly.
47:31 Should be one. PG, we love PG, image files. Yeah, what does it stand for? Something graphics. Portable. Hey, oh. Network graphics. Okay, okay. Portable network graphics. JPEG, classic.
47:46 Yeah, another image file. Portable, graphic, junior portable. Juicy, let's see. Joint Photographic Experts Group I think this is incorrect. I think I got this from the wrong source. This is
48:01 not a unique fact. This is not a unique fact. There probably are some JPEGs that are juicy, but
48:07 that's definitely not. Okay, okay. P-Y, getting your territory. Yeah, Python file. Python. Yeah, we talked about this in John's episode. Python is like the go-to script. Yeah, it's just a
48:20 swiss army knife. Yeah. Hold on, back to JPEGs. That is what it stands for, And it is the name of the committee that standardized - The image format. Wow. God bless him. All right, I think
48:33 one or two more. Empeg. That's like a movie, like that's an old school. I've never heard of Empeg. An old school, moving picture experts group. And fun fact, the like MP4, MP3, the M stands
48:48 for Empeg. So I guess it, MP4 is technically like moving picture experts group four or something. I don't know. You can see. I didn't know that The nomenclatures of all this is just very
48:60 interesting. Is that it? Oh, this is a personal one of mine. Do you know what XML are? Yes. I hear they're called buddy files. Have you ever heard that term? And I'll explain why I hear it. I
49:12 haven't heard that, but yeah, it's a very nested thing. It's kind of a brother to HTML,
49:18 which would be like, but it's a markdown language. Markup language language. Extensible markup language So yeah, it's like XML files are always like one byte.
49:29 right, they're like, I never see them large or anything. Oh no, you haven't seen the right ones. Okay, yeah, it's so weird. It's like, I download a lot of things all the time, like in
49:37 folders, like zip folders. And when you unzip them, there's always like XML files and they're like trash, basically. Yeah, there's a lot of those, but I mean, so before JSON, XML was how they
49:46 would pass the data around with APIs. Okay. And even like, Whitsamo is like a form of that, which is the format in which they stream most of the drilling data now. Oh, wow, I should have pulled
49:59 that one. Yeah, like every time I shoot a video, it creates an before and an XML file. And yeah,
50:09 I think that probably has some metadata associated with that. Yeah, that's what it is. And that's why everyone just deletes them. 'Cause yeah, they say like, some people are like, oh yeah, you
50:16 can keep it because in the future it might be useful. Like I don't, I'm like, whatever. But no, I mean, you could have like huge XML documents that like contained, you know, data, like much
50:25 like you would have Jason now. Yeah, it was XML back then, so. All right, well, here's our last one. A YAML file. YAML, do you know what it stands for?
50:37 John pulled this one, I think this is a, I would give you a lot of money if you can guess this one. Yeah, what is, I've never heard. Ain't markup language? That's real. YAML ain't markup
50:49 language. I don't, it's probably just like some nerd who wanted to come over with it And what is this, what is this, what is this use for? It's a YML. There, it's used a lot these days for like
51:01 four kind of configuration files and stuff like that. So like actually there's a tool called DBT that I use a lot and they use YAML files, which, you know, it's somewhat structured, but it's a
51:10 little more human readable. But like you can put, you can identify like certain sources. It's kind of parsable, but it's,
51:22 you know, it's used a lot for like configuration like people will store like now people are almost like writing YAML files that can be turned into code, 'cause the code can parse it and turn it into,
51:32 but it might be more approachable for someone, you all, if you're not as code heavy, or you can configure a file, and then it can spit out a chart or a graph or something like that.
51:42 The only one you need to put on there was a GIF,
51:46 and we could say, we can argue over how you say that. How do you say it? I think GIF, 'cause that's what I heard, this is a proper way to say it. I feel like that's peanut butter. No, I'm a
51:54 GIFer I'm a GIF. And you know, I have room to change. Because it's graphics interface. So get, yeah. For a matter of ever, but like, Graph. I know like, just interchange from wrong. But you
52:06 know, I used to say niche all the time, and then I switched to niche. Yeah. So maybe one day I'll come and get first. I used to say GIF, but then I think the person who created, I found out
52:16 he's supposed to be GIF, so now I'm on. He's wrong, so. Yeah, I hear that. Yeah, well, I guess, yeah. The moral of the story is just to be open-minded, You know, wow. I'm hosting.
52:29 Hi, Bobby, that's it, we did it. Wait, I forgot to tell everyone, you're also a co-host of Energy Bites Podcast where you deep dive into this. I'm allowed to go deeper there. Yeah.
52:43 Yeah, you kinda went into Energy Bites territory today, but like I said, I don't think you can really avoid that with these topics. Well, and if you wanna learn more, you can always reach out to
52:52 me.