Chuck Yates Got A Job

This episode is packed with big-picture energy talk and some seriously nerdy (but fun) data breakdowns. John Kalfayan from collide. and Chuck start with what’s really happening in oil and gas today before shifting into the challenges of putting AI to work in the field. From there, things get deep: contract dedications, what RAG actually means, how data chunking works, and the never-ending battle with duplicate info. We also weigh the costs of storage, querying, and running models, plus the tradeoffs between RAG and foundational models. If you’ve ever wondered about vector databases, data strategy, or just why we have a rant about sand, it’s all here. By the end, we hit on the human side too: education, privacy, and making sure the right people can access the right data.

Click here to watch a video of this episode.

Join the conversation shaping the future of energy.
Collide is the community where oil & gas professionals connect, share insights, and solve real-world problems together. No noise. No fluff. Just the discussions that move our industry forward.
Apply today at collide.io

00:00 - Intro
01:51 - Oil and Gas Industry Insights
06:34 - AI Deployment Challenges
09:12 - Contract Dedications Explained
10:32 - Understanding RAG
12:52 - What is RAG in Data Management
13:43 - Data Chunking Techniques
17:17 - Cost Considerations in Data
18:03 - RAG vs Foundational Models
19:21 - Vectorized Databases Overview
23:47 - Managing Duplicate Data
26:28 - Data Strategy Considerations
28:24 - Sand Rant
31:32 - Identifying Gaps in Data
33:10 - The Cost of Storage
33:56 - Effective Data Querying
35:50 - AI Education and Awareness
37:53 - Privacy Concerns with Language Models
40:54 - Data Access and Availability

https://twitter.com/collide_io

https://www.tiktok.com/@collide.io

https://www.facebook.com/collide.io

https://www.instagram.com/collide.io

https://www.youtube.com/@collide_io

https://bsky.app/profile/digitalwildcatters.bsky.social

https://www.linkedin.com/company/collide-digital-wildcatters

What is Chuck Yates Got A Job?

Welcome to Chuck Yates Got A Job with Chuck Yates. You've now found your dysfunctional life coach, the Investor Formerly known as Prominent Businessman Chuck Yates. What's not to learn from the self-proclaimed Galactic Viceroy, who was publicly canned from a prominent private equity firm, has had enough therapy to quote Brene Brown chapter and verse and spends most days embarrassing himself on Energy Finance Twitter as @Nimblephatty.

0:19 All right, dude, we've got to start this off. Someone was, it was,

0:25 you know, stalking me online as they do, as they do. Found Chuck Yates needs a wife and has decided you're the single funniest person on the planet.

0:37 I'll because I don't even remember who wrote that Was that Basel or is that you? No, I, I actually had a, had a Google ancient Roman torture methods.

0:50 So here's what we're going to do. Assuming you're okay with it, I'm going to have Jacob just cut a few of those seeds. That was, that was so good. Man, we have to do something about Chuck. Is

1:00 that a control? What do we do? We get kill him, you know, maybe run him over with a truck and make him drown in his own blood. Nah, we can't do that. Chuck's podcast is our most valuable asset.

1:13 What if we sewed him into the interior of a dead donkey so that only his head was exposed? I mean, we could strip him naked, cover him in honey, stuffed him into like a large basket and sealed him

1:23 in there with like an active beehive. Dude, what are you talking about? Maybe we could have him trampled by a pack of elephants or eaten by a hungry pack of rats or like a good old fashioned

1:32 crucifixion might be nice. Yeah, I appreciate the props from her, but I also had nothing to do with any of it. I just said some lines. I didn't even look at the camera. I'm just looking at my

1:44 monitor. Oh, it's the best. Literally reading the lines off of my computer the whole time. Oh my God, it was so good. Now, when we're out talking to people about AI, we always say our data

1:55 scientist is a former FRAC engineer. Is that actually true? Yeah. I've been meaning to check that. So tell me your background. I'm not sure we've ever done this. Yeah, so I was a mechanical

2:05 engineer for undergrad and the summer before I graduated, I got an internship, my first oil field exposure, but I got an internship working for a small operator production in turn down in the.

2:18 booming metropolis of Smackover, Arkansas, where I lived in a trailer. You're an elderly or an elderly. That's right outside of the area. Yeah. OK. Smackover Field. So lived in a trailer,

2:28 worked across from the - or lived across the yard all summer. And tiny - it literally won't stop why the bank closes at 430. You know, we got off at 4, so we

2:38 had 30 minutes to go put our checks in the bank. Nice. You know, a whole very, very small town kind of deal. So then when I graduated, I knew I wanted to get in the industry and started as a

2:48 fracking engineer for cut up in the Fayetteville back when people were still doing stuff in the Fayetteville.

2:54 That age has made quite a bit. But yeah, so then did that for a little bit, moved to Houston, did technical sales and support for frag jobs from Houston for a couple different companies and then

3:04 got into the gauge side of things, surface and downhill data And, you know, I'm usually, well, for a lot of my existence. around collide slash digital wildcatters. I was asleep on the couch.

3:17 How did Colin actually find you? Um, no, I've known Colin since I think I found him on LinkedIn back in like 18 or 19. And, uh, there were some, I think it was one of the Urtex or Dougs in

3:32 Dallas. And we were up there. He made a post on LinkedIn that they were going to be there. I was like, Hey man, we should just grab drinks or meet up or whatever. I'm going to be out there

3:40 Because, you know, what he was trying to do sounded really cool. And I thought was really needed at the time. And so the night that we met, we ended up, me, him, Jake, Kevin Zatterfield,

3:51 shout out to Kevin. Uh, we went out. I think we ended up, it was like Kevin broke his arm on a, he broke his elbow on a, uh, razor, you know, one of the electric scooters. He just, we were

4:02 all cruising around and he completely ate it. Um, and you know, call involved in this. There was definitely some alcohol involved. Well, and that's how I knew it. The statute of limitations of

4:11 that. Yeah, that's how I knew it was definitely broken 'cause he was many whiskeys deep at like two or three in the morning. He's like, Man, it still really hurts. It's like, Yeah, it's

4:18 probably probably broken. Anyway, we ended up at like Waffle House or IHOP at four AM. I was like, These guys are pretty cool. And so we've been friends ever since. You know, I was originally a

4:28 client of DW doing different marketing stuff and then came in as a consultant initially. Just, I had started my own kind of consulting data stuff and anyway, that very quickly turned in a full-time

4:41 thing. So I just saw on my slack that I've been here for over three years now. So it's crazy. It's been a crazy three years. It is crazy. So I'm glad I'm not lying when I'm out talking to people.

4:54 You really are a frack engineer. So the thing we get when we're out talking to people, and you know, one of the famous lines that we use is you can't deploy AI into a banker's box so you actually

5:07 have to get. data

5:10 into a digital, usable type format. And I think that actually is where things need to start. Yeah, well, and that's the thing too, is like even once you get it into a digital format, right?

5:22 It's like that use case or that client we're doing the POC for now, right? It goes back 40 years, right? There's data handwritten notes on these production or drilling reports from the

5:34 80s, right? And so it's like, just because it's digital still doesn't necessarily mean that you can do stuff with it. If you can't read the PDF yourself while you're looking at it and it's in this

5:46 like script cursive, probably not a good candidate to be, 'cause it's just not gonna be able to pull it. But if you can read the handwriting, there's a very high likelihood that we can extract

5:55 that. And so it's crazy because you're literally talking about decades. And I couldn't say that three months ago. No, yeah, you're literally talking about decades worth of operational know-how

6:07 well-specific, field-specific notes. We always talk about the great crew change and the knowledge retention and stuff, and it's like, at least historically speaking, that's where a lot of that

6:17 knowledge was in the actual documents itself, right? And so being able to expose that data to clients that literally didn't know

6:28 what was in those boxes to begin with, now they can search through the entire thing with a few mouse clicks is pretty cool And so, okay, so I'm oil and gas company client guy. I have banker boxes

6:39 full of stuff. And you've kind of talked me into, I need to come modern and join the real world. What am I thinking? What am I doing? What are issues just? Yeah, well, no. So I mean, that's

6:53 the other thing too, is it's like, whether you do it now or in five years or in 10 years, you're gonna have to do it at some point. You're gonna have to digitize those files if you wanna use them

7:02 Otherwise, just throw 'em in the trash or sell 'em to somebody at the end of the day, really, right? The big thing, you know, one, you need someone who, uh, scanning them is one part of it.

7:13 Being able to actually know, like, label each file or each document generally correctly is another big one. Um, for this, this client, we partnered with, uh, Caso Land, shout out to Scott and

7:27 those guys, um, they did all the scanning and not only the scanning, but they actually catalog them and name them and put them in files and folders and group them by asset. And it was very, it's

7:38 much easier to deal with and manage that way when you have a general idea of what the document is than PDF one, PDF two, PDF three and she's like, well, who knows? Um, because I've, I've also

7:50 seen other clients where it's like, they kind of did it in house and they just took every file for the one asset and scanned it into one PDF It's like that's terrible

8:03 because it's, it's a bunch of, a bunch of data in one file and Ideally, you'd have that broken out into each file. Each document has its own file type thing because the big thing with the language

8:16 models, it all boils down to a data problem. And so understanding the metadata about the document or if it's this type of document, take a post-job report since we were talking about FRAC. And a

8:28 post-job report, I know that there's going to be totals for every single FRAC stage, for my chemicals, my profit, my average pressures, my average rates, my mens and maxes, and all of that

8:38 stuff. And so it's like right now, you get the typical workflow as someone gets that post-job report in a PDF, and then they go manually enter that data into WellView or some database. It's like

8:50 in the future, there's no reason a human has to do that at all. You can just extract the data from the report and push it to where it needs to go. Also, it's in your language model as well, so

8:60 it's all there and available. And so being able to understand what type of document you're looking at, it allows us to get even better. answers on the back end so that, you know, if we need to

9:10 extract additional data explicitly, we can. So like contract dedications is another one, right? You've got all these contracts, and they're related to wells, but they're related by the lease or

9:20 by the section township range, they're not related explicitly to the well. So the guys on the land and contract side have to go through contract by contract, find that dedication's table, pull out

9:32 the section township range or the lease name or whatever, and then go find all the wells that are actually within that. And then they know that, okay, that's the contract for just this bucket of

9:40 wells. Well, that takes a lot of time for a person to do by hand. Or is with a like, I mean, that's that's like literally conference room table. Yeah, opening to exhibit A because exhibit A is

9:50 always dedicated acreage under marketing agreement. Yeah. And so being able to do that just with a computer. Yeah. Okay, it's this type of file go extract this table, then from that extraction,

10:00 go and find all the wells that are in that lease, or within that geospace. we can do automatically now, which is pretty cool. 'Cause when I'm out there talking to people, what I always say is the

10:14 magic is in our data pipelines, you have to have the domain experience so that you can chunk stuff and bed stuff, slap metadata on it, and somebody know. I'm don't said, I I mean, Chuck? And

10:25 gonna that, What does goes today have John on the podcast in just a few minutes. Don't tell me what that actually means, John No,

10:34 so the, I would say the biggest, again, going back to the data problem, right? Like documents, data has relation, some data has relationships, some doesn't, right? So the contract example,

10:45 right? Each contract relates to wells, but via a lease. And so you have to understand that kind of hierarchy of, okay, you've got your leases, and then within a lease, we can have pads, and

10:55 within a pad, we have multiple wells, and so on and so forth. And so having that type of understanding allows you to add either extract metadata from the document generate your own metadata about

11:07 the document that allows you to continue to give better answers, right? So, for example, if you're the client and you want to know what wells

11:17 the contract applies to, well, traditionally, you'd have to go physically look all of that up. Well, if I can do what I just said and say, okay, I'm going to pull out the dedication stable,

11:27 and then I'm going to go find all the wells that are related or that are within that lease Now I can add those wells or the API numbers or whatever as metadata to that contract so that when someone

11:37 asks, hey, what wells are related to this contract, hey, here's the relationship from this contract to the wells that are related to it directly. And so it's all data on the back end, right?

11:48 Like these people think that you just dump everything in there and it magically gives you an answer. It'll give you an answer because that's what they're trained to do, not because it's the right

11:56 answer. And so that's in our experience the biggest kind of the most important part of your rag and your pipeline. or RAG is getting that metadata and those relationships structured and built into

12:09 it in a way that when you ask a question, it understands, okay, it's asking about a lease, not a well, or it's asking about a well, not a lease, or if it's offshore, it's in this block versus

12:21 this, there's all the nuances of the industry and stuff, and that's really where having domain expertise significantly, 'cause you're essentially reverse engineering the answer based off of a

12:33 question. A lot of the time, you don't know what the question's going to be because you're not the client, and so you can take your best guesses, but understanding what those questions are gonna

12:42 be allows you to understand how to break the data down, what metadata you need so that when they ask a question a certain way, it's available to be filtered or searched against so that you get the

12:54 best answer ultimately. So do this real quick, just in case mom's listening, What's RAG? RAG is retrieval augmented generation. And so what RAG is is just a different architecture for where you

13:07 leverage an existing language model, but on top of your data, the way that we do it is we essentially say if it isn't in this data set, tell the user I don't know. And so RAG allows you to kind of

13:19 eliminate this concept of hallucinations by constraining the answers to your documentation. So the worst thing that it could do is cite the wrong source, but it's never gonna make up something So it

13:33 could give you a bad answer, but it's a bad answer still grounded in the data that is yours. With a traditional language model, they're trained on a giant, it's like the best general - And when

13:48 you say a language model, you mean like chat, GBT - Chat, GBT, Grock, Gemini, Grock, all of the big guys, right? So RAG still uses a language model, but it just uses a language model to

13:56 summarize the answer from the search that you performed. A lot of people get that confused. In a rag, the last step is you basically pass the subsection of text. So this is what a chunk is. If I

14:08 took a 100-page document and I broke it into 100 chunks, that would be chunking it by page. So each page is just literally - here's my giant block of text from that page, or tables, or whatever

14:20 else is in that document. And so the chunks just allow you to narrow down the context of the answer depending on what type of questions and answers you're going to have But ultimately, you're doing

14:31 a search based off your question. A language model is not involved at this point. You search across all these chunks, so all the parts of every document that you have in the rag. It gives you the

14:42 top k, which is that is a variable. So let's just say five. So it'll give you the top five best answers or chunks that it found within the search. And then it gives those to the language model and

14:54 says, basically, summarize this information that I'm giving you. Don't use anything else. training data, don't use anything else other than what I'm giving you to generate an answer. And so if

15:04 we don't pass it anything, it knows. And it says, OK, well, we don't have access to that information. So it's not going to tell you a lie, whereas a traditional language model is built to give

15:15 an answer. Not the right answer, not the best answer, an answer. And so that's why you see it when

15:30 it makes a mistake. And it's like, oh, you're right. You're absolutely right. It does that every single time when you correct it, because it is trained to give the user an answer, right? And I

15:34 mean, I think ultimately, because I had the head of Intel on the AI for Intel on the podcast about a year and a half ago, and we were talking about RAG versus the foundational models. And back

15:46 then, he was like, half the world thinks foundational models will ultimately solve it all. They'll get so good that chat GPT will just give you the answer And then he also said, but equally smart.

15:58 people in the other half of the world think you need rag architecture for that context. And I've come to believe, and I think the world's there too, is ultimately you need that rag one for the

16:11 precision of the answer like you're saying. But two at the end of the day cost. Yeah, if I am searching just this database, that's a lot cheaper than Hey, go search the whole internet, you know,

16:23 and those three websites over in Sri Lanka, right? You know, that's another big point here is that and cost are going to matter at some point. Yeah, no, that's that's very, very valid, right?

16:35 Like, you're already starting to see some of the AI tools are changing their pricing models or they're upping the price or even GPT five, like with it, it has this auto thinking feature where it

16:47 intelligently routes your question to the best model it thinks for the best answer. But then I've seen a bunch of people publishing data on this and it's like the output. you know, the number of

16:59 tokens, which is ultimately how you get charged for these things with the new models. That's the gift card, right? Or like two to three times the amount of what they were with the last iteration.

17:10 And so it's like they're just, they're basically inherently forcing users to use more tokens so that they make more money. But with, you know, another on the cost side with RAG versus language

17:20 models, language models have a cutoff date, traditional foundational models have a cutoff date, right? So because they have to be trained, it's not this dynamic thing where you can just keep

17:30 adding data to it and it magically works the minute you add it to it. With a RAG, the beauty of RAG, and one of the main reasons we went with RAG for this solution is because it can be this living

17:42 thing where I point it at a folder or a group of folders or an entire directory of data. And anytime a file gets added to that, I can run it through my pipeline index and it's ready and available

17:52 for search within minutes. Whereas a large, a foundational model, it would take months for. you to train it to the point to add a new data set. And so those are the big differences there.

18:05 There's lots of really cool stuff and a lot of very valuable things from large language models, foundational models specifically, coding is one of them. But getting to the depth of my company's

18:18 specific data and the way that we phrase things, the verbiage that we use, all of that stuff, that's like foundational models never gonna get to that level of specificity, at least, unless

18:30 someone just explicitly focuses on that for the industry, and I'm not saying that that won't happen. But it's still, a foundational model is trying to be a generalist in a topic, or multiple

18:41 topics, right? It can go deep in certain topics, right? Like they've got it passing the bar and certain medical exams and stuff, but it still doesn't have the logic Like they have thinking modes

18:55 and.

18:57 They have this logic tree and stuff that they go through, but there's been a bunch of studies also that show that the logic starts to break after it gets one or two steps deep. And so it's not logic

19:09 like you and I have. So a foundational model will have all of the knowledge in the world, but it's still like a toddler and understanding what it actually means. It doesn't have the understanding.

19:18 It just has the data. And so that's a big piece of it. So go back to data, 'cause I wanna make this point and I'm gonna say this as a statement, but it's obviously a question. I think one of the

19:33 things you were basically saying is at the end of the day, you're chunking, embedding, slapping metadata on

19:42 the data as it's ingested in such a way that a vectorized database can understand relationships between stuff and

19:54 I think ultimately the promise of AI search is great. I mean, it's nice not to waste time, but ultimately it's automating a workflow. And I think you almost have to start here so that, and let's

20:10 go back to the marketing contracts. We may have marketing contracts that are pegged to CPI. So we need to go out and hit CPI and my database has to know those relationships to be able to create that

20:24 workflow. Am I right on that? Yeah, it has to know the database, it has to know what's in the database, the types of fields, the structure, all the relationships, all of that stuff. And so

20:35 that to me is one of the biggest promises of a language model is traditionally we have either raw data or what I call contextualized data so it reports some kind of text data. And typically we don't

20:47 have them in the same place, right? So it's like an engineer is looking at the raw data, looking at it in the dashboard, on the spot fire chart, whatever. But then it's not till after the job is

20:56 done that the actual post job report comes in that has all the details of like, okay, this is what we did. Here's the notes and handwritten stuff, or input notes from the guys in the field that

21:08 contextualize the numerical data. And so being able to have all of that in one place in a language model really makes things rather interesting because you can now blend your internal data sets with

21:19 the numerical data sets that you also have instead of them traditionally being kind of siloed away from each other. It also allows you to pull in external data like CPI or crude prices or stock price,

21:30 whatever, right? There's so many APIs out there to pull data into different places and that's what MCP basically is, is a way for you to get data from an API using a language model so that the

21:42 language model can ingest it and use it and understand it and things like that. So it's really exciting from that perspective just because traditionally our industry has been so siloed from a data

21:52 perspective It slows so much stuff down, right? Like someone in accounting needs some specific thing from an engineer and the engineer doesn't have time to get it, but the data's in there. And so

22:02 if you just had a tool that allowed the person in accounting to search for the answer, like a rag, they get that without having to ask anybody or, you know, wait. And then also

22:16 the being able to automate that I know the accountant on the 17th needs to know this number and bop, it just shows up. Yep. Without even having to ask. Or even like the exceptions of workflows,

22:27 right? Like a lot of in the regulatory workflow that we're doing with the G10s and the W10s, right? It's pretty straightforward. You're just looking at the last production test for each well and

22:37 you're reporting that. Well, a lot of the times there aren't, you know, there are wells where they haven't done a production test in the last six months and they need to, but they don't realize

22:45 that they haven't done one until after they start filling out this report Now you can run, you dump your. your notice in from the railroad commission and it runs and it pulls in and it will clearly

22:56 show you, okay, here's the wells that you've got data for for that well test. Here's the wells that are missing well tests. So go get some well tests before this report or this notice is due to

23:06 the railroad commission. Otherwise you're going to get fined. And so it's like those things would have been found out last minute because it takes them hours to do those workflows just because of

23:15 how they're kind of set up. But now you can run that in literally less than 30 seconds and have direct marching orders for everyone on your team of, okay, here's the wells that we've got the data

23:26 for. Here's the wells that are missing data. Now we need to go dispatch our guys in the field to run some well tests before this date so that we can get the submitted and everybody's happy and we

23:34 don't get fined. There's just so much of that, you know, that we don't even know that it exists a lot of the times, but you uncover it as you

23:43 go through these workflows We will not be ignorant, so it's going forward with it, so let's go back to data So, um.

23:52 I, you know, I show up with my bankers boxes. You're telling me I need to digitize it. We're going to scan. We're going to figure out how to do best practices in terms of tagging what it is, et

24:07 cetera. How do I deal with this problem? We were talking to one company and I'll get this somewhat wrong. I think they said they had 39 terabytes worth of data on their system. 29 of the terabytes

24:23 were duplicates. And

24:26 that makes sense, right? You download something, you sit there and mess with it, you store it on your hard drive, you don't put it back on the system. All that. How do we deal with that problem?

24:37 Yeah, there's some programmatic ways that we can deal with that on the back end. I won't get into the, it's not that much of a secret sauce by any means, but there's ways that you can essentially

24:47 identify how unique a document is at a text level. And then as you ingest, you can compare that with what you've got in your index and see if it matches and if it matches, then you just don't index

25:00 it. I mean, we can do something. I mean, 'cause AI can really help on this process as opposed to, I mean, five years ago, you're having to do that manually. Potentially, yeah, there's some

25:11 hacky ways to do that without having to have a person involved, fortunately. So that's a lot of what we're focused on is like, Colin is a big proponent of doing things that don't scale, right?

25:24 But you get to a point with software, where especially at the enterprise level, you've got to make it scalable in some form or fashion and stuff. And so we'll always take on projects that are new

25:35 or people are like, oh, there's no way you could do that. But the main point of that is not only can you do it, but can you do it in a way that it's enterprise and production code worthy And so

25:48 that's another big, you know, that's a difference. me writing a script that I just run locally on my laptop versus me writing a full-blown app that has, you know, Entra integration and SSO and

25:59 all the security and the permissions and management of the users and all the other stuff. And so that's, I'd say that's one of the big things there is like you've got to understand that there's a

26:11 give and take always. And so most things are feasible. It just depends on how much money and time. If it's a finance change, we ought to take Chuck's file over John's file. If it's a frack job,

26:24 we ought to take John's file over Chuck's file. I kind of get it. What else do people need to be thinking about with data? Yeah, I think another one that just generally speaking, people are

26:36 starting to look at. But as someone who's sat through just hours and hour, probably days or weeks of after action AR meetings, or like we ought, on the operations and engineering side, we'll do

26:48 these meetings where something bad or good happened, and then we all sit down and we review what went well, what went bad, why all of this stuff. And people spent, I mean, you spend lots of time

27:01 putting those presentations together, getting all the data together and stuff, but the actual knowledge and the value from those conversations is spoken, right? And so like putting that PDF in

27:11 there is only gonna capture a fraction or that slide deck in there is only gonna capture a fraction of what was actually discussed and the knowledge that was actually shared in that meeting. So

27:21 having just an iPhone sitting there recording the meeting, you can now take that data and transcribe it. And now you actually have, you know, the industry, company, domain, expertise,

27:32 knowledge stored forever. And

27:36 I think - Yes, searchable. Yes, searchable, discoverable. There for every person from there to the, in the future forever, you know, potentially to have access to. And I think that's a really

27:45 interesting thing in our industry given the whole crew change and all that stuff. personally super optimistic that LLMs

27:59 and

28:00 RAG will help with that also terrified as an engineer that like, you know, you can't get, you can't find service companies now that can run a gel job. Like that's all we did when I started in the

28:05 field was most things were cross-link gel. And so it's like, you can't find companies that have hydration units anymore, just the equipment, no less the people to do those jobs. And so like that

28:14 terrifies me that we're gonna have a whole nother like generation that comes through the industry that's gonna have to relearn stuff that guys 20, 30 years ago already knew, but now we've retired

28:24 them. Well, you will appreciate this. So we were out at pro-frack yesterday 'cause they were doing kind of exhibit day, showing all their stuff, and they've got a sand company. And they had

28:36 different grades of sand. And so I went up and I was like, Jacob, roll the camera on me. I go, engineers used to tell me, I need very coarse sand to cut a big path through the rock so that the

28:48 natural gas and the oil could be. flow through to create permeability. And then some engineer who must have been stoned or something said, dude, let's put some talcum powder down there and see

29:00 what happens. And lo and behold, it worked a lot better. Two things, one, mad respect for the engineer to actually experiment, measure, test it. But the other half is, you're totally full of

29:13 shit. You didn't know what you were talking about back then. Yeah, no, I'm very, I don't know this definitively, but I'm adamant that that happened because 100 mesh is a byproduct of 3050 and

29:26 4070 profit when you're mining it. And EOG had their own sand mine in South Texas where they were mining 3050 and 4070 because that's what everybody was using. Well, if you are sitting there in the

29:39 pile of 100 mesh that you've got just continues to get bigger because it's a byproduct of these other two products that you're actually using. At some point, I feel like someone was just like, Hey,

29:49 it's free, we've got it sitting there. Let's see what happens. Don't cost up. Literally. 'Cause that's what an ops engineer would do. They don't care, they don't, I mean, they don't go talk

29:58 to Reservoir and say, Hey, what are my analogy wells based on? Yeah, I also think we as an industry figured out, and this was after a long time of being, I have the same deal, right? Like, I

30:08 can't tell you how many lunch and learns, webinars I sat through from, you know, Sancho Bain and all the ceramic profit companies preaching about how you needed crush strength because, you know,

30:18 the reservoir pressures were too high and you would put sand down there and it would just crush itself. And it's like, come to find out that doesn't matter at all. Like, because any sand that we

30:28 put down there is infinitely more permeable than the shale. And like, at the end of the day, that's what ultimately matters. And so, you know, 100 mesh versus 3050 are both ultimately

30:39 superhighways compared to what the shale is from a perm perspective. And so it doesn't really matter. But having, I think a lot of it is more about, 'cause I saw an article. This week from Exxon

30:51 talking about, you know, another pumping

30:54 was a Coke ash. So another byproduct of PowerGen or coal plants that they may or may not probably have sitting around as a waste that's free that they need to get rid of anyway. And someone was like,

31:07 hey, maybe we had this to a fraction. I don't know if that's how it happened or not, but they're showing, you know, anywhere from five to 15 production increase by adding that. And it's because

31:16 it's lighter than traditional 100 mesh So it actually gets in places where it wouldn't be able to get or settle, at least generally speaking. That's the concept. I haven't looked at any of that

31:26 data, but yeah, it's fascinating stuff. It was fun out there. So, okay, so what else do we need to be thinking about when we're thinking about data? And what I might do is say, what are we

31:44 potentially gonna look back five years from now and go? Oh my gosh, I wish we had done this. So,

31:53 even if you don't immediately use them today, just digitizing your paper documents, like again, it's, to me, it's almost like a piece of tech debt on the, you know, programming software side.

32:05 It's there, it doesn't go away. No matter how long you wait, it's just lurking over you. And so, you know, if you can build that in your budget, you're gonna have to build it into your budget

32:15 at some point, or just get rid of it, like throw it away. 'Cause there's - Totally go ahead, Bill. Right, yeah. At some point, it's just literally taking up space, and it's, and so if you're,

32:26 if you have any plans of trying to use that data, go ahead and at least get it organized, put into a digital format, 'cause the way things are going is that there's a very strong chance in the next

32:35 year or two, you'll be able to leverage a lot more of that data than you would have in the past. Like in the past, you have to use OCR, you'd have to use these very finicky, kind of very touchy.

32:47 techniques to get data out. And that's no longer the case. It's much easier to get data out of, you know, PDFs, people think are structured, but PDFs are just a dumpster fire for, it's just a

32:58 dumpster where people throw any kind of data to, you know, and so extracting the data from those is not easy by any means. But the tooling around that over the last year or two has gotten so good

33:09 that it's only going to keep getting better. So and I want to be nuanced on how I say this, because everybody's cloud bill is through the roof. And they think it's expensive But storage actually

33:19 doesn't cost anything. Storage continues to get. I mean, my brother has been mechanical integrity of hard drives since he graduated from Rice. He has kind of sat at the same office through compact

33:31 computers. Yes. Every once in a while, a supplier steals him away and he goes away for a year or two to get a raise. And he comes back and now he's at HP or whatever it is these days. But I mean,

33:39 it's ridiculous how much data.

33:46 And the deal with AI is you can go search that data. So I mean, we're literally driving search costs to zero. I mean, yeah, I think the other big one is, and we see this all the time when we go

33:59 into clients, there's always someone from the productions, GATA, IoT department that's like, I wanted to optimize my artificial lift or my ESPs or whatever. It's like a language model is

34:12 explicitly built for language. It is not built for math And the way that chat GPT does math now is it writes a Python script, and then it executes the Python script because Python is actually good

34:23 at math and built to do computations. And then it gives the answer back to the language model. And that's how it does math. And so it's like

34:31 being able to see GPT do something and then assuming that that is how everything can work like that is a bad way of looking at these things I think, you know, with numeric data, the. The

34:46 interesting thing with numeric data for language models is being able to query it, right? Connect to my production database and then let me ask questions about it as a non-technical person, not

34:55 having to write SQL or know anything about databases, just being able to ask information about my database and getting it that out. And I think being able to have that across a company, where all

35:06 different departments can ask their questions of the same datasets and get their answers directly and immediately is like truly game changing for how quickly we're gonna be able to do stuff in the

35:17 future. And traditionally it's like, okay, that person has to go take their problem to someone in data or IT. And then that gets put in like a queue, right? They have to create a ticket. And

35:29 then the person who's generating that, or once they get to the ticket, the person who's actually doing the work has to go call the person who put in the ticket to get all the details about exactly

35:37 what they want. Then they have to go write the query, then they write the query, and then they give them back the data. And then that could be a day, that could be weeks. in the future, you

35:45 just ask the question and you get your answer and just keep moving. Interesting. I think that's another really, really, it's just an education piece. Language models are statistical models that

35:56 predict the next best word or set of words, right? They have nothing to do with numbers. And so trying to use them, it's a tool. It's a tool in a toolbox, right? Machine learning models are the

36:07 same thing. Machine learning models are built for that. They're built for feeding in a bunch of historic time series and numeric data to predict what it's going to do in the future. Like, that's

36:17 literally what they're built for. So if you're looking for those types of things, that's a machine learning problem, not a language model problem. If you want to query your data and just ask

36:24 questions about it to find answers in a database, that's a great language model use case. But I think just understanding how they work at kind of a base level and understanding the limitations of

36:35 these things is really important. Because if you listen or you're watching any of the GPT, Claude, Jim and I, battle in public or in marketing. they do everything for everybody, right? Like,

36:45 that's also part of the problem with tech is they over-promise, they promise the world, they make these demos explicit for this one very specific thing and then they spent God knows how much time

36:54 doing that just so they could show a demo. And then everyone's like, oh well, now they can all do this. And it's like, not necessarily, right? We always overestimate technology in the short

37:04 term and then underestimate it in the long term. Completely agree. Yeah, I mean, 'cause the, I mean, the internet now is ubiquitous and like literally embedded everywhere. It's in our phone,

37:16 it's in our watch, it's, it's, and you don't even, you almost don't even know it's there. Yeah. And I think, I think Altman's right. He was on Theo Vaughn of all places talking about. I need

37:27 to go watch that actually. Well, two interesting things he said about AI is one, it's just gonna be embedded everywhere and you won't even notice it. And my running joke whenever I talk about it

37:37 is, yeah, when you, you know, leave the bar and you drive through Taco Bell and the way home. your toilet's gonna know to warm the seat on the way up. I mean, it's gonna get to that point where

37:49 it's gonna be embedded in everything. The other thing he said that's really interesting that I'll get you to comment on. Maybe here's what we'll close on is Sam Altman thinks that

38:02 your searches into your AI models ought to be protected like attorney client privilege. They ought to be private 'cause now they're not If the government shows up at open AI and says I want to see

38:15 the searches somebody ran, they have to turn it over. Yeah, and I think that got even,

38:21 I guess less restrictive would be the way to say that. Within the last couple of weeks because there was a lawsuit, I guess it was a class, I don't know if it was class action or not, but

38:31 anyway,

38:33 yes, that's a terrifying thing because right now, yeah, they can get your entire you know, and it's still the very. early days of this thing, right? So people who are in there just messing

38:46 around could potentially be flagged as doing something bad when they're just trying to, you know, there's. Yeah, God knows I don't want to see my Reddit queries. So, but I do agree with that.

38:55 That it is, you know, it's information that you're using in the privacy in your own private place. At least people think that it's in their own private place. It's happening in OpenAI's cloud,

39:08 wherever they're hosting their stuff But ultimately, yeah, understanding the privacy around these models and what access they have and the history, how long they store your data, all of that stuff

39:19 is incredibly important, especially when you're talking about enterprise level stuff 'cause no one wants their data. No one wants their next month's earnings calls to be accidentally exposed to be a

39:29 chat chippy team. Right. So that's, no, but I do believe the best implementations of any kind of AI are almost invisible to the user, right? like there's all kinds of stuff on your phone

39:39 happening. Like Apple's really good about that, with the new iOS, you can go in and search through your photos just with plain text, right? It's like there's a language, there's a vision model

39:51 underneath the hood, looking at all the photos and has already labeled all those photos on here. And so it's like, you didn't know, you had no clue, but the feature that you get is the search,

40:01 but the model is running all the time. And so that, to me, is like the epitome of a beautiful kind of implementation of AI because the user only sees the benefit, they get none of the frustration

40:11 up front You don't have to be a prompt expert. You don't have to be an expert at all. You just type in. I wanna see a flower. Yeah, literally. Yeah, okay, I'll confess this. And

40:24 anytime we're out talking to clients, they'll always say, well, can you talk to this? Can you talk to this? And I always say, every time we've had to talk to something, John has been able to

40:38 make that work. It's just what shade of red does John turn? And when he's really red, he doesn't talk to me, he doesn't tell me how much of a problem it is. But so far, we've been able to talk

40:50 to all the databases, all the other legacy software, so far so good. Yeah, no, I mean, I think that's really, that's the thing I'm most excited about with what we're doing, with what other

41:00 people are doing in the language model spaces. There is just still so much siloed data in this industry. Like one client we're working with has recently moved to the cloud version of the software.

41:11 And then because they're in the cloud version of the software, somehow they magically don't have API access to their data anymore. When that was like one of the biggest promises of cloud is that you

41:20 can now provide access to the data anywhere you need it. And so, but it's just the perfect microcosm of what the entire industry has been dealing with for the past 20 plus years is software vendors

41:31 that get you in and then

41:34 cause an excessive amount friction for you to do your job.

41:39 even though you're the one paying them for the service. And so it's like, hey, if we can start an L-verse production, if we can start integrating these things on the back end, then the world is

41:49 our oyster as far as that goes with the energy industry. But, you know, there's still a ton of softwares out there that you can't, or they don't want you to, or you have to pay more to get access

41:59 to just the API, you know, dumb little stuff. And you're paying tens or hundreds of thousand dollars a year for this. And at the end of the day, it's still your data. That's the craziest part

42:08 about all this to me Thank you, sir, may I have another? Yeah, it's the, you know, the operator's data. It's your data. But if you're buying software from vendors that don't allow you to

42:17 access it under very explicit conditions, then there's a problem with that. Like, I can access my freaking washing machine, my dishwasher on my phone. But, you know, if I'm a production

42:29 engineer and operator, I have to be on my company's specific network or on a VPN and then have to have multi, like all this extra stuff to go into it, of that security and that is good and valid.

42:41 But from the software perspective, the software companies that are going to get ahead are the ones that are going to allow people to use their data from your software where they want to ultimately

42:51 end up using it. And so I think the industry historically has kind of the incumbents have rested on their laurels because they got in first and fast and they just didn't have to do development or add

43:04 these things that the users wanted. But now that there's more and more software coming out, tech has continued to infiltrate the industry. That's changing. Well, and I've totally drunk the

43:12 Kool-Aid because when you use a GPT interface and you use natural language, it's just intuitive. Yeah. And a lot of the switching cost, I think, on legacy software was just, I know how to use

43:26 this for sure. Well, and that's the potential cool part, I think, in the short term is that language models layer on top of those softwares, right? So like you're not having to change user

43:37 behavior. in the software that they're using every day, it's just that data is now connected to all of the other data via a chat interface. And so over time, I think, there's no telling what the

43:48 software market looks like in five, 10 years because of how this changes things. 'Cause I do think it will significantly change a lot of the structures of the kind of foundational software tools

43:57 that we use, but. Right. The only key is it needs to change where we're rich.

44:04 That's the plan, Chuck I've been through too much. John, you were cool to come on, man. Yeah, man.

More episodes

Chapters

What is Chuck Yates Got A Job?