The technological moat is eroding in the AI era, what new factors separate a successful startup from the rest?Aurimas Griciūnas, CEO of SwirlAI, joins the show to break down the realities of building in this new landscape. Startup success now hinges on speed, strong financial backing, or immediate distribution. Aurimas warns against the critical mistake of prioritizing shiny tools over fundamental engineering and the market gaps this creates.Discover the new moats for AI companies, built on a culture of relentless execution, tight feedback loops, and the surprising skills that define today's most valuable engineers.The episode also looks to the future, with bold predictions about a slowdown in LLM leaps and the coming impact of coding agents and self-improving systems.Follow the hostsFollow AtinFollow ConorFollow VikramFollow YashFollow Today's Guest(s)Connect with Aurimas on LinkedInAurimas' Course: End-to-End AI Engineering BootcampCheck out GalileoTry GalileoAgent Leaderboard
The technological moat is eroding in the AI era, what new factors separate a successful startup from the rest?
Aurimas Griciūnas, CEO of SwirlAI, joins the show to break down the realities of building in this new landscape. Startup success now hinges on speed, strong financial backing, or immediate distribution. Aurimas warns against the critical mistake of prioritizing shiny tools over fundamental engineering and the market gaps this creates.
Discover the new moats for AI companies, built on a culture of relentless execution, tight feedback loops, and the surprising skills that define today's most valuable engineers.The episode also looks to the future, with bold predictions about a slowdown in LLM leaps and the coming impact of coding agents and self-improving systems.
Follow the hosts
Follow Atin
Follow Conor
Follow Vikram
Follow Yash
Follow Today's Guest(s)
Connect with Aurimas on LinkedIn
Aurimas' Course: End-to-End AI Engineering Bootcamp
Check out Galileo
AI is reshaping infrastructure, strategy, and entire industries. Host Conor Bronsdon talks to the engineers, founders, and researchers building breakthrough AI systems about what it actually takes to ship AI in production, where the opportunities lie, and how leaders should think about the strategic bets ahead.
Chain of Thought translates technical depth into actionable insights for builders and decision-makers. New episodes bi-weekly.
Conor Bronsdon is an angel investor in AI and dev tools, Head of Technical Ecosystem at Modular, and previously led growth at AI startups Galileo and LinearB.
00:00:00:00 - 00:00:23:16
Unknown
The enterprises do not work well with open source enterprises need a very mature solution. When you build an infrastructure company, you need real engineering, right? You cannot wipe code. There are many factors and requirements by enterprise companies like security, stability and everything needs to be top notch.
00:00:23:18 - 00:00:53:23
Unknown
We are back on chain of thought. I am your host, Connor Bronson. Today we're joined by a guest that many of you may know and who I've been following for quite a while. Rmse grit Souness Rmse is someone you've probably seen on LinkedIn, maybe on ex. If you're interested in an AI, you've absolutely seen some of his incredible charts that he shares these graphics and the insights that he brings around everything from the observability stack for AI to how AI agents are being dealt with the enterprise.
00:00:54:01 - 00:01:17:18
Unknown
He makes some really unique and interesting content, and has a deep background in the trenches of data and AI, having been everything from a data analyst, machine learning engineer to ML ops engineer and Chief Product Officer at Neptune AI. Today, he is the CEO and co-founder of swirl AI consulting and Building agent ecosystems for clients. He's a prolific content creator.
00:01:17:18 - 00:01:47:06
Unknown
As we've already mentioned, and he's launched his own course, the End to End AI Engineering Bootcamp, to train the next wave of builders. All right. It's a pleasure to have you on the show. Welcome to Train of Thought had Connor, and thank you for having me here. And super glad to have the conversation with you. I know we'd planned to start on some more general topics, but given that you just told me that your first cohort of your end to end AI Engineering Bootcamp is wrapping up right now.
00:01:47:08 - 00:02:13:12
Unknown
I'd love to hear from you. How has the cohort been? How's the new course going? I do believe that it is going really well. There are a lot of learnings that I'm bringing away from the first cohort, and I'm definitely bringing into the second one. So first learning probably is that there is a lot to cover when it comes to AI engineering, especially end to end AI engineering.
00:02:13:14 - 00:02:48:15
Unknown
Then you're covering systems from the very simplest ones than having to interacting genetic agents, build agents, communication protocols, deployment, observability. Right. So probably eight weeks is not enough. If you are also, working full time job at the same time. So now the realization is that, yes, you can deliver the material in eight weeks, but a commitment for a learner who is actually learning, those materials should probably probably be around, I know, six months.
00:02:48:17 - 00:03:16:04
Unknown
So the materials should be reviewed after the bootcamp. And so I think that's one of the realizations. But in general, it is a very hands on bootcamp, and it seems like people like it. And I think that everyone is bringing a lot of hands on experience from it. Do you think there would be a challenge with the pace of innovation and change that has occurring, though in AI?
00:03:16:04 - 00:03:34:10
Unknown
If you were to do that type of six month course where, oh, look, so much has changed around the frameworks you're applying and there might be new tools you want to bring in. How do you approach this given that you have I'm like your own day to day approach to learning, and then these cohorts that you're also working today.
00:03:34:12 - 00:03:57:16
Unknown
And, so what would it actually meant is that, the course, the cohort itself would still be eight weeks. So it will not span eight months. But, if you want to get deep into topics and properly apply them in practice, you should probably take around six months and spend another four months on top of the eight weeks of the bootcamp to properly learn that.
00:03:57:18 - 00:04:25:00
Unknown
Now, since I am kind of, providing also lifetime access to the materials. So the next cohorts there are, I update the materials are also available for the previous learners and will back get up to speed with all of the changes in the industry. And you've mentioned observability and evaluations as key areas within the AI space. And increasingly, I think we're seeing more and more conversation about this.
00:04:25:02 - 00:04:53:03
Unknown
And you, in fact, mentioned to me that you've considered starting a company in that area but ultimately decided against it, calling the market too pat. Could you walk us through that thought process and how you were thinking about the AI infrastructure market today? So when I decided to try and look into the space I was leaving Neptune, I it was kind of natural for me to try and maybe build something very similar in the similar space.
00:04:53:05 - 00:05:16:14
Unknown
More in the application layer. That's why, the initial decision to actually, research the space, but then kind of even after a few, few of the first weeks, we kind of found 20 plus companies that are doing that with their ability. And you. All right. And apart from the hyperscalers who are also doing that as well.
00:05:16:14 - 00:05:40:23
Unknown
Right. So there are quite a lot and, all of them are covering also trying try to cover end to end because, it's really hard to pinpoint what will be really important in the next few months. So I think that's why all of those companies are trying to, do observability and evals and experiments and project registries and, maybe some of them routers as well.
00:05:40:23 - 00:06:04:16
Unknown
Right. So connecting, but end to end traces and having end-to-end observability and evals of the system. So there was really no probably no unique space to tackle that hasn't been already kind of picked up. And then probably there are also 20 companies, in stealth building those solutions. And half of them are open source and available for free to host.
00:06:04:18 - 00:06:36:12
Unknown
Speaking of open source, you are one of the folks who has correctly anticipated the need for agent interconnection. I've seen you talking about it for months now, maybe even back into 2024, and that space is obviously now being tackled by open standards like H-2a and, agency, which have been donated to the Linux Foundation, obviously. MCP how do you see the open source movement within AI changing the calculus for founders who are trying to build venture backed companies, open source?
00:06:36:14 - 00:07:10:13
Unknown
Few for a few thoughts here. So currently in the first place, currently it is not easy to find a company and be successful, right? So you either build something that, grab some attention really, and you kind of reach some sort of escape velocity within the first months. Once after you start building, or you have, really strong backing from VC, so you have a lot of money and then you can actually build a big team and, all enterprise operations properly, roll out enterprise operations properly, or you have distribution day one.
00:07:10:14 - 00:07:45:03
Unknown
So I guess those are the only ways how you can easily start a successful startup today when it comes to open source, I'm strong believer in open source, but it is really also hard to make an open source product a profitable business. Yeah, and enterprises do not work well with open source, right? Enterprises need a very, mature solution.
00:07:45:05 - 00:08:16:18
Unknown
And that's also the reason why it is hard to build infrastructure company, because, when you build an infrastructure company, you need real engineering, right? You cannot write code because, there are many, many, factors and requirements by enterprise companies like securities, the ability being able to deploy on prem. So the support, that comes with it and their price features and everything needs to be top notch.
00:08:16:19 - 00:08:46:20
Unknown
And then for some of the companies also need, the ability to do hyper scaling on your own side because they're big companies. They might be ingesting a lot of data. And if your infrastructure is not specifically meant for that, then you will not be, able to succeed in enterprise space. So do you see this differentiation between the capital rich folks or the folks who have at least raised a lot of capital to take on infrastructure companies?
00:08:46:22 - 00:09:21:17
Unknown
Kind of taking a very different approach from people who are, as you put it, vibe coding their way to success and maybe using their own built in distribution to try to quickly generate revenue. How do you see these dynamics playing out in the market? And yeah, we'd love to explore that with you. So when it comes to white coded tools and products, I think, this can mostly be successful in B2C type of products, because you're quickly capturing attention, with some sort of a new idea from the broad public.
00:09:21:19 - 00:09:44:20
Unknown
And when it comes to building enterprise products, I still think that, VC backed companies with large amounts of cash will be the winners. Unless you really build to something really, really great, really fast. And then you get a lot of money, and then you hire hundreds of engineers to refactor your white code. It, I know what what's you what's your take on this?
00:09:44:22 - 00:10:09:09
Unknown
Good question. I, I think you're spot on that it really depends on the space. Every time I see someone trying to code their way to a business solution, I just assume, maybe unfairly, that it's not going to scale. That. Okay, sure. This may work for a certain dev tools segment then. Or maybe a single AI CPU, if you have, the ability to just kind of put a credit card in.
00:10:09:11 - 00:10:28:13
Unknown
But once you start going up against competitors and larger deals and there's actual frameworks being applied about, okay, like how there's a how are your security and compliance protocols? Are you meeting our needs in these specific, areas? Do we we have the rule based access control. We need all the things that enterprises or even just larger scale ups are looking for.
00:10:28:15 - 00:10:50:06
Unknown
I would expect you to see some some major challenges and I do think there's a potentially viable path. And maybe seeing this play out a bit of vibed code your way to a cool demo, try to raise money off that cool demo, and then actually hire engineers to create the whole thing. And I wouldn't be surprised if there's quite a few companies doing that today.
00:10:50:06 - 00:11:21:12
Unknown
I'm not going to name names, but, that also creates a lot of hidden risks for founders who choose this kind of high intensity path where it may help you get to that raise. And maybe that's what you need, but there's a lot of pressure that comes with that as well. But but I guess if a really strong engineer is doing vibe coding and usually are not doing whiteboarding or doing assisted coding, it's very efficient way, then maybe this kind of, a tool coded in this kind of way could actually succeed, right?
00:11:21:12 - 00:11:47:20
Unknown
You can build something good, then you hire a team quickly once you get, DC money and then you scale out. I think it's a viable approach if you have that engineering talent within the founding group, if you if you are coming in and you're a non-technical founder and you're expecting to be able to just find your code, your vibe, code your way to initial success, I would be hesitant because I feel like you'll induce or you'll introduce so many issues.
00:11:47:22 - 00:12:09:20
Unknown
Because to your point, I think you you need to treat like a partner. You can't simply just say yes to everything there. It's very easy to refactor things in the wrong direction and introduce a ton of long term challenges. So yeah, I do agree with you that I if if someone comes in and has an expertise and understands what they're doing, they want to use AI as a partner today.
00:12:09:20 - 00:12:33:15
Unknown
I think that's a fantastic use case for it for and I think we'll see and are already seeing, but we'll continue to see a lot of folks take on founding with AI as a key partner in their initial build out in their demo. And then I think the challenge will be okay. How do you translate that to we're a scaling company, and obviously there's a million people have written books on that.
00:12:33:15 - 00:13:09:17
Unknown
I'm not going to try to pretend I'm Paul Graham and say, oh, here's here's the approach you want to take with it. But the inception point of going from idea to MVP, feels like it needs to just move so fast today. And I think it's a big opportunity for founders. But, of course brings a lot of pressure as you start bringing in those VC dollars and another backing, even if you have raised and, you know, a successful vertical app, maybe even get to a million to an R, there's also this copy risk that's being introduced of, oh, this could be copied easily.
00:13:09:17 - 00:13:26:19
Unknown
Now it's a lot easier to just say, oh great. Like let's, let's take it what our rivals doing and we're going to do the same thing. How do you create that defensible note? That's where my mindset now of it feels like having a clever idea, I mean, or an early traction. I mean, obviously it's never been enough on its own.
00:13:26:19 - 00:13:56:20
Unknown
You still have to execute. There's so many things that have to go right. But I wonder if the the moat of technology, is actually less defensible today. It is, I guess it this and especially you touched this point previously, like when you're doing an enterprise sale. Right. So it's not like you're just coming in solo, a single company and trying to a sell, enterprise sales process is a very kind of known.
00:13:56:22 - 00:14:18:19
Unknown
It has very known patterns there. You would be, benchmarked against ten hours and the investor would be chosen. Right. And then, there are a few risks. Either you're already entering a very hot, market. So. And then how do you become better than others like, even if you like, no one will pay just because you are a known person, right?
00:14:19:00 - 00:14:54:15
Unknown
That they will pay because the product is good and better than others. So you have need to have all of those features that you need and, implement it in a more efficient way, way than your, competitors. And now when, this copy risk exists, then a new product can very quickly me and my code it you could say bad coded, but if you have a, very strong engineering team with, AI at their side, so maybe five strong engineers, then, you know, it's very easy to build, even, for example, observability infra.
00:14:54:21 - 00:15:17:03
Unknown
Right. If you are a really great engineer and you're not, a single engineer at the company, and then you use AI to some extent, you can very quickly build a observability and eval tool that, rivals tools like, Lang Smith, for example, or Lang Fuzed. Right. Or yours. I don't know, I never used Galileo, but, maybe try it out, let us know in think.
00:15:17:03 - 00:15:39:16
Unknown
Yeah. So I guess let's make this practical then. If you were advising another founder today and they have an ID and they're like, how? How should I get started? How should I approach this? You know, we've talked a bit about a couple of these flashpoints that are now occurring where it's easier to get to MVP, it's maybe easier to copy.
00:15:39:18 - 00:16:00:19
Unknown
Well, what's your advice to that founder? Be it who is coming in with the unique ideas, is trying to think through how should they should approach it? So I think this is also how many VCs think, is that it's not all about the problem. Right? It's very it's a lot about the founder, herself or himself.
00:16:00:21 - 00:16:23:17
Unknown
And what this really means is, can that person very quickly pivot and adjust to the changes in the market? So usually the first idea is not a great idea. So I would really try and, maybe figure out how good they're pivoting ability is then this is the first one. The next one is how do you sell the product.
00:16:23:17 - 00:16:59:17
Unknown
How do you market that? Because that's, probably even more important than the product itself, at least at the very beginning. Right. Distribution and reach is key. So I wouldn't even be a too strict on the idea that the person is trying to build that and rather, see which part of the market, those founders are targeting and maybe and looking a little bit back into their histories, what they have been doing before and how they, think about the industry in the first place, I completely agree.
00:16:59:19 - 00:17:24:02
Unknown
I'll say my most successful agent investment. So far have both been instances where the founders have pivoted and said, we didn't quite have this right initially, but we saw the potential of, you know, how smart and driven and thoughtful these people were. And they found the path. And I think that's true of most folks who are doing investing is the, you know, founder first.
00:17:24:04 - 00:17:53:09
Unknown
Who are they? Will they actually take you down this path? Do they have the the grit and determination and the, you know, mental ability to think outside the box, but then also bring order to the chaotic ideas that they're putting out in the world. And know the real the kind of decision points probably come once you actually put out your first idea into a market, and then you get the feedback, and then you get maybe some paying customers, and then you kind of figure out what needs to be done next.
00:17:53:11 - 00:18:19:02
Unknown
I think that's one of the really exciting parts for a lot of entrepreneurs in the space. Right now, which is that it's so much easier to get the MVP and start getting that feedback faster. So you can say, oh, I did this completely wrong, or oh, great, we've got something here. Let's let's see where this goes. And it's creating this intense market pressure for speed, both within larger companies and for entrepreneurs.
00:18:19:04 - 00:18:55:15
Unknown
How have you integrated this focus on teaching what's important and, and this idea of focus, which I would argue is increasingly important in a world where there's just so much happening and so much information and the cost of generating code or generating content is drastically decreasing. How do you bring that into your course and your your work as you help advise and understand what folks should be focusing on for so so my course is really focusing on fundamentals.
00:18:55:17 - 00:19:25:10
Unknown
So it's no not not tool focused course. Even though we are using the popular tools of course. But taking a simple example like I am using land graph throughout the entire build out of, let's say of a capstone project. But at the same time, I'm not using blacksmith, I'm using instructor for structured outputs, and I'm using instructor wrappers within and inside of laying graph nodes.
00:19:25:10 - 00:20:04:20
Unknown
So it's kind of teaching, people that the structured outputs are important, that this is how you can achieve that. This is how it works. You shouldn't rely on those abstractions, but, hide, the, structured outputs are actually achieved. Also not using tool bindings using the, the frameworks. Right. Actually prompting the learning itself by produce tool suggestions by giving tool descriptions and so, so anyway, so the main point is that the, the course is specifically about teaching fundamentals and real, infrastructural patterns along the way.
00:20:04:20 - 00:20:29:07
Unknown
For example, observability is being taught a day one and then we move with observability and evals throughout the eight weeks. And they teach how to evaluate each different system. Yeah. So I think fundamentals are really important. Now we are also seeing the flop of GPT five. I think it is a flop now already. Right. It's not as great as promised.
00:20:29:09 - 00:20:55:00
Unknown
So I think that, this iterative improvement of LMS is happening is starting to happen. Right. So AGI might not be so close. And I think the old, old, which are old things which are two years old, are still very important in building injecting systems. So properly understanding how to context engineer. And by the way, context engineering I think is a very, a very important topic.
00:20:55:00 - 00:21:28:17
Unknown
And very often the overlooked then learning, how to build agent ecosystems because it's not as easy as it looks like when you're building demos, you're not doing any context engineering usually. And yeah. So these fundamental things I think are very important. Are there particular gaps that you're noticing in the experiences or fundamental skills of folks who are either trying to grow their skills working with you or people who are out in the market today?
00:21:28:19 - 00:21:59:06
Unknown
You mentioned observability evaluation. Obviously, we share the viewpoint. Those are crucial and need to be day one instrumentation pieces that continue throughout the entire lifecycle of your application or agent. But I'm curious if there are particular gaps that you're noticing, out in the market today where people aren't really paying attention to fundamental skills. So depends on, what you which part you're referring to is it's the bootcamp itself, because the bootcamp is naturally not for the top, top engineers.
00:21:59:06 - 00:22:23:23
Unknown
I think I'm looking more broadly here if like, well, like, what are you seeing in the market as far as potential gaps definitely evolves? There is still a gap, right? So this is from it's not an emerging topic. It's already in the old topic. But not everyone is adopting the practice of full eval driven development yet, even though it is crucial in building these systems.
00:22:24:01 - 00:22:53:00
Unknown
Then I think over reliance on some orchestrator orchestrators is bringing some problems, and then once the systems, starting to are starting to mature because then you need to go back to, base software engineering without using any wrappers. So people are taking too long to ship in some cases. I think the first entities are not, being rolled out soon enough.
00:22:53:02 - 00:23:33:19
Unknown
The human feedback is not getting, brought back into the system soon enough. Yeah, definitely. In business, understanding, like, people are not, people building those systems are not always very close to business, and they want to build something shiny and adopting some cool tech. Right. But not necessarily solving a business problem. And then, projects and products, start being deprioritized because we are not growing in business value teams are building, a genetic systems in the basement for five months.
00:23:33:19 - 00:23:58:02
Unknown
Then we come out and, system doesn't solve a problem like we just said. Like you have to just get out there faster and start getting that feedback. Or else you're creating a risk point for yourself that you could be building in a silo. Most companies don't succeed that way. I don't think so. I think that, we are still early in MCP days, and sometimes I think MCP is being overused.
00:23:58:04 - 00:24:20:15
Unknown
It brings more value than you have for remote servers. Right. But I don't think that we are, at any point. We are not yet there where we can actually utilize them. Remote MCP servers properly, at least without significant engineering. Yeah. So sometimes just using tools within your code is also a good idea. You don't need to have MCP for everything.
00:24:20:17 - 00:24:59:06
Unknown
Yeah. You don't always have to be using the new hotness. It doesn't. It have to be shiny. Sometimes fundamentals are fundamentals for a reason. You mentioned in context engineering is one point of emphasis, in the market today. And I think there's been quite a bit of conversation around, I mean, everyone the folks who are saying, oh, prompt engineering is the way and, a lot of folks think everybody just short term solve and then vibe coding and context engineering, there's there's been all these terms thrown about and different approaches that have been discussed as this is the new approach that we should be taking on here.
00:24:59:08 - 00:25:34:23
Unknown
And I'm curious from your perspective, when you talk about context engineering as important and, using the system focused lens, what would be your advice to engineers who are maybe under utilizing this and or haven't explored it yet? Okay. So the first very important point is probably that I well, I was thinking prompt engineering. I was always thinking context engineer from day one because if you're if you are building agent ecosystems, you cannot build agent ecosystems without context engineer.
00:25:35:01 - 00:26:02:10
Unknown
Right. So for agent agent builders, context engineering equals prompt engineering because you need to store, the action somewhere. You need to compress the actions, because the context window is just exploding. If you are building systems of multi turn conversations. I don't even have like okay. So I do have, suggestions on what needs to be implemented while performing context engineering.
00:26:02:10 - 00:26:30:20
Unknown
But if you are building a multi-tenant initiating system, you have felt that then I believe like the, agent running for too long, the context window exploding to a few hundred thousand tokens per single run. And then you need to, you know, have, maybe five runs each 200,000, tokens for input. Then it takes 50s to complete, and that's a token.
00:26:31:02 - 00:27:09:08
Unknown
Yeah. So where to focus is, probably, the main ideas that I love in context engineering is, the ability to compress the conversation history, I think, because that is definitely, being able to, also discard unnecessary, actions or store them in so-called, scratchpad, where you can later on pick it up from maybe a writing, all of the state files to the disk, but then picking only what you really need, for specific nodes in your agent system was needed.
00:27:09:10 - 00:27:33:21
Unknown
Then then when it comes to tool usage is just regular pattern. So we cannot avoid, adding those additional, tokens inside of your prompt. Because if you don't do that, then, your tool calls will start rolling. Also, you will not be able to properly, retrieve structured outputs correctly. Right? Yeah. Tool optimization has been a big area of focus for us.
00:27:33:21 - 00:28:00:22
Unknown
I'll say, when we've been looking at agent like reliability and observability is understanding, how can we better, suggest opportunities to improve tool usage within apps because it's a super common problem, as you're alluding to here is like one of the key places will fall is like an agent will just try to use the wrong tool over and over, or it will get stuck in this as loop of trying to solve this problem without going back to first principles and thinking it through.
00:28:00:22 - 00:28:21:23
Unknown
And so yeah, there's there's some really obvious failure patterns to address there and what kind of, low hanging fruits. And do you have to suggest to building. Yeah. Well, I'll say check out Galileo Dot eye and Week or maybe, maybe we'll, we'll contribute. Maybe we'll contribute a lecture to your next, boot camp.
00:28:21:23 - 00:29:04:17
Unknown
Actually. Happy to. I'm happy to chat more about that. That'd be fun. Of course. Let's do it. Yeah. We've been we've been doing stuff around, two layer optimization, in the platform. So basically looking to see if we can identify from an agent graph, or from a, you know, a product with a bunch of traces, like, okay, like what's consistently happening here, if we aggregate these different traces together, can we identify and basically use, our inference engine on the back end to suggest fixes to agents where we can say, oh, like we're seeing this tool issue, maybe, for example, your application is, supposedly booking you a trip and it
00:29:04:17 - 00:29:28:04
Unknown
is looking at, you know, Trivago and Expedia and it's always trying to use Trivago first, even if it fails. Instead, it's not thinking about like that's, you know, secondary option to book a hotel. How do we change the weights on that? So, we've been doing some of that through like automation with other platforms. Are that what we're calling our insights engine, where we're basically feeding the kind of evals data set that's people have established on the platform.
00:29:28:04 - 00:29:54:15
Unknown
So the their metrics, they've created the traces or logs, annotations, they may have, into our judge. And the judge is then suggesting stuff, and then you can provide human feedback on the suggestion. So it's working pretty well so far. I think there's a lot more potential there honestly. Like it's we're just scratching the surface. It's not something where it's like I think our, our long term goal would be something where it's creating this automated feedback loop where it's just like, oh yes, let me go vibe code my app through this.
00:29:54:15 - 00:30:15:16
Unknown
I'm a lot of I've code vibe improve I guess. So just like okay great. Like here's the eval, here's the improvement. Let me take a read a really quick cool great check. But that's, that's the that's the longer term dream I think. So this is, almost like a automated error analysis, right? Yeah. We're trying to automate root cause analysis for errors.
00:30:15:18 - 00:30:37:15
Unknown
And it's not, it's it's not 100% solved yet by any means, but we're we're starting to make some real strides with that. And you'll you'll see us start to do it through, an mSRP line, see where you can access this like these different catalogs of, different error types and, you know, pull it in through mSRP and just do it through it to where it's like, oh, here.
00:30:37:15 - 00:31:04:17
Unknown
Great. We went around the eval. Here's a suggestion, you know, awesome. Let me approve it. But that's all. That's all very much on our, our beta testing side of things right now. So because we can I would say that that observability tooling in general is great. Right. But it's not the hardest problem to solve, right. When you're building agent ecosystems, the hardest problem to solve is to actually create those eval datasets.
00:31:04:23 - 00:31:39:16
Unknown
Yeah, really, really hard. And so, from what I hear, is you but you're not only targeting the actual improvements to the system given the eval data set, but also somehow clustering the, traces themselves to. Yeah, we've got a couple different ways we're doing it. Part of it's through just like new views. So, we have, like, an aggregated graph for you, for example, where you can look at like multiple traces at once and kind of see, I wish I had a good example handy right now, but you can just see like, okay, like what paths to the agent take throughout this?
00:31:39:18 - 00:31:55:03
Unknown
How much are they overlapping where where there are problems. And then we're trying to do that in a much more automated way. As you point out. So, I'm, I'm really interested to see where it goes. I it's the stuff that gets me excited about the platform. It's like, okay, like observability as a base layer.
00:31:55:04 - 00:32:12:07
Unknown
Get that right. Okay. And then try to do evals really well. And then ideally that should a feed an improvement mechanism. And right now where that improve mechanism is I engineers going in and kind of tuning things themselves. But okay, the more we can do to just make it really easy for them to go, oh yes. Great.
00:32:12:07 - 00:32:42:21
Unknown
We see this. It's it's identified very quickly. We can go solve very quickly. I think that's where this whole evaluation informed value space is going to really expand to driving improvement. I agree completely like this is really the piece of the puzzle which is currently kind of missing because just nebulous much. Yeah. Yeah. Too much too much. Ours are going in into figuring out the eval data sets, like 70% of our project.
00:32:42:23 - 00:33:07:01
Unknown
Sometimes. Yeah, we're getting we're getting faster at it. And I think our the fact that we have our, our Luna two small language models fueling some of our, like, eval metrics, I mean, the challenge with those right now is maybe not the time as the episode comes out, but like the for the moment, like we have to fine tune those models to get them really accurate, but they're much cheaper and faster than if we're using an alarm call across the board.
00:33:07:01 - 00:33:23:08
Unknown
So you can do this much more cheaply and much more effectively. And I'll tell you, though, I Adam, I have to edit this out depending on what episode comes out like, we are going to go live with anyone can just fine tune their their metrics using us all on a platform which will make things go way faster, hopefully.
00:33:23:10 - 00:33:41:22
Unknown
But this is all again, it's, I'm this is the this is the edge of the thought from like, we're not quite done with it yet. We're hopefully figuring it out. So that's that's the exciting stuff. That's the fun stuff. And what is your take on all of these OpenAI open source models coming out now? So you can actually pick that one up and fine tune it?
00:33:42:00 - 00:34:04:19
Unknown
I'm a big fan. Yeah, I think I think that we should I'm, I'm very pro sauce models in part because like, I think if we don't open source most models, we're going to have a standpoint where a couple of companies just going to monopolize in the long run. And I think like we really negative for the broader economic picture and broader ecosystem of software.
00:34:04:21 - 00:34:40:17
Unknown
So I think the the opportunity with smaller models to do more specialized tasks and for fine tuned open source models is huge. And I'll say, like our Luna two models were originally based off of like all the models that we took in and redid and fine tuned. So, yeah, I think the, the, the future idea of having a model that is cheap to run and can run on your, your own hardware, and really enable people to have cheap, excellent inference that is like fine tuning to their tasks is very exciting to me because I while I know it's not going to necessarily it's not going to like solve AGI, right?
00:34:40:17 - 00:35:04:08
Unknown
Like we're much, much more reasoning much ran friends. We were GPUs thrown at that problem. I think it can tactically solve a lot of, problems as long as it is fed initially by these broader frontier models where we're spending all this money to get them right. But so I know what's your take? So I believe that there's a need for fine tuning, even in agent systems.
00:35:04:10 - 00:35:32:03
Unknown
We need to fine tune for specific routes. But, there's no buts. But I think that these open source models by open AI will be, a great kind of leap forward for all of this research. Yeah. I mean, even thinking I'm getting my hands on one of those, Nvidia sparks, maybe putting it on my table and playing around open source myself, I'll also say we've been.
00:35:32:03 - 00:35:53:20
Unknown
Oh, sorry. Go ahead. Yeah. No, no, I say was a we we did this. I don't know if you saw our research about, our agent leaderboard. So we did an original version of this back in February, and then we just did an update recently, basically looking at, tool selection quality for different LMS across a variety of agent like, scenarios that were aligned towards enterprise.
00:35:53:22 - 00:36:19:01
Unknown
So I was like, okay, here's like a finance scenario, here's banking, health care insurance. It's, investment telecom's out there being, okay, let's try to actually identify like, are these LMS effective within customer support agents that have to be very specialized for these different areas? And so we looked at both tool selection quality and then action completion with the idea being like did they actually complete what you want and solve your problem?
00:36:19:03 - 00:36:41:14
Unknown
And honestly, one of the most impressive models we looked at, most recent round was Kimi K2, which came out a few weeks ago as we're recording this, from moonshot and, yeah, like Kwan 2.5, their 72 B and then Kimi K2 both did really well. And our analysis. So I think there is a big opportunity for open source models to feed a lot of this.
00:36:41:14 - 00:37:01:04
Unknown
And I'll say like probably by the time this episode comes out, we'll definitely have added these new open source, open AI models to it because I think it's very exciting to see the ecosystem catching up. But I know Gemini and Quad and GPT are gonna jump ahead again. But it's like, okay, like let's let's make sure that we're not leaving the boats behind, so to speak.
00:37:01:06 - 00:37:18:21
Unknown
Well, I've taken us completely off track here as we, talk about agents, but, it's been a ton of fun here. I, I do want to ask you a bit more about how you're thinking through the future of the space. Obviously, we've been talking about open source a bit. We've been talking about the fundamental skills and AI engineer needs.
00:37:18:23 - 00:37:43:04
Unknown
But honestly, one of the things that really made you want to have you on the show is how good the graphics are that you make on LinkedIn, and, you make these complex AI concepts accessible. And I'd love to understand, as you peek into the future and think, oh, what's ahead? What are the big misunderstood ideas or emerging trends that you're excited to showcase the community next?
00:37:43:04 - 00:38:18:18
Unknown
What are you thinking about? So there are few areas that I think are under represented, especially in content creation. And one of them maybe is a step back, but it is data engineering for AI applications. So connecting the data layer, the actual application layer because now know there are there's a lot of talks. There are a lot of talks about data engineering being kind of left behind, even though these engineers are doing most of the work to make these systems actually run.
00:38:18:20 - 00:38:43:17
Unknown
And then there is no supporting content that. So maybe I was thinking maybe I should actually step in the right direction a bit because, talking just about each intake system designs, everyone is doing that today, right? It's really, really it becomes it's becoming really, really boring. Yeah. Okay. Well, let's let's dive in there a bit.
00:38:43:19 - 00:39:08:20
Unknown
I mean, personally, I'll say one of the things I've thought about, what data engineering is that I almost feel like we are just recreating names for subtasks of what data engineering does. With so much of what we've been saying about AI for the last year, what's your take on data engineering and what needs to be done to make sure it's getting the attention it deserves, and also that it's being effective?
00:39:08:22 - 00:39:30:11
Unknown
It has always, always been a problem. I well, I was data engineer in my career for 4 or 5 years. I think, also leading data engineering teams, and even back then machine learning was taking the center stage in data engineering was never. So it's either system design or machine learning or now the AI and I engineering.
00:39:30:12 - 00:39:58:15
Unknown
Yeah, but now I don't agree that, I don't agree with people who are saying that. Yeah, engineering is just data engineering. That's not true. Data engineering is about piping the data to where it needs to live. And engineering is about building those genetic, system designs on top of the data that you have. So it's definitely not the same, discipline, how to keep data engineering in the spotlight?
00:39:58:17 - 00:40:20:15
Unknown
I don't know, but to be honest, I don't know, like, this is a long problem that we are facing. That is the whole industry and the fighting about data engineering. Enough. Maybe education, just general education, maybe just, running bootcamps about it because people are interested in data engineering for some reason. It's simply not, taking the spotlight because it will never be hot.
00:40:20:15 - 00:40:43:16
Unknown
Unfortunately, like, data engineering is not, the money machine that we are looking to that that does leave it a little out of the spotlight. It's true. You've got to find that money machine to to really, get the attention you deserve. Because the data engineers are sitting costs. We are not really producing revenue in a sense.
00:40:43:18 - 00:41:02:06
Unknown
Okay, so what did you do? And more clearly show I guess, more graphs showing the money saved, but help maybe. But yeah, but but money saved is money saved is also not hot. The made is hot right now. It's really not hot today at all. Looking at the burn rate or something, the company is it's like, well yeah.
00:41:02:08 - 00:41:30:17
Unknown
All right. What other predictions do you have about, let's call it the the next six months of I don't want to make you think too far out because it starts to get really blurry at that point. But as you think through, you know, this massive wave of agent, that conversation people are having and, I would argue some of the overhype that's happening on agents because I'll say personally, I'm kind of with Karpathy in this idea of like, yeah, this is the year of agents, but also it's going to be a decade of agents, like, we're not solving this tomorrow.
00:41:30:19 - 00:41:50:13
Unknown
But one of the things you're thinking about, though, as, as, as we head towards this next stage of AI development. So six months, a few months ago, I would have said that six months is really a very, very short amount of time, like a lot of things could happen. But now we are seeing the slowdown of, improvement of them.
00:41:50:13 - 00:42:23:18
Unknown
So I think less and less stuff will be happening as we move forward in this amount of time. So in general, I think that, what we will not see is definitely we will not see any big leaps in G. I, I think what we will not see, we will not see, distributed multi-agent systems in production yet even though everyone is talking about it way and, how it will change the world because, companies will start exposing agents and services.
00:42:23:20 - 00:42:57:21
Unknown
Not not in six months. I don't I don't believe in that. It's too hard to build, multi-agent systems for various reasons, but, one of them being just regular observability, it's it's really hard to instrument a distributed system, especially when it is, long running it in the systems behind those, distributed services. So what I'm really looking forward to is, coding agents, C coding CLA agents improving because we are already quite good.
00:42:58:00 - 00:43:29:03
Unknown
And I'm, I mean, the agents that, do not require of writing any line of code, so like code for. Yeah, exactly. Because they are already quite good. So I'm really looking forward on how this develops. And I had a chat with a very brilliant engineer few days ago about this idea of, writing specifications and allowing your, agents to write your code and then, you know, from really making microservices ideas come to life.
00:43:29:03 - 00:43:51:10
Unknown
The code is useless, right? You can throw it away, and you can just rebuild your entire service with the next iteration. So I'm looking forward on the next iteration of, these coding agents because I think there's something here and it will definitely change. Software engineering as it is, you know, but the the industry is now I think will start moving slower.
00:43:51:10 - 00:44:12:18
Unknown
So six months is not that long. The bed frame. Do you think we need to basically take a new leap in scaling as far as like massively more amounts of energy, tons more GPUs to kind of take the next step? Or do you think this is, inherent challenge with AI hardware today and there's a need for, non transistor based architecture or a new architecture?
00:44:12:23 - 00:44:36:21
Unknown
What's your kind of take on what's going to get us past this? Maybe a little bit of a wall. We're hitting. So we need to take one of two sides right. Either you believe that lamps will allow us to kind of thermal forward, through this BI barrier that we are facing, or you need to take a side, but, you will even need a different kind of architecture in on the module side, which will be, based.
00:44:36:23 - 00:44:59:08
Unknown
So I'm rather on the second or on the second one, I think I love this will not bring us to AGI, so it might not even be hardware problem, it might be the actual model problem. Model architecture problem. Does it mean that, this new breed through model, if you take care of it, we will find who will require as much of compute as we currently are building for.
00:44:59:08 - 00:45:24:07
Unknown
I don't know, but I think that we would need this kind of, computing firepower for inference anyway. So, even with these kinds of models that we currently, I think, that's a good question. I think I agree, I think we, I think it's too I would, I would name three challenges. So one, I mentioned energy.
00:45:24:07 - 00:45:46:12
Unknown
I think we're going to need simply a lot more chips. I think we're going to see a barrier wall in the next year or two where we just realize, hey, we need more nuclear reactors. We need more, you know, energy sources here. Like, we simply can't build the amount of data centers we want to, and have the type of power grid we want to with our current setup.
00:45:46:12 - 00:46:08:17
Unknown
So I think there's a massive investment needed there. We're starting to see that with know Microsoft, for example, investing in reopening, the recently shuttered Three Mile Island reactor. We're seeing folks talking about, small nuclear. We're seeing people talking about investing in, natural gas in different areas. We're seeing Solar Green brought up. But, I think that's a limiting factor we have to consider.
00:46:08:17 - 00:46:24:08
Unknown
And that outside of the hyperscalers I don't see talked about as much as maybe I should be. Maybe it's because it's not really our problems if we're not a hyperscaler. And I followed data science, but, like, we should be cognizant of that. And that obviously aligns to the secondary problem of like, yeah, we're we are still going to need more inference.
00:46:24:08 - 00:46:45:17
Unknown
We are still going to need more chips. We're we're going to need, you know, more data centers. And, I definitely think that's an inherent challenge today with all Lams. But I've always been a skeptic about this idea of getting to AGI just by brute forcing with all arms and, I look forward to me being proven wrong.
00:46:45:17 - 00:47:05:14
Unknown
Well, we'll see, but I, I agree with you. I think we're going to need to fundamentally make some change with the architecture. The current LLM movement is incredibly useful as limbs are very useful. We've made massive advances, but I, I don't I mean, if we if we look at the basics of it, we're not truly creating thinking machines.
00:47:05:16 - 00:47:33:14
Unknown
The way that I think has been built at times like we're creating machines that are doing fantastic things for prediction and, and have incredible memories and data sets and do unique things. But mostly they're predicting what should go next versus, I think, creating that new. And so it just feels to me it's like to make a true breakthrough here and have a fundamentally different paradigm.
00:47:33:16 - 00:47:54:06
Unknown
We'll take just that, like, a new way of exploring the problem. We're going to create a lot of business value. We're going to create really interesting systems out of this. We can solve a lot of problems, but I don't know that we're going to truly redefine how thinking works. And and that to me feels like a is a different step.
00:47:54:06 - 00:48:14:01
Unknown
So I guess it's also a question of how do you define AGI like our. Yeah. Like do I think an LLM, I don't know, GPT six or Gemini six or whatever. It could become a strong enough knowledge worker that it can just solve most business problems that are kind of inherent today. Sure. I think that's that's possible in the current architecture.
00:48:14:03 - 00:48:38:11
Unknown
And I think that's a huge amount of value and worth shooting for. But I don't think it's going to create a new model of physics, I guess, is how I. Yeah. There's but there's one more, area, but I'm kind of think that it could work, but, I forgot to mention so self-improving agents. Right. So agents still kind of seem like they are the way to go.
00:48:38:11 - 00:49:05:06
Unknown
At least short term. As with the ones that we do have right now, how do you make the agent rewrite its own code? And I think this is where we also will need a lot of inference. And this is where a lot of the nuclear reactors should be going into them. But, how do we make agent so we can make it the right new agents spawn new agents, like, evolutionary algorithms do, right?
00:49:05:09 - 00:49:26:16
Unknown
Yeah, that's that's one potential. But I think this will be an area of research, active research in the next few years. Two years maybe. Not six months, a little bit later. But definitely looking forward to this one. Yeah. I think it's really easy, because there's so many exciting things happening with space, particularly school years to expect that, oh, we're going to solve this immediately.
00:49:26:21 - 00:49:47:00
Unknown
It's going to be solved tomorrow. And and in some cases, I've been surprised by the problems that are being solved. But, agents, swarms. And again, maybe I'll be proven wrong on this by the time this episode comes out. It feels like truly having self-improving self growing agent swarms that are very successful for solving an actual enterprise's problems, not doing a cool demo.
00:49:47:02 - 00:50:00:12
Unknown
It's going to take a bit of time still to get right, because there are these inherent challenges are we talked about like, yes, we can vibe our way to our demo on this. We can quickly, you know, pull something together. But actually solving a business problem in a way that makes money is is a different challenge and highlight.
00:50:00:12 - 00:50:32:10
Unknown
So I'm, I just wanna say thank you so much for joining me today. It's been such a fun conversation, and I've really appreciated you kind of bringing us through your thought process and talking about some of the things you're thinking about. Where can listeners go to in order to follow you and learn more about all the great stuff you're creating and thinking about that so you can find me on LinkedIn, you can find me on ECS, and you can also find me on my newsletter, which is newsletter Dot swirl I icon.
00:50:32:12 - 00:50:57:18
Unknown
And very soon I will be start also starting posting out YouTube videos. So you can also start checking my YouTube channel which is still empty. It's already there for more than a year. I think it's still empty. So in the upcoming month for sure there will be some fresh videos coming out. Fantastic! Well, I'm excited to watch those and I highly recommend to all of our listeners.
00:50:57:18 - 00:51:18:22
Unknown
Definitely follow us on whatever platforms you're active on. His works deeply valuable and his his thought process is, I think, excellent. And while you're at it, make sure you're subscribed to the Chain of Thought podcast on whatever platform you're interested in here, whether that's Linked Ending, Follow Galileo or Chain of Thought podcasts. Spotify if you're listening Apple podcast we love a rating.
00:51:18:22 - 00:51:33:18
Unknown
A review means it makes a huge difference. If you're on YouTube right now, you can see our most of the nice smiling faces as we discuss the future of AGI and everything else. And just want to say thank you so much, everyone for listening, and thank you so much for joining us today. It's been an absolute pleasure.
00:51:33:20 - 00:51:46:16
Unknown
Thank you for having me.