Chain of Thought | AI Agents, Infrastructure & Engineering | AI in 2025: Agents & The Rise of Evaluation Driven Development

"In the next three to five years, every piece of software that is built on this planet will have some sort of AI baked into it." - Atin SanyalChain of Thought is back for its second season, and this episode dives headfirst into the possibilities AI holds for 2025 and beyond. Join Conor Bronson as he chats with Galileo co-founders Yash Sheth (COO) and Atindriyo Sanyal (CTO) about major trends to look for this year. These include AI finding its product "tool stack" fit, generation latency decreasing, AI agents, their potential to revolutionize code generation and other industries, and the crucial role of robust evaluation tools in ensuring the responsible and effective deployment of these agents.Yash and Atin also highlight Galileo's focus on building trust and security in AI applications through scalable evaluation intelligence. They emphasize the importance of quantifying application behavior, enforcing metrics in production, and adapting to the evolving needs of AI development.Finally, they discuss Galileo's vision for the future and their active pursuit of partnerships in 2025 to contribute to a more reliable and trustworthy AI ecosystem.Chapters:00:00 AI Trends and Predictions for 202502:55 Advancements in LLMs and Code Generation05:16 Challenges and Opportunities in AI Development10:40 Evaluating AI Agents and Applications16:07 Building Evaluation Intelligence23:41 Research Opportunities29:50 Advice for Leveraging AI in 202532:00 Closing RemarksShow Notes: Check out Galileo⁠⁠⁠⁠⁠⁠⁠⁠⁠ Follow Yash Follow Atin Follow Conor

Show Notes

"In the next three to five years, every piece of software that is built on this planet will have some sort of AI baked into it." - Atin Sanyal

Chain of Thought is back for its second season, and this episode dives headfirst into the possibilities AI holds for 2025 and beyond. Join Conor Bronson as he chats with Galileo co-founders Yash Sheth (COO) and Atindriyo Sanyal (CTO) about major trends to look for this year. These include AI finding its product "tool stack" fit, generation latency decreasing, AI agents, their potential to revolutionize code generation and other industries, and the crucial role of robust evaluation tools in ensuring the responsible and effective deployment of these agents.

Yash and Atin also highlight Galileo's focus on building trust and security in AI applications through scalable evaluation intelligence. They emphasize the importance of quantifying application behavior, enforcing metrics in production, and adapting to the evolving needs of AI development.

Finally, they discuss Galileo's vision for the future and their active pursuit of partnerships in 2025 to contribute to a more reliable and trustworthy AI ecosystem.

Chapters:00:00 AI Trends and Predictions for 2025

02:55 Advancements in LLMs and Code Generation

05:16 Challenges and Opportunities in AI Development

10:40 Evaluating AI Agents and Applications

16:07 Building Evaluation Intelligence

23:41 Research Opportunities

29:50 Advice for Leveraging AI in 2025

32:00 Closing Remarks

Show Notes:

What is Chain of Thought | AI Agents, Infrastructure & Engineering?

AI is reshaping infrastructure, strategy, and entire industries. Host Conor Bronsdon talks to the engineers, founders, and researchers building breakthrough AI systems about what it actually takes to ship AI in production, where the opportunities lie, and how leaders should think about the strategic bets ahead.

Chain of Thought translates technical depth into actionable insights for builders and decision-makers. New episodes bi-weekly.

Conor Bronsdon is an angel investor in AI and dev tools, Head of Technical Ecosystem at Modular, and previously led growth at AI startups Galileo and LinearB.

[00:00:00]
Introduction to AI in Software Development
Atindriyo Sanyal: in the next three to five years, every piece of software that is built on this planet will have some sort of AI baked into it. new era of software development, right? Like this is a new paradigms, new ways to build software. And if you turn the clock back to the 1980s, how we used to build traditional software. we're kind of there within the new era of AI software, there's no eval tooling, there's no machinery around, you know, building a piece of software in a robust way. People are using caveman tools, you know, just to look at, wipe checks and eyeballs. That's how potentially we used to do, you know, software development in the earliest days when when software became a thing, so we're kind of back to square zero in a way,
Welcome to Season Two of Chain of Thought
Conor Bronsdon: it's a new year and we are back with a new season of the [00:01:00] chain of thought podcast. Welcome to season two of chain of thought. I'm Conor Bronson, head of developer awareness at Galileo. And I'm delighted to be joined once again by my fellow hosts and the co founders of Galileo.
We've got two out of three here today.
Yash Sheth COO and Atindriyo Drio Sanyal CTO. Atin Yash, thanks so much for joining me
Today.
Yash Sheth: Excited for the new year. Let's do it.
Conor Bronsdon: Yeah, it's great to be here in 2025.
AI Trends and Predictions for 2025
Conor Bronsdon: I think there's so much excitement happening in AI coming out of this incredible growth year we saw in 2024. What do you expect the theme for 2025 to be with AI,
Automation and AI in Software
Yash Sheth: if one word comes to mind, it's automation. so far, we have been really leveraging this technology to answer questions to, make technology more conversational and, while that has been a great start to adopting language models as part of software, the real ROI is going to come [00:02:00] from leveraging this technology to automate so many, so many workflows out there across industries.
Conor Bronsdon: Atin, what's your take on that?
Atindriyo Sanyal: Yeah, I think, AI will find two kinds of fit in 2025. One is it will start getting towards product market fit. where we'll start seeing some user benefits and, you know, these LLM systems will start reaping some business results, but it will also start seeing, product tool stack fit. But what that means is we spent a lot of 2025, you know, putting early prototypes into production, but the tool stack was still evolving and still mature, uh, immature.
Um, a lot of libraries come, come in and go out. We'll start seeing a lot of the engineering and systems around the LLM starting to mature, which will lead into building more practical and better systems.
Conor Bronsdon: Yeah.
Advancements in LLMs and Code Generation
Conor Bronsdon: Is there a particular advancement that you're anticipating, that you're thinking, [00:03:00] Hey, we're seeing the early signs of this and this is going to happen later this year.
Atindriyo Sanyal: I think from my perspective, lot of focus will be on getting LLMs to actually do work and take action, as opposed to just showing some generation that looks smart, which is where agents come in. a lot of the foundational models are also focusing towards multi modality, so that will, that's the other, other thing that'll sort of pick up. and it'll be interesting to see a hybrid system where, you know, the LLMs are taken to the next level. They start actually achieving goals for users and, not only taking language as input, but also images and audio.
Yash Sheth: Just one more thing I'd like to add there is the ability for LLMs to generate, high quality outputs in a lower token count, basically like, you know, able to generate things faster is going to improve drastically like, you know, the Gemini 2. 0 flash model is just just a quick example that came out, not too long [00:04:00] ago, last year.
And, you know, we're going to see a trend in that. I think The biggest, as I mentioned earlier, like, and to Athin's point as well, if automation, true automation is going to be unlocked and, LLMs are going to be able to make, you know, API calls, run, execute code and process large amounts of data, multi modal images, audio, etc.
then generation latency has to come down and we're already seeing that trend happening.
Atindriyo Sanyal: Totally. one more just additional point to the automation bit, since Yash mentioned code generation and, code related use cases, one thing I'm personally very excited about is, taking code generation beyond just generating boilerplate template code, which is kind of the low hanging fruit. I know we got the statistic recently that 25 percent of Google's code is now code gen automated. lot of that if you would put a magnifying glass, you'll see that it's stuff that they would have to write, but, code that's essentially boilerplate. But, we'll take steps towards [00:05:00] better code understanding. And, you know, giving more nuanced Generated code, which is not only, you know, a couple of for loops and, you know, basic things, but more really understanding the context around the code. so very excited to see, you know, progress on that front.
Challenges and Opportunities in AI Development
Conor Bronsdon: Do you think you're going to be ready to implement that autonomous AI dev agent this year, Atin or not? Not quite yet.
Atindriyo Sanyal: I'm excited to. I do think a lot of this was kind of. You know, we had prototypes, released in 2024, which people, many sort of criticized and, thought that, oh, it's just, toys. but we'll truly see some very incredible advancements towards that. the most exciting thing I think is, There'll still be a human in the loop or developer in the loop needed because you're not just writing the code writing is just probably 10 percent of a software engineer's job.
There's design and a lot of other things. So it'll free the developer up from writing code. And at some point we'll get fully [00:06:00] automated code writing, kind of like autonomous driving. And then you can just focus on, you know, connecting boxes with arrows and building awesome systems. Thanks.
Conor Bronsdon: same time though, as someone who enjoys writing code occasionally, even if I personally be the worst one on this call and actually doing it. I don't know that I want to fully free up my code gen. I want to spend my time, as you point out on the kind of like higher level, more strategic pieces. It's that boilerplate that I'm really excited to get rid of where it's like, great, like, let me get to stage one here more rapidly.
Let me make my update from one framework to another more rapidly. And I'm curious if there are particular use cases, whether it's something with software developers or something else that you're hearing from customers about. What they want to see from AI, whether it's this year or in the coming future.
Yash Sheth: one thing I'll like, in terms of particular use cases, it, really depends on the, the industry and the vertical. Like, you know what I'm hearing right from the,the inception of the, generative [00:07:00] ai, landscape here is like, can we use this technology in to convert?
Like COBOL and Fortran code and financial services and like mainframes to more performant, maybe Rust or, you know, even Python for that matter. I know that a lot of there's a lot of effort being put behind that just that, you know, I'm not seeing a lot of financial institutions really talking about it because it's a sensitive topic.
You don't want to change the world's, transaction capabilities overnight. But, you know, I'd love to see, some amazing progress on that front this year because that's truly going to be transformational for the world's economy.
Atindriyo Sanyal: And I fully agree. And, uh, it's very interesting. The point, Yash mentions about, translating code from, you know, old school legacy systems, but also some of the modern software that we've built, potentially on programming languages, which are, chosen for reasons other than performance. And you end up building so much tech [00:08:00] debt tech debt that is like a cancer in every organization you go to. And it leads to slow systems, so it'll be very interesting if we can totally automate code translation into, languages like C++ which are just natively 1000x faster than some of the more application layer languages.
Conor Bronsdon: I absolutely think that's one of the. Almost current opportunities. it's one of the big things Amazon did in 2024 and trumpeted during their, I believe Q3 earnings was, Hey, you know, our Q software generation, tool allowed us to upgrade from, I think it was Java 8 to Java 17, this massive savings across the board.
I had a conversation with. LinkedIn's VP of engineering, Aarathi Vidyasagar, who fantastic leader on this front, thinks very deeply about developer experience in particular. And she thought about it from the perspective of what they're trying to do at LinkedIn and say, okay, we, we don't want our devs to spend their time having to do this major translation.
That's not exciting work. Let's [00:09:00] free up their time for new features. Let's free up their time for more exciting parts of their role and kind of help with that translation layer of great, let's get. Into C whatever it is we want to upgrade to. I totally think you're both
spot on that. This is a major opportunity, both right now with all kind of the co pilots of the world, but also with agents here pretty shortly, if not already.
Yash Sheth: Yeah. I mean, speaking of agents, right, like, I mean, I think until now there's been a lot of focus around code and code capabilities for these language models. and the big reason is that even if you look at the biggest, you know, highest valued companies, a lot of their, OPEX goes in developer costs, like, you know, developers are some of the most expensive resources, And that's where a lot of spend goes.
A lot of innovation happens there. So if you can free up time, this massive ROI to be unlocked. if you look at industry wise and there are tens of thousands of people processing transactions, documents, in [00:10:00] various verticals, whether it's, financial services again, or even, you know, telecom or regulatory defense, there's so much manual inspection happening.
And the reason why it's these things haven't come up yet is because agents are just being productionized this year. that's where the true automation will help all of these. folks get up skilled to not be doing the manual grungy tasks of inspection, almost like develop on top of this technology that, okay, this automation is already helping me review all of this.
What can we build more on top of that to help the end user?
Evaluating AI Agents and Applications
Atindriyo Sanyal: Just to add to that point, I think this is where it also underscores the need for better evaluation tooling for these kind of LLM applications, Number one, not only are these systems more complicated with agents and various other components in the mix, it's not just you querying [00:11:00] an LLM, it's also that you're achieving, it's also that you're not just getting a textual output as a generation, you're actually executing an action. So, you know, the the penalty you pay for the right action versus the wrong action is potentially much higher than, say, a simple generation, which is why you need robust evaluation tooling that can be achieved at scale cost and efficient cost as well.
Conor Bronsdon: Yeah. Atin, I'd love for you to unpack a bit more how is thinking through the approach to evaluating agents in particular, as we see that as such a theme. Here in 2025 is the rise of agents. The opportunity for agents. How do you think businesses should be thinking through agentic evaluations and the different pieces of that process?
Atindriyo Sanyal: I'd love to hear Yash as well. But from my perspective, there's a few different components to [00:12:00] this. Number one is The accuracy off the metrics, that help you truly evaluate. the task at hand and second is customizability because there's no, two tasks which are exactly the same, which one metric can, you know, be faithful. To add a hundred percent accuracy. It's like one size fits some, so that's where the platform kind of comes in and allows the developer rather. It's like inversion of control, giving the developer the power to use these ingredients, but build custom evaluations on top of it, which can adapt To your use case, which brings me to my third point, which is adaptability. Once you productionize these applications, similar to, machine learning in general, like we would productionize models and the data would change over time, the models would meet new data and there'll be drift and you want to address the drift by taking action similar to here, right? As the users are using [00:13:00] the system. Data is changing. The patterns of usage are changing. So your metrics also need to evolve over time. They might lose their accuracy otherwise. So all this needs to be baked into an evaluation platform, but Yash, I would love to hear your take as well.
Yash Sheth: I've always stated, that. when we're productionizing AI, the rigor in AI has gone from curating the best data sets and fine tuning the most accurate model. To using out of the box models that are amazing and really spending that time to create the best set of metrics and this is beyond guardrails because you know when we think about guardrails, it's typically like these prompts or these instructions that we can set in the system prompt for the model, but the model may choose to disregard those guardrails at some point, right?
So, it's very important to measure and it. If we can do an amazing job at quantifying the behavior of your application through metrics, [00:14:00] that's where it, frees us up to scale that application in production. Now, why is that even more important for agents is because with these API calls, the tool calling, the code execution, there are going to be irreversible changes that agents make out there.
And for that, be able to run these metrics that quantify what a good behavior looks like and enforce that in real time in an agentic flow is going to be absolutely critical. we have what we call Galileo protect, but, essentially what's,that is, is a control pane for your generative AI application.
It's like, you know, you may call it firewall for some security threats, but it's more so control pane where this, as soon as you detect bad behavior, you can immediately, create a metric to detect that and prevent your application from. bad behavior within seconds. That's going to be absolutely critical for every agentic application of that.
Conor Bronsdon: And as you both point out, this is an area where [00:15:00] Galileo is putting a lot of time in. You know, we're looking at not only the step level of, Hey, is this agent making the right tool selections? Is it doing the correct actions? But also the turn level of. Is it performing these actions in the right order? And then also, is the final result accurate? And we can get more granular around that. I'm sure we'll have a broader discussion there. But I would also encourage folks who are interested in thinking through this kind of identic frameworks to check out our recent episode, our last one of season one, with Vinnie from Twilio, where he goes in depth on how Twilio is thinking through Their platform for their customers to build AI agents and how Galileo is enabling them with evaluations and observability at every step of the way to build those agents.
we're really glad to have them as a partner and, very excited to continue to grow that relationship.
Galileo's Role in AI Advancements
Conor Bronsdon: And as we think ahead to the rest of 2025, I've loved to give the audience some context on some of the other more forward thinking initiatives [00:16:00] we are Working on behind the scenes, if not already starting to bring on the scenes. Atin, Yash, I'd love to hear from you both. Maybe Yash, if you want to start, what do you view as the top priorities for Galileo and for AI evaluations in 2025?
Yash Sheth: Galileo is squarely focused on building evaluation intelligence for the trust layer in the generative AI side. What does that mean? to build through evaluation intelligence, we have to help our users to firstly solve that measurement problem How do we quantify your behavior, your application's behavior into metrics?
And we'd be able to do that quickly within minutes and accurately is the first thing that's important. the second most important component of evaluation intelligence is, you know, what use are these metrics if they're just offline? Like, you know, we need to be able to scale these metrics and enforce them in production at scale.
because again, With agentic evaluation, it's not like just people talking to a chatbot, they're going to be many [00:17:00] transactions that happen for maybe every data entry in a table or every, every code that, you know, file that is updated. There is an agentic flow that gets kicked off with it. So when you, when we think about these applications and enforcing metrics at scale, those are going to be kind of the top two priorities.
and you know, love to have Atin also talk more in that direction.
Atindriyo Sanyal: Yeah. I mean, here's a hot take from me. I think in the next three to five years, every piece of software that is built on this planet will have some sort of AI baked into it. And it's kind of like new era of software development, right? Like this is a new paradigms, new ways to build software. And if you turn the clock back to the 1980s, how we used to build traditional software. we're kind of there within the new era of AI software, there's no eval tooling, there's no machinery around, you know, building a piece of [00:18:00] software in a robust way. People are using caveman tools, you know, just to look at, wipe checks and eyeballs.
That's how potentially we used to do, you know, software development in the earliest days of when when software became a thing, so we're kind of back to square zero in a way, that's why evaluation is super critical and it took us many decades to get to the level of sophistication and around the tooling around traditional software that has allowed us to literally change the world. 90 percent of the world today has used some sort of software in their life. And to get to that with AI, I think the time frame will be much shorter because we are just more advanced as a species. But, in the next 5 to 10 years, we'll certainly see, a revolution in, uh, The engineering and the machinery around these LLMs, and there'll be progress on both fronts, like you'll have better, better software, but also better models.
And, there's a combination of which will be [00:19:00] very exciting to see, you know, the possibilities.
Conor Bronsdon: That is a spicy hot take, and I think it'll take longer. I think there's still a lot of room for deterministic systems for at least the next several years, but I'm excited to see if, you get proven right here, I will, I will owe you a meal if, you, said five years was your top end.
So let's, keep an eye on it. Cause, uh, we'll, we'll check back here on, and so what'll that be season seven for us and, uh, we'll, we'll have a review of the different hot takes we've had at that point.
so as you think about this advancement of AI and how fast the space is moving, as you pointed out, it is arguably iterating faster than any prior technological advancement, because in and of itself, we are building self learning models that are helping speed the development themselves in a lot of ways. How is Galileo going to contribute to the advancement of AI this year and beyond?
Atindriyo Sanyal: , my perspective is that we think of, not just evaluations as a necessity, but. [00:20:00] scalable evaluations. So we're tackling the problem, not just from the perspective of, Hey, we need to give something is accurate and actionable for a user, but also how do we do it at scale and allow the user to take their application literally to the world. And they get a customized suite of evaluation methodologies that scales with their data, that scales with their application. So I think the focus for us is on two fronts. One is, of course, we have, a scalable platform that allows a user to use a lot of the metrics and create metrics and customize metrics on Galileo, but also, work on, baseline metrics that we offer in the product at dirt cheap. If, for example, we have the only hallucination detection model and algorithm that literally works at zero dollars, And there's no one in the industry that does that. And a lot of research and a lot of [00:21:00] brainpower has gone into building methods like that. So we'll continue to push the envelope on both. You know, research and newer ways, advanced ways, cheaper ways off of achieving a high accuracy evals at scale, but also offer a world class platform that scales with the user and allows them to, design, test cases and. security measures that adapts to their application, like Yash talked about AI security being such a critical thing. I think the software engineering or the SDK API layers that we offer becomes very critical, but also latencies, right? Like, we're the only ones who offer a very low latency, both P50 and P95. that actually works for applications beyond a certain throughput, but pass the baton to Yash to talk more about that and also get his take.
Yash Sheth: Yeah, I mean, absolutely. I think, in terms of advancing AI, right? again, I, I go back to the fundamentals of, [00:22:00] advancing AI means, Increasing AI adoption in software across the board. Now, as AI replaces parts of software or even augments software, as of today, we're already seeing a lot of that having a trust layer that can speed up the adoption and automation of AI.
You know, AI powered software is going to be critical, that includes CICD monitoring and the firewall, you know, the typical trust layer of the software stack that's changing massively and we see how every single hyperscaler out there or model provider is making evaluations front and center.
That's because, you know, that's super, super important to, to adopt. Now, to Atin's point about scaling. if. today, a lot of the evaluation is happening via LLMs, like there aren't any, you know, stochastic or statistical metrics that can evaluate these applications, right? And we're all aware about how LLMs are being [00:23:00] used in this space.
Now, how can we push the state of the art to a point where things can actually scale to millions and millions of requests? To thousands of QPS of traffic at a cost point, the price point that, that can scale as well. no one wants to double their OpenAI bills no one wants to, uh, have, an evaluation system that takes 10 seconds to evaluate one prompt.
Conor Bronsdon: I think you're spot on there. We would all like to decrease those bills if we had our druthers. And I also think it speaks to the value of the proprietary research Galileo has done around our chain pull and Luna methodology. Are there opportunities that you see on the research front to continue to further that unique proprietary advantage that Galileo has? We've been leveraging and hopefully help the entire industry do these evaluations across the board, but in a more cheap and more scalable way.
Atindriyo Sanyal: We've been [00:24:00] working on two aspects of that research. one is, building better, higher accuracy, fundamental sort of foundational models that actually measure things like rag hallucinations and task completions and all the things that people care about. add, reasonably high baseline accuracy that's respectable, but then beyond that, making them fine tunable and to the point of fine tuning, that's the second aspect of our, our work, which is.We term as lunar flow, which is essentially this framework that allows you to, it's almost like a metrics authoring slash fine tuning system that caters to anyone who has a loose definition of what they want to evaluate. They can literally start with a natural language text definition. And from there, we've built the proprietary tech to be able to create a high accuracy metric that adapts to their data.
takes feedback. There's RLHF happening behind the scenes, but also there's an auto ML layer [00:25:00] in the loop, which fine tunes to the data. So you can bring your data to the table, upload it to the Galileo system and magically the metric accuracy improves, over time. So that that's a lot of the machinery, our engineering team works on and then doing that at scale. So those are the two or three main areas of focus for us.
Conor Bronsdon: Yash, is there a particular aspect of Galileo's research around Luna or, chain pull that you think maybe we haven't talked about enough that we should spend more time on?
Yash Sheth: Oh, absolutely. I think one of The most amazing recent launches are that can we, how can we quickly adapt our out of the box metrics to the use case itself? the whole continuous learning flow on identifying not only what needs to be measured, but also understanding your data and what needs Like, what, how can we tweak the metrics to best represent your task?
another [00:26:00] piece of our research that, that Gallio has developed over the years is our capability of measuring semantic drift in the traffic. How can that help, curate the best datasets? How can that help, uh, assign identifying skews in our traffic because as you know, when we think about good observability in production, these applications that we're building are so broad sometimes that users can use these applications in varied ways over time.
If we can capture that meaningfully and give users strong workflows, very, very strong workflows. And, you know, I think talked about the lunar flow. That is basically a workflow, to keep their metrics layer, their data sets, up to date and most representative, then teams will feel most confident in delivering those applications in production.
Conor Bronsdon: Do either of you see opportunities for [00:27:00] us to partner with other players in the AI space this year to increase that research opportunity or that technology opportunity.
Yash Sheth: Absolutely. I mean, I think, we've shown some early, partners through our series B announcements last year with, you know, databricks and ServiceNow and, you know, we have the partnerships with, you know, the cloud providers, obviously, but I think on a technology front, there's a lot happening behind the scenes where, we work with the vector DB providers, we work with the model providers, and we are working on essentially building, a model or a technology agnostic system.
Our focus is to help the application developers. by partnering with these technology providers, developers can integrate our systems into an end to end flow that can help them leverage these models at a higher scale. Today, a lot of the POCs are stuck in [00:28:00] the POC phase and not being able to go into production because of the missing trust layer.
And that's where, we're calling on all partners to actually work with us to embed this trust layer in the stack as we can jointly help developers unlock more value and scale, even from the model perspective, from the vector DB perspective, from the agentic evals, uh, agentic framework perspective.
Conor Bronsdon: To your point, the same kind of work we've already done with Databricks and Google Cloud and others and definitely an opportunity to continue to expand that and have Galileo continue to form this trust layer across the ecosystem. Atin, how about yourself? Are there any other particular collaborations that you foresee or want to pursue?
Atindriyo Sanyal: I mean, anything that gets us closer to the user is a valuable collaboration in my opinion. like one of the issues has been that a lot of the, even the industrial research is sort of the backbone of it is academic [00:29:00] benchmarks and, uh, you know, not, not to throw shade at academia at all. I mean, they're doing fantastic work.
In fact, a lot of this work Revolution comes from universities and academia, but what it's lacking is industrial benchmarks and, standards which are more practical for industry use cases. So any kind of partnership that helps us get closer to developers or acts as a channel to thousands of developers so that we can connect the dots and sort of navigate an ever changing tool scape. and ecosystem. I think that will really help us sort of connect the dots and it'll be great for Galileo to build holistic evaluations that, you know, work for many use cases, many users.
Conor Bronsdon: Excellent. Well, guys have very much enjoyed having a chance to connect with you here to kick off 2025.
Advice for Leveraging AI in 2025
Conor Bronsdon: If you could close with one piece of advice to businesses and engineers who are looking to leverage AI in 2025, what would that piece of advice be? Yash, if you [00:30:00] want to start.
Yash Sheth: I think the one advice and I'll, I'll, uh, I'll kind of harp on my point of Establishing rigor in the workflows. when we're adopting, AI in our applications, it's very easy to build a cool POC and start to launch it out there. but not having that rigor is the big mistake that most, most people make.
So however you want to implement it, one big advice would be start quantifying the behavior of your application into metrics early on, because that is going to be essential as you scale these applications.
Conor Bronsdon: How about you, Atin??
Atindriyo Sanyal: Get Galileo for your LLM evaluation needs, really. I mean, that's my advice. but anyway, on a serious note, I think I truly second Yash's point about just the rigor is completely and grossly been missing. And that has been kind of the bane of the developers experience where they have this magic ball in their hand which can do so much.
The possibilities are [00:31:00] endless. And it's so easy to get to a prototype so quickly And then everything falls apart in shambles because you don't have a robust evaluation framework and, there's so many issues to cite, right, like just the discrepancy between the data that you test with versus the real world.
Data that hits your application, the cost and the scale of your evaluation methodologies, including like manual eyeballing, like there's no dollar value to it. But the amount of time it takes, it's just impractical and untenable. So the need for robust evaluation is the sort of the unlocking thing. Power. It's kind of like going back 10 years, right? The one big thing was missing in massive cloud adoption was security. And once there were security solutions, it was just free flow. Cloud became a universal thing. I think that's the same thing for AI is evaluations.
Conor Bronsdon: [00:32:00] Fantastic.
Closing Remarks and Listener Engagement
Conor Bronsdon: Well, thank you both so much for joining me to kick off season two of Chain of Thought. And for our listeners, we are so excited to be back this year with you. We hope you've been enjoying the show. What we would love is why don't you let us know what else you would like to see. Is there a guest you want us to have on?
Is there a topic you want us to cover? Are we wrong about something and you need to tell us? Let us know. We have open comments on Spotify. You can reach out to us on LinkedIn. You can reach out to us on X slash Twitter. Uh, we'd love to hear from you and hope you're all are having a fantastic start to the year and Atin Yash.
Thanks again for joining me.
Yash Sheth: Thanks, Conor. Looking forward to an amazing season two of the podcast. And yes, it's such an exciting space. please comment, give us suggestions. uh, you know, we can bring in the experts here. this is meant to be a point where we discuss the most important things, for Gen AI.
So yeah, looking forward to, a great season two, Conor. and, welcome to 2025.
Conor Bronsdon: Love it.
Atindriyo Sanyal: Thank you.
[00:33:00]