Chain of Thought | AI Agents, Infrastructure & Engineering

Will 2025 be the year open-source LLMs catch up with their closed-source rivals? Will an established set of best practices for evaluating AI emerge?
This week on Chain of Thought, we break out the crystal ball and give our biggest AI predictions for 2025. Listen as Sara Hooker, VP of Research at Cohere and Head of Cohere for AI predicts a trend towards smaller, more optimized AI models; Craig Wiley, Senior Director of Product, Mosaic AI at Databricks, dives into the future of multimodal AI; and Galileo’s CEO, Vikram Chatterji, shares his predictions, including the rise of open-source LLMs.

Chapters:
00:00 Introduction
02:01 Vikram's top 3 predictions
06:19 AI and nuclear energy
08:30 Giving power back to the people
13:46 Craig's predictions
20:46 The "era of toolification"
30:38 Sara's predictions
35:07 AI safety

Follow:
Vikram Chatterji: ⁠⁠https://www.linkedin.com/in/vikram-chatterji/⁠
Yash Sheth: https://www.linkedin.com/in/yash-sheth-/
Conor Bronsdon: ⁠⁠https://www.linkedin.com/in/conorbronsdon/⁠⁠
Sara Hooker: https://www.linkedin.com/in/sararosehooker/
Craig Wiley: https://www.linkedin.com/in/craigwiley/

Show notes:
Watch all of Productionize 2.0: ⁠⁠⁠https://www.galileo.ai/genai-productionize-2-0⁠⁠

Show Notes

Will 2025 be the year open-source LLMs catch up with their closed-source rivals? Will an established set of best practices for evaluating AI emerge?

This week on Chain of Thought, we break out the crystal ball and give our biggest AI predictions for 2025. Listen as Sara Hooker, VP of Research at Cohere and Head of Cohere for AI predicts a trend towards smaller, more optimized AI models; Craig Wiley, Senior Director of Product, Mosaic AI at Databricks, dives into the future of multimodal AI; and Galileo’s CEO, Vikram Chatterji, shares his predictions, including the rise of open-source LLMs.


Chapters:

00:00 Introduction

02:01 Vikram's top 3 predictions

06:19 AI and nuclear energy

08:30 Giving power back to the people

13:46 Craig's predictions

20:46 The "era of toolification"

30:38 Sara's predictions

35:07 AI safety


Follow:

Vikram Chatterji: ⁠⁠https://www.linkedin.com/in/vikram-chatterji/⁠ Yash Sheth: https://www.linkedin.com/in/yash-sheth-/ Conor Bronsdon: ⁠⁠https://www.linkedin.com/in/conorbronsdon/⁠⁠

Sara Hooker: https://www.linkedin.com/in/sararosehooker/ Craig Wiley: https://www.linkedin.com/in/craigwiley/


Show notes: Watch all of Productionize 2.0: ⁠⁠⁠https://www.galileo.ai/genai-productionize-2-0⁠⁠

What is Chain of Thought | AI Agents, Infrastructure & Engineering?

AI is reshaping infrastructure, strategy, and entire industries. Host Conor Bronsdon talks to the engineers, founders, and researchers building breakthrough AI systems about what it actually takes to ship AI in production, where the opportunities lie, and how leaders should think about the strategic bets ahead.

Chain of Thought translates technical depth into actionable insights for builders and decision-makers. New episodes bi-weekly.

Conor Bronsdon is an angel investor in AI and dev tools, Head of Technical Ecosystem at Modular, and previously led growth at AI startups Galileo and LinearB.

2025 Predictions ft. Sara Hooker, Cohere; Craig Wiley, Databricks; and Vikram Chatterji, Galileo
Introduction and Recap of Previous Episodes
Conor: [00:00:00] Welcome back to Chain of Thought, everyone. I'm Conor Bronsdon, Head of Developer Awareness at Galileo, and I'm here with Vikram Chatterjee, Co founder and Vikram, great to see you.
Vikram: Like us, Connor. Good to see you again.
Conor: it's been a lot of fun kicking off this show with you. And in the first couple of episodes, we've obviously focused on what's happening today in AI with episode one, talking a bit about regulation, the evolution of AI agents, how writer is winning with enterprises with may have be congrats to them by the way, on their just announced 200 million series C of 1.
9 billion valuation, fantastic stuff. And then obviously key efforts from Google, Anthropic and open AI. Within their AI technology and new products they're releasing.
Predictions for AI in 2025
Conor: But today you and I really want to focus on the future and look ahead to what we can expect from AI in 2025. And at our recent productionized 2.
0 [00:01:00] conference with other AI leaders across the industry, we spoke with Sarah Hooker, VP of research at Cohere and head of Cohere for AI and Craig Wiley, senior director of product for Mosaic AI at Databricks about what their predictions about gen AI. We'll look like in 2025 and what the future holds, and we'll share those conversations at the second half of this episode, but first, Vikram, I want to hear from you.
What are you anticipating from AI in
Vikram: question. And it's a good timely time to talk about this too, given that it's Q4 people are reflecting on the whole year that's passed by and everything that's happened. And I feel like there's so much that had happened this year. Everyone thought that at the end of 2023 and people have been even more surprised, now at the end of 2024.
And even for folks like me that have been in the industry looking at language models for the last decade, it's been such an exciting and exhilarating year to see all the advancements. it's also a good time to piece the different together to see where the [00:02:00] trajectory is going.
Vikram's Predictions for AI Ecosystem
Vikram: the way I see it, there are three different things that I'm excited about that I'm going to maybe position as predictions for the next year by the end of 2025.
The first one is more at an ecosystem level. So in terms of It's not just like the models, but everything else around it that makes AI click in the enterprise and for developers at the ecosystem level. I'm very bullish still on open source models. We've been doing, as the Hallucination Index from Galileo every couple of quarters.
what we've been seeing. Through that is, open source is catching up really fast, but my prediction is by the end of next year, there will be open source LLMs which will catch up with the best closed source models of today. So think the anthropic claude models, which are super expensive. I don't even know who's going to use them, but it's very expensive, but large context window, I almost think of them as concept models today, right?
It's here's the art of the possible. Some people can use them, but this is what you can do. But within the next year, I feel like there'll be, there will be open source LLMs that'll catch up with that [00:03:00] for fairly large context windows for a majority of tasks. And you look back at this year and say wait a second, this is only possible if I paid an arm and a leg to actually make this available to me.
But now. I can just use this open source model get it to where I need it to be, which is amazing and really good for the developer community. That's on the ecosystem level.
Market and Developer Level Predictions
Vikram: I also think on the market level is a really interesting shift that's happened with AI, where the rules previously of how products are adopted have changed a lot, right?
If you look at how mobile apps came around, and the adoption was really great for some apps especially in the social and local side of things, like DoorDash and Facebook and WhatsApp. And they started to get crazy amount of revenue very quickly, but it still took them a while to get to, let's say, a hundred million in revenue versus, my prediction is you'll see a lot of these, what used to be called last year, wrapper apps, and people used to diss on them a lot.
I think we'll see at least five to ten of [00:04:00] those applications reaching a hundred million in revenue in ARR by the end of 2025. Some of them are already well on their way. So this is something I'm very excited about because it completely breaks the notion of how do you build a business. And I also think it's going to be really great for the developer community to come together and, and build more applications in the space.
And the last one, the third one is on the developer level. I feel like, this year has been again, a lot about understanding how. How do you actually trust the applications and how do you know that these applications can actually work really well at scale and a large part of that is just evaluations and we've heard from the community that evaluations are hard.
That's because developers are being like data scientists right now where they have to experiment and figure out if it's actually ready to ship or not. And that takes a level of rigor that software engineers are getting really quickly attuned to. So my prediction there at the developer level is by the end of next year, developers will align on best practices for AI system evaluations.
And that's not going to be so much of a knowledge bottleneck anymore. It's just going to be a process [00:05:00] bottleneck, which obviously tools like Galileo are here to help with and support. But I think that by the end of next year, it's going to be a very different place in terms of maturity for understanding how to evaluate and really, monitor your systems at scale.
Conor: I totally agree. I think we're seeing that trend happen already with people talking about evaluation driven development and this conversation starting there. And more and more folks are looking to the evaluation intelligence layer at companies like Galileo to help solve and drive that. I think every conversation we have, we hear from folks saying, Oh we've seen early success.
We need to fine tune our model. Now we need to evaluate in production and pre production. Like, how do we make this work? It's pretty clear. There's a major. opportunity for the industry to rapidly improve, particularly as we're seeing a bit of a slowdown in the size of these large LLM s as they hit their maximum for the moment in, training data and what they're actually able to do. it is giving open source a chance to catch up and it is giving us all a chance to say, okay, how can we now [00:06:00] fine tune this to work particularly well for specific needs? And that's where we're also saying kind of S. L. M. S. Have their day again. I feel like
Vikram: Yeah, absolutely. But those are my predictions, Connor. curious, what are your predictions for the coming year?
Connor's Predictions: AI and Nuclear Energy
Conor: I don't know that I have anything revolutionary or at least I'm not gonna be the first one to say this, but I think 2025 is going to be the year of A. I. And nuclear energy. We've seen this trend kick off already here in 2024 with Microsoft's deal to reopen Three Mile Island Reactor 2. Amazon, Google, and others making strides on leveraging nuclear power, particularly for AI data centers. And I expect to see significant continued momentum on building nuclear to support additional, AI data centers. We've seen many major banks come together saying, Hey, we're going to fund nuclear projects. We've seen advancements, on, micro reactors. And there's a clear signal from regulators that they're going to open up the opportunity to build more, nuclear energy. And that's going to help with our baseload broadly, but it's in particular, going to, drive our [00:07:00] ability to create larger and, more important models with this. So I'm hopeful we'll see more reactors be updated, reactivated alongside continued investments, in those micro reactors. And it's a huge opportunity to have the industry help move the power grid forward while also solving for the future of AI workloads. And I think giving us this massive scaling opportunity.
Vikram: I totally agree. That's going to be very exciting on all fronts.
The New Class of AI Builders
Conor: And I also agree with you around, this new class of AI builders is something that we've talked a lot about. I just think we're going to see this continued expansion, of what it means to be someone in software and in AI. The capacity of cursor or even just using Claude or chat to PT to help you code, help you put together an app, let alone a presentation or something more basic. There is just a complete revolution happening where. We are just now much more easily able to communicate with computers. There isn't this major barrier of, Hey, you have to [00:08:00] learn this new language. You can now come in and do at least basic stuff and increasingly more and more with natural language. I just see that along with the scaling opportunity from more energy, more data, more large models and more small models that we just see coming throughout, whether that's open source or closed source, there is a new Massive wave of people who are coming in saying, Oh let me just build something with this.
It just shows the opportunity for the industry.
Vikram: absolutely. And I feel like this also gives the power to the, back to the people, in a very exciting way where, I keep going back to mobile all the time because I was early in the world of Android in 2011 2012 timeframe. And it was really exciting to see all the developers getting psyched about mobile, building out these apps for this form factor.
But the AI is slightly different because it's not really a new form factor which you're carrying in your pocket. But. It's actually more interesting because it's everywhere. It's ubiquitous. You can put this into an app that you already have. You can also create net new products from the ground up. So it is [00:09:00] much more ubiquitous.
It is the barrier to entry is reducing a lot. Cost was, now that's reducing as well. And I think as a result of this, what starts to happen is, people can get access to the ability to build even sooner. As an example, I was talking to some friends of mine where This is a friend who has a 10 year old son and for his 10th birthday, they asked him what do you want for your birthday?
Do you want, I know they said something generic, like a PlayStation or something like that. But he actually asked for access to, GPT pro. And, he wanted to know more about how you can build websites using GPT models. And that was interesting because for the parents, they were like, how do you even know about this?
But for the child, he was like, that's what all my friends are doing because they're bringing out sites for their favorite characters. So if you think of it as what were the kids of today trying to do, that kind of also points to where the future is going. It's just the barrier to entry in terms of building stuff is just reducing a lot more.
It's just going to [00:10:00] be super exciting and which is why I feel like the outcomes are also going to be pretty large in terms of actual revenue gain for businesses, for the actual operations gain for users. I also think this whole notion of what's the ROI of AI, which is a very enterprise problem right now, which a lot of people are grappling with, which is more of a level setting in terms of expectations for people.
is also going to be something which we'll have smoothened out to some extent because of the amount of maturity that's going to come in terms of what can AI actually do versus what it can't do just yet. So I'm excited about that. all of us collectively getting to a level of maturity with AI.
that doesn't quite exist yet,
Conor: Are there particular applications or types of applications that you think are going to have the most significant impact, through AI in the next year or two?
Vikram: I feel like any, that's the beauty of it, right? Like when, if you look back at machine learning, it typically, we still always talk about how, like even at Google, when we were trying to figure out where can we apply NLP, broad scope, that was the game plan at my team. And we started thinking about what are the repeatable tasks that people are doing [00:11:00] over and over again, which is super manual today, where there's a lot of text data and CV.
There was a lot of image and video data and speech data. And given pervasive unstructured data is, the list is really long of where people have to just go through documents like one by one. Just think about it.Through your day, you're probably likely interacting with a bunch of people who are doing that.
You go to your closest UPS or FedEx store, there are probably 10 people doing something like that. You go to a supermarket, there's probably a person who's looking at every single label and putting it in front of a machine. There's so many tasks which can be done. Massively, improved in terms of efficiency, just using generative AI.
So I'm just excited about in general, people who are subject matter experts about the retail industry or the financial services industry, specific parts of the healthcare industry where there are massive inefficiencies, especially in the U S, coming in and pairing up with a developer, and just building something really exciting and interesting here.
So the opportunities are really incredible. And I feel like some of these use cases that. We can think of right now, we'll be actually very surprised by the end of next year [00:12:00] in terms of how some of them have just become extremely ubiquitous and we've just started to get used to them around us.
The here and now obviously are things like code generation, which is a massively saturated market. there are a lot of players out there that has emerged as really good use case. There's also obviously other services like legal services and RPA that have come up as, very obvious choices for Gen AI to be, Very influential.
And, as we know, at Galileo, we have a lot of financial services customers and customers in telco where they're doing a lot of, deep dives into where are there, where there are outages and using natural language to sequel use cases for that. So it's just been fascinating. all of this is in the last, six to eight months where these use cases have come up and just proliferated across these organizations.
So in the next 12 to 14 months, there's going to be so much more that's going to happen.
Conor: Awesome. I love the optimism. I think it's a really exciting time to be an AI. And, for folks who are listening and maybe are starting to get into this industry, maybe they're not full time [00:13:00] yet. This is the time to jump in because there is a massive opportunity ahead for us.
And you'll hear more about it shortly from Craig Wiley at Databricks and Sarah Hooker at Cohere after the break. Vikram, thanks so much for coming on again.
Vikram: thanks for having me, Connor.
Conor: While you're listening, if you're on Spotify or Apple podcast, do Vikram and I a quick favor, click on the show and select an option to rate the podcast five stars, leave a little review. If you're on Apple, we're so thankful to all of you who have already done. So we're already seeing a ton of interest in the podcast and we can't wait to share the show with more AI builders around the world,
Craig Wiley's Insights on GenAI
conor_2_11-19-2024_152919: I hope you enjoy this conversation with Craig Wiley From Databricks joining us alongside Yash Sheth, a COO and co founder at Galileo for a conversation between the two of them about their predictions for AI in the whole idea of this session is. to get your perspective, Craig on, how, you and the Databricks team see the GenAI space evolving and where it's headed. let's dive into.
some of [00:14:00] the revelations are learnings from 2024. Yeah. the learnings have been good ones, right? If I look back to a year ago at this time, it seemed like the only topic was rag, it was like, Hey, can I rag this, can I rag that, can I rag the internet, can I rag my company?
Like, how can I rag everything right. And then I think as we got into 2024, what folks found was that, prompt plus some docs doesn't equal a system that I can bet my company's reputation on. And so all of a sudden there was this massive influx of interest in how do I drive accuracy?
And for, what we see in the market is that. Those systems that can actually be run at production quality grade for, with real kind of ramifications. If they get it wrong, those systems are not Hey, I've got a vector store in a prompt and it's good enough. Often these are multi components, real compound or agentic systems that require kind of significant investment and significant tuning to make [00:15:00] them really perform the way we would like to.
And I think that learning of, Hey. This is going to take more than just some embeddings and a prompt was really the one that I think powered a lot of the product development and a lot of the enterprise, energy over the last six to 12 months. right from 2023, when we were all Building amazing, exciting POCs with, with Jenny. I, this year was definitely about what can, where can we actually get the real ROI in production? And you alluded to that a little bit here, but, What was off like from the beginning and like what, in terms of expectations and also what like the hurdles that, that people have said, brought up to you.
I think in a lot of ways, the technology itself got in its own way for a little while there. And what I mean by that is, When GPT launched in November of 22, the data science community, all looked at it and was like, Oh, they trained a giant transformer.
That's pretty cool. And they made it available. That's even cooler kind of thing. The. The engineers, it seems like, and the software [00:16:00] developers were looking at it going I've been using GitHub copilot and that's legit. Who knows what this new API is going to allow me to start to achieve.
But it was really the C suite and the CEOs who all, I think, looked at each other and went, wait a minute, all we need to do to make computers do exactly what we want is tell them what to do. And so there was this moment of max expectation of simplicity With very little knowledge of actually what it takes to get these things grounded and get these things, aware of all of the problems and to squeeze all the kind of entropy and hallucination out of the systems and, and I think that took, a lot of us spent 23 watching startups show up out of nowhere with these great one click, Oh, you bring your text and with one click, I'll give you a rag end point.
And I think what I've been seeing this year is customers showing up with those one click rag POCs disappointed looking, [00:17:00] saying Hey the demo when the vendor came to us was amazing. The, it was super easy to build, but it just doesn't do at scale and broadly. What I needed to do.
And so helping companies start to try and wrap their arms around that problem and try and understand, hey, how well do you know your domain space? What is your golden data set look like? How are you doing evaluation? Are you using LLM judges? All these types of questions were questions that just weren't there.
A year ago when everybody thought this was going to be a one click wonder of a technology, but instead I think we all now know that, while this technology will and continues to simplify abstractions for folks, actually building production quality AI systems that you're willing to bet your company's reputation on takes a lot of investment and a lot of hard work.
we've also been seeing how, beyond embeddings and prompts, there's been a lot of infrastructure that is being set [00:18:00] up to take actions, to generate the ROI that we expect from these agents, and not just be that one click wonder, from that perspective, like, where do you see the industry headed, I think we're at a really interesting point, earlier this year, function calling started to really show up as a new capability. And I often, we'll talk to customers and you can almost delineate a level of maturity on whether or not function calling is being used aggressively.
And, with function calling these functions, this idea of models using other tools or using other models and this kind of idea of agent interaction and things like this, I really think this is where we're starting to see some of the real innovation because now all of a sudden I can take these, this hard, Hey, my model is performing well, except It's not performing well over here in this space.
What if I were to deploy a couple of tools, specialized tools for that space, and now all of a sudden, instead of my model kind of naively [00:19:00] going over there, trying to figure out what the right answer is now, there's a couple of kind of purpose built models. Capabilities for it to go utilize in this area of high entropy or of challenging topical or domain space.
And that's, I think, the piece that I'm really excited about is seeing how folks are able to start stringing these pieces together And really, to start seeing these models and agents get a lot smaller, and what I mean by smaller, not only, hey, does size make serving these things smaller, but, if this bot I'm building, or this agent I'm building, If it had a very narrow set of capabilities that I'm actually asking of it, I can tune that much more effectively than a very large one, and then I can maybe orchestrate it as part of a bigger bunch as opposed to trying to, boil the ocean with one single kind of, my company GPT dot com or something like that.
And so that this idea of tools and increasingly agent interaction is one where I really think we're going to see a lot [00:20:00] performance improvement as we come into the new year. So prediction one, 2025 is going to be an era of toolification. Seems like fingers crossed.
I hope so. Cause right now it does feel like the most, one of the most promising capabilities for really driving a stepwise improvement in the accuracy of these systems. Exactly. And I think, going from that, like it's, it seems the smart thing to do is to be specific about, the use cases, go deeper into specific.
Use cases versus building something very broad on to that effect, would love to get your thoughts on, teams are rethinking, the scope and keeping that kind of focus, in place. the idea of this focus or this idea of in a much narrower sense is where we're seeing a lot of the value come from for organizations. If I sit there and say hey, my objective with this endpoint or with this model or chatbot or what have you is to cover this massively broad area.
Then things like developing evaluation [00:21:00] schemas and developing kind of my own ability to understand that domain, parse that domain, and break it down, really starts to become the headwind in many cases, right? Whereas if this is an area that is somewhat tractable, if this is a domain, hey maybe I do want to build that customer service chatbot, but I'm going to limit it to, my, order and delivery pipeline this time, and then the next time I'm going to, I'm going to have it focus on, product specifications and product challenges and break these different silos of both data, customer need and usage, and really instead of trying to build it.
One chat bot to rule them all, instead maybe having something focused on, Hey, how do I respond to customers who have questions about supply chain versus how do I respond to customers who have questions about product performance or product capabilities and, or troubleshooting. And I think as we start to see folks naturally, Start to break some of these things up and start to attack them [00:22:00] a bit more in a bit more of a targeted way.
I think we'll be able to see evaluation and a lot of the kinds of the challenges that come with this become something that folks are. It's easier to approach and it gets a little bit easier to start putting, wins on the board all of a sudden when you're able to build and deliver that thing at a really high quality, as opposed to having the giant bot that underwhelms across a series of use cases, if you will.
Speaking of use cases, right? Like I think what I've been seeing across, across our customers is actually a lot of the use cases can have multi modal aspects to them in some ways. what are your thoughts there? What do you say? Multi modal man, oh man, it was exciting.
I was really excited to see the new Lama models launch. Clearly multi modal is coming, but I think, When we all think about, or at least when I think about multimodal, I'm thinking about video gen and video understanding and, live chat, like live voice chat and all of these kinds of things where the modality [00:23:00] is a real, it's a real luxury to get to think about these modalities, but to be perfectly honest, collectively, we're not parsing PDFs all that well like it would be one thing if this modality were really were like, Hey, I can take in all senses at all points in time and kind of fully understand them.
But to be perfectly honest, I'd be really excited if insurance companies could parse claims docs successfully. And, helping companies start to try and understand, Hey, how do I start? I've been able to extract all this text. I have these tables or these images. Not only how do I interact with them and how do I parse them and chunk them and what have you but how do I make sense of them within my system?
Do I describe them and then treat them as text? Do I use a multimodal system and what are the benefits of that? And I think, today The cost of multimodal is still high enough that unless you need it, it's probably not the right direction. But, I think as this technology continues to get [00:24:00] faster and cheaper, as the next gen of hardware comes through, I'm really excited to see whether or not we're going to be able to move to, it's funny with LLMs.
I was talking to a data scientist recently who said my challenge with LLMs is that the data, the Folks building it, all of it are sloppy. They just pour all the data in and there is some, for a classic kind of machine learning scientist, yeah, it might feel funny to just pour in, all the text we have into this model or something like that.
The way folks often find can be valuable and. And I think in that regard, getting to a place where we could do that for multimodal would be super exciting. But to be honest, as far as 2025 goes, my hope is that we exit 2025 a human understanding of PDFs to be perfect. Yeah, there's just so many use cases that have PDF manuals and books and, illustrations that, we're not parsing all the knowledge from charts and graphs.
And, even if I like scan a newspaper, like a [00:25:00] literally printed newspaper and. give it to the LLM, there are certain visual components that, that, that don't get caught. It's those times When the interaction of the text and the visual when that when the text is discussing the visual and the visual is discussing the text and back and forth that I think it becomes a requirement that we start to figure this out more effectively.
And lastly, Craig what about governance and, thinking of trustworthiness when it comes to and a lot of what we've talked about is, in service of getting, the model, the entire journey, I system to operate accurately and reliably reducing scope, thinking about toolification function calling, some of the challenges with multimodal, I think it all boils down to us running, an accurate and trustworthy system in production.
This is an area where obviously Databricks, takes a really opinionated stance on governance. I really think that, as these technologies continue to increase and continue to get more and more data hungry, [00:26:00] the requirement to understand the flows of data that you have and the lineage and kind of the dag of data traveling through your own usage patterns as well as the systems you're building and the ability to really govern that, right?
The ability to ensure that, hey, whoever's asking this question of my system, am I able to confidently say that, hey, they have access or they have permission to view any of the training data that went into this system and any of the data that I'm injecting through the prompt and, that is, maybe it's really straightforward if this is a public endpoint, but if this is an internal endpoint, that's Seeks to talk about HR challenges, or seeks to, dive into my JIRA tickets, or something like that.
Ensuring that governance is met at every step. And I say it may be easier looking externally but, certainly we've seen plenty of leakages without Gen AI [00:27:00] externally. And being able to have just an extraordinarily high degree of confidence. Here's the deal Legislation and compliance is coming and the more and more impressive and exciting this technology come gets, the more and more likely that compliance is going to come and have a giant impact.
And so ensuring that you're ready for that, that hey, where, what data am I consuming? Where is that data coming from? How am I consuming it? And being able to really reflect that back for each system and even each prompt that each system responds with is going to be a requirement as we move forward in time, as opposed to maybe the luxury it's been in the past.
Awesome. Really quick. I know we're a few minutes over, but, Craig, I want to, end this amazing discussion with one hot take from you is at the end of 2025, if you were to wave a magic wand, what would you want to happen? Yeah, other than PDFs, that's a good one, right? No, here's the deal.
I met with a group of [00:28:00] banks recently, and I asked them, Hey, how many of you have Gen AI capabilities in production? And they all raised their hand. And I said, how many of you have more than one? And almost all of them raised their hand. And I said, how many of them are ROI positive? And all of their hands went down.
And this is something we've got to help the industry succeed at. And, I'm super impressed with the work that Galileo is doing. It's, whenever I have folks who are concerned about the performance of their systems in production, I often will have them call you guys, because I think there are a few companies better positioned to be able to help companies with some of those challenges.
And really, I think, really excited about the partnership we continue to have, but really. As I look forward, I think I'm looking forward to that moment next year when I can look out at a group of banks or a group of other enterprises and say, hey, how much are you making because of Gen AI?
How is Gen AI contributing to your bottom, positively contributing to your bottom line? And I'm really excited about us starting to see that set of use cases come through next year. [00:29:00] Amazing. That's a great way to end this. And thank, thanks for the partnership as well, Craig. And it's, I hope together we can make a difference and make that a reality next year.
Ab Absolutely. Thanks again and, congrats to all the folks watching you've chosen a, an extraordinarily relevant, meeting to stay tuned to Have a great day. Thanks Ash.
Sarah Hooker's Predictions for 2025
Conor: Next up, you'll hear from Sarah Hooker at Cohere about what she expects to see out of AI in 2025. And
I'm really happy to invite Sarah today and thanks for coming. It's such a pleasure to be here. So really looking forward to this chat. Amazing. So the, exciting topic we have today is, predictions for 2025, when it comes to generative AI, the space is moving so, so rapidly.
Open it up to you. I would say for 2025, there's a few things. I think that the realm of optimization is going to expand. So you can optimize in the data space, you can dynamically change your data pools, the algorithm, but also tooling is going to [00:30:00] become. More pronounced, but I think people have very difficult time evaluating the caliber of tooling, when we talk about tooling, it means different things to different people.
And to be honest, a lot of the time you're bumping into the infrastructure of the internet. So it's really unclear how you evaluate when everything feels very tied to the specificity of a given API. And so that's wide open. How do you actually evaluate if these algorithms are doing a good job? I think multimodal is going to be a crucial, a few questions there, which is that, how do you understand if the model's uncertain, if there's multiple different modalities?
This is important from a safety perspective, but we struggle with it with individual domains and multimodal compounds the complexity of it. So what I think we'll see there is like very interesting work. Trying to get at in many ways, the core question, which is for us as humans, having multiple inputs, audio, visual, motion [00:31:00] actually helps us calibrate uncertainty because for a given second of inputs, we can understand, does this match or something off?
And I think we're going to see really interesting work. They're trying to do the same, for our algorithms. And for me, for someone who works a lot on safety, I think that's very interesting as a direction.
How do you think, the work we're doing with, adaptation of these models now is going to help unlock some problems some big, progress in the multimodal space and even the tooling space, right?
As we were talking about before. With Craig as well, that as, all of these, I would say capabilities of the LLMs are enhanced with tools. And there are so many different tools that are used by agents to actually do mission critical work. the ability of these models to adapt to diverse APIs or multimodal content, like just a PDF of, diagram with a table and the model able to understand that.
what do you think we need to do now to be able to unlock these [00:32:00] capabilities next year?
Yeah, I would say so with the tooling, I think people underestimate this. So tooling is often thought of, as a, a pure research question. How do you enable tooling?
The reality is you are bumping into things. Every API has been designed in a different way. So what that means is you're either going to have to get much better on, on the fly exploration. So how do you figure out how the API differs? How do you automatically adjust? Which is traditionally something our algorithms are not as good at.
or you're going to have to pick and choose your tools and saturate the data space. Both of these are interesting possibilities, but. I think the real question is, that's more fun to think about is who innovates with a mixture of both. Who's able to leverage, gain enough data from some tools to an exploration strategy for others.
And it's a, not a trivial problem. And so I actually think while tooling will continue to dominate over the next year, I suspect if we find ourselves here next year, catching up, I think that [00:33:00] we'll still see. Some tools been used quite well because people have just saturated the data space.
But in terms of making things work for your tool for a given business, I think that it's a formidable challenge and it means that exploration piece to work.
Yeah. One last thing about 2025, right? A lot of your focus is on safety, and, regulations aside, I think, what are some of the, regulations that, in my opinion, but, on the fundamental like science, of these LLMs and safety of these LLMs, what are some of the breakthroughs that you're hoping for, in 2020?
One aspect of research that we've done a lot of work on, that I think is very interesting, is how do you optimize for multiple objectives? Because safety is very interesting, right? Because at any one time, safety is both global, there's some concepts that typically there's consensus on across regions, like no violence generated, or, really strict guidelines [00:34:00] around sexual and CSAM content, and these are Typically agreed on and very universal wherever you operate, but then there's also specificity of local harms And this is a very important technical challenge. How do you align a model, to both your global, but also your local alignment and to do this in a way in which you're flexible, and you can adapt a model to each domain separately in each jurisdiction. And so a lot of what we've been doing around, this area is preference alignment, which is really this gold dust that happens at the end of training where you're steering your model.
And how do you do that according to a multiple objective? I think this is very important and it's a very crucial building block from actually creating Models which can be used in many different settings
we've been thinking about you know having better preference models, even as a mode of observability or even as a mode of, if we designed a system or if [00:35:00] we deployed a system to do X, Y, and Z, three different objectives, are we aligning to those objectives, which is, one, some of the baby steps we've taken are like metrics and algorithms called instruction adherence.
If. People are giving, instructions from their objectives to the LLM that this is what I expect to see, is the output actually adhering to those. This just baby steps here, totally, resonate with the direction of adapting them, the model. From a preference alignments perspective seems like a very exciting space.
Yeah, and this is also very crucial what you're describing, which is, let's say you align on a set of preferences and you're tailoring them. You want to know, can you then track progress against your new set of objectives? So it's super important. It's actually critical. It's part of the same problem space.
That's awesome. So we made three predictions, but now let's get onto the hot takes. If there was like one thing that like, being fortune tellers right now, a year from now, what would be [00:36:00] you would want to see?
I think we're going to see dual, so dual trends. I think models are going to get bigger, but, and this is mainly for multimodal. I think people are going to push out the boat and see how far they can go. But I also think models are going to get a lot smaller as we better leverage, optimization in a broader way.
So optimization, the inference space, and also data, optimization where we stratify our dataset. And so that's pretty exciting. I do a lot of work on efficiency at scale. And so for me, it's super interesting to think about how we do more with less. And what's your hot take? You can't get away with that.
Bringing up the cuff, I think one of the biggest, bottlenecks I've seen in, I deeply care about, teams being able to adopt this technology for every single use case. And I love the the idea of adaptation and this for these LLMs. I think, one of the big things, if you look at, what, every business is thinking about is like, Oh, there were a lot of POCs this last [00:37:00] year, we've been trying to productionize, a lot of these this year, but it's been taking much longer than we expected to, next year, I would want there to be a world where every AI developer.
Knows exactly what they need to do to leverage this technology confidently. And this is, intentionally high level, but because there are few different aspects to it, but, Really, in, in, in a minute, it will be, it's basically a combination of having the right trust layer, having the right regulations and having the right foundational, capabilities of these models to adapt.
And the three combination will give us a much better framework, and accelerate adoption of GenAI in the world. Yeah, I do fundamentally believe that adaptability and also, the ability for, I think one of the things I see is how do end stakeholders quickly shape and provide feedback to guide and align some of these models and efficiently.
So it doesn't take [00:38:00] much time. I think this is one of the most fun problems to work on. Amazing. So excited. So happy to speak to you. This was an amazing conversation. Thanks a lot, Sarah.
Conor:
Closing Remarks and Call to Action
Conor: Make sure to subscribe to chain of thought wherever you get your podcast,
don't forget to check out Galileo's YouTube channel, where you can watch every episode of the podcast as well as events like productionize. And don't forget to connect with us on Twitter or LinkedIn at run Galileo or find Vikram there as well. He has a lot of interesting things to say. So let us know what you think.
We'd love to hear from you about the show and we'll see you next week.