How AI Is Built

In this episode of "How AI is Built," host Nicolay Gerold interviews Jonathan Yarkoni, founder of Reach Latent. Jonathan shares his expertise in extracting value from unstructured data using AI, discussing challenging projects, the impact of ChatGPT, and the future of generative AI. From weather prediction to legal tech, Jonathan provides valuable insights into the practical applications of AI across various industries.

Key Takeaways
  • Generative AI projects often require less data cleaning due to the models' tolerance for "dirty" data, allowing for faster implementation in some cases.
  • The success of AI projects post-delivery is ensured through monitoring, but automatic retraining of generative AI applications is not yet common due to evaluation challenges.
  • Industries ripe for AI disruption include text-heavy fields like legal, education, software engineering, and marketing, as well as biotech and entertainment.
  • The adoption of AI is expected to occur in waves, with 2024 likely focusing on internal use cases and 2025 potentially seeing more customer-facing applications as models improve.
  • Synthetic data generation, using models like GPT-4, can be a valuable approach for training AI systems when real data is scarce or sensitive.
  • Evaluation frameworks like RAGAS and custom metrics are essential for assessing the quality of synthetic data and AI model outputs.
  • Jonathan’s ideal tech stack for generative AI projects includes tools like Instructor, Guardrails, Semantic Routing, DSPY, LangChain, and LlamaIndex, with a growing emphasis on evaluation stacks.
Key Quotes
"I think we're going to see another wave in 2024 and another one in 2025. And people are familiarized. That's kind of the wave of 2023. 2024 is probably still going to be a lot of internal use cases because it's a low risk environment and there was a lot of opportunity to be had."
"To really get to production reliably, we have to have these tools evolve further and get more standardized so people can still use the old ways of doing production with the new technology."
Jonathan Yarkoni
Nicolay Gerold:
Chapters
00:00 Introduction: Extracting Value from Unstructured Data 
03:16 Flexible Tailoring Solutions to Client Needs 
05:39 Monitoring and Retraining Models in the Evolving AI Landscape 
09:15 Generative AI: Disrupting Industries and Unlocking New Possibilities 
17:47 Balancing Immediate Results and Cutting-Edge Solutions in AI Development 
28:29 Dream Tech Stack for Generative AI

unstructured data, textual data, automation, weather prediction, data cleaning, chat GPT, AI disruption, legal, education, software engineering, marketing, biotech, immediate results, cutting-edge solutions, tech stack

What is How AI Is Built ?

How AI is Built dives into the different building blocks necessary to develop AI applications: how they work, how you can get started, and how you can master them. Build on the breakthroughs of others. Follow along, as Nicolay learns from the best data engineers, ML engineers, solution architects, and tech founders.

Nicolay Gerold (00:00.514)
Hey buddy, welcome back to another episode of How AI is Built. Today we have another very special guest. Jonathan Yarkoni is the founder of Reach Latent. He specializes in helping companies extract value from unstructured data by analyzing millions of documents and extracting the key information. We prefer the hard problems and that we work with minimal guidance. Now, what

this would be most beneficial to our companies that are looking to increase productivity. More to point, companies that have a lot of textual data, such as contracts, documents, and or are looking to do automations, that's where we shine, using Gen .ai to kind of do those automations or processes.

And you already said like you thrive on hard problems. Could you give like one or two examples? What were like the gnarliest problems you have tackled yet? Yeah, that's a great question. I think that the place we love being is actually coming to companies who maybe have already tried to solve that problem on their own. And for some reason, I've kind of not given up because they're still looking to solve the problem.

But I've kind of exhausted all possible paths internally. And then it's very ripe for like discussion. They know where they want to get. They know what they've tried and maybe hasn't worked or has partially worked. And we're kind of tasked with taking it to the next level. Some of like the hardest problems we've worked on are ones that companies kind of want to stay incognito, but I'll still give like an example of those.

And we had a company who was working on weather prediction. And they were trying to figure out anomalies of very rare weather events so that they could go on and sell that data. Now, the nature of weather data is it's hard. It's unpredictable. You need a lot of subject matter expertise. And you also need to employ

Nicolay Gerold (02:22.39)
a lot of interesting strategies such as, you know, handling skewing the data. And that was a really long process of, of really getting their data and doing the necessary data cleaning and sitting with the subject matter expert from their side, extracting all of the information, translating it into what we already have, what we need. And we also went through a longer process of like recommending them to acquire external data sources that we found. And, and eventually.

develop a model, initially a very simple model using linear regression and decision trees and stuff like that. And later down the line, after a couple of months and successful projects, diving into deep learning and really taking it to the next level. So that's one of the, I think, the hardest projects that we have, but also the most interesting

Would you actually, it sounds really like this process diagram you have in like AI, which is like business understanding, data understanding, data modeling, and then like the training part and then feedback. Do you have like a fixed process and how you break down new projects or is it like different depending on who and what you're working on? We have an overall framework, but we don't believe it's like a one size fits

Some people are looking for us to take everything and be the drivers on the process. So we allow ourselves to cut a lot of corners and other people, for instance, really want to be involved and feel like they're in control. And we understand that, which is why we switch it up a bit. The other aspect of that is the nature of the problem.

Not all problems require you to do a data collection, a data cleaning process, stuff like that. Some projects, especially generative AI projects, the fact that you have data, the models are very tolerant to dirty data. And if the objective of the company is doing something within a limited timeframe, or if just performance isn't as important,

Nicolay Gerold (04:43.103)
maybe we can just skip the data cleaning and do that. So we have a framework. We use, I think, the best standards of sitting with the stakeholders, doing a data collection process, understanding the data, cleaning the data, and working with a sample data set, training a model, and so on and so forth. But the point in case is that we don't go through all the steps for the sake of going through all the steps. We try to do the.

minimum necessary steps in order to have a successful project. And a successful project a lot of time means generating a model, supplying companies with an API or figuring out whatever they were trying to do. How do you, I'm really curious about that. How do you ensure the success after the project? Especially nowadays as the data is shifting.

And also like new models come out, new models of like GPT -4 or the other channels of models come out. Do you take on new projects or do you build in like automatic updates, retraining into your delivery? Great question. Models, as you said, are constantly evolving and new models are constantly being released. And I expect this to be the case for at least the next two years to come.

We definitely work in that environment and we take it into account. And we tell customers to be very cognizant of what's out there and to develop in relation to what's out there. And that is both like what other companies maybe are building a SaaS product that you can simply adopt. Yeah. If somebody's building the exact same thing as a SaaS product and they already have an MVP running, you might not want to use an MVP, but you got to think.

If it's going to take you three months to do the project, where will they be in three months? So that's the first thing. And we see it as our job to not only take everything towards like a custom development, but also to recommend the customer, know, this component, maybe you want to leverage a third party provider. And when we do hands -on development, which is again, most of what we do on a day -to -day build custom solutions, we take it into account. We take it into account both.

Nicolay Gerold (07:05.661)
that models will be changing over time. And also we take it into account that maybe if your problem is almost solvable, it will probably look very much different in three to four months time when new models come, when more specific models are released. So on the one hand, we prefer always starting with the best model, especially when we're doing a proof of concept. On the other hand, we're very cognizant

of the fact that if you have constraints such as privacy or security or cost constraints and you want to use one of the smaller models that say Mistral, which we found to have a lot of success with, that's okay. Those models are going to be better over time. In terms of constantly monitoring and retraining, monitoring, yes, definitely. We put things in place.

Automatically retraining generative AI applications, not so much. There's a bit too much that ties into a comprehensive robust solution for it to be easily retrainable. You always want to add more data. Yes, if you're doing things such as RAG, you can update the database and it will still work. Switching the model in an automatic fashion,

the way that it was done with deep learning and is still done nowadays, not yet the case with generative AI. It's still very much a question of how you actually evaluate models. And until we figure that out and until there are good standards for evaluation of models, we're not going to be automatically retraining and switching. I'm not even sure we will reach a point where we have fixed standards because it's so use case specific.

When we are talking about like, we most often are generating text. So we have to evaluate them from a perspective of the use case.

Nicolay Gerold (09:15.847)
The chat GPT moment especially, how do you actually see that impacting the AI space in general or at large? That's a great question. think chat GPT had two major impact. First of all, it became a common place to think about adopting this technology. People are highly familiarized with something that

rather niche just a year and a half ago. The second thing is that in turn it made everybody work on problems in AI. VCs saw that it's hyped, invested money, and now we have a lot of companies working on this, aside from academia, even though the main drivers of the technology are actually not academia, it's actually private companies.

I think we've only just seen the tip of the iceberg. I'll give an example of the way I think it unfolds. So to give a tangible example, let's take agents, which is all the rage right now in terms of the latest capabilities of AI. Agents came out as a concept in mid -2023. People started building agents. Then Lagchain created an abstraction for building agents.

different type of agents were created and ways to use them. And VCs caught up to it. People like me and you came and pitched like, we're going to build the next agent definition company. They got funded, they hired people, and now they're working on it. And you have great companies like Crew AI, which are like building a specific project for definition. And LandGraph just came out like two months ago. We haven't yet seen

a fleshed out project, which was built with these new companies. And the way I see adoption happening is in waves. I think we're going to see another wave in 2024 and another one in 2025. And people are familiarized. That's kind of the wave of 2023. 2024 is probably still going to be a lot of internal use cases because it's a low risk environment and there was a lot

Nicolay Gerold (11:39.755)
opportunity to be had. I spoke about this and will probably speak about this a bit later. A lot of the projects we do are internal, like process automation and search systems and chat systems and support systems. All of those are not customer facing. And the obvious driver for that is there are still problems with the models, the fact that they talk off topic or the fact that they hallucinate.

In 2025, I see another adoption wave coming post all of the success stories of 2024. And the fact that the models have evolved and things such as hallucinations are not as much of a problem, we're probably going to graduate to a next level.

Yeah, I think especially on the hallucination front, like on the one hand of like applications, which are especially like more B2C like mid -tune -ian stuff, the hallucinations are a feature and not a bug. And for the other stuff, we're seeing actually more and more interesting like structured generation libraries, which are really enforcing constraints through regex, through state machines and other interesting concepts.

Nice. And what industries or domains do you think are most ripe for disruption by especially generative AI? Okay, so I'm going to give a list. I'm going to try and not name all industries. The ones which I think are most ripe for disruption by AI are ones which are heavy text right now. Text model has made a leap. There is a lot of investment that has gone into

they're at the point where they're ready to be used. I say this kind of as opposed to diffusion models, which are great, but have only recently just made the leap where they can be used in production. And even that, mostly for specific use cases, there's a lot that will be in the future. So just to recap, text heavy industries are ripe. Legal education, software engineering, and marketing. Another, which I think there's a lot of noise

Nicolay Gerold (13:54.079)
And it makes sense, it's biotech. You've probably seen most of the interesting results as of lately, generating new proteins with all kinds of unique functionality. That's probably going to have another impact. And soon thereafter, once diffusion models really mature, and mature is a couple of things. First of all, we had to have hyperrealism.

which I think really only came about with Mid Journey V6, which I don't remember, but it's like three, four months ago. And now we have to make them adhere to style. And I think that will have a large impact on the entertainment industry because the entertainment industry has to be hyper realistic. It has to be controlling the style because you have your own brand if you're creating a game. And obviously things such

a video generation, everything happening with, with Sora and all of its competitors. It's going to impact gaming and content generation at large. I love to think about this idea of where this is going. And one thing I think is imminent is Netflix on demand, like a hyper personalized Netflix on demand. You're just going to be sitting in front of the TV. And I don't know, I think it's three to four or five years. I have a, my Twitter, I have a thread that I constantly follow

an update. I think in three to four years from now, you're going to be sitting in front of the television and you're going to be able to say, I really want to see another Game of Thrones season. But you're not going to wait until HBO starts producing it and sending it to Netflix because that's the subscription that you're subscribed to currently. You're just going to be able to tell the machine, I want another episode of How I Met Your Mother.

I really like this series in the 90s, Seinfeld. They make me another 11th season. I don't understand why they stopped at 10th. I don't agree with Seinfeld. It's not, you know, stop at the peak. Give me more content. I hope it can also take like the last season of Game of Thrones out of the context window so I don't have to continue with that. Yeah, for sure. But yeah, especially I think

Nicolay Gerold (16:23.405)
It sounds like it's still rather on the B2C side. I think like in biotech, it's challenging to get like an external view on that, whether what's actually happening behind the scenes, because it's such an intransparent industry. And I would love to actually see more or have a better peer into that industry, what's actually happening and what's being tried. But I think in biotech, it's like the output always has to go through

several additional steps to actually really validate it as opposed to like images and text I'm generating. Like nearly everyone can be in charge of that. For sure. Just, you know, seeing how long it's taken CRISPR technology to mature. It's not only about the technology. It's not only about productionizing it. Yes. At the end of the day, bringing something to market in biotech takes exponentially longer than software. And

I'm looking at China AI especially, but maybe even before that, also AI solutions. How do you actually balance delivering like immediate results and like developing cutting edge more like research heavy solutions? Because often it's like tricky, especially if you're working for external clients. That's a great question.

I think like at heart, we always want to work on the cutting edge and experiment as much as possible, which is why we like taking like hard problems. But on the other hand, delivery is like a core value for us. So we don't take this lightly. What we do in order to balance is we try to frame everything as a cost value proposition and really involve the customer in the decision -making process.

When we work with customers, a lot of the times we have, you know, sinks on a good cadence, which normally means once a week or sometimes even twice a week. And there we try to constantly evolve the cost value metrics and to plan with the customer. So we'll say things like, listen, after we've run some tests and we've seen some preliminary results, we think there

Nicolay Gerold (18:45.229)
two paths that you can take. One, we're able to invest two weeks, use simplistic models and known methods, and we'll probably reach a good point. Or we can take a month, month and a half, try something really cutting edge, involve some really new techniques, but it's a risk.

not succeed more than the simple method, it might not succeed at all. so ultimately, we constantly explore cutting edge and experimental techniques, but it's mostly when the customer is willing to take the risk.

Especially with agents, this is my pet peeve of agents. You never can tell whether they actually work reliably on any use case. And often, if you use them and they work really well in some situations, then they are complete garbage in others. Yeah, to that I'll say that we build things incrementally. I don't think we've ever tried to develop something with an agent and had to sack it at the end of the day.

I will say that you never know how far you can go with agents. So, you start from the basic and you start from prompting and eventually you graduate at the end of the day to working with agent and there's a lot that can be done there. Especially like the multi -agent paradigm that's kind of up and about right now. So I've never had to really sack an agent, but yeah, they can be unreliable at times.

Did you decide to not go with an agent? not second because of performance, but rather because the performance cost trade -off wasn't really there? So no, because of performance. Yes, because of cost. And I mean, you know how agents work, but at end of the day, they...

Nicolay Gerold (20:52.013)
they go through a couple of steps, they plan and then they execute. And when they execute, they can also verify and they can react on it and basically repeatedly make requests to the same LLMs. And you do control it to some extent, but it's not as a clear cut when you work with, let's say, rag pipeline. know exactly how many steps, know exactly how many requests to an LLM are going to be made.

So we have, it has happened to us once in the past, like, yes, the agent was working pretty well, but they had to make so many requests in order to make it work that the unit economics didn't make sense at the end of the day. Yeah. And because you mentioned it before, the cost and value drivers beyond like time saved, what value drivers do you see with Chenai?

Nicolay Gerold (21:51.602)
There are a lot. Yes, productivity and time saving is a great opportunity that Gen .ai offers. Aside from that, you were basically able to do things that you weren't able to before. We now have these LLM models and the main thing that they have is higher reasoning capabilities. So suddenly you can

systems at the tip of the finger that just weren't there before. And yes, initially it's a time save. if I can have an automatic system, searching over all the corpus in my company so that you can get the answer on the fly. So you say, well, a human would have had to read a million documents and then only figure out the answer. So that's a time accelerant, but not really. mean, that's just...

something that you couldn't do before. The human couldn't read all the documents and probably couldn't find the answer. So that's kind of creating new value out of scratch. It's not a time saver. It's something that you weren't able to do before and now you

Nicolay Gerold (23:06.657)
Yes. And looking at the generative AI space, like with all the different use cases you've already implemented, what is actually missing from the space? Like what would you love to see implemented either as closed source, like as a managed service, but also as open source.

So, I mean, I think I'm very much keen on the progression that's out there right now. I'd like to see more, like I'd like to see better open source models. I'd like to see better small models and I'd like to see better task specific models. So that's what I have like three resources that I'd like to be out there. I think ultimately at the end of the day,

having more open source models is A, going to drive the industry as a whole. It's going to make us better understand how the models operate. Smaller models are just good because they run faster and they're cheaper. And task specific really feels like the way to go for me. At the end of the day, this comes from economics, but markets specialize. And the same thing applies here. Another thing which I'd really like to see, which is not so much

a resource that we'd be getting, but more a framework for better planning. think agents are getting there. There's a lot of discussion right now, like, you know, what is actually planning? How does good planning look like? so that's like, those are like the four things I'd like to see. Yeah. And I always like to say, it's like, think AI data, it's like, it's a game of knowledge. So

so many different tricks, hacks, little configurations you can set to actually squeeze the last few percent of accuracy and performance out of it. What are the little tricks and hacks you have picked up in all your projects that you now use almost instantly from the get -go? First of all, you said that and immediately made me think about prompt engineering. We get asked this question so much, like, okay,

Nicolay Gerold (25:18.397)
You do so many projects. You've obviously done a lot of prompt engineering. Tell me how. I've had actually consultation sessions. Log on to the Zoom and tell me, now teach me how you guys do prompt engineering. And I think to some extent, yes, it's a lot of concepts that you have to adhere to and know. also practice makes perfect. But it's also made out of so many little

tips and tricks and you have to use a chain of thought and you have to know how the model reacts to some things. And I think a lot of like tips and tricks are like in the world of prompt engineering and stuff like, you know, using the hack of like simple chain of thought, just think this out step by step or using emotional stimuli, like saying my job depends on it.

or actually using ChatGPT to revise and critique my prompts. That's a great hack. You write a prompt, you think it's good, but it's probably disorganized and probably has some ambiguity in it. And if you tell ChatGPT, know, review my prompt, it's able to do that, organize my prompt, it's able to do that. It's able to do a lot of things. Also try to, as a rule of thumb, over...

overuse chatgbt in your day to day. I create content, I create project plans, I create a blog post every now and then. Obviously I'm going to use chatgbt. It's not because I'm trying to game the system. It's the new way of working. Don't be, you know, I also tell this to companies that we work with that they should really encourage

the work first to adopt a technology rather than kind of penalize people for using it? What do you think if you have to go and work a project management plan today? Let's say it's not something you've done a million times in the past. Don't you think ChatGPT has some kind of sense and have seen quite a lot of like project management plans and can probably start from a better point than you?

Nicolay Gerold (27:42.047)
learn how to work with the system, give it guidance. I promise you from like personal experience, somebody, yes, I used to be a team lead in various companies, but I wasn't a project manager. And I'm not used to like, you know, writing six, seven, or even longer pages of project management, know, detailing everything. But working with ChatGPT, you can get some pretty impressive results and it helps.

And maybe as a closing question, what is for one, like your dream tech stack at the moment for generative AI and what is also something you would have to add to the tech stack? Like what is a wish you

Nicolay Gerold (28:29.901)
So there are a lot of things that we use right now when we're developing Gen .ai, just to throw a couple of them out there. Instructor guardrails, semantic routing, DSPY, Lang chain, Lama index, Lang graph, things such as obviously GPT -4 and GPT -3 .5 that we heavily rely on. One thing I'm really looking forward to is more evaluation stacks.

And yes, there's OpenAVALS by OpenAI and there's RAGAS, which is an open source framework, which we rather like for chat applications. And then there are things third party providers such as Vellum and TraceLoop, which are also great because they have like aspects of no code, low code, which really allow you to constantly evaluate and monitor

and probably in the future also take action based on stuff like that. And I think we spoke about this thing in the beginning. It's a great closing question. To really get to production reliably, we have to have these tools evolve further and get more standardized so people can still use the old ways of doing production with the new technology.

Yeah, I think I had the founders of Raragas on an episode five and they just also released a new model for generating synthetic data sets to test rack stacks, which is really interesting. And they also, think they built it on a Quen model, a really small one, I think like 1 .7 billion parameters. And I tried it on a weekend and it works really well. So one really interesting use case or more

it was interesting how we chose to make progress on that. We working with a security -oriented customer. They sell software to governments and they wanted the ability to query all of the data they collect in free text. Now they have certain things that they're like trying to identify and obviously they help governments kind of mitigate security risks.

Nicolay Gerold (30:53.535)
So they're looking for stuff like, if somebody is planning something malicious on the internet or makes a post and stuff like that and insinuates something bad, they want to be able to catch it. And they want to allow analysts to use the system in free text. Now, that's great. And it's loosely defined, but we can work with that. Like we're game for working with minimal guidance. However,

There was a problem of acquiring data because working with governments and even municipalities and stuff like that, they're not so quick to give their data for training. They're happy to adopt your system, but you're still going to have to produce it on your own. What we thought of, or rather my partner had this brilliant idea, let's use synthetic

We've done it in the past. We've done it a lot with the image generation. We had some interesting medical customer way back when where synthetic data was beneficial. But we really had to get started quickly here. So we had the brilliant idea of taking non -IP licensed, sorry, non -copyrighted books.

mystery books, which revolve around potentially harmful events and just trying to work with that data and figuring out if we could catch all of the times where either malicious conversation was taking place or an insinuation of a malicious conversation was taking place. And that was great. We took it all the way to production at the initial system.

And then later on, we allowed them to train with their own data on local servers and everything. But synthetic data allowed us to get that project up and running within a month. And like two months in, they were already able to deploy it on the customer's environment. Nice. And what were the things you used to generate the synthetic data? Can you go into some detail?

Nicolay Gerold (33:14.221)
Most of the times now we adhere to the best practice, which is using the strongest model in order to create the synthetic data. And right now the strongest model is GPT -4 with Claude probably being runner up and good at specific use cases. So we use a very strong model in order to create the synthetic data. And if it's really at scale, we might consider other models, probably ChatGPT 3 .5 to create the data.

It pretty good, pretty easy, very easy to automate, very easy to get the amount of data that you need. And what filters did you use to actually filter out bad examples from the created dataset? Great. So first of all, we're like fan of ragas and there's a concept like of aspect critique and you can also like create

a custom metric, which we're very much fond of. You can create your own definition of what is good or bad and have other frameworks such as like Ragus score it for you. I'll give an example. Let's say you created a lot of data which you intend to use to summarize niche topics in chemistry. You go and you create a lot of data and

There are a couple of metrics that you can see, sorry, there are a couple of metrics that you can measure to see if these synthetic examples are good or bad. One of them in the context of summary will be, was the key data kind of retained in the summary? Another one would just be flat out, like, is it a summary? Is the output string shorter than the input string? If not, it's definitely not a good example because it's basically not a summary.

This is a rudimentary example, but that's what we do at large. We create the data, and then we use most of the times either ragas or ragas together with a custom metric in order to figure out. And it's not perfect, but the world doesn't revolve around perfect. It revolves around value.

Nicolay Gerold (35:36.299)
Perfect. And if people feel inspired and want to get in contact with you to build an AI solution or they just want to follow you, where can they do that? Yeah. So first of all, people have, you know, if they want to work with us, if they want our either development services or consulting services, we're obviously happy to jump on a call and figure out where there's value to be had. And if they want to reach to us, I'm available on LinkedIn always.

Jonathan Yarkoni, you easily find me on LinkedIn. The other place is if you want to reach out directly by mail, it's info at reachlatent .com, which coincidentally is also our website, www .reachlatent .com.