AI & its practical application in business

In this conversation, Bas Alderding and Koen Ter Velde, founders of AI development agency SevenLab, discuss advancements in AI including multi-agent architectures, OpenAI's new "Swarm" release, and other AI tools that have potential applications for business. They explore the benefits of multi-agent systems, recent developments in video models from Meta, and Tesla's robot innovations. The conversation also touches on the application of AI in support systems, emphasizing the effectiveness of specific AI agents in orchestrating complex tasks.

Takeaways:
- Multi-agent architectures allow AI agents to handle specialized tasks, enhancing overall efficiency.
- Meta’s new video models, like MovieGen and Pyramid Flow, promise significant advancements in video generation.
- OpenAI’s Swarm aims to make building AI agents easier and more practical.
- Tesla’s new robot technologies, including the Optimus humanoid, showcase the growing capabilities of AI in robotics.
- The efficiency of AI support systems can be significantly improved with multi-agent setups.
- Different AI models have specific tasks, leading to better performance when properly orchestrated.
- Faster AI response times are expected to dramatically improve user experience in AI-driven applications.

Chapters:
00:00 Introduction to Multi-Agent Systems
02:00 Recent AI News: Meta's MovieGen and Pyramid Flow
05:00 Flux 1.1 Pro Image Generation Model Overview
06:20 Tesla's Robotics Update: Robotaxi, RoboVan, and Optimus
09:00 OpenAI's Swarm and Multi-Agent Architectures
12:00 Example Use Cases for Multi-Agent AI Systems
18:00 Practical Examples from SevenLab's AI Agents
21:00 Multi-Agent Systems in Support Scenarios
24:00 Improving AI Response Times for User Interactions
26:20 Conclusion and Contact Information

Creators & Guests

Host
Bas Alderding
Co-Founder of SevenLab
Host
Koen ter Velde
Co-founder of SevenLab

What is AI & its practical application in business?

A set of AI courses and news items from AI development agency SevenLab.dev

Bas Alderding (00:05)
Hey, welcome everyone. I'm Bas and this is my colleague Koen and we're both founders of the AI development agency SevenLab located in the heart of Amsterdam. And today we'll chat about AI agents and multi-agent architectures and see how they can be used for your business. And we will look into the new release swarm tool from OpenAI. And after that, we'll do a deep dive in how this

agent structure works and how it can be applied for your own business processes. So yeah, maybe we'll start with some quick news of the week, Koen, and then we'll dive into some of the details. So yeah, every week we have a quick call with our team to go through the latest AI news and everything every week brings something new. And this week we had quite a few items.

So one of the first items we put on the list is quite some new and interesting video models. So Meta just did a release from MovieGen. Maybe you can elaborate a little bit on that, Koen. It was your news item after all. Yeah.

Koen Ter Velde (01:24)
It's my topic of the week indeed. Yeah, they release a new movie model. They announced the model. It's not available yet. But they say, and it also looks like it's on par with Sora. And Sora is the promised video model made by OpenAI. I think they announced it almost a year ago, maybe already. But it's still not publicly available.

Bas Alderding (01:50)
Yeah?

Yeah, and I was that because it was very, or at least that's the rumor that it's super resource, intensive to, to deploy that model. Right. And that, that you can do all kinds of weird stuff with it as well.

Koen Ter Velde (02:04)
Well, I read it was mainly because they were scared of the things you can generate with it. So right now they're working with artists and newspapers and stuff on testing how to generate results with these models in a safe manner because the results were so good that it was really hard to determine if this is an AI video or not. And with, for example, the elections coming up in the United States,

Bas Alderding (02:11)
Hmm.

Koen Ter Velde (02:34)
It's really weird if you have videos circling online and you don't know whether it's real or not.

Bas Alderding (02:41)
No?

Koen Ter Velde (02:41)
But yeah, the Meta video gen model is looking quite nice as well.

Bas Alderding (02:49)
Yeah, and then it's released as open source, right? So you can do anything with it you would like. And you probably will get all these uncensored models again that are then deployed to this website called Huggingface, right? Where they have all the uncensored models. And yeah, I'm really curious to see what people will do with it.

Koen Ter Velde (02:53)
Yeah.

Yeah.

Yeah.

Yeah, but also they also launched Pyramid Flow, which is an open source video gen model. And that's already launched and available on GitHub.

Bas Alderding (03:23)
And is it released by Meta then or another company?

Koen Ter Velde (03:27)
To be honest I don't know which company released it.

Bas Alderding (03:30)
OK. Yeah, and another thing I read is that, yeah, again, in the image kind of area, Flux 1.1 Pro was released recently. And Flux was already one of the best performing open source image generation models. And yeah, this new version is another iteration. And I tested it this morning.

And yeah, the results were really impressive. So what I found until now with most of the image models, they don't follow your prompt really well. But this one, like the more specific you are, yeah, the better images you get. This is what you would expect from a prompt. But in my opinion, the most of the image models haven't been following that very well. Like if you try Dolly three from, from open AI, it's garbage compared to this.

Koen Ter Velde (04:03)
No.

Bas Alderding (04:23)
flux model for instance and that makes it much more suitable for all kinds of marketing purposes I think as well.

Koen Ter Velde (04:32)
Do you also test it with text on the image?

Bas Alderding (04:37)
Not yet, but that's one of the improvements they made. So yeah, you can show text in an image as well. And yeah, you might have noticed that if you use ChatGPT to generate an image with Dolly3, that all the letters and the sentences were, yeah, was gibberish basically. And 1.1 Pro from the Flux model works really well. And I have tested on together.ai, which is a...

Koen Ter Velde (04:52)
No, it's just gibberish.

Bas Alderding (05:05)
an AI infrastructure provider. yeah, tip for anyone that wants to test drive it, you get $5 free credit as well. So you don't need to enter your credit card or anything, but yeah, to generate some fun images.

Koen Ter Velde (05:09)
No.

because the word pro in the model name implies that it's a paid model. Yeah, I haven't used it yet, so...

Bas Alderding (05:27)
Yeah, yeah. No, no, no. So, so together AI is just the platform provider and yeah, it's their pro model, I think. It's not their model, but it's called pro, I think because you can now use it professionally maybe. But I'm not entirely sure of that. Yeah, yeah.

Koen Ter Velde (05:36)
Yeah.

Nah, I'm gonna, yeah.

Yeah, so it's just sounding better. Yeah.

Bas Alderding (05:52)
Some of the other news came from Tesla. Our team was a bit excited about all the robot taxis and the robo van and the Tesla Optimus humanoid robot was released. this morning we pulled up some videos from Optimus in our stand up.

everyone was very excited to see the robots dance and answer all these questions in a very human-like fashion. And I think...

Koen Ter Velde (06:19)
Yeah, but I think the most weird thing about the announcements were that Elon Musk called the robo fan Roboven. Roboven, I don't know why, but...

Bas Alderding (06:27)
Yeah, yeah, yeah, yeah.

Yeah, have this movie, science fiction movie. was recently released and I think also in Westworld, this sci-fi movie, RoboVanus or Roboven is the AI overlord of the world or something. I think it's called Roboven. Yeah, maybe that's the reason. But yeah, today we mainly want to dive into some of the

Koen Ter Velde (06:43)
Yeah.

Reboot.

or above and maybe that's it then. Yeah.

Bas Alderding (06:59)
Yeah, multi agent topics. And what inspired us was that OpenAI released something called Swarm. Yeah, OpenAI is called OpenAI, but they rarely, rarely release open models. But this again is an open source tool where you can quickly build agents. this was already possible long before the time that OpenAI released this. But I think this is a step.

for them to move into more in the agent direction because in chat GPT, for instance, you don't have a lot of, you don't really have the agent structure yet. Of course you can talk with the AI over there, but it cannot do tasks on your behalf in the background. So this is what I would call an agent.

You give it a task, it does some stuff, and it comes back later with an answer or it completed X amount of tasks for you. And currently, yeah, the state of chat GPT is mainly a question back and forth and maybe an action to an external tool in between, but not really that something works for you. This is what we call agents, like a tool that

does stuff for you in the background and yeah, can do a planning and then execution of those steps. And of course, we at SevenLab have been building agents for quite some time right now. And yeah, the next level of an agent is a multi-step or a multi-agent architecture where you have...

multiple AI agents that have their own plans, plan of action, and their own tasks to do, which you then combine into one big answer at the end. So just like you have a team of colleagues, each has their own task, everyone makes their own plan, but you have one output in the end and one question that comes in from the management, for example.

So yeah, we will not pull up an article about swarm from OpenAI, but a different article we have. Yeah, I think it's interesting to put it onto the screen and let us walk you through it to explain you a little bit more. Yeah, how this exactly works, but mainly also how this can be applied

a concrete business process, for example.

Koen Ter Velde (09:27)
Yeah, and then I think it's good to add that right now AI models are at a level that they can be primed to perform certain specific tasks really well, but they're not able to handle multiple different tasks in the same setup in the same model. And that's why multi-agent frameworks are

relevant right now because then you can create multiple agents which are really primed and trained and have tools to perform a really specific task. So if you look at this picture, this is an article from Llama Index and Llama Index is a data framework we're using to create agents with large language models. It's an open source framework. And they released an article about the multi-agents

set up in July this year. So this is a bit older than the swarm released by OpenAI this week. But here they made a setup where you have a concierge agent where you talk to and this agent then transfers your message to the orchestrator and the orchestrator only has the task to select the right agent to be used.

Each of these sub-agents have their own specific instructions, their own specific tools, and they communicate back to the orchestrator, which then communicates back to the concierge, and that goes to the actual end user. So if you look at the setup here, it's about asking your account balance when you have an account at the bank.

and look up stock prices. And they wrote down an entire script, which is detailed here below, where you can see how a person interacts with this agent. So you, for example, ask, okay, I would like to transfer money. Then this message is transferred or is put on to the orchestrator.

The orchestrator is first going to check, are you authenticated? Because when you're not authenticated, you're not allowed to transfer money. So based on your authentication status, the auth agent is selected. So the agent who is going to check if you are the person who you're saying to. So first, you need to provide your password in this example. It's a really simple example. So it's not like how it's going to work in the real world.

Bas Alderding (12:20)
Yeah, exactly. Like I wouldn't let an AI check if I'm authenticated. If we would build something like this, I think we would probably not put this into the AI agent, but in code to make sure all the data you might have access to is at least secured by fixed rules. Because we all know like

Koen Ter Velde (12:47)
Yeah.

Bas Alderding (12:48)
Some of the AI models are not 100 % accurate. And if they authenticate you falsely or they make a mistake, then yeah, it would be quite critical in this case, at least when transferring money. But yeah.

Koen Ter Velde (13:02)
Yeah.

Yeah, so this is just an example with a really high level implementation and authentication like BuzzSat is not going to be done by an AI agent yet. Yeah, not yet, not yet. So when you're linked to this auth agent, who does the authentication, you need to record your username, then your password. And when those two things match up, you're authenticated in this example.

Bas Alderding (13:13)
The best use case.

Koen Ter Velde (13:35)
Then based on this authentication state, this process of authentication is completed. And then you need to be linked again to the orchestrator who then knows, this user is authenticated. I can go on with the next step. And in this next step, the first step is to check the account balance because if you don't have money, can't transfer money. So then the balance agent is selected and this balance agent

can based on the ID of the user or the account can reply with the balance of the user. So based on the authentication state and also the fact that you know there's money in the balance, you can go on with the actual step of transferring money. So the agent, the balance agent replies with, hey, you have $1,000 of credits.

So you can transfer money if you want. Do you want to continue? Well, then the user says, okay, I want to continue. want to transfer money. So the orchestration agents, the one who's hovering above all the other agents then selects the transfer money agents. And this transfer money agents can then help you with transfer the money to another bank account.

And this example really shows that each of these agents have a really strict instructions and a really scoped task. And together they can facilitate the process of a transaction in a bank, for example, which is a theoretical example. It's not a real life thing yet.

Bas Alderding (15:23)
Yeah, and maybe like the orchestrator agent gives all these agents a task and they report back to the orchestrator, right? So they can work simultaneously or yeah. So you have to also your performance is not really impacted because the orchestrator just waits until.

it has all the feedback or thinks it has all the feedback and then it can come up with the final answer, for example. But maybe to give a practical example, because I just had a quick discussion with a colleague of ours and he was building an AI agent that was supposed to give support in an e-learning environment for one of our clients.

One of the problems was that we built an agent with a single task to answer questions about the e-learning to the end user. So someone was doing the e-learning, you might have a question about some of the theory explained in there. But one of the things that must not happen is that someone asks a question that's also a homework question.

And then the agent would reply with an answer. So in 80 % of the cases, it was going well with this single agent. And it wouldn't answer any questions about homework.

But now we had the issue that 20 % was still going wrong in this case. So what we came up with was actually a multi-agent structure where we have one agent that pre-checks if every question is a homework question or not. And that's its only job. And then

After that we have the final agent that has access to all the information that only gets activated when the question passes the homework check. in this case it doesn't happen simultaneously but sequentially. But still you can see the impact of.

a dedicated AI agent or a dedicated AI tool that has a specific instruction because then it's primed to handle certain tasks. And that's also when you get the best possible output from that particular tool. that also means that an agent with a very large instruction tends to perform less than

an agent with a very specific task or instruction. I think that's also, yeah, it's a real practical example, something we even did this morning, but it makes such a big difference when you really think this through for most of the practical solutions. So yeah.

Koen Ter Velde (18:13)
Yeah.

And I think the example you gave gives a clear distinction between because here in this example I talked about with the account balance and transferring money, there's one orchestrator that is directing all the agents to do their tasks. And the example you gave, it's more sequential. It's more like a data pipeline that always goes in the same direction.

Bas Alderding (18:48)
Yeah.

Koen Ter Velde (18:49)
And there's no overarching agent that controls the process.

Bas Alderding (18:54)
Yeah, so they don't really talk to each other, although the first agent passes the data to the second agent that gets called then, but they don't work in the background. And that's actually what you want in most cases. That's exactly what your normal human team does mostly, most of the times, or at least you hope they do, that they don't wait for each other. So yeah, that's a big difference.

Koen Ter Velde (19:01)
Yeah.

Yeah. Yeah.

Yeah, but I think it also really depends on what you want to achieve. So if you really need to make sure that, for example, it's not a homework question that is to be asked, and that's the go or no go for the rest of the flow, then it's not something you want to handle maybe with an orchestrator above.

Bas Alderding (19:42)
Yeah, yeah. And then maybe to, I think we covered the paper now for a bit. Do we maybe also have a couple of other practical examples that we can think of where we are going to apply this?

Koen Ter Velde (19:59)
Well, I think this is a really relevant case for support agents where you have one orchestrator or you have one support agent which handles the communication which is responsible for the right tone of voice, the right structure, the right language that then communicates with an orchestrator that's delegating the task to specific agents.

Bas Alderding (20:26)
Like, can I see the orchestrator as a manager in that sense? Like the classical structure where... Yeah.

Koen Ter Velde (20:30)
Yeah, it's some sort of router. It's a router who directs data or questions to specific other agents or colleagues. Let's call them AI colleagues. And in the support example, you can have one agent that is trained and linked to the knowledge base of the company. So that agent can answer questions about all the data linked to it.

You can have another agent that is a link to the ticketing system. So this agent can answer questions about tickets or maybe even create tickets.

You can have multiple agents that each have their own specific domain and tools. And as a user, your conversation is one with the actual support agent. But behind this agent, there are multiple specific agents.

Bas Alderding (21:28)
yeah cool and

Yeah, I guess this would work in a lot of different use cases, That also if you're talking to the support agent, but one tool can pull in some information from the knowledge base

with all the previous questions asked or the knowledge built by the organization, you can have another tool that pulls in the ticket times from the ticketing system you're using for support. You can have another one that pulls some contextual data from the customer or the history from

Koen Ter Velde (21:46)
No.

Bas Alderding (22:01)
from support requests from that specific customer. And then together, they make sure the orchestrator has the right info to give the best answer possible.

Koen Ter Velde (22:08)
Yeah. Yeah.

Yeah, but I do think that it's like working in a team. The more people you add to the team, the more communication lines there become. So the more risk of miscommunication or there needs to be a lot of communication to even make the team work. And I think the same applies to this setup where you have one orchestrator with, let's say 10 different sub-agents.

but the output of one agent can really affect what the other agent is doing. I think it's really, well, it's not necessarily the solution to just add 10 more agents to your multi-agent setup.

Bas Alderding (22:46)
No.

No, but if you have an orchestrator, the orchestrator can actually let the agent know that it needs to do the actions again or the lookup again, redo what it thought it was doing right. And I think that's also one of the main advantages. But it can take longer before you get an answer because you have all these back and forth between the agents. And I think in practice...

Koen Ter Velde (23:21)
Yeah.

Bas Alderding (23:26)
it would be more practical.

if the AI models answered a little bit faster, because most of the cutting edge models answer as fast as you can see in a chat GPT, for instance. But you also have other providers that provide inference as a surface, they call it. So inference is the AI talking. And they host all of these models, but can then do that output text or

images or whatever, like 100 times faster than what you would see in chat jpt. And then the communication is almost instant. And then it doesn't really matter anymore how much we communicate.

Koen Ter Velde (24:15)
Why does OpenAI then don't use such a tool?

Bas Alderding (24:20)
Yeah, I don't know because they don't have the dedicated hardware for it. They mainly focus on training new models, I think, and not really on the output of those models at the moment. I assume they are looking into that, but they have a lot of hardware to train models, which is then totally different from the hardware needed to run the models at speed or let the models output at speed.

Koen Ter Velde (24:28)
Yeah.

Yeah.

Bas Alderding (24:49)
So but I got promised by one of those providers called grog that in the next few weeks for us as a developer these Quick models are unlocked without any rate limits, which were applicable until now so I expect in the in coming weeks that we as a Developer get access to this and then yeah, we can build applications with instant output which would be

Koen Ter Velde (25:05)
Yep.

No.

Bas Alderding (25:18)
like dramatically improve the user interaction with these kind of tools.

Koen Ter Velde (25:24)
Yeah, yeah, because I think in the example I gave from the support agent, if you're chatting with an agent and it takes 20, 30 seconds for each answer to come, then at one point, or emailing, yeah. Yeah, yeah. Yeah.

Bas Alderding (25:36)
Yeah, or emailing. Like how nice would it be if you get an instant answer back instead of waiting days? And we're already there. Like now it takes two minutes or something. If you do a really comprehensive lookup. So I'm not talking about the chat interface, then you handle that a little bit differently.

Koen Ter Velde (25:49)
Yeah.

Yeah.

Bas Alderding (25:58)
But yeah, we do have applications where the processing time by AI is 10 minutes to generate a long form document of hundreds of pages, for instance. And how nice would it be if that's reduced to one second? Yeah. But I think for today, yeah, we just leave it at a.

Koen Ter Velde (26:15)
Couple of seconds, yeah.

Bas Alderding (26:21)
quick update from the news and this whole multi agent structure. And I would say if you're curious about how this can work specifically for your business, you can find our contact details in the description of either the podcast or on YouTube. And yeah, I think we, yeah, this is it for today, right? Yeah. Okay. Well, thank you, Koen

Koen Ter Velde (26:44)
See you next week, I think. Yeah. Sounds good.

Thank you, Bas. See you next week. Ciao.

Bas Alderding (26:52)
Yeah, no worries until next week.