Software Should be Free

# Summary

In this conversation, Tim Abell and David Sheardown explore the challenges and innovations in productivity tools and AI coding assistants and the overwhelming landscape of AI tools available for software development.

The dialogue delves into the nuances of using AI in coding, the potential of multi-agent systems, and the importance of context in achieving optimal results.

They also touch on the future of AI in automation and the implications of emerging technologies.

# Takeaways
  1. AI is reshaping the workplace, requiring adaptation from professionals.
  2. Understanding engineering problems requires a structured approach.
  3. AI coding tools are rapidly evolving and can enhance productivity.
  4. Providing clear context improves AI coding results.
  5. Multi-agent systems can coordinate tasks effectively.
  6. The landscape of AI tools is overwhelming but offers opportunities.
  7. Understanding the limitations of AI tools is crucial for effective use.
  8. Innovations in AI are making automation more accessible.
  9. It's important to balance AI use with traditional coding skills.
  10. The future of AI in software development is promising but requires careful navigation.
# Full details

In this episode of Software Should Be Free, Tim Abell and David Sheardown delve into the rapidly evolving landscape of AI-powered coding assistants. They share hands-on experiences with various AI coding tools and models, discuss best practices (like providing clear project context vs. “vibe coding”), and outline a mental model to categorize these tools. Below are key highlights with timestamps, followed by a comprehensive list of resources mentioned.
Episode Highlights
  • 00:05 – Introduction: Tim expresses feeling overwhelmed by the proliferation of AI coding tools. As a tech lead and coder, he’s been trying to keep up with the hype versus reality. The discussion is set to compare notes on different tools they’ve each tried and to map out the current AI coding assistant landscape.
  • 01:50 – Tools Tried and Initial Impressions: David shares his journey starting with Microsoft-centric tools. His go-to has been GitHub Copilot (integrated in VS Code/Visual Studio), which now leverages various models (including OpenAI and Anthropic). He has also experimented with several alternatives: Claude Code (Anthropic’s CLI agentic coder), OpenAI’s Codex CLI (an official terminal-based coding agent by OpenAI), Google’s Gemini CLI (an open-source command-line AI agent giving access to Google’s Gemini model), and Manus (a recently introduced autonomous AI coding agent). These tools all aim to boost developer productivity, but results have been mixed – for example, Tim tried the Windsurf editor (an AI-powered IDE) using an Anthropic Claude model (“Claude 3.5 Sonnet”) and found it useful but “nowhere near 10×” productivity improvement as some LinkedIn influencers claimed. The community’s take on these tools is highly polarized, with skeptics calling it hype and enthusiasts claiming dramatic gains.
  • 04:39 – Importance of Context (Prompt Engineering vs “Vibe Coding”): A major theme is providing clear requirements and context to the AI. David found that all these coding platforms (whether GUI IDE like Windsurf or Cursor, or CLI tools like Claude Code and Codex) allow you to supply custom instructions and project docs (often via Markdown) – essentially like giving the AI a spec. When he attempted building new apps, he had much more success by writing a detailed PRD (Product Requirements Document) and feeding it to the AI assistant. For instance, he gave the same spec (tech stack, features, and constraints) to Claude Code, OpenAI’s Codex CLI, and Gemini CLI, and each generated a reasonable project scaffold in minutes. All stuck to the specified frameworks and even obeyed instructions like “don’t add extra packages unless approved.” This underscores that if you prompt these tools with structured context (analogous to good old-fashioned requirements documents), they perform markedly better. David mentions that Amazon’s new AI IDE, Kiro (introduced recently as a spec-driven development tool) embraces this “context-first” approach – aiming to eliminate one-shot “vibe coding” chaos by having the AI plan from a spec before writing code. He notes that using top-tier models (Anthropic’s Claude “Opus 4” was referenced as an example, available only in an expensive plan) can further improve adherence to instructions, but even smaller models do decently if guided well.
  • 07:03 – Community Reactions: The conversation touches on the culture around these tools. There’s acknowledgment of toxicity in some online discussions – e.g. seasoned engineers scoffing at newcomers using AI (“non-engineers” doing vibe coding). Tim and David distance themselves from gatekeeping attitudes; their stance is that anyone interested in the tech should be encouraged, while just being mindful of pitfalls (like code quality, security, or privacy issues when using AI). They see value in exploring all levels of AI assistance, provided one remains pragmatic about what works and stays cautious about sensitive data.
  • 29:57 – Models + 4 Levels of AI Coding Tool: Tim introduces a mental model to frame the AI coding assistant ecosystem (around 29:57). The idea is to separate the foundational models from the tools built on top, and to classify those tools into four levels of increasing capability:
    • Underlying Models: First, there are the core large language models themselves – e.g. OpenAI’s GPT-4, Anthropic’s Claude (various versions like Claude 1.* and 2, including fast “Sonnet” models and the heavier “Opus” models), Google’s Gemini model, as well as open-source local models. These are the engines that power everything else, but interacting with raw models isn’t the whole story.
    • Level 1 – Basic Chat Interface: Tools where you interact via a simple chat UI (text in/out) with no direct integration into your coding environment. ChatGPT in the browser, or voice assistants that can produce code snippets on request, fall here. They can write code based on prompts, but you have to copy-paste results – the AI isn’t tied into your files or IDE.
    • Level 2 – Agentic IDE/CLI Assistants: Tools that deeply integrate with your development environment, able to edit files and execute commands. This includes AI-augmented IDEs and editors like Windsurf Editor (a standalone AI-native IDE) and Cursor (AI-assisted code editor), as well as command-line agents that can manipulate your project (like the CLI versions of Claude Code, OpenAI Codex, or Gemini CLI). At this level, the AI can read your project files, make changes, create new files, run build/test commands, etc., acting almost like a pair programmer who can use the keyboard and terminal. (For example, Windsurf’s “Cascade” agent mode and Cursor’s agent mode allow multi-file edits and running shell commands automatically.)
    • Level 3 – Enhanced Context and Memory: Tools or techniques focused on feeding the model more project knowledge and context (sometimes dubbed “context engineering”). The idea is to improve the AI’s understanding of your codebase by supplying documentation, requirements, or summaries in a structured way. For instance, some setups use special files (like a Claude.md or project brief) or memory windows to inject relevant information into the prompt. Tim mentions he experimented with Windsurf’s Memories feature (which lets you pin important notes for the AI), and techniques like giving the AI an architecture overview so it knows, for example, “all parsing should happen in Parser class, not in the Executor.” In theory, this level yields better coherence and adherence to architecture by giving the model a persistent knowledge base. In practice, Tim admits his results with memory/context features have been hit-or-miss so far – though some users report great success when this is done right (e.g. Anthropic’s Claude is known for handling large context windows). Better context management is seen as an area where more refinement is needed, but tools are emerging to help (even the base models are evolving to handle 100K+ tokens).
    • Level 4 – Multi-Agent Orchestration: The cutting edge is having multiple AI agents with specialized roles collaborating on tasks – essentially an “AI team.” This might involve one agent acting as a code writer, another as a tester, another as a project manager, etc., coordinating via a framework. Tim notes that this space is just beginning to be explored, and it’s hard to know how much of it is hype vs. real productivity gain. Nevertheless, they mention a few examples: AutoGen Studio (a Microsoft open-source tool for prototyping multi-agent workflows), a terminal tool called Claude Squad (which can run multiple Claude or other agent instances in parallel sessions), and the LangChain framework (commonly used to chain together LLMs and tools, often cited for agent coordination). These solutions aim to let agents divide-and-conquer coding tasks or feedback on each other’s work. Tim hasn’t personally “gone to level 4” in his workflow yet – it’s a bit uncharted – but it’s an area to watch as some on social media claim big wins by letting agents handle entire projects.
  • 37:05 – Rapid Evolution of Copilot and IDEs: The discussion returns to GitHub Copilot, noting how much it has improved from its early days. Originally Copilot was a simple autocomplete on a single model; now it has a Chat mode and even an “Agent” mode (aka Copilot X) that behaves more like Cursor/Windsurf. In VS Code, Copilot can now browse the project, edit multiple files, and follow high-level instructions, not just line-by-line suggestions. David mentions that in VS Code you can even choose between multiple underlying models for Copilot (e.g. GPT-4 or Claude 2, etc.), and Visual Studio is catching up as well. An extension called AI Toolkit in VS Code further allows power-users to play with many models side-by-side (including hooking up local LLMs like Meta’s Code Llama via quantized versions, if you have the hardware – though David’s attempt with a smaller quantized model showed its limitations). Essentially, the gap between official tools like Copilot and third-party ones is narrowing as features converge. They joke how fast everything moves – “one month” in AI feels huge – features like agent modes that started in Windsurf/Cursor quickly made it into mainstream tools.
  • 58:00 – Agents Taking Actions: By the late stage of the episode, they marvel at how far the “agentic” abilities have come. One anecdote: tools now exist (e.g. some VS Code agents or Prototypes in Copilot) that can autonomously browse the web or perform actions on your behalf given the permission. For example, an agent could log into your Salesforce or check an external API to fetch data needed for coding a feature – basically doing the boring data gathering or repetitive setup tasks for you. This blurs the line between coding assistant and general AI agent. It’s powerful but a bit unnerving – you have to really trust an AI to let it, say, place orders online or manipulate production data! The hosts recognize real utility in offloading mundane tasks to AI, as long as there are safeguards and oversight.
  • 1:00:00 – Conclusion: Tim and David wrap up by acknowledging how many different tools and models they’ve mentioned (indeed, a dizzying number!). This explosion of options is “not even scratching the surface” – new entrants seem to pop up every week. It’s challenging for developers to know which tools will last or prove truly useful, and which are just hype. Their advice is to keep exploring and sharing notes. They anticipate the landscape will look very different even 3–6 months from now, so a follow-up discussion will be needed. In the end, they express cautious optimism: staying on top of AI coding tools is effortful but likely worth it, as these assistants could meaningfully improve developer productivity if used wisely.
# Resources Mentioned

🤖 General LLMs and Interfaces
  • ChatGPT (OpenAI) – Web-based chat interface for GPT-3.5/GPT-4.
    👉 https://chat.openai.com
    ℹ️ https://openai.com/chatgpt
  • Claude (Anthropic) – Claude 3 models including Opus, Sonnet, Haiku. Used in Claude Code and Windsurf.
    👉 https://claude.ai
    ℹ️ https://www.anthropic.com/index/claude-3
  • Claude API (Anthropic Developer Hub) – Documentation and access to Claude API.
    👉 https://docs.anthropic.com/claude
  • Gemini (Google DeepMind) – Google's LLM family (Gemini 1.5 Pro, Flash, etc.)
    👉 https://deepmind.google/technologies/gemini/
  • Gemini CLI – Command-line tool to interact with Gemini LLMs.
    👉 https://github.com/google-gemini/gemini-cli
🧠 IDE & Coding Tools
  • GitHub Copilot – AI pair programmer for Visual Studio, VS Code, Neovim.
    👉 https://github.com/features/copilot
    ℹ️ https://docs.github.com/en/copilot
  • Copilot Chat in VS Code – AI agent chat integrated with file editing and IDE actions.
    👉 https://code.visualstudio.com/blogs/2023/07/20/copilot-chat-preview
  • Cursor Editor – AI-native code editor built on VS Code with Claude/GPT support.
    👉 https://www.cursor.sh
    ℹ️ https://github.com/getcursor/cursor
  • Windsurf Editor – Claude-powered standalone AI IDE with agent mode (aka "Cascade").
    👉 https://windsurf.ai
    ℹ️ https://github.com/windsurf-ai/windsurf (may be outdated or archived)
  • Claude Code – Anthropic's CLI tool for agent-style code generation/editing.
    👉 https://github.com/anthropics/claude-code
  • OpenAI Codex CLI – OpenAI’s coding CLI interface (precursor to GPT-style CLI agents).
    👉 https://github.com/openai/openai-cookbook/tree/main/examples/Codex-CLI
    (Note: Codex was deprecated, replaced by GPT-4 and GPT-4-turbo models.)
  • Manus – AI coding agent (autonomous mode) launched by Monica team.
    👉 https://manus.im
    ℹ️ https://github.com/manus-ai/manus (if open sourced)
  • Kiro (AWS) – Amazon’s AI development assistant/IDE with spec-first design.
    👉 https://kiro.aws
    ℹ️ https://aws.amazon.com/codewhisperer/kiro/
🧩 Multi-Agent Systems & Context Engineering

Creators and Guests

DS
Guest
David Sheardown

What is Software Should be Free?

Tim Abell & (sometimes) David Sheardown look for ways to help you solve your problems with computer things.
Send us a voice message: https://www.speakpipe.com/ssbf

- https://0x5.uk/
- https://twitter.com/davidsheardown

Tim Abell (00:05)
Hello and welcome to Software Should Be Free with myself and my buddy David Sheardown. Hello David. Hello. Right, I want to talk about AI coding tools because I am feeling completely overwhelmed. I have been desperately trying to keep up with what the hell's going on as a coder and tech lead and someone who wants to build side projects and what have you. I absolutely cannot ignore.

David (00:11)
Hello there.

You

Tim Abell (00:35)
the AI hype, much as I would very much like to. So I've tried Windsurf using Claude, not, what is it called? Yeah, Claude, Sonnet 3.7 or something, and that was reasonably good, but nowhere near the like 10x that everyone was talking about. And I've been following a bunch of the LinkedIn.

influences and things it suggests to me and it's like a super polarised debate. There's like a bunch of people saying it's all hype, there's very little substance here and there's a bunch of people repeatedly saying this is amazing, it's changed our code, we're all so much more productive. I'm not quite ready to dismiss all that out of hand. What I kind of wanted to go over today is like you and me David have been both independently like

trying out different tools and kind of trying to get a lay of the land. So I wanted to compare notes on what we've been trying and what we've been figuring out. yeah, give me a bit of intro on like what's your path into AI and what's been catching your interest and what your current approach is.

David (01:50)
Yeah,

yeah. Well, obviously, you know, because it's, you know, it's been a quite a Microsoft centric sort of stuff. Obviously, GitHub Copilot, ⁓ you know, is my ⁓ sort of go-to because just that's what I've got used to. It's not wrong, bad or indifferent. In fact, a lot of the models that have come along now, you know, you've got Claude Sonnet 4 in there.

in the mix and all that sort of stuff. And to be honest, it does for me, that does a reasonable job for how I use it. And then there's the kicker, isn't it? But I've played around with Claude Code and OpenAI's Codex and Gemini CLI of late as well. There's others as well, even Manus I think I've had a bit of a play with.

But the command line tools are interesting, but you can see where there's an extra little barrier for some people. Although I don't think it is really, you know, just using it from a command line, it's fine. You're just really putting instructions in on you in a different way. But I think behind all of this of late, what I've, I think had more success. Now it depends, because if you're...

if you're already in an existing application, like I've got one at the moment, I've got an application that's got a web front end, it's actually quite an old, it's just native vanilla JavaScript and believe it or not, I think it's even got a bit of jQuery in there. But it's CSS, HTML, you know, which isn't going anywhere soon anyway. But interestingly, stuff that I wanted to...

make changes to or some additions. Of course, everything comes down to the prompt and the context that you're giving it. But that has actually done a really decent job for me. It's done stuff in JavaScript that I sort of remember, but then I've forgotten because if you're not using it every day, I don't use a lot of this stuff in my day job, as it were.

But to do that stuff, it's generated stuff within minutes that would take me a few hours fumbling around, if I'm honest. And certainly when it got to CSS, like how can I make this CSS cleaner and all that sort of stuff? And I know enough of CSS to know where it could go wrong. I'm not saying that I wouldn't detect anything, but...

Tim Abell (04:22)
Mm.

David (04:39)
But for existing apps, that's what I've been using GitHub Copilot because it's just there. And like I said, it does a reasonable job for that moment. if I was to write, like I've tried, I've tried to write a couple of apps being very specific with like a PRD, a product requirements document. So as you've seen and everybody else has probably seen is that

you have these sort of PRDs or requirements specs or whatever, but all of these coding platforms, Claude Code, Windserv, Cursor, the GUI IDEs, as well as all those command line tools, all take custom instructions marked down, don't they? So if you can, and again, it's sort of weird because it's sort of like, well, isn't that what we did and do?

Anyway, it's like when you're building an application, unless you're just playing around with a toy, you want to get the requirements straight. know, so where I found it really worked very well is where I gave a PRD, basically. ⁓ I said, look, this is the stack I want to use. And I decided, OK, I want to use a .NET Web API backend or whatever, you know, flavor you want to use.

the latest view at the front end, know, maybe tailwind or something like that for the CSS style or whatever it was. But so I gave it the stack. I gave it very clear overview as you would when you're writing a requirements document. And there's the challenge. I think if you're not used to that, well, that's where you're missing, you know, you're missing a huge benefit of what the AI can do. And I've tried it now.

the same PRD within Claude Code. I think I tried it in OpenAI's Codex, the command line version, because I haven't got 200 quid a month ⁓ to have the pro version. And I think recently I tried it in Gemini CLI as well. They all did slightly different variations, but not much, which was surprising. But it did stay to the frameworks that I said. And I said, don't use

Tim Abell (06:55)
Hmm.

David (07:03)
any additional packages unless you ask me, you know, and I can verify what I need. But it did actually build a reasonable attempt. And I think that that's where I can see and I think everybody else, we talked about Kero a little bit, the Amazon one. It seems to be, you know, context first, you know, rather than, dare I say, vibe coding, where it's just one shot prompts and

Tim Abell (07:15)
Mm

you

David (07:31)
things like that. Look, it's okay for a bit of fun, you know, I created a lunar lander game in HM, I say I, I did, I, but I literally just did that through just prompting and it came out with a reasonable effort, but that was just a bit of fun. But I think going back to the context stuff, yeah, I think, you know, if you can provide that ultimately it's, what would you do if you wanted to go outsource this to a

company to build something. You'd have clear requirements, wouldn't you? So I think that's where I found all of these tools are slightly better in certain things, slightly worse. Again, you've got the money, Claude Opus 4, but it's only available in the expensive plan, I believe, on ⁓ Anthropic site.

Tim Abell (08:02)
Hmm.

Yeah.

David (08:28)
from what I've seen, and I've only seen it secondhand, because I, again, I just don't have that type of cash to throw at those sorts of things. But it seemed to really, really sort of up the game of how it's actually developed the code, and it stuck to those rules, those custom instructions very clearly. But I'm noticing even with the...

So much, not so much the free versions, because you run out of tokens very quickly. Gemini CLI seems to be one of the most liberal ones with a free plan. But even with like, you know, the 20 quid a month type plans or, you know, like say even with copilot 10 US dollars a month or something, you know, the access to those models are doing, you know, I think a really reasonable, you know, ⁓ sort of job. But

It's picking what works for you, think, isn't it, ultimately?

Tim Abell (09:28)
Have you, so you talked about ⁓ sort of old things and existing things and being able to go quicker with stuff you're less familiar with. You talked about like new things that kind of a bit throwaway, like getting something smallish from scratch. I did something similar. I wanted a Cosmos DB emulator.

And I thrashed out... Which way around did I do that? Well, anyway, I basically got Windsurf with ⁓ Claude... Summit three and a half or something to write it and it, I think... Did I... I might have used GPT for some of the early design and I did... Like I... I hadn't called it the PRD but I take a similar approach which is...

I create a readme that explains what the thing is, like I would anyway. ⁓ I like the ARD, the architecture requirement document. No, ADR, sorry. Architecture Decision Record. Making things up, just like an LLM. I've liked that for quite a while, of like, this is why we made a decision to do this particular thing in technology and...

David (10:45)
Hey!

Tim Abell (10:56)
this is what else we considered, is why the trade-off's so that it solves the problem of someone coming back later and going, can we change this? This is wrong. And you're like, go read the ADR for that. And either the assumptions that were written down in it still stand. And no, we shouldn't change it. And you can argue against it if you wish. But with a good base of knowledge of what was discussed at the time. Or, well, we chose it because of this. This is no longer true.

yes, we should change it. Maybe prioritize that. And that was reasonably effective. So initially, it looked like quite a big speed up. It did create ⁓ some ⁓ remarkable amount of half decent code at reasonable speed. It was a bit vibe code-y because I was trying to get it done.

very fast because I wanted to use it for a client project and if it took too long there wouldn't be any point and I was only doing it in evenings and what I found was that it kind of a bit like a cheap overseas outsourcer it would kind of push the boundaries when it came to doing what I'd asked so it

It did some, it created a lot of good stuff at speed, it, because it knew the Cosmos DB API surface area, it actually already knew more than I did about how to replicate that in a fake NuGet package, as in a NuGet package that contains a fake Cosmos, not a fake package. This package is fake.

David (12:43)
Yeah.

Tim Abell (12:50)
So it kind of did some good stuff, but it also did some really crazy stuff. was test driven. It would not manage to make the test pass. it would detect the test was passing. Sorry, it would detect the test based on some input values and then it would just put the right output value. So it would made the production code entirely just for a very specific case in the tests, which is ridiculous.

and did the thing of apologizing profusely, which is just bizarre. And the other thing that it did is it didn't really manage to stick to the architecture very well. And this is, again, before I'd even heard about some of these newer approaches, but it would... This particular library had a query parser because you can write SQL for Cosmos. So it had to understand that.

And then it had a query executor which took the parsed representation and would actually run the queries using link on the memory list that it had. And one of the things that it got wrong that I didn't realize for quite a while while it was coding was because the tests were parsing, because there were more kind of integration tests. There were unit tests of the bits as well. It was putting some of the...

It couldn't work out how to do some of the parsing using the parser library I was using So it put some of the parsing logic in the executor So it basically would parse it into kind of big chunks and then something that should have been parsed into tokens It didn't it just passed as a string and then there was a bunch of parsing logic that it stuffed into the executor I didn't catch that and it took blooming ages to unpick that and Trying to persuade it to unpick it because it basically didn't really know how to implement it

David (14:37)
Mmm. Mmm.

Tim Abell (14:41)
Every time I tried to get it to put it back in the right place, it just couldn't do it. So that cost me quite a lot of time. And I think that's, from what I've seen so far, that seems to be a bit of a weakness of these things when you get into these complexity areas is like getting it to keep going with some of these slightly more complicated things. And I had a similar issue on a client code base where I was using an HTTP mocking library.

David (14:45)
Yeah.

Mm.

Tim Abell (15:11)
to simulate an endpoint for a test, a third party endpoint for a test. And it would, and there was, I put an abstraction over the top of it to make the tests cleaner. So instead of like having a whole load of boilerplate in the test for how to set up this mock, it was a higher level of abstraction for what we cared about in this API. So it was nice and clean, but it just never quite got the hang of the fact that even when I'd repeatedly tell it and refer it back to what I'd said before.

It kept writing the raw mock library calls in the test code, which then didn't compile because it was doing it in the wrong place. It wasn't updating the abstraction with what it needed. So that was kind of frustrating. I, know, part of the reason that I've continued to explore is like, I keep hearing people online saying, that's so productive.

David (15:53)
Hmm. Yeah.

Tim Abell (16:08)
and I'm aware that the state of the art is moving so fast and it's really hard to unpick what's real from what's... So have you, I mean, I've heard you talking before on your show and on your Just Five Minutes podcast and what have you about some of the tools you've been coming across. ⁓ How much have you pushed the AI in terms of doing kind of more complicated stuff?

David (16:18)
Yeah, yeah, yeah, yeah.

Tim Abell (16:37)
Have you run into any of these kinds of behaviours? And have you been trying any of this more advanced, like multi-agent and the... I don't know what you call it, but you know where you have all these files that tell it what's what. How much have you done with this?

David (16:52)
Yeah, yeah, yeah,

yeah. Well, certainly, you know, because they're all getting much of a muchness with all the agent mode and stuff like that. So where I've used the agent mode, ⁓

Tim Abell (17:05)
So hang on,

so by agent mode, you mean its ability to directly edit the files and right, okay. So that was...

David (17:10)
Yep. Directly

edit the files, the command prompt, you know, if you're brave enough, of course. But ultimately that's where, you know, that's where some of these gains are. But yeah, I guess the thing is in my sort of day job, I don't get to come across those more, maybe really advanced stuff. Although I have had some... ⁓

Tim Abell (17:15)
right

David (17:40)
where I've used the agent mode to actually, like I said, where it's saying, well, I need to add this, a new page to the website. It's got to have this, it's got to have that. And the agent mode has gone away, built everything, looked at my code base and said, well, this is the style that you're using. Even though it could, like I said, there's some old even jQuery stuff that was in there, which it's not.

breaking anything, you know, for what it needs to do, it's okay. Yes, it could do with being upgraded, but it looked at my code base and actually matched it. And I was actually quite surprised because it could have easily gone off and actually said, ⁓ well, this is a better way of doing it. We'll do it natively with all the latest JavaScript. You know, you don't need to do that as jQuery, but it did. It kept it because I asked it to. I said, you know, keep the same,

you know, here's an example page that I've got to do xyz ⁓ and some of it was fairly complex. It's like dynamic data grids that it creates and things like that. So it's not just trivial stuff. ⁓ It's not rocket science either, but where I just left it to run, ⁓ it might have taken about 10, 12 minutes or something. yeah, mean, there's times when

Tim Abell (18:53)
Hmm.

David (19:09)
And to be honest, in the stuff that I've tried, and some of that was relatively complex in fairness, ⁓ but I sort of found that it did actually give me that big performance boost because even though there was a couple of things, there was a few minor things that I thought, well, okay, I'm not quite sure of that because I usually do it like this, but as soon as I corrected that, ⁓ you know, it did sort it out, but...

You're absolutely right. think it's, how do you get it? And it's, I suppose it's like, if you were working in a team with other developers, I was gonna say junior developers, but it doesn't, it's not even that. Sometimes there's gonna be misinterpretation, but you're right. There are gonna be things where you think, like you mentioned, writing a whole implementation in a test, where it really should be the...

the test stubs or whatever you want to call it, you know, the abstraction is wrong and all that sort of stuff. ⁓ the interesting thing, like I said, where I've seen it, but only secondhand so far, some of the ⁓ more... I was going to say, well, they're not because they're changing so much, but some of the newer models, like I said, Claude, Sonic 4 and Claude Opus 4, which...

Tim Abell (20:12)
Hmm.

David (20:37)
The Opus 4 one, like I said, you have to be in you know, the lots of money a month club for that one. But what I've seen and the way that it reasons and that agent mode working, it's been really trained specifically on coding. So it is an LLM still, but the way that it's been trained has been very specific to light coding. you know, some of the components

Tim Abell (20:45)
you

David (21:06)
Fairly complex apps that it can build do look like say this is secondhand. So it's not ⁓ it's not like, you know, really tangible on your on your computer, but ⁓ But it does look as if that is like another level up, you know, ⁓ But yeah, I But like you said, I think it goes back to I tend to the agent mode stuff is good when you've got a very clear cut ⁓

Tim Abell (21:13)
Hmm.

David (21:36)
set of instructions, like I say, set of requirements, and you are really clear about don't deviate from this. Agent mode works really well, but I tend to find if you're just building an app from scratch, then again, as long as your requirements are really clear about the tech stack and the features and like I said, all the other bits that go along with when you would write a spec anyway.

then it can do a reasonable job. But I tend to find that even though it's tempting, isn't it, just to click a button and go agent mode, hey, you know, it's going to do this. ⁓ I tend to do it for the initial scaffolding of stuff. But then I tend to like drip down into just chat rather than agent mode for maybe the refinements, as it were, because

Tim Abell (22:20)
Mm-hmm.

Right.

Mm-hmm.

David (22:32)
I have seen it where in agent mode I've said, you know, I want to add this feature, let it, I want it to do this, this, this. And it's gone and done that. But in the meantime, like you said, with your Cosmos emulator, it's gone and changed stuff that you don't see straight away. And it doesn't necessarily break tests. It doesn't necessarily break the app. And then you look at it thinking,

Tim Abell (22:49)
Yeah.

David (22:56)
the heck did he do that for? know, it's like then you roll him back and there's another thing which I know you know I don't need to say anything to you about this but is you know, the ultra well, it always was important source control, wasn't it? But but literally, you know to have branches or you know, at least version control there So if it goes and does that stuff up, you can roll back in a fine-grained way to not lose you know

Tim Abell (22:58)
Mm-hmm.

Hmm.

David (23:26)
lose the whole stuff that you've built. And that's the problem when it's an agent mode. It's doing so many different things that in one sweep, it's not necessarily doing little source control checkpoints along the way. don't know. I haven't tried that if it would, but I suppose you would, you? know, yeah. Yeah. I was going to say, think if you said, look, after each, I don't know how you'd partition it, but you know what I mean? If there was a logical...

Tim Abell (23:41)
I expect you would if you gave it permissions and tell it that's how to operate. You could get it to create commits. I was thinking about doing that.

Yeah, just commit every

change. Let it run.

David (23:57)
Yeah, commit every change.

Yeah, yeah, yeah. I mean, you're going to have a big history, then that's not a bad thing, is it?

Tim Abell (24:03)
Yeah. But if you do

that on a branch and then you merge it in as one feature, that could be okay.

David (24:09)
Yeah, true, true.

Yeah, yeah. So I think there's the things. think it's all back to, I mean, you're a real developer. I'm not, but you know, but you can see, and this is where I think, isn't it? Where, you know, if you're just using these tools without, and I don't want to sound elitist because I think what AI is doing is giving, you know, it's giving people that, you know,

Tim Abell (24:33)
Mmm.

David (24:39)
don't necessarily have all of those skills up front. And there's no harm if you're just building stuff vibe coded, ⁓ know, it's like I said on one of the podcasts before, you know, it's like, you you got to start somewhere, you've got to build a little bit of Lego, work out what doesn't work, what works. But if you are really then interested in that, instead of just being, dare I say, lazy and just sitting back and thinking, hey, look, it built that, but...

Is it, is it right? Is it secure? Is it blah, blah, You know, but if it, if it then gives you that boost to actually want to go deeper and find out what it's doing about this and that and the other, then I don't think that's a bad thing. It's like, it's like learning.

Tim Abell (25:22)
Yeah, totally. Lots of

people get their start fiddling with bit of HTML and be like, I can change something. And then they realise that they need a server to do more stuff and go learn that. Yeah, I think that it's making it more accessible. And I think that's a really good thing.

David (25:27)
Yeah! Yeah! Yeah!

Yeah.

Yeah,

yeah, yeah. I think, but again, I think the guardrails and stuff that, know, like especially an enterprise company, but any company really, is like you said, just having that common sense and that oversight. And even though, you know, dare I say, you know, there's some of the JavaScript,

If I looked at it, I'd have to look at it a bit deeper to actually fully understand some of the stuff, but you sort of know enough to know that it looks right. You've got, you've got that feeling that yeah, it's not so far off your, you know, radar. ⁓

Tim Abell (26:15)
Yeah.

Yeah, it's not enumerating

all the IP addresses on the internet and trying to request... robot.

David (26:24)
Yeah, yeah.

it was one, you you're hearing loads of these things. I heard one just the other day where ⁓ I think it was Lovable or Reaplit or one of those ⁓ online ones where using like Superbase or something like that, you know, the Postgres database. And I think it was like 12 days into this app. Sorry?

Tim Abell (26:44)
was this the, it dropped my database? Was

this the one that dropped my production database? And then apologized. Oops.

David (26:49)
Yeah, yeah, exactly. was it. Yeah. So, mean,

oops, you know, that's like a, but you see, that's the thing, isn't it? It's those types of things where you sort of got to know enough of what's going on to be able to look for those things and look for those. like you said, with all of these, because there's GitHub Spark now, you know, which are these, again, a bit like a lovable, you know, from GitHub and stuff like that.

And they're all great. They're making like micro apps or something. think they're calling it because they're all running essentially in sandboxed mini VMs or whatever. I don't know the infrastructure, but it's a bit like lovable and all that sort of stuff. But interestingly, it then raises quite a few other questions, doesn't it? Like about, you know, because we're in the UK and Europe and all that sort of stuff is GDPR like.

If you've just got an app just to take everybody's email addresses and you haven't got any way of looking after that, managing it or even giving the notifications about, you know what mean? There's a lot of stuff that you can come unstuck with very quickly, can't you? You know, so, yeah.

Tim Abell (28:00)
Yeah, yeah, definitely. And like,

I definitely wouldn't like take the toys away because of it, because it's, it's not, it kind of accelerates a lot of things, but you know, businesses have always been better or worse and people make mistakes. And like, if you expect perfection out of the gate, you're just going to end up with nothing. ⁓ And that's, that's not better. You know, people have to be educated on GDPR and security and what have you. And the more conversations about that, the better. ⁓ I have to say like,

David (28:22)
Yeah. Yeah.

Yeah.

Tim Abell (28:30)
It's definitely important to say about the tone of discussion. One of the things I've noticed on LinkedIn, in my feed at least, is that the debate is extremely polarised. And if you move then on to Reddit, some of it's quite toxic. Like you're saying, there's a bit of hating on vibe coding. mean, engineers are a bit renowned for hating on non-engineers. It's a bit of a weakness of the profession, and it's rather a...

David (28:48)
Yeah.

Tim Abell (28:59)
unsightly side of things. And I think it's important to say, like, I, you know, I just want to understand the tools, what works, what doesn't work. And I've always taken the view that anyone who's interested in the tech, like I just want to help them do whatever they can do with it. And, you know, I'll try and warn them about any pitfalls like GDPR and security issues and what have you. And it's tricky because

You know, people can only take on so much at once. And because, like you say, I've been deep, neck deep in it all for so many years, you know, a lot of the things that I have come to see as best practice, like that didn't come overnight, you know, I did it without it and was like, that sucks. How can I fix that? And just repeat. So, you know, if I just like give a complete download of all of that.

David (29:31)
Yeah, yeah, yeah.

Tim Abell (29:57)
to somebody who's just getting started, it's completely overwhelming. ⁓ But I definitely want to encourage all newcomers. So I'm starting to get from the sort of discussion we're having, I'm starting to get a bit of a mental model to frame kind of what's available. So we've got the underlying models, which is starting to be, basically that's quite a separate thing. So the models like GPT, Claude Sonnet, Gemini,

David (30:04)
Yeah, yeah, yeah.

Tim Abell (30:26)
all the local models, mean, there's tons of them. ⁓ And then you plug those into various tooling that allows you to do the job. And I'm thinking of like a maturity model or like an advanced, what would be like simple to advanced, like levels of maturity. So like level one would be interacting in a web browser or like a voice agent on your phone or something where it's literally just text back and forward, you know, and it can produce code snippets, but you can only look at them on the screen.

Level two would be the agentic thing, is like it can now, whether it's an IDE or whether it's the CLI, kind of doesn't matter, but it's like either way it can edit files, it can run command line commands. So it can search for files, it can see what's in them, it can modify the files. So that's like level two. And then from what we're talking about, the kind of bleeding edge of

what everyone's talking about with AI ⁓ coding specifically. Level three, I'm thinking is like the ability to feed it good context. So a Claude.md file, PRDs, Readmes, ARDs, and kind of having that tidy. And I think there's a nuance about how it's injected as well. ⁓ So I was listening to a, was it YouTube? I it was YouTube, about ⁓ using Claude code.

David (31:39)
Yeah, yeah, yeah, yeah.

Tim Abell (31:56)
and they saying something along the lines of some of this stuff gets injected along with the system prompt which gives it a different priority. I might be wrong about this, don't quote me. Because I noticed in Winsurf, like it has a memories feature which I think is a similar kind of thing. Just to go off on a little bit of a tangent, I didn't have lot of luck with memories. Like I did try using memories to say use the abstraction layer in this test library.

David (32:06)
Okay, yeah.

Tim Abell (32:26)
I tried using it to say all of the parsing goes in the parser. Duh. And it still didn't really improve matters. That was just my experience on that. So that's level three, is better context. I kind of feel like I haven't nailed that one. It sounds like people are getting really good results. I personally haven't had amazing results yet. And then level four is the multi-agent stuff. So the coordination, giving agents roles and...

David (32:31)
Yep.

Tim Abell (32:55)
This is the stuff that I'm kind of becoming aware of, but I haven't even got a clue where to start. Does that sound like a good mental model for the space we have?

David (33:05)
It is,

I think so. I think, I think the, the, the, the key bit there is, and like I said, no buzzwords, buzzwords, buzzwords, you know, context engineering or whatever you want to call it. doesn't matter what moniker you give it, but that, that is the, and like you said, the nuance of how it injects that from the, the, way that you describe the system prompt and stuff. think that, yeah, it's a very good, good view of the mental model for sure. And I think.

Tim Abell (33:34)
Right.

David (33:35)
That's where, like I said, even in my limited sort of view of working with this is once you're very detailed, like I said, it goes back decades, doesn't it? Writing a good requirements document is always, well, always is maybe not the right thing, but generally going to give you better results of, know, because it's very clear, you know, you've got very good

Tim Abell (33:52)
Mm.

David (34:03)
guidelines about what to do. And also, of course, in that requirement, it's about what not to do as well. Don't include, like I said, like I did with mine is I've made it very specific to say, if you think you need extra packages, whether it's JavaScript packages or NuGet packages in .NET or whatever it is, ask me first because I want to double check those things. So you've got

You've got as much or as little oversight as you want to give. But I think, no, going back to what you said, yeah, that's a pretty good mental model, I think.

Tim Abell (34:44)
Okay, so in terms of the landscape of tooling that's out there, just an overwhelming amount of stuff going on. I think I'm starting to be able to fit it into these buckets. obviously, chat gbt fits into the web bucket. Windsurf fits into the agentic, the IDE automation bucket.

David (34:45)
Yeah.

Yeah. Yep.

Tim Abell (35:08)
Some of the recommended ways of using Claude code seem to fit into the third bucket of ⁓ context engineering. Perhaps it's the right term, I don't know. ⁓ And then there are some specific tools ⁓ like Autogen Studio, there's a repo called Claude Squad, ⁓ someone's mentioned something called Lang Chain for coordinating multiple agents.

David (35:33)
Yeah, yeah, line chain, yeah.

Yeah.

Tim Abell (35:38)
It seems like people are getting the best results based on the hype machine of social media when they go all the way to level 4. I can't entirely pick apart whether they're just, like, hyping it. I mean, it's hard to disentangle incentives because, you know, if I wanted to be more successful, I would probably rebrand myself as, the AI Whisperer and tell everybody I could do their projects in five minutes flat.

David (35:52)
Hmm.

Tim Abell (36:08)
which, but I wouldn't do that unless I was damn sure I could make it work.

David (36:08)
Yeah.

Yeah, I was

going to say confident that you could. Yeah, yeah, yeah, yeah. Yeah.

Tim Abell (36:17)
And maybe that's a failing of me in business. don't know.

But so I will try to pick apart like which tools should I be pursuing? Like my task this morning was to have another crack at playing with Claude code to see if that improves on my experience with windsurf, especially in the light of windsurf's bizarre acquisition and stuff stuff.

David (36:22)
Buh-bye!

Tim Abell (36:43)
⁓ and I think they might be potentially getting left behind a bit technically as things are just moving fast. so many things. ⁓ one... this is a complete tangent. You mentioned copilot. So, GitHub copilot, so you saying that you can use that with any of the backend models now? Or most of the...

David (37:05)
Well, lot of them, and you can even use it, and you have been able to for a while, actually, is you can use like Olamo and local models as well. ⁓ But yeah, it's Yeah, in a well actually copilot in Visual Studio, but VS code, because I think VS code, I guess I'm just guessing here ⁓ is maybe quicker to adapt for some of this stuff. So you tend to find.

Tim Abell (37:07)
Right.

Yeah.

Right.

And that's Copilot in VS Code.

David (37:35)
a lot of the advances happen in VS Code before they hit big Visual Studio. But even Visual Studio, yeah, you can pick between Gemini Pro 4 or 3.5 rather, or Claude 4 and certainly in VS Code and ⁓ even better if you dip into an extension called AI Toolkit in VS Code, that gives you a playground to test.

quite lot of the models and you can actually do the testing of the models in parallel as well, which is quite handy. So, you know, but you can ultimately, it's got all the major models that you can plug in there.

Tim Abell (38:10)
Yeah. Okay. Yeah.

Okay, yeah,

because I think my original experience with GitHub Copilot was pretty poor, but I think that was before they were allowing multiple models and I don't think it could do very much. So, Winsurf was a big step up from that. But... But it's changed. Right. So, Copilot behaves a lot more like Winsurf now, does it? So, it reads the files, it gets the context, it feeds it to the model properly. Oh, that's interesting. God, it moves so fast, doesn't it?

David (38:32)
⁓ yeah, yeah. Cursor and windsurf were, but then I think it really spurred VS Code on and Copilot. And of course, it does. It does. Yeah, yeah. Yeah, it said agent.

Yeah, yeah, I know. It said agent mode in VS Code and now in Visual Studio, by the way, for the last, what, month or more. But certainly in VS Code, it said agent mode and those different models for

a good few months now. But also of course they did

Tim Abell (39:03)
Yeah. And is that the same sort

of model of there's two modes of interaction, you've got your auto-complete, but you've also got a chat window where you can...

David (39:12)
Yeah, you've got, you've still, you've certainly got chat and you've got agent mode as well. Yeah. Yeah. Yeah. Yeah.

Tim Abell (39:17)
Yeah,

in the agent mode you can still talk to it, you can still say, want you to do this, and then it can go and edit the files. confusingly they call it cascade in windsurf. ⁓

David (39:25)
Yeah, yeah, yeah, yeah. Yeah,

yeah, that's right. It's like, like, it is very much like that. But like I said, you can, you can hook it up to local models if you really want to try that. And if you've got a decent machine to run those models on, because I tried it with, did I try it with Quen or something like that, which is one of the coding, Quen code or something locally. But because I haven't got squillions of

Tim Abell (39:37)
Yeah. Yeah.

David (39:54)
dollars to buy the top of the range Nvidia graphics cards or even anywhere to ⁓ put these things. ⁓ You end up using a smaller parameter based model. And even if you try to use quantization, which is what they use a lot on these models, that's a topic for another day maybe. But it worked. It did work. It did work from VS Code.

Tim Abell (40:08)
Right.

David (40:23)
you know, it wasn't brilliant, but then again, you know, you pay for what you get, don't you? You know, but certainly, yeah, in VS code, especially, can certainly do agent mode. can switch between a lot of the different models, including, like I said, Claude four, you don't get things like Opus four from ⁓ Anthropic yet. ⁓ Maybe they will, you never know, but that's the big money ⁓ bit where you have to pay for.

Tim Abell (40:28)
Mm.

Yeah, yeah, I was reading it might become much bigger money soon. They've been running at a loss and yeah, maybe they need to some profit. Yeah, it's going to be very, I mean, to follow that tangent a bit of the money thing, it's related to the current hype. It'll be interesting to see how long they're like investing with no return.

David (40:52)
But ⁓ yeah.

Yeah, yeah, yeah, yeah, yeah.

Tim Abell (41:18)
keeps going for because I mean they're raising just eye-watering amounts of money like you know it's 40 50 billion here 50 billion there and it's just staggering amounts of money

David (41:19)
Yeah, yeah.

It's incredible isn't it? Yeah, they've

all got their own printing presses haven't they I'm sure.

Tim Abell (41:31)
Right. Yeah.

So, you know, if they keep needing to raise those kinds of money, you know, at what point did the investors say, OK, where's our return on investment? And does the return on investment justify those valuations or are we going to see a whole load of them go bust? you know, what's, what do they have to suddenly flip into these? You know, I mean, because it's hard to know where the value sits in terms of pricing, because

You know if the claims if the biggest claims are kind of real then You know, there's a lot of value people already pay hundreds of thousands for developers So if they can accelerate there's definitely value there, you know, what's the underlying real cost because they don't they don't really Publicize that it's a lot of people guessing and trying to work out from first principles based on you hardware data sensor data center purchases GPUs and what have you? like I was

I was listening to last week in AI, which is I think you recommended is a great, podcast for a bit of a hype free info on what's been going on in the, it's more on the kind of research side and the investing side, which is really interesting. And one of the facts that caught my ear was that one of the bits of hardware that you can get for, I think it's for training basically.

David (42:40)
⁓ yeah, yeah, yeah, yeah.

Tim Abell (43:01)
Is it inference? Fancy word these. ⁓ And one single rack in a data center, which is crammed with the right kind of kit for this, had a power draw of a megawatt, which blew my mind. I've stood in front of a rack, you know, I'm old enough to have put servers in a rack myself in level 3 hosting. ⁓

David (43:19)
Wow.

Tim Abell (43:28)
And you know, I've watched the little power strip with its built-in amp meter and I've watched it hovering at about 10 amps ⁓ for not even a full rack and thinking, cool, that's a lot of power for compute. just crazy amounts of power. And you know, that turns into money, never mind the hardware costs and everything else. So, you know, to run these as profitable businesses, it's, you know, I definitely think...

David (43:43)
Yeah.

Yeah.

Yeah.

Tim Abell (43:58)
It's definitely worth making the most of it for building your own assets and products while the party's cheap. Yeah, because they might start charging more or they might start going out of business. It's definitely a window of opportunity at the moment. Anyway, I've distracted myself completely from what I was going to ask you. So, my understanding from what I'm saying is that...

David (44:04)
While you've got the power.

Mmm.

No, no, that's fine, that's fine.

Tim Abell (44:27)
the further down these levels of maturity you get, the better results you get, if the hype is to be believed. So have you tried any of these multi-agent things?

David (44:33)
Mm-hmm.

I've had a play with Langchain, you know, to see, but, you know, which is generally a Python-centric, but they do have a JavaScript sort of SDK as well. But all of these, by the way, before you even go into that, ⁓ but what I was going to say is that they, it's always, what is it really that you need that multi-agent to do? Because I was going to say, ⁓

Tim Abell (44:46)
We have.

Hmm.

David (45:12)
you've now got in the Microsoft world as well, there's auto gen and stuff like that, but you've got semantic kernel that's now got multi agent. So it's at the low level and funny enough, semantic kernel sort of sits underneath a lot of the other stuff basically in the Microsoft world, but Lang chain and Lang graph, I think there's a whole conversation on that as a separate thing really, isn't it?

but crew AI is another one. ⁓ But the interesting thing is these things can do, you give them a goal, you can have ⁓ multiple agents that can run in parallel or in a sequence because you need one thing to complete before something else happens. But you could have parallel agents going out looking, because obviously they're all tying into tools or.

dare I say MCPs, you know, now, ⁓ because it makes it easier to glue together the APIs anyway. But the idea is that, you know, if you've got a clear picture why you need a multi-agent, because like, like they say, you know, that do you really need a, you know, a multi-agent sort of framework? Well, yeah, you might, but there's a good chance that you might not, you know, so.

Tim Abell (46:23)
Mm.

Right. So my,

the reason that caught my interest is like, I heard someone saying something along the lines of like, you have one agent be the architect. One agent is dealing with tasks. And like I, the impression I got was that it could potentially fix some of these shortcomings. So the thing I was saying about how it ended up putting the code in the wrong place, like if you have the architect checking its work, like I, I wouldn't potentially have to do that. And maybe it could get it right.

David (46:42)
Yep. Yep. The manager. Yeah. Yeah. Yeah. Yeah.

Yeah.

Tim Abell (47:03)
I mean, I do worry about the sheer number of tokens or whatever it is it's going to use to do it.

David (47:03)
Yeah, yeah, yeah.

I think these things like Copilot, Claude, whatever, whatever, whatever, these agent mode platforms, let's call them that, I think as they are getting to that point, but you're absolutely right with a lot of these multi-agents, you can have like an agent manager. In fact, the best way I've seen it, believe it or not,

sort of displayed, if you want to call it that, is NA10, which is an automation platform. But they've gone leaps and bounds and really adopted the whole AI agent thing. So you've got a very visual way of connecting agents with memory, with tools, ⁓ including MCPs. But the way you can actually visually put that together is really good because you can very clearly see

this agent manager is looking after these other agents because one's going to go and try something, the other one's going to check it, you know, it's going to do whatever it's going to do. But to build like a multi-agent with that oversight, ⁓ like I said, the architect, I think that's one of the nicer ⁓ interfaces that I've seen rather than, you know, having to get into the code level, although, you know.

They're all starting to do this anyway. So, but, but NA-10 definitely. And, of course you can run it in a Docker container locally, which is nice. So.

Tim Abell (48:40)
So is that,

that like you basically have to build that your build the setup yourself and like spend ages experimenting with what kind of combination of agents works or is it? No?

David (48:50)
No, well, you can get N8n running very quickly, ⁓ the platform itself, but that's the other good thing. They've got loads of good examples of single-based agent ⁓ example workflows and then multi-agent workflows. They've got a lot to kick you off to start with. They've got a lot of templates basically that you can start to look at.

Interestingly, the thrust of it, if you like, they tend to be multi-agents doing, you know, going out and researching a company or whatever, you know, all this type of stuff. But there are some more, you know, code-centric stuff that I've seen with that as well. So yeah, you know, everything from, you know, you can create a Telegram bot that talks to it, an A10 and...

Tim Abell (49:28)
Right, yeah.

David (49:44)
It can go and check your calendar and do this and all that sort of stuff and come back with a nice summary response and whatever. So you can do all those things, but you can go right up the scale to, you know, like I say, working out from marketing, sales, financial stuff. But there are people who've done, ⁓ you know, where they're using it to look at like code samples and things like that as well. So there's, I haven't looked into that. ⁓

too much detail because as I say, think a lot of these platforms like Copilot and, you know, Claude code and all that sort of stuff, they're starting to head that way anyway. ⁓ So, and of course, they're more specific in the coding sort of viewpoint. So, yeah.

Tim Abell (50:25)
Mm.

Yeah,

because I could spend a bunch of time putting Legos together. if it's... Because my, yeah, like I said, my focus is purely on what's the optimal coding setup. You know, I don't necessarily care whether it's like, replace 10 coders kind of good, or whether it's like, a slightly psychotic assistant good. I just want to make sure I'm not missing a trick here. So...

David (50:36)
Yeah. Yeah.

Tim Abell (51:01)
What about this autogen studio? I heard that mentioned. Have you heard anything about that one? What do know about that?

David (51:05)
Yeah, yeah, yeah. Well, that's another

one from the Microsoft Research Labs. And it went through fairly recently or a few months ago. It went through a bit of a split where the original ⁓ team sort of went off a bit. I don't know where it is at now, but Microsoft certainly have carried on with that in the Microsoft sort of side of it. But well, it's basically...

Tim Abell (51:22)
Hmm.

So what's the purpose of Autogen Studio?

David (51:35)
If you look at these things like Lang chain and crew AI, essentially gives you a coding for well at the foundation level, it's like a coding level of being able to generate multiple agents and orchestrate them ultimately. ⁓ But auto gen has also got and to be honest, I haven't looked at it in the last month or two, but it had a visual ⁓ sort of multi agent builder and it started to look.

Tim Abell (51:48)
Right.

David (52:04)
quite interesting, a bit like what I said about ⁓ NA10, more of a visual ⁓ way of building it, but underneath, it's still writing code for those multiple agents, again, whether they're in parallel or in sequence, but it's giving you, Langchain and QrowAI, it's giving you the coding framework to build those multiple agents. so.

Tim Abell (52:29)
Okay, so these

multi-agent things at the moment are mostly kind of generic, build whatever, for whatever use case that you want, so it's up to you to figure out what you're trying to achieve. So it's not that any of these are like, this is the way to write code, but you could potentially use them for that. And people have been, okay.

David (52:48)
Yeah. Yeah.

Yeah. Yeah. Yeah. Yeah. Yeah. And that's where it seems to be quite an interesting time where people are adapting stuff that, you know, it's the typical thing where, I didn't realize you could start to do that, you know, with that, you know, it's that type of thing where people are finding new ways to work with those things. yeah, generally like auto gen and stuff like that, like say Lang chain and there's crew AI, there's obviously a number of others.

Tim Abell (53:03)
Yeah.

David (53:18)
but they're all giving you that blueprint, if you like, to code. But some of them, like I said, they are giving you visual ways of representing those multiple agents. So, but like I said, yeah.

Tim Abell (53:31)
Yeah, because the discourse

online has been fairly toxic and quite often a pattern I see fairly regularly as people interact with each other is like, whether the original post is like AI good or AI bad, someone in a comment will say, you know, my experience was, you know, I tried it and it was a lunatic and it wasn't actually that helpful. And then someone will reply saying,

but you're clearly just not using the AI right. And maybe don't even give any more away than that. They're like, ⁓ I get 100x results. It's amazing for me, which is not very helpful. Or they might say, ⁓ well, you're just not using it right. My setup is to use this, that, and the other tool that no one's ever heard of and then leave it at that. And you can't really tell. Have they come across some...

Is it just like, well, if you go and grab that tool, you'll be good? Or is it like, you grab that tool, but they've spent ages customising their prompts? And also you can't tell, like, is their setup generating reams and reams of awful tech debt? Or is it genuinely producing high quality stuff? Like, it's just so hard to tell.

David (54:51)
Yeah, yeah, it's um, I think that that's ultimately um You you until you get it's like until you get into the weeds with any of this stuff, isn't it? Um It's really difficult, but that's where you know, like I said to do depending on what you want to actually um Sorry about the bleep. Sorry. I haven't been able to turn those off. Uh, in fact, I might have to uh Say oh, uh, you know, uh

Tim Abell (55:04)
Mm.

Yeah

David (55:20)
Goodbye, Shirley. Sorry, I'm getting hassled with a day job. I'm sure you are as well. But ultimately, that's where I've played with NA10 to actually look at the sort of world that, because by the way, NA10 underneath the visual ⁓ sort of AI agent building, it actually uses Lang chain in the background, which is quite interesting. ⁓

Tim Abell (55:21)
Yeah, me too.

or I just, yeah.

David (55:49)
but you can at least visualize what you're trying to build, even if you're just prototyping it really. And then you're finding out the way you're connecting to tools, MCP servers or whatever. You can sort of see where, like you said, when you get down into the weeds, where that's gonna work or not. you know, ⁓ but yeah.

Tim Abell (56:09)
Cool. All right, before

we wrap up, warmwind OS, what do you know about that?

David (56:15)
well, yeah, like I said, it's really weird. don't usually, and maybe I could be completely wrong and I probably are, but when I saw, cause you're right at the moment, it's, I'm on a wait list. I haven't actually used it, but the interesting proposition is that it's essentially your own virtual AI server sort of thing. So it really looks to me.

The OS bit is a bit of a play, you know, because ultimately it's going to be essentially a VM-ish type of environment. But what they're doing is quite interesting where they're training on ⁓ different apps so that instead of you just using a prompt or context to go and generate stuff, if you've got... ⁓ This is more... This is twisting it a little bit, so it's not necessary for generating code, ⁓ although...

maybe it could, but it can actually use your mouse and keyboard. So if you've got, you know, one example I had is, you know, an ERP system that you need to go and ⁓ check an invoice or check if a delivery has been created or something like that. All of this stuff, the way that OneWind OS seems to work is that it actually goes, you can register those apps and it'll actually use those apps as if it was a person.

Tim Abell (57:42)
Hmm.

David (57:43)
And some of the demos that they show, if they live up to the demos that they show, do look really interesting because it takes a slightly different view of ⁓ how you can actually talk to the AI, but actually, you know, they're all starting to do it now, aren't they? You've seen, like you said, the Comet browser and ⁓ ChatGBT, if you want to.

ProPlanet's got Agent Mode now, funnily enough, where it can surf the web for you and make orders on websites if you trust it. So it's sort of all these ones are starting to go and you've got all these actual interactions as if it was a person and that's what the one Windows looks like. But it just feels like it's a subtle, it was too subtle actually for me to start with. Can I look at it?

Tim Abell (58:12)
Yep.

Right.

David (58:39)
And I went back and watched the video again. I yeah, I sort of didn't really get it. And then it suddenly clicked that, you know, yeah, for a lot of this stuff, like if you're having to go and check Salesforce and look for, you know, sure, you could write integrations to do this, but it can actually go and use Salesforce, know what to look for, you know, given your context and prompts. ⁓ And it can go and check in your ERP system or do...

something in Excel or whatever, but it can actually interact with all of those apps, which is why it popped out as like a, yeah, because it's sort of like a real, you could actually see where it's got real utility as the boring, mundane sort of data entry stuff, but also it goes a bit deeper than that. yeah, so it's actually quite an interesting one to keep an eye on that one, I think. yeah. Yeah.

Tim Abell (59:15)
interesting.

Cool. All

right, let's get back to our respective day job things. Yeah.

David (59:40)
Yes, unfortunately, as

ever, you know, it's, yeah, no, it is been good to talk again, actually. Yeah, yeah, definitely.

Tim Abell (59:44)
That's been really useful. Thanks for exploring the space a bit with me.

Cool, right, well, until next time. Keep exploring.

David (59:53)
We'll have to. Indeed. Yeah, well, I was just about to say, I was

going to say we'll have to do a follow up and see how things have actually changed in three months or six months. I'm sure it must be the landscape. Yeah, it's going to be quite a different world, isn't it? Yeah. Yeah.

Tim Abell (1:00:05)
Yeah, 60 minutes.

Yeah, I mean, like we've mentioned so many different tools

and models and things, and I'm sure that's not even scratching the surface, and I think they're popping up left and right and centre. It's trying to work out what matters, what doesn't, which ones to pay attention to. It's certainly challenging at the moment. But I think worth it. We'll see where it lands. ⁓ Yeah, cool. All right, well, I'll speak to you again soon. Thank you ever so much. Thanks to our listener for joining us.

David (1:00:27)
Yeah. Yep. Yep. Yep.

Yeah, that's okay. No, it's good to talk to you again. Yeah.

Tim Abell (1:00:39)
⁓ Right, and that doesn't include me listening back later, which I will. ⁓ Cool, alright, well, that's it for Software Should Be Free. The AIs are not free, they might be about to get more expensive unless you run them locally, in which case you need to some money on some hardware. So goodbye from him and goodbye from me. Until next time on Software Should Be Free.

David (1:00:39)
Hey! ⁓

Okay.

Yeah.

Okay, take care.