Exploring the frontiers of Technology and AI
Ejaaz:
99% of people are using AI models the same way that they use Google.
Ejaaz:
But recently, a new way of prompting your AI has emerged that doesn't just replace
Ejaaz:
the way that you work, it promotes you to the CEO of your very own AI company.
Ejaaz:
It's called Loops and it's part of a growing development in agent autonomy where
Ejaaz:
AI agents basically spin up and autonomously complete tasks or goals that you
Ejaaz:
set for it, often working throughout the night.
Ejaaz:
In 2019, the longest that an AI agent could work autonomously for was for two seconds.
Ejaaz:
Fast forward to today, and they can work autonomously for 12 hours,
Ejaaz:
and that's doubling every couple of months.
Ejaaz:
Andre Carpathy calls this phenomenon the autonomy slider, where you can go take
Ejaaz:
a dial that slides from humans that approve everything to humans that periodically check in.
Ejaaz:
And it's part of this growing trend of agents consuming and taking up more of
Ejaaz:
human capital and labor. And the question that remains going forwards is, what will humans do?
Ejaaz:
And will they be entirely replaced
Ejaaz:
by AI? Or will they be the ultimate orchestrator of their destiny?
Josh:
Yeah, I think the goal for this episode is really just to inform people on what's
Josh:
possible current day with these agents, with these LLMs, with writing these
Josh:
loops, as well as where you can possibly find yourself within that stack,
Josh:
because it gets pretty complicated.
Josh:
When we're getting into loops, not everyone needs to use loops,
Josh:
but everyone should be using LLMs probably slightly different than how you're using them today.
Josh:
So maybe we could start with a little history lesson in terms of the four levels
Josh:
in which we have been engaging with llm starting with the first level which is just prompting,
Josh:
generally like most people are probably still doing this started in 2022 2023
Josh:
around the release of chat gpt
Josh:
the way that you would engage with these llms is you would just submit a question
Josh:
or you submit a prompt and you get some language back now if you are still doing
Josh:
this that's okay because i find a.
Josh:
Three years ago, four years ago. It has since advanced pretty,
Josh:
pretty meaningfully since then.
Josh:
The second step of this is agents. And we're going to spend some time on agents.
Josh:
Everyone's kind of heard of an agent. Maybe not everyone knows what an agent is.
Josh:
An agent is something that could think for a little bit longer.
Josh:
It could run a bit longer than just a standard prompt.
Josh:
It can go off and do things. It could call tools for you.
Josh:
It's a much more capable version of the text box. Then, like we talked about
Josh:
all the time on the show recently in the last few weeks, there's the harness
Josh:
feature in which you put an LLM into a container and that gives it a memory feature.
Josh:
That gives it complete tool use. That's something like an open claw that we've
Josh:
talked about a lot that some people do use and that's level three.
Josh:
And now level four, which is the new thing that has come this week,
Josh:
that's really been highlighted by some of the top leaders at these AI labs is loops.
Josh:
And a loop is essentially a version of an agent that has an orchestration layer
Josh:
and kind of builds upon itself.
Josh:
So it allows you to kind of continue to scope yourself out. If you can imagine
Josh:
you're kind of you're dealing directly with an employee at level one and then
Josh:
you're kind of directing that person to go off and do their own in level two,
Josh:
At level three with the harness, you're kind of directing a series of people to help you.
Josh:
And then level four, you're just the top level CEO who's directing your C-suite
Josh:
to go and manage all the employees below you. So there's an entire stack to this.
Josh:
It's very cool. It just how do you use your AI currently? Where would you say
Josh:
that you fit in this stack?
Ejaaz:
Yeah, so looking at this diagram that we have on the screen here,
Ejaaz:
I'm somewhere between number two and number three. I'm somewhere between using
Ejaaz:
agents and trying to figure out the whole harness thing.
Ejaaz:
Now, what am I doing when it comes to like spinning up agents?
Ejaaz:
If you look at either my Claude or my ChatGPT desktop apps right now,
Ejaaz:
I've renamed a bunch of my conversations to a particular focus or subject and then agent after it.
Ejaaz:
And so I can go to it and this agent basically has all the context of what I
Ejaaz:
wanted to do, whether it's like research a particular topic,
Ejaaz:
create some kind of an outline for something, research a particular investment angle.
Ejaaz:
It already knows and has the embedded context for what it needs to do.
Ejaaz:
And there's usually like one to maybe three tasks that it needs to autonomously execute on its own.
Ejaaz:
And so it runs in kind of like a sequence. But if any of that sequence kind
Ejaaz:
of breaks, let's say it kind of tries to retrieve data from some particular
Ejaaz:
website and it is unable to do so, it breaks.
Ejaaz:
And it comes to me and it says, hey, Ejaz, is there some other thing that you
Ejaaz:
want to look at or retrieve from, blah, blah, blah?
Ejaaz:
It's not fully autonomous. Now, number three, the Harness side of things is
Ejaaz:
what I'm trying to kind of like mold my understanding around.
Ejaaz:
What I've noticed is when you type in a prompt and you get a response,
Ejaaz:
you can kind of tell that it's AI-y. Like usually when we kind of create artifacts,
Ejaaz:
it comes in a particular font or it speaks in a particular type of language.
Ejaaz:
The Harness helps kind of like take your prompt and kind of mold it into something
Ejaaz:
that is more human-like, but also more nuanced with what you are trying to do.
Ejaaz:
Like it effectively gets
Ejaaz:
closer towards that ultimate goal. Like we were talking before recording this
Ejaaz:
episode about human taste and how AI doesn't really get human taste.
Ejaaz:
The harness helps you get towards that ultimate kind of taste profile for the
Ejaaz:
particular output that you're trying to generate.
Ejaaz:
I haven't tried working with loops just yet, but my understanding of this,
Ejaaz:
and correct me if I'm wrong, is you have an AI. You can prompt it and you can get some kind of output.
Ejaaz:
A loop specifically is an AI agent that doesn't break, if it comes across an
Ejaaz:
obstacle that it doesn't understand, its instinct isn't to come to the human
Ejaaz:
and say, hey, like, I can't figure this out, guide me.
Ejaaz:
It completely reiterates the prompt over and over again until it gets past that
Ejaaz:
obstacle, working towards like one objective. So a few examples I've seen for
Ejaaz:
this is if you are coding, right?
Ejaaz:
And let's say there's multiple workflows of a code base that you want to work
Ejaaz:
on, and it comes across a hiccup where it can't retrieve data from one of those
Ejaaz:
particular flows, it is able to kind of like circumnavigate around it,
Ejaaz:
maybe spin up its own separate flow and try to figure out the problem.
Ejaaz:
And often this results in an agent working for multiple hours at a time,
Ejaaz:
often overnight. I think Carpathy spoke about his auto research agent working
Ejaaz:
overnight whilst he slept.
Ejaaz:
And we're seeing different variations of this start to arise.
Ejaaz:
Where are you, Josh, in the stack?
Josh:
Yeah, loops are like the closed source a system where you kind of define an
Josh:
outcome and it will continue to work towards that outcome without any external inputs.
Josh:
It's very cool. It's very automated. I don't think it's for everyone.
Josh:
It's certainly not for me because I haven't really had a use case for loops per se.
Josh:
I would say I'm sitting at each one of those first three phases given whatever
Josh:
tasks I'm trying to do. And I think it's important to understand that a lot
Josh:
of people might not even need to go past number one unless you're actually doing productive work.
Josh:
A lot of the agents, a lot of the harnesses are for kind of automating more.
Josh:
More systems from your life if you're just trying to use this as google if you're
Josh:
just trying to use this as a writing assistant or someone to chat with the prompting
Josh:
is really strong and i find a lot of times
Josh:
this is my outlook or this is my outlet for like google search results so instead
Josh:
of searching for google i'll get a little more in-depth results i'll ask my llm
Josh:
for agents i use them quite a bit when i'm doing a little bit more productive
Josh:
work for example we track the analytics on limitless and we want a place in
Josh:
which we can have all those analytics dumped to a dashboard,
Josh:
that is an agent that I run.
Josh:
So it goes into my browser. It detects all of the views that we've had from
Josh:
the week for YouTube, from Spotify, from RSS feed, where you should all be subscribed
Josh:
to and rate us five stars.
Josh:
And it compiles it into a singular spreadsheet in which we could then publish
Josh:
online and we could share with prospective sponsors and things like that.
Josh:
And then for harnesses I've used, because I mean, that's mostly OpenClaw.
Josh:
I've used OpenClaw. I really enjoyed the process. I find myself using it a bit less and less.
Josh:
And I think in the loops feature, at least it's probably most productive right
Josh:
now for people who are writing code, who are writing verifiable solutions.
Josh:
One of the difficult things that as I was looking into loops and figuring out
Josh:
how I can structure them into my life, one of the problems that I run into is
Josh:
I'm not really sure I have a verifiable,
Josh:
set of outputs that I wanted to optimize for, for a lot of the work that I'm
Josh:
doing, because a lot of it is subjective. A lot of it is kind of creative work.
Josh:
It requires a human in the loop for a lot more of it.
Josh:
So I would say I am number one, two, and three on the list. Haven't quite made my way to four.
Josh:
But yeah, for the people who are, those are the people like Boris Churny from
Josh:
Anthropic. And we know Andre and Peter Steinberg from OpenAI.
Josh:
They are all on four. They are using it to,
Josh:
create these like unbelievable, agentic systems and continue to remove themselves out of the loop.
Ejaaz:
You know what I've realized? With loops in particular and just AI agents in
Ejaaz:
general, they're trying to improve our understanding or rather their understanding
Ejaaz:
of the English language.
Ejaaz:
So one of my favorite Carpathy quotes back in the day was English is the new
Ejaaz:
programming language. I think you said this like two, two and a half years ago.
Ejaaz:
And I've just realized that like us creating AI agents is basically like,
Ejaaz:
it's the same model. It hasn't necessarily got smarter.
Ejaaz:
It's just like using that model to kind of like keep ramming its head and its
Ejaaz:
brain against a particular problem until it understands what the human actually means.
Ejaaz:
And so like in this new world, like I know you just used the example of like,
Ejaaz:
you know, loops can be used for coding specifically,
Ejaaz:
coding that Boris Churny and Carpathy is doing is English.
Ejaaz:
Like they're speaking to the LLM, they are writing in English to the LLM.
Ejaaz:
And yeah, maybe they're copy and pasting some versions of code,
Ejaaz:
but that code is primarily generated by an AI.
Ejaaz:
I think like something crazy, like 80% plus of code generated at Anthropic,
Ejaaz:
both for research and for just general consumer adoption is generated by Claude itself.
Ejaaz:
And so that's one thing. The other thing is the model just not getting smarter
Ejaaz:
is a really interesting thing. Like typically in my head, I would think,
Ejaaz:
okay, you need a better model to be able to unlock some of these new features
Ejaaz:
like AI agents, autonomous loops, et cetera.
Ejaaz:
But really you could just take the same model, wrap a harness around it and
Ejaaz:
try to get it to understand what particular goal it's getting at and just run
Ejaaz:
that iteration over and over and over again until you get a better output.
Ejaaz:
And I guess this is the same concept as inference or reinforcement learning
Ejaaz:
where like we've found this trend of post-training of these AI models,
Ejaaz:
these AI models just getting smarter, not because they've got bigger GPUs or more expensive GPUs.
Ejaaz:
It's because you've just taken the same model and you've just run it through
Ejaaz:
a different reasoning framework over and over again until it can do a thing.
Ejaaz:
And this is the practical embellishment of it. I personally haven't found like
Ejaaz:
an obvious use case for loops either.
Ejaaz:
So either you and I are boxing ourselves into a particular realm and maybe someone
Ejaaz:
listening to this is using this for like their software engineering thing or
Ejaaz:
their marketing thing. But yeah, I guess that's where I sit right now.
Josh:
Well, I think it's probably a skill issue on both our parts.
Josh:
Like there is certainly a use case for us in which we can use a loop in which
Josh:
we can define this outcome, send an agent off to go do it, and it will iterate
Josh:
on itself until it comes to a conclusion.
Josh:
I think it's just so novel and so new. It's difficult to kind of understand
Josh:
why. And we have this really great chart on screen that you're showing now,
Josh:
which is the why now section of this.
Josh:
And it's because the duration of a task that these agents can run is so much
Josh:
longer than it used to be.
Josh:
I mean, in 2019 we have here, it was two seconds. This was well before ChatGPT.
Josh:
But even early last year, in 2025, the duration that an agent could run on one single task was
Josh:
less than an hour in length so there's only so many tokens it could generate
Josh:
there's only so much reasoning it can do and there's only so much iteration
Josh:
you could get over that hour time period let alone the amount of costs that
Josh:
these tokens are going to be,
Josh:
costing you if you're using like the api or anything like that now fast forward
Josh:
to today i mean the best models in the world they're getting days worth of runtime
Josh:
so they can really think deeply and continue to iterate on themselves over and over i see examples of um.
Josh:
Backslash goal on x all the time of people who have a problem whether it be
Josh:
an optimization problem where they have a bug that they need to fix and they'll
Josh:
put this backslash goal on it for
Josh:
however long it needs to and it'll think for three four even five days i've
Josh:
seen in order to optimize for the specific parameter and this is possible because
Josh:
these models now can think for days long,
Josh:
you have to assume months is coming what does it look like Like when an agent can think for months.
Josh:
I mean, it's a really interesting paradigm shift that I'm not sure where people
Josh:
are going to find value in the open-ended way that it exists today,
Josh:
right? It's like, okay, here's this agent.
Josh:
You can tell to do whatever you want. You can create a loop.
Josh:
You can create an infrastructure system for it to operate in.
Josh:
It's pretty much open-ended and it's on you. And I think the answer to that
Josh:
is that not even the AI companies really understand the best use cases for it quite yet.
Josh:
I would imagine it's still this really difficult thing of how do you unlock
Josh:
value from essentially an open-ended agent that can go and run for an infinite
Josh:
amount of time? I don't know.
Ejaaz:
I also question like what a human's purpose would be at that point.
Ejaaz:
Like if you automate enough of the thinking and the curiosity behind like solving
Ejaaz:
particular problems, What do humans end up doing at that point,
Ejaaz:
especially if they don't do the work themselves?
Ejaaz:
They don't understand it, right? You need an AI to kind of like understand what
Ejaaz:
on earth is going on in the first place.
Ejaaz:
And eventually like an AI will then start setting goals, like more ambitious
Ejaaz:
goals than a human can in terms of like what to like kind of solve or go after.
Ejaaz:
There were some very low-level examples that I saw in response to Pete Steyer's tweet about loops.
Ejaaz:
And there's some kind of concrete examples that I want to run through very quickly here.
Ejaaz:
So one of them is using it for code, right? So a classic loop could basically
Ejaaz:
look like, okay, can you please pull live errors for my particular app?
Ejaaz:
Can you inspect and figure out where the bug might particularly be?
Ejaaz:
Can you create then a fix for this particular bug in my code?
Ejaaz:
And then can you deploy it? Then can you check the health of that deployment
Ejaaz:
and make sure that nothing else is broken?
Ejaaz:
And then record what failed and feed that into a database so that in the future,
Ejaaz:
we can detect errors like this or prevent it when we code and build some of
Ejaaz:
These future app features.
Ejaaz:
Now, that is kind of like a very small and specific enough use case that can
Ejaaz:
be generalized across basically any app or software engineering project that
Ejaaz:
you might be working on if you're listening to this.
Ejaaz:
And I wonder how many hours worth of engineering time that this replaces.
Ejaaz:
Because I know that there are entire teams having worked at companies.
Ejaaz:
Been a product manager in the past, entire teams of software engineers that
Ejaaz:
spend their entire days working on something like that. So that's one thing.
Ejaaz:
And then for content, which is very applicable for product managers,
Ejaaz:
or even like the work that you and I do, Josh, an agent could read a PRD.
Ejaaz:
So which is a product requirement doc, which is usually kind of like created
Ejaaz:
for a strategic goal that you want to kind of like build at your company,
Ejaaz:
like a product or a feature, it then writes whatever that next asset could be.
Ejaaz:
So it could be like a design profile or a mockup of what that feature might
Ejaaz:
look like, score it against like some kind of criteria that the company has
Ejaaz:
across like, you know, it must follow our vision, A, B, and C.
Ejaaz:
It must also look a particular way. This is our design profile,
Ejaaz:
our brand kind of profile and our aesthetic.
Ejaaz:
And then it kind of like updates its progress depending on like what other teams
Ejaaz:
have shipped. So maybe it's dependent on a particular feature.
Ejaaz:
And so it updates itself autonomously like that. Now, this all sounds very vague
Ejaaz:
intentionally because it's meant to apply to your particular business or your particular project.
Ejaaz:
But make no mistake, this is what
Ejaaz:
a lot of humans are paid upwards of six figures to do on a daily basis.
Ejaaz:
It's that nuance. And we're starting to see basically AI models and AI agents
Ejaaz:
enter into that human taste profile. So when I think about where we end up eventually,
Ejaaz:
There's a common argument that's made that it's like, oh, humans will always have the taste.
Ejaaz:
They'll always be able to kind of direct where the AI should go because we are
Ejaaz:
this all being kind of like smart kind of entity.
Ejaaz:
But I see increasingly AI stepping into that boundary and becoming the tastemaker
Ejaaz:
for all of the work that we end up doing.
Josh:
I still believe that to be true, that humans in the loop are critically important
Josh:
to applying human taste. I saw this great chart. I have no idea where it is.
Josh:
Somewhere in the depths of X. But basically, it was showing that in the App
Josh:
Store, the iOS App Store, where everyone downloads their apps,
Josh:
the amount of apps that have gone into production that have been published recently has gone vertical.
Josh:
I think it's doubled or tripled over the last six months. Everybody's publishing apps at the App Store.
Josh:
The amount of five-star reviews and the amount of downloads has actually either
Josh:
stayed flat or gone down.
Josh:
It has not matched the amount of new apps that are going to the app store.
Josh:
Why is this? It's because a lot of the apps don't have enough care applied to
Josh:
them. They're just not great applications. And when I think about,
Josh:
how I use my phone on a regular device or on a regular day or how I use my laptop
Josh:
and the applications that I actually spend time on, there's a very fixed set
Josh:
of them. And I'm a little stubborn when it comes to downloading new ones because
Josh:
a lot of the new ones just are not great.
Josh:
And I think a lot of that comes from this, this lack of care that is presented
Josh:
from AI outputs, where if you're optimizing for a specific parameter that you
Josh:
can measure, it's going to do it great, but it doesn't understand the subtle nuances of how humans
Josh:
engage and how they really love to use these products like one of the products
Josh:
that i use totally unrelated totally not sponsored but this app called copilot
Josh:
money it's like a budgeting application and it's so thoughtfully curated and designed and.
Josh:
And it really deeply understands all the complexities that are related to humans
Josh:
when it comes to budgeting it understands a lot of the
Josh:
the design characteristics same with an app called flighty i'm sure a lot of
Josh:
people have heard flighty it's like a flight tracking application there's a
Josh:
thousand ways to track a flight but flighty really cares about design they really care about how,
Josh:
it's implemented with the human and they've created this amazing output and
Josh:
i don't see that changing one thing that i did want to note is that,
Josh:
i think when a lot of people see this they imagine a world in which they are
Josh:
getting replaced everyone's like ai is replacing me look how much i could do
Josh:
now it has these loops and i think the reality is it gives you a lot more agency
Josh:
to do the things you want to do,
Josh:
where maybe you're not doing the day-to-day where,
Josh:
you would normally prompt an agent to do this but you're doing a lot of the
Josh:
higher level tasks you can imagine yourself not having to do
Josh:
the day-to-day like for example if you're just managing your household you no
Josh:
longer have to take out the trash you don't have to run errands you could just
Josh:
focus on how to make your household the best household it is because you have
Josh:
that higher level ability
Josh:
and in that chart that we showed in the artifact earlier on it shows a decreasing
Josh:
sized human it's the amount of input that a human is needed to get the output you want,
Josh:
but it's still ultimately on the human being in order to to push and navigate
Josh:
towards the outputs that you want because ultimately these tools are just for us so when i think of.
Josh:
Ai becoming increasingly good and when it comes to running the show even i've
Josh:
leaned on it we both have i think a lot more recently but all that's done is
Josh:
actually given us more leverage to do more with the show than have it replace
Josh:
us and even in the case that.
Josh:
We could clone ourselves. We could create a video version of ourselves that
Josh:
has a perfect voice that sounds just like us. I don't think people actually want that.
Josh:
There's that lacking human nature that still isn't understood.
Josh:
And I find that it's more empowering when I hear that these loops exist that
Josh:
can run for days on end and create amazing outputs versus not where it's kind
Josh:
of extracted from us. I don't really think that's true.
Ejaaz:
Yeah, it's like that stat of, well, it's that thesis that everyone held about
Ejaaz:
a year ago, which is like with the increase of AI adoption, people will have
Ejaaz:
more free time to have fun and leisure.
Ejaaz:
And in fact, the opposite has shown that like people just work way more and work harder.
Ejaaz:
And the output of that work is measured across like pretty much every single
Ejaaz:
company and profession and role.
Ejaaz:
I do generally agree with that. I don't think humans are going to get wiped out anytime soon.
Ejaaz:
But one thing that is kind of nagging my brain is if we extrapolate this intelligence out enough,
Ejaaz:
there is no reason why AI won't be able to take over or replace other parts
Ejaaz:
of the cognitive process that a human can do,
Ejaaz:
particularly if it's one AI model trained on the entire corpus of knowledge
Ejaaz:
that a bunch of humans have been guiding it.
Ejaaz:
So when I think about Anthropic, when I think about OpenAI, I think about all the
Ejaaz:
millions of people that use their product every single day and the data that
Ejaaz:
they ingest every single day that gets recorded on one singular database that
Ejaaz:
can then be reused to train a better model that is more hyper-optimized towards humans.
Ejaaz:
You could argue that as a single human, you don't get to meet and read the thoughts
Ejaaz:
of every other human that is out there.
Ejaaz:
You have your very own individual process. And I think that an AM model that
Ejaaz:
can get access to the world's brain and thoughts could probably create something
Ejaaz:
kind of close to knowing what that human taste profile would be.
Ejaaz:
The other major question that I'm wondering is, how much is all of this going to cost?
Ejaaz:
One, like, stat that has stuck in my head over the recent few weeks is that
Ejaaz:
Philanthropic particularly, they service, or like,
Ejaaz:
the Fortune 10, the top 10 companies in the world, nine of them use Clawed,
Ejaaz:
and their budget's increased by 500%, or is projected to increase by 500% by the end of this year.
Ejaaz:
And they're doing this willingly because the ROI, the value that they're getting
Ejaaz:
out of that is pretty massive.
Ejaaz:
Alternatively, there are companies like Uber that have slashed their budgets
Ejaaz:
massively because their entire year's budget was spent in a couple of months.
Ejaaz:
So I'm wondering, in this world of agent loops where you've got AIs working
Ejaaz:
overnight for you, the bills are going to increase pretty massively.
Ejaaz:
And I'm wondering, unless these AI models don't get cheaper,
Ejaaz:
and there's an infrastructure bottleneck there where these GPUs cost a lot of
Ejaaz:
money, we can't scale power and infrastructure anytime soon.
Ejaaz:
We need so much more energy than we already have currently on Earth to be able to power these things.
Ejaaz:
The cost of these things are just going to go up a lot more massively,
Ejaaz:
which means that either this is only going to be a power or a tool reserved
Ejaaz:
for the rich, or something's going to break here and maybe open source models
Ejaaz:
get adopted more aggressively.
Josh:
Yeah, I imagine there's probably use cases for all of the above.
Josh:
It's like open source models will continue to improve they'll be able to do
Josh:
a lot of the more trivial tasks that don't require frontier intelligence so
Josh:
therefore the cost of those types of loops will go down because not everyone
Josh:
needs to have the most cutting edge,
Josh:
software stack engineering like they're just kind of having it help them through
Josh:
their day-to-day maybe it's replying to emails maybe it's whatever miscellaneous things it may be
Josh:
there's a high probability that these open source models as they continue to
Josh:
improve will be able to bite off a meaningful chunk of that then the other half
Josh:
is using these frontier models that is a requirement in order to get the absolute
Josh:
best results for whatever very challenging work they're doing.
Josh:
And that is going to cost a lot of money for sure.
Josh:
And I don't see that changing, but I think the output of the dollars in will continue to go up.
Josh:
It's because as you get more knowledge per token, as you get more output per
Josh:
prompt, it very clearly, I mean, the economics seem to make sense.
Josh:
And I think that's kind of right now.
Josh:
Enterprise spend on these models they're trying to figure out well how much
Josh:
value can we actually get back from every dollar spent and right now it's a
Josh:
little bit unsure you mentioned uber we have uber here that we're showing on screen
Josh:
where uber just recently put a cap on the amount of tokens that
Josh:
their employees are allowed to use at fifteen hundred dollars per engineer per tool per month and
Josh:
we'll see how that works because a lot of other companies that we know they're
Josh:
kind giving their engineers unlimited budget in fact they're kind of ranking
Josh:
the engineers based on how many tokens they're using per month and.
Josh:
We'll see where that goes. I suspect the companies that are spending more on
Josh:
tokens will continue to see a higher upside for now, at least.
Josh:
But like you mentioned, the underlying problem with all of this is we're going
Josh:
to continue to have more prompts. I mean, these loops consume a tremendous amount
Josh:
of tokens, whether they're frontier tokens or open source tokens.
Josh:
It doesn't matter. We're going to need orders of magnitude more than we have.
Josh:
And we don't have the computability. It really does always come down to that
Josh:
energy problem, that infrastructure problem.
Josh:
We don't have the infra built out to support this so therefore the costs likely
Josh:
continue to stay high maybe it's not because you're paying the provider for tokens
Josh:
perhaps it's just renting the gpu time from a cluster that is doing much more
Josh:
valuable work so i think that might ultimately be
Josh:
that crux is the actual availability of the compute to do these things and that's
Josh:
why these edge compute devices like having your,
Josh:
mac studio on your desktop that can run locally it's probably a pretty valuable thing to have.
Ejaaz:
So I'm sure a lot of you are wondering, you know, how does this apply to me?
Ejaaz:
You know, I have none of my friends have mentioned this loop feature.
Ejaaz:
I don't really know many people who are using it.
Ejaaz:
As we mentioned earlier, like this isn't probably going to be used by the bulk
Ejaaz:
or majority of people yet until some of those use cases actually arise.
Ejaaz:
I think it's mainly going to happen in the workplace. It's going to happen with
Ejaaz:
like some of these enterprise companies that are trying to automate certain
Ejaaz:
departments or functions of their particular a company like marketing,
Ejaaz:
like software engineering.
Ejaaz:
And I think it'll start with lower level tasks because these agents still aren't
Ejaaz:
smart enough to understand nuance completely.
Ejaaz:
And also, you don't just want to let an agent run loose overnight whilst you're
Ejaaz:
sleeping and then take down your entire company. And one place where it's working
Ejaaz:
tirelessly to accelerate the development of that
Ejaaz:
And we have Boris Cherny over here basically explaining how he's basically ditched
Ejaaz:
his integrated development environment.
Ejaaz:
He has ditched all of his normal tools that he had spent decades basically honing
Ejaaz:
his software engineering skill on to now completely focus on building up these
Ejaaz:
agent loops. And what is he focused on?
Ejaaz:
Well, he works primarily on cloud code, but the other folks at Anthropic and
Ejaaz:
OpenAI have started this thing called
Ejaaz:
Recursive self-improvement or RSI, which is basically the goal of getting your
Ejaaz:
AI model to build the next version of itself.
Ejaaz:
And this is a test that Anthropic and the folks at OpenAI do for any new model that they release.
Ejaaz:
They set it a goal or task to basically rebuild itself in a more improved fashion.
Ejaaz:
Now, one thing that the AI has gotten really good at is building out that next function.
Ejaaz:
But one thing it's not very good at is figuring out what research problems they
Ejaaz:
should fix, what research problems it should focus on to try and,
Ejaaz:
you know, overcome and make it ultimately, you know, a better model than its competitors.
Ejaaz:
Now, RSI is something, it's kind of like the golden egg that each AI lab is going after.
Ejaaz:
And this is the primary use of agent loops right now.
Ejaaz:
And you can see why it might be obvious. If you have an AI model that can basically
Ejaaz:
build the next best version of itself, eventually you're going to get to AGI,
Ejaaz:
whatever the hell that looks like, and then you can apply it to pretty much any sector.
Ejaaz:
Now, the problem and the worry that kind of immediately pops into my head and
Ejaaz:
a lot of these researchers head is, if it eventually does get that smart, right?
Ejaaz:
Could escape human control completely and run off on its own and do its own
Ejaaz:
thing. Because at that point, why would it need a human to kind of like guide it or shepherd it?
Ejaaz:
Instead, it can just kind of like do its own thing. So this is like the primary
Ejaaz:
use case that I'm seeing for agent loops being worked on right now.
Ejaaz:
I would love to see a like more broader application across like kind of like
Ejaaz:
consumer professions, like in finance, like in science and stuff like that,
Ejaaz:
which I do believe it'll spill over eventually. But unless you're seeing anything
Ejaaz:
else, Josh, I think like that is primarily it on agent loops and agent autonomy.
Josh:
It's on you to figure out the best use cases for it. Like there's no real company
Josh:
defining it. They're just giving you the tools. And I mean, for better or worse,
Josh:
it's very open-ended. So it's on you to figure out how best to use these.
Josh:
I think if this sounds a little overwhelming, maybe we could outline a few examples
Josh:
of each one of these kind of four rungs in the ladder here.
Josh:
The first one being prompting this everyone has done before.
Josh:
I'm sure it's like rewrite this
Josh:
email to sound more confident or explain what my doctor meant by this.
Josh:
But then you've probably also used the partial agentic usage as well of these
Josh:
models, which is like planet my three day vacation to Lisbon that I'm going on next week.
Josh:
And it will actually go off and use tools and it will think complex thoughts
Josh:
and ideas and kind of surface you a full itinerary for your trip.
Josh:
And then there's the third one, which is the harness. This is a little more
Josh:
complicated. This is for people who are building more project based stuff.
Josh:
So for example, if you want to build you a website for your dog walking business
Josh:
and you kind of describe it and you go back and forth on a spec and then it
Josh:
goes off and implements that.
Josh:
And the fourth is loops, which doesn't have to necessarily be overwhelming.
Josh:
It can be simple as let's say you are.
Josh:
Interested in the news, you could say every morning before I wake up,
Josh:
scan these 10 sources plus market data and give me this bulleted brief.
Josh:
Or let's say you have a to-do list. It'll go off and think overnight and solve
Josh:
all those problems overnight, iteratively until it comes to a solution that
Josh:
it hopefully arrives at in the mornings. There's a lot of use cases.
Josh:
I think a lot of it requires creativity.
Josh:
And that is the prompt we will leave you with today, which is share with us,
Josh:
please, how you are using these models best.
Josh:
Because so much of the question isn't are these models smart?
Josh:
It's how can I extract that intelligence from them in the most effective way
Josh:
for my life? So I would be so curious to hear which rung of the ladder you find
Josh:
yourself on one through four.
Josh:
And then what the most interesting use cases you found,
Josh:
among those rungs of the ladder are you using loops currently what are you using them for,
Josh:
are you with agents are you still using it as a google extension if you're still
Josh:
using it as a google extension i would encourage a little more creativity really
Josh:
try to ask harder questions and figure out how it could be implemented in your
Josh:
life but i think that's pretty much it on the loop um,
Josh:
you're not going anywhere but your job might shift a little bit in terms of
Josh:
scope as these tools get more powerful and that should be the hope that should
Josh:
be the goal because it'll allow you to do so much more that you want to accomplish, I believe.
Josh:
And yeah, I think that's where we'll leave you with today.
Ejaaz:
Thank you folks so much for listening. Similar to Josh's prompts,
Ejaaz:
I'm actually kind of curious, for one singular task that you've used your AI
Ejaaz:
for, what is the most number of tokens that you've burnt?
Ejaaz:
Be honest, it can be for any use case, doesn't matter, let us know.
Ejaaz:
And also, what is the longest that you've had an AI work on a particular task
Ejaaz:
for? Is it a couple of minutes? Is it hours? Is it potentially overnight?
Ejaaz:
Let us know. I'm curious. And what was the associated bill with that?
Ejaaz:
And yeah, we'll see you on the next episode. Wherever you listen to us,
Ejaaz:
if you haven't subscribed, if you haven't rated us, if you're not leaving us
Ejaaz:
comments, what are you doing?
Ejaaz:
We respond to pretty much any and every one of them. We listen to your feedback.
Ejaaz:
It feeds into some of the work and content that we put out.
Ejaaz:
We are almost hitting 60,000 of you folks. And you guys are reading our newsletter,
Ejaaz:
which is like hit out to about 100,000 plus people. every single week. We post twice a week.
Ejaaz:
But yeah, wherever you are, please subscribe to us, leave us a comment,
Ejaaz:
and we'll see you on the next one.
Josh:
See you guys next time.
Ejaaz:
Peace.