1
00:00:00,000 --> 00:00:03,750
I regret saying this because in many ways this is a good idea, but I think

2
00:00:03,750 --> 00:00:07,200
people are going way too far on the, like throw more tokens at the problem.

3
00:00:12,450 --> 00:00:14,130
Welcome to Screaming in the Cloud.

4
00:00:14,160 --> 00:00:15,420
I'm Corey Quinn.

5
00:00:15,600 --> 00:00:20,730
I'm joined today by Dexter Horthy, the CEO and Co-founder of Human Layer.

6
00:00:20,790 --> 00:00:23,070
And by all accounts, he appears to be human.

7
00:00:24,000 --> 00:00:24,720
Thanks for joining me,

8
00:00:25,349 --> 00:00:26,849
dude, I'm so stoked to be here.

9
00:00:27,479 --> 00:00:30,599
This episode is sponsored in part by my day job Duck.

10
00:00:30,599 --> 00:00:33,810
Bill, do you have a horrifying AWS bill?

11
00:00:34,050 --> 00:00:35,940
That can mean a lot of things.

12
00:00:36,150 --> 00:00:39,239
Predicting what it's going to be, determining what it

13
00:00:39,239 --> 00:00:43,080
should be, negotiating your next long-term contract with

14
00:00:43,080 --> 00:00:47,099
AWS, or just figuring out why it increasingly resembles of.

15
00:00:47,570 --> 00:00:51,200
Phone number, but nobody seems to quite know why that is.

16
00:00:51,500 --> 00:00:53,180
To learn more, visit duck bill

17
00:00:53,180 --> 00:00:55,070
hq.com.

18
00:00:55,370 --> 00:00:58,250
Remember, you can't duck the duck bill.

19
00:00:58,280 --> 00:01:03,650
Bill, which my CEO reliably informs me is absolutely not our slogan.

20
00:01:04,470 --> 00:01:07,289
So for those who have not had the pleasure of encountering

21
00:01:07,289 --> 00:01:11,310
your particular, we'll call it perspective, what is it you say?

22
00:01:11,310 --> 00:01:12,180
It is you do here?

23
00:01:14,190 --> 00:01:14,730
Amazing.

24
00:01:14,730 --> 00:01:18,750
So I, uh, am obsessed with getting the most out of ai.

25
00:01:18,840 --> 00:01:21,690
How do we take whatever the current models we have outside

26
00:01:21,690 --> 00:01:25,020
of training and fine tuning and like task specific stuff.

27
00:01:25,080 --> 00:01:27,960
What can we as engineers who are not working in a big lab.

28
00:01:28,500 --> 00:01:30,960
Due to push these models to their limits.

29
00:01:31,170 --> 00:01:34,229
Most recently, in the last like six to nine months, most of that

30
00:01:34,229 --> 00:01:37,140
has been around coding agents because I think it's one of the most

31
00:01:37,169 --> 00:01:40,470
misunderstood and also has the highest ceiling if you do it right.

32
00:01:40,770 --> 00:01:43,830
It seems to me like this is one of those areas where.

33
00:01:44,535 --> 00:01:47,595
You are taking a half hour outta your day to have this conversation with

34
00:01:47,595 --> 00:01:50,595
me, and during that half hour, the whole game is gonna change again.

35
00:01:50,774 --> 00:01:52,755
This isn't an area where you can hold still.

36
00:01:52,755 --> 00:01:56,115
Uh, a year ago I had a whole bunch of problems that, oh,

37
00:01:56,175 --> 00:01:58,845
these are things that the coding tools will struggle with.

38
00:01:59,024 --> 00:02:02,175
I'll just keep that as sort of a personal benchmark and well, I, I ran out.

39
00:02:02,580 --> 00:02:03,810
You ran out of personal benchmark.

40
00:02:03,810 --> 00:02:05,160
What did your benchmark used to be?

41
00:02:05,280 --> 00:02:09,630
Do some analysis of a 150 megabytes of JSON, so I can have

42
00:02:09,630 --> 00:02:14,190
discussions with models about my Twitter corpus from a seven year run.

43
00:02:14,579 --> 00:02:17,610
There were build, build weird backend systems

44
00:02:17,610 --> 00:02:19,770
for me that just sort of started working.

45
00:02:19,920 --> 00:02:21,329
I replaced my Adobe.

46
00:02:21,400 --> 00:02:24,730
Creative Cloud subscription by building in a custom podcast

47
00:02:24,730 --> 00:02:28,840
recorder into a, into a web app that I use for the Monday

48
00:02:28,840 --> 00:02:31,930
podcast that I record for the last week in AWS podcast.

49
00:02:32,260 --> 00:02:34,720
It's basically a bunch of workflow tools of things

50
00:02:34,720 --> 00:02:37,000
that, well, that's hard, that's what smart people do.

51
00:02:37,390 --> 00:02:38,140
I still have some though.

52
00:02:38,140 --> 00:02:40,660
I mean, I have a Bloomberg keyboard on my desk at work,

53
00:02:40,750 --> 00:02:42,790
uh, which has a fingerprint reader that if you don't pay

54
00:02:42,790 --> 00:02:45,085
Bloomberg, you can't read, there's nothing on it on the Mac.

55
00:02:45,635 --> 00:02:48,125
Uh, Claude code went nuts on it and apparently there's some

56
00:02:48,125 --> 00:02:50,525
encryption thing it needs to basically be able to break through.

57
00:02:50,555 --> 00:02:53,615
So, you know, I need either need to get someone with an actual Bloomberg

58
00:02:53,615 --> 00:02:57,875
subscription and do a wire capture on it, or I can just put that back on

59
00:02:57,875 --> 00:03:01,505
the well until cryptography falls, I suppose I'll have to live with it.

60
00:03:02,100 --> 00:03:05,850
Yeah, and you probably don't wanna get caught asking a frontier model

61
00:03:05,850 --> 00:03:09,480
to, uh, reverse engineer something that you're supposed to be paying for.

62
00:03:09,480 --> 00:03:12,510
It's a good way to get, uh, banned from Anthropic for a while.

63
00:03:12,630 --> 00:03:13,530
No serious.

64
00:03:13,530 --> 00:03:17,100
Interesting, because there's nothing about this that I use

65
00:03:17,100 --> 00:03:19,320
this as a standard keyboard, it as a fingerprint reader in it.

66
00:03:19,320 --> 00:03:20,370
I want to use the fingerprint.

67
00:03:20,990 --> 00:03:21,620
The end.

68
00:03:21,710 --> 00:03:23,750
This is not about stealing things from Bloomberg.

69
00:03:23,750 --> 00:03:26,390
To be clear, there's nothing unethical in this request.

70
00:03:26,390 --> 00:03:28,460
It's, I, I would love to be able to use the

71
00:03:28,460 --> 00:03:30,290
fingerprint reader built into my keyboard.

72
00:03:30,560 --> 00:03:30,980
The end.

73
00:03:31,220 --> 00:03:31,400
Yeah.

74
00:03:31,400 --> 00:03:35,060
The evals are getting harder to find good ones that the models can't solve.

75
00:03:35,450 --> 00:03:38,990
I still do have a couple, and like I've built this actually this like

76
00:03:39,230 --> 00:03:42,380
sort of personal mental model of like every time I'm doing something

77
00:03:42,380 --> 00:03:46,250
with AI that becomes so hard that I either end up spending like.

78
00:03:46,635 --> 00:03:50,385
Ton of time going back and doing like 30 different sessions just to understand

79
00:03:50,385 --> 00:03:54,315
the problem and then another 10 sessions to actually figure out the solution.

80
00:03:54,615 --> 00:03:58,905
I will like flag that and I have a little journal of things that AI is not

81
00:03:58,905 --> 00:04:02,955
good at solving and then I come back to that, get repo at that get shot

82
00:04:02,985 --> 00:04:05,895
and every time there's a new model I say, can you one shot this problem?

83
00:04:06,315 --> 00:04:07,725
Can you actually go figure out

84
00:04:08,205 --> 00:04:08,475
the problem?

85
00:04:08,475 --> 00:04:10,815
Do you have an example that, I'm sure this example will age like fine milk.

86
00:04:11,025 --> 00:04:12,345
It's been working for about six months.

87
00:04:12,345 --> 00:04:14,655
There is a race condition bug.

88
00:04:15,045 --> 00:04:19,815
In the current version of the Human Layer, open Source, we ended up forking that

89
00:04:19,815 --> 00:04:23,715
open source repo and making it closed source for now, just because open source

90
00:04:23,715 --> 00:04:27,195
is a little bit, it's going through its own weird moment right now and ours.

91
00:04:27,255 --> 00:04:27,615
Yeah.

92
00:04:27,615 --> 00:04:30,375
Open source will be going through weird moments for 30 years, but I hear you.

93
00:04:30,525 --> 00:04:30,735
Yeah.

94
00:04:30,795 --> 00:04:33,495
Our, our vision does not require us to be open source.

95
00:04:33,495 --> 00:04:35,385
It's an extra set of distractions that we just

96
00:04:35,385 --> 00:04:37,515
don't wanna worry about right now, so we can focus.

97
00:04:37,905 --> 00:04:40,335
But if you currently pop open the current version of Human Layer,

98
00:04:40,425 --> 00:04:43,905
if you can get a model to one shot, the race condition between.

99
00:04:44,310 --> 00:04:50,670
The Towery Rust native app, the V front end that, uh, it serves the

100
00:04:50,670 --> 00:04:55,380
Golan demon that runs locally, that launches cloud code sessions that

101
00:04:55,380 --> 00:04:59,880
launch a standard IO MCP server that loops back to the demon that serves

102
00:04:59,880 --> 00:05:03,450
approval request to the front end and all the way back through all that

103
00:05:03,450 --> 00:05:07,230
chain and your model can one shot the solution to that race condition.

104
00:05:07,230 --> 00:05:08,039
I know what it is.

105
00:05:08,039 --> 00:05:09,030
We haven't pushed the fix to it.

106
00:05:09,030 --> 00:05:10,020
We fixed it in our closed source.

107
00:05:10,560 --> 00:05:14,700
But that is, that is one of my evals that I, uh, every time I want

108
00:05:14,700 --> 00:05:17,880
to test a new model or test a new workflow, we throw it at that.

109
00:05:18,150 --> 00:05:20,340
And the correct answer is that workflow is insane.

110
00:05:20,340 --> 00:05:21,690
Have you considered not doing that?

111
00:05:21,810 --> 00:05:24,030
We, I mean, so this is the other problem with AI SWAP is we

112
00:05:24,030 --> 00:05:27,360
haven't talked about problems with AI swap, but, uh, we tried

113
00:05:27,360 --> 00:05:30,450
the, like don't read the code thing for about six months and

114
00:05:30,450 --> 00:05:32,730
found ourselves running away with from It with our hair on fire.

115
00:05:33,435 --> 00:05:34,844
And this may be a skill issue.

116
00:05:35,235 --> 00:05:37,155
I find that it, it's odd because when I, when I

117
00:05:37,155 --> 00:05:39,735
do backend stuff or infrastructure stuff, I often

118
00:05:39,735 --> 00:05:41,925
have to slap the chainsaw out of the thing's hands.

119
00:05:41,925 --> 00:05:44,445
But on front end, eh, it, I don't know anything

120
00:05:44,445 --> 00:05:46,275
about front end, so I assume it's right.

121
00:05:46,545 --> 00:05:48,435
It feels like the blast radius might be smaller

122
00:05:48,885 --> 00:05:52,544
a little bit, but also front end is very, like, once your front end

123
00:05:52,544 --> 00:05:55,515
becomes super tangled, I mean, it was both backend and front end and how

124
00:05:55,515 --> 00:05:59,055
they talked together that caused us to throw out this entire code base.

125
00:05:59,415 --> 00:06:02,475
We could have fixed it, but we decided there were other architecture things

126
00:06:02,475 --> 00:06:06,675
we needed to rethink anyways, so it would be easier to start Greenfield and

127
00:06:06,675 --> 00:06:09,375
throw it out and start over, which is a thing you were never supposed to do.

128
00:06:09,375 --> 00:06:10,665
And with AI you can do more

129
00:06:10,665 --> 00:06:10,785
of.

130
00:06:10,785 --> 00:06:12,225
AI makes that a lot better.

131
00:06:12,285 --> 00:06:15,525
I found that, oh, this thing that I built to serve

132
00:06:15,525 --> 00:06:18,255
a particular purpose and fix a problem that I have.

133
00:06:18,820 --> 00:06:21,730
Uh, no longer serves that purpose 'cause of requirements change or something.

134
00:06:21,730 --> 00:06:22,150
Great.

135
00:06:22,240 --> 00:06:25,810
Throw it out, baby bath water and all the baby's floating face down.

136
00:06:25,810 --> 00:06:26,260
It's fine.

137
00:06:26,440 --> 00:06:28,690
And we're gonna go ahead and start over from

138
00:06:28,690 --> 00:06:31,720
scratch that, that used to be a three week project.

139
00:06:31,750 --> 00:06:33,700
Now it's, it'll be done by the end of my coffee break.

140
00:06:33,850 --> 00:06:35,320
I remember the second job I ever had.

141
00:06:35,320 --> 00:06:39,160
I started and I came into a three month refactor.

142
00:06:39,160 --> 00:06:40,630
That was on month six.

143
00:06:40,660 --> 00:06:42,430
And it was like, we're gonna upgrade all the frameworks.

144
00:06:42,430 --> 00:06:42,910
We're gonna pause.

145
00:06:42,910 --> 00:06:45,100
Feature Deb, the CTO, convinced the CEO.

146
00:06:45,690 --> 00:06:47,310
That it was gonna be okay and it would be over

147
00:06:47,310 --> 00:06:49,320
quickly, and it had to happen no matter what.

148
00:06:49,320 --> 00:06:51,720
He had like bargained with the, with the product leadership

149
00:06:51,720 --> 00:06:54,990
of the company to be allowed to spend a couple months like

150
00:06:55,350 --> 00:06:57,865
upgrading and cleaning things up and improving removing tech debt.

151
00:06:58,545 --> 00:07:00,915
And of course it went twice as long and like my first

152
00:07:00,915 --> 00:07:03,345
week was like, okay, this thing is due on Friday.

153
00:07:03,345 --> 00:07:06,345
Everyone has lost patience and it is now a death march

154
00:07:06,345 --> 00:07:08,535
for the next two weeks to actually get this thing out.

155
00:07:08,535 --> 00:07:12,375
And of course, shipped a million bugs and we eventually like recovered.

156
00:07:12,375 --> 00:07:14,115
But yeah, like you're not supposed to do that.

157
00:07:14,205 --> 00:07:16,305
When an engineer says, we need to rewrite this thing, you're supposed

158
00:07:16,305 --> 00:07:18,550
to tell them to go read a book about why you shouldn't do that.

159
00:07:19,485 --> 00:07:22,305
You have a background doing the DevOps, SRE dance, which means

160
00:07:22,305 --> 00:07:27,105
that you're often the voice of moderation in a, in dev environments

161
00:07:27,135 --> 00:07:29,835
where everyone wants to build features and do exciting things.

162
00:07:29,835 --> 00:07:31,515
You're like, Hey, let's make this sustainable.

163
00:07:31,515 --> 00:07:32,205
Let's slow down.

164
00:07:32,205 --> 00:07:34,605
Let's be conservative with things like databases, file

165
00:07:34,605 --> 00:07:37,365
systems, the stuff that leaves a mark when it breaks.

166
00:07:37,545 --> 00:07:41,235
Now it seems like you're almost championing acceleration of features.

167
00:07:41,235 --> 00:07:41,995
What was that transition like?

168
00:07:42,385 --> 00:07:43,555
You say I'm like a DevOps.

169
00:07:43,555 --> 00:07:48,835
SREI have done plenty of DevOps at SREI did a ton in the Kubernetes world at, I

170
00:07:48,835 --> 00:07:51,895
was at a startup called Replicated for like seven years where we helped people

171
00:07:51,895 --> 00:07:55,255
package up their Kubernetes app and ship it to other people's data centers.

172
00:07:55,675 --> 00:07:58,525
But I, I would frame it less as like the voice of reason.

173
00:07:58,525 --> 00:08:01,345
I've always been a like, impatient fast.

174
00:08:01,345 --> 00:08:04,615
Like, let's ship value, let's, you know, be scrappy and like.

175
00:08:05,400 --> 00:08:10,140
Figure out like what risks are tolerable and what corners should never be cut.

176
00:08:10,140 --> 00:08:10,650
Of course,

177
00:08:10,680 --> 00:08:12,690
how do we be responsible in our irresponsibility?

178
00:08:12,780 --> 00:08:17,020
Played a lot of StarCraft two growing up and uh, or StarCraft and one and two.

179
00:08:17,594 --> 00:08:20,565
And I forget who said this, but like, it's, it's an incredible

180
00:08:20,565 --> 00:08:25,515
exercise in like early stage companies, not obviously large, like not

181
00:08:25,515 --> 00:08:28,395
just like seed, like all the way through A, B, C, whatever, because

182
00:08:28,395 --> 00:08:33,135
it forces you to make hard decisions with incomplete information.

183
00:08:33,395 --> 00:08:36,605
And it forces you to do that hundreds of times a minute.

184
00:08:36,845 --> 00:08:37,564
Oh, absolutely.

185
00:08:37,564 --> 00:08:40,324
I, I, one of the hard lessons for me when we're building Skyway over

186
00:08:40,324 --> 00:08:43,564
at Duck Bill has been, we are willingly accepting technical debt.

187
00:08:43,714 --> 00:08:48,125
That is something we are doing with our eyes open on it, and we're, we're

188
00:08:48,125 --> 00:08:51,305
making the decisions that will not ideally screw us over later, but.

189
00:08:52,020 --> 00:08:54,689
If we get to that point, we can fix the technical debt.

190
00:08:54,689 --> 00:08:56,100
And if we don't, it won't matter anyway.

191
00:08:56,220 --> 00:08:59,490
So that took a bit of change in my perspective.

192
00:08:59,670 --> 00:09:01,680
'cause historically I was never at a company this early.

193
00:09:01,680 --> 00:09:03,900
I was in after product market fit.

194
00:09:04,110 --> 00:09:04,470
Okay.

195
00:09:04,470 --> 00:09:07,079
Developers have taken the environment as far as they can.

196
00:09:07,260 --> 00:09:08,910
Everything's on fire all the time.

197
00:09:08,910 --> 00:09:09,990
Can you help us?

198
00:09:10,045 --> 00:09:10,890
Yes, I can.

199
00:09:10,890 --> 00:09:14,339
Basically, my entire job and career have been paying off technical debt.

200
00:09:14,699 --> 00:09:15,000
Yeah.

201
00:09:15,000 --> 00:09:15,569
And it's really fun.

202
00:09:15,569 --> 00:09:16,770
I love paying off technical debt.

203
00:09:16,770 --> 00:09:20,040
I mean, I So coming back to your question of like, how did you go from.

204
00:09:20,305 --> 00:09:22,525
The more conservative voice of reason to like, Hey, we

205
00:09:22,525 --> 00:09:24,220
need to figure out how to accelerate things, is like.

206
00:09:25,110 --> 00:09:26,969
I would frame it less as DevOps, SRE.

207
00:09:27,030 --> 00:09:30,630
I would frame it as like I've been building software factories my entire

208
00:09:30,630 --> 00:09:35,219
career, like not on purpose, but I always looked up the most to the

209
00:09:35,219 --> 00:09:39,120
engineers that maintained the software factory, whatever part of it it was.

210
00:09:39,120 --> 00:09:42,390
Whether it was the environment that the like system that allowed you to spin

211
00:09:42,390 --> 00:09:46,110
up like temporary testing sandboxes with a full stack so that a PM could

212
00:09:46,110 --> 00:09:49,920
look at it, or the CICD pipeline or the thing that did the automated testing.

213
00:09:49,980 --> 00:09:52,319
That was always the most fascinating thing for me because.

214
00:09:52,680 --> 00:09:55,200
I, I saw early on the people who invested

215
00:09:55,200 --> 00:09:57,480
in that would have compounding returns.

216
00:09:57,750 --> 00:10:00,750
You write the feature, you get a feature, you improve the factory

217
00:10:00,750 --> 00:10:04,830
10%, well, you get, you know, 20% of your time back the next day

218
00:10:04,920 --> 00:10:07,110
and you can spend half of that making the factory even better.

219
00:10:07,110 --> 00:10:08,820
And the other half of it writing more code.

220
00:10:08,820 --> 00:10:11,850
And this is how like Will Larson was like an elegant puzzle.

221
00:10:11,855 --> 00:10:14,820
There's like this part of the curve where you have, you have invested

222
00:10:14,820 --> 00:10:18,540
so much in the thing that builds the thing that you're now just like.

223
00:10:19,185 --> 00:10:20,745
Leaving everybody behind in the dust.

224
00:10:20,895 --> 00:10:23,595
So I am curious when you take a look now, since what you do

225
00:10:23,595 --> 00:10:27,165
more or less is telling people how to effectively work with

226
00:10:27,165 --> 00:10:29,715
AI coding agents, what are people getting wrong the most?

227
00:10:29,865 --> 00:10:32,175
What can we take away from this as far as, oh, I'm gonna

228
00:10:32,175 --> 00:10:34,635
get better results with Claude Code after listening to you?

229
00:10:34,785 --> 00:10:38,865
I, I regret saying this because in many ways this is a good idea, but I think

230
00:10:38,865 --> 00:10:42,315
people are going way too far on the, like throw more tokens at the problem.

231
00:10:42,400 --> 00:10:44,560
Are we talking about GST stack without mentioning GST stack?

232
00:10:44,829 --> 00:10:47,500
Uh, we're talking about Gastown GST stack, Ralph Wickham.

233
00:10:47,500 --> 00:10:50,290
Any number of good ways to throw more tokens at a problem.

234
00:10:50,439 --> 00:10:52,959
And in general, if you design the problem correctly,

235
00:10:53,140 --> 00:10:56,740
throwing more tokens at it may be helpful, especially if

236
00:10:56,740 --> 00:10:59,290
you can create good deterministic back pressure, right?

237
00:10:59,290 --> 00:11:02,020
The reason why Ralph Ham was able to create this cursed programming

238
00:11:02,020 --> 00:11:05,140
language with a model that was not that, you know, like a sonnet three

239
00:11:05,140 --> 00:11:09,310
seven or like pre pre, like everyone else thinks AI is good model.

240
00:11:09,870 --> 00:11:11,760
Is because it was building a programming language

241
00:11:11,820 --> 00:11:14,580
and a programming language is infinitely verifiable.

242
00:11:14,640 --> 00:11:17,400
You grade code in the language, you try to compile it, compiler breaks.

243
00:11:17,400 --> 00:11:19,650
You go fix the compiler, you the compiler works.

244
00:11:19,650 --> 00:11:20,310
You run the program.

245
00:11:20,310 --> 00:11:21,000
Program breaks.

246
00:11:21,000 --> 00:11:23,340
You go fix the whatever the compiler is putting in.

247
00:11:23,340 --> 00:11:25,620
But it's like it's very easy for the model to check

248
00:11:25,620 --> 00:11:27,360
its work and tell if it's done a feature, right?

249
00:11:27,930 --> 00:11:30,390
Not a lot of problems have that characteristic

250
00:11:30,390 --> 00:11:32,580
and people are trying to apply these techniques.

251
00:11:32,580 --> 00:11:34,410
That worked really well, throwing more tokens at

252
00:11:34,410 --> 00:11:37,575
the problem for these like very verifiable problems.

253
00:11:38,835 --> 00:11:40,785
At problems that are not verifiable.

254
00:11:41,235 --> 00:11:46,305
That is, it also feels like that that is what everyone is doing to a point

255
00:11:46,305 --> 00:11:50,325
where now we're seeing token capacity constraints from the major providers.

256
00:11:50,415 --> 00:11:52,185
Anthropic, as of this recording, has done some

257
00:11:52,185 --> 00:11:56,235
strange things with session windows and double usage.

258
00:11:56,475 --> 00:12:00,650
Part of me wonders if that is a byproduct of people throwing tokens at problem.

259
00:12:01,270 --> 00:12:02,350
That's interesting.

260
00:12:02,530 --> 00:12:04,390
The, the whole philanthropic thing of like, okay, we

261
00:12:04,390 --> 00:12:07,450
need to control open claw usage and we need to make sure

262
00:12:07,450 --> 00:12:11,080
that hey, people are taking our subsidized inference.

263
00:12:11,080 --> 00:12:13,900
And only my general take on that whole thing is like if

264
00:12:13,900 --> 00:12:16,570
philanthropic wants to give a discounted plan and tell you

265
00:12:16,570 --> 00:12:19,420
how you can and can't use it, like that's their prerogative.

266
00:12:19,450 --> 00:12:21,670
Everybody I know who is serious, all of our enterprise

267
00:12:21,670 --> 00:12:23,170
customers, they're paying for token anyways.

268
00:12:23,775 --> 00:12:27,584
And it's like cool, like no one, no one promised you cheap inference.

269
00:12:27,584 --> 00:12:29,204
Nobody owes you cheap inference.

270
00:12:29,204 --> 00:12:31,755
You can say what you will about anti competitiveness, right?

271
00:12:31,755 --> 00:12:36,015
Like the example that Theo gave me was actually pretty good is like Amazon wants

272
00:12:36,015 --> 00:12:39,974
to kill diapers.com, so they just take the same product and sell it cheaper.

273
00:12:39,974 --> 00:12:42,435
They sell it at a loss because they can afford to.

274
00:12:42,735 --> 00:12:45,495
And then one day when that, when all those like, you know, one-off

275
00:12:45,495 --> 00:12:48,375
businesses are out of business, then they can charge whatever they want.

276
00:12:48,390 --> 00:12:48,449
Uh,

277
00:12:48,930 --> 00:12:51,569
that's why I am interested in a lot of the

278
00:12:51,569 --> 00:12:54,150
local LLM uh, research that's being done.

279
00:12:54,150 --> 00:12:57,000
I, I want to be able to have a coding agent that runs locally and

280
00:12:57,000 --> 00:12:59,850
uses, makes tool use and sure it's gonna be slower and it might

281
00:12:59,850 --> 00:13:04,319
not be as great, but a lot of what I do isn't that complicated.

282
00:13:04,410 --> 00:13:08,520
Go ahead and modernize the, uh, version of Python.

283
00:13:08,520 --> 00:13:12,270
This dumb little script is written in Go is the sort of thing that,

284
00:13:12,300 --> 00:13:15,540
okay, that takes half an hour and basically heats up my laptop.

285
00:13:15,720 --> 00:13:16,560
I don't care as much.

286
00:13:17,130 --> 00:13:17,939
Yeah, that makes sense.

287
00:13:18,765 --> 00:13:21,855
So what are you seeing as emerging trends these days

288
00:13:21,885 --> 00:13:23,745
other than, you know, throwing tokens at things?

289
00:13:24,105 --> 00:13:24,435
I don't know.

290
00:13:24,435 --> 00:13:27,495
Every other person I talk to is like accidentally reinventing gastown from

291
00:13:27,495 --> 00:13:30,315
first principles, but I don't know, I don't know if I wanna say that's a trend.

292
00:13:30,315 --> 00:13:35,205
It's just a, like, there is a thing that engineers like to do, which is to

293
00:13:35,205 --> 00:13:39,015
glue systems together and, and see how they work and improve them over time.

294
00:13:39,020 --> 00:13:40,965
And you start with three prompts and then you wake

295
00:13:40,965 --> 00:13:42,645
up the next day and suddenly you have a hundred.

296
00:13:43,785 --> 00:13:45,285
You're the only one that knows how to use it.

297
00:13:45,734 --> 00:13:50,444
For me, something that I've begun to deeply appreciate about agents is

298
00:13:50,444 --> 00:13:54,885
one of the things I look for when I was interviewing SREs Once upon a time

299
00:13:55,244 --> 00:13:59,564
where you, you start throwing a problem at them and seeing how deep they go.

300
00:13:59,785 --> 00:14:01,375
And the, the right way to get through an interview

301
00:14:01,375 --> 00:14:03,355
like that is never give up, never surrender.

302
00:14:03,505 --> 00:14:07,735
So I will see these things, oh, I can't, I don't have access to that.

303
00:14:07,735 --> 00:14:09,535
So here's what I'm gonna do instead to get to the

304
00:14:09,535 --> 00:14:11,875
reason that I'm, that this thing is misbehaving.

305
00:14:12,145 --> 00:14:14,365
I've seen it start pulling TCP dumps.

306
00:14:14,365 --> 00:14:16,135
I've seen it start packet crafting.

307
00:14:16,314 --> 00:14:19,105
It's doing ridiculously in depth things.

308
00:14:19,105 --> 00:14:21,655
I haven't seen SRAs yet, but I'm waiting for it where

309
00:14:21,655 --> 00:14:25,464
it's using very deep tools to get at the answer.

310
00:14:25,735 --> 00:14:28,135
Uh, in many cases, past a point of reason.

311
00:14:28,635 --> 00:14:32,474
But it's, it's doing a lot of the stuff that I would do if I weren't lazy.

312
00:14:32,775 --> 00:14:36,795
I care about figuring out why I have this non-deterministic delay on

313
00:14:36,795 --> 00:14:40,814
an API that I built, but not enough to actually go diving into it.

314
00:14:40,814 --> 00:14:42,824
But I can turn this thing loose and it'll tell me I.

315
00:14:43,635 --> 00:14:47,175
This episode is sponsored by my own company, duck Bill.

316
00:14:47,475 --> 00:14:52,635
Having trouble with your AWS bill, perhaps it's time to renegotiate a contract

317
00:14:52,635 --> 00:14:53,085
with them.

318
00:14:53,445 --> 00:14:55,575
Maybe you're just wondering how to predict

319
00:14:55,575 --> 00:14:58,815
what's going on in the wide world of AWS.

320
00:14:58,875 --> 00:15:01,515
Well, that's where Duck Bill comes in to help.

321
00:15:01,725 --> 00:15:04,455
Remember, you can't duck the duck bill.

322
00:15:04,455 --> 00:15:07,125
Bill, which I am reliably informed by my

323
00:15:07,125 --> 00:15:10,665
business partner is absolutely not our motto.

324
00:15:10,740 --> 00:15:14,160
To learn more, visit doc bill hq.com.

325
00:15:14,880 --> 00:15:17,910
The adoption of Claude Code was the first thing that

326
00:15:17,910 --> 00:15:20,610
made me believe that CloudWatch was actually useful.

327
00:15:21,045 --> 00:15:23,655
CloudWatch is incredibly powerful, incredibly

328
00:15:23,655 --> 00:15:27,555
useful with a user interface that is garbage.

329
00:15:27,915 --> 00:15:30,015
It's the data structure underneath everything

330
00:15:30,015 --> 00:15:32,925
good, but it itself, it is terrible to work with.

331
00:15:32,925 --> 00:15:34,245
But agents do not care.

332
00:15:34,515 --> 00:15:34,845
Exactly.

333
00:15:34,845 --> 00:15:35,985
Agents don't care what it looks like 'cause

334
00:15:35,985 --> 00:15:38,295
they're just plumbing through JSON anyways.

335
00:15:38,564 --> 00:15:44,235
I remember a tweet I saw when I first got back on Twitter in like 2015 or 2016.

336
00:15:44,730 --> 00:15:48,270
And it was a tweet from Koda Hale, and the picture was like, it was one

337
00:15:48,270 --> 00:15:51,000
of those CloudWatch charts where you just have like three little dots and

338
00:15:51,000 --> 00:15:54,720
one line because it's like not filling in the gaps between everything.

339
00:15:55,199 --> 00:15:59,040
And like the caption was like CloudWatch was a technical marvel.

340
00:15:59,069 --> 00:16:00,780
Like it's incredibly powerful.

341
00:16:00,930 --> 00:16:03,810
But how did anyone look at this and say, yes, this is good.

342
00:16:03,810 --> 00:16:05,130
This is what we should ship to customers.

343
00:16:06,120 --> 00:16:07,560
In October in 2018.

344
00:16:07,590 --> 00:16:10,380
Uh, CloudWatch is of the devil, but I must use it.

345
00:16:10,950 --> 00:16:14,130
And I wound up talking about how it violated

346
00:16:14,130 --> 00:16:17,700
every one of AWS's, then 14 leadership principles

347
00:16:19,710 --> 00:16:22,380
and that was how I met the then GM of CloudWatch.

348
00:16:22,380 --> 00:16:23,700
And they fixed a lot of it.

349
00:16:23,730 --> 00:16:26,670
It's still not great, but it's not the nightmare

350
00:16:26,670 --> 00:16:28,860
tire fire that it was back in those days.

351
00:16:29,430 --> 00:16:31,079
I do miss aspects of this.

352
00:16:31,469 --> 00:16:32,939
Of old CloudWatch.

353
00:16:33,420 --> 00:16:33,599
Yeah.

354
00:16:33,599 --> 00:16:36,240
Back then you when, when you got something like this working back then.

355
00:16:36,689 --> 00:16:37,949
It was because you really cared.

356
00:16:38,250 --> 00:16:40,470
You suffered for it to get it out the door.

357
00:16:40,530 --> 00:16:43,230
Now it feels like that barrier has been lowered, which is, I wanna

358
00:16:43,230 --> 00:16:46,350
be clear, a good thing, but it's having a bunch of knock on effects.

359
00:16:46,380 --> 00:16:49,560
Uh, GitHub is on fire based upon the sheer number

360
00:16:49,560 --> 00:16:51,870
of commits and agents stuffing things into it.

361
00:16:52,110 --> 00:16:54,689
It, they're not helping themselves by, whenever it comes back up

362
00:16:54,689 --> 00:16:57,449
half a second, babbling about copilot, and then it falls over.

363
00:16:57,449 --> 00:17:00,240
People can draw connections that aren't necessarily there.

364
00:17:00,510 --> 00:17:05,220
I, I do think that they finally showed up in a way, and maybe this is just like

365
00:17:05,220 --> 00:17:09,960
me being too terminally online, but like some VP from GitHub came online and

366
00:17:09,960 --> 00:17:12,690
on Twitter he is like, here's the problem, here's what we're doing about it.

367
00:17:12,690 --> 00:17:13,619
We know it's an issue.

368
00:17:13,619 --> 00:17:15,139
Like, here's what I can say about it.

369
00:17:15,915 --> 00:17:19,425
Yeah, and it was like, oh, I'm no longer worried about this problem.

370
00:17:19,454 --> 00:17:22,694
It's a shame that it took people complaining online for 24

371
00:17:22,694 --> 00:17:25,484
hours a day for weeks straight for them to come out and do that.

372
00:17:25,484 --> 00:17:25,635
There is

373
00:17:25,635 --> 00:17:29,024
a corporate comms lesson in here, and that's very Microsoft, where my issue with

374
00:17:29,024 --> 00:17:32,774
Azure security for a long time was not the security issues, which aren't great.

375
00:17:32,774 --> 00:17:34,905
Let's be clear here, but my problem was the

376
00:17:34,905 --> 00:17:37,845
complete stonewalling silence coming out of Redmond.

377
00:17:38,084 --> 00:17:40,215
Uh, I yell at AWS about this all the time.

378
00:17:40,215 --> 00:17:43,754
When they say nothing, they are far too big now.

379
00:17:43,820 --> 00:17:48,770
To get the benefit of the doubt, they're a nearly $3 trillion

380
00:17:48,770 --> 00:17:53,240
company that is going to have the worst assumed about

381
00:17:53,240 --> 00:17:56,600
them until I, they start talking at which point, oh, okay.

382
00:17:56,870 --> 00:17:58,730
Now, sure, some people aren't gonna believe what they say.

383
00:17:58,730 --> 00:18:00,919
Some people are always gonna want to needle 'em, and I get that.

384
00:18:01,280 --> 00:18:03,439
But at least they're trying at that point instead

385
00:18:03,439 --> 00:18:05,450
of, well, maybe if we shut up, they'll go away.

386
00:18:05,899 --> 00:18:10,550
Do you think we're going to get an agent optimized GitHub,

387
00:18:10,580 --> 00:18:13,399
or do you think someone else is gonna have to build that?

388
00:18:14,205 --> 00:18:21,375
I am cynical in that this is gonna make me sound ancient, but Git was a Marvel.

389
00:18:21,375 --> 00:18:23,355
It was a distributed tool for source control, and

390
00:18:23,355 --> 00:18:25,004
the first thing we did is centralize it again.

391
00:18:25,155 --> 00:18:25,545
Awesome.

392
00:18:25,845 --> 00:18:29,655
It is not that hard in isolation to run a Git repo.

393
00:18:29,655 --> 00:18:32,175
It is a static web server with a few extra bits.

394
00:18:32,355 --> 00:18:35,655
It's all the ecosystem stuff on top of it that starts getting tricky.

395
00:18:35,939 --> 00:18:38,310
It's the, the fact that it sparks off agents, the fact

396
00:18:38,310 --> 00:18:41,340
that it does web hooks, the RAC, which is no small thing.

397
00:18:41,639 --> 00:18:44,700
The fact that it can track issues, the pull

398
00:18:44,700 --> 00:18:47,220
request model, the discussions around it.

399
00:18:47,490 --> 00:18:50,760
A part of the problem even now is describing what GitHub is exactly.

400
00:18:50,939 --> 00:18:53,730
So some aspects trivial to replace, uh, for agent scale.

401
00:18:53,760 --> 00:18:56,310
Others, I don't know, boss, that's a heavy lift.

402
00:18:56,685 --> 00:18:59,655
I have a couple friends who are like crazy system

403
00:18:59,805 --> 00:19:03,885
engineers and like last year they built a Git server from

404
00:19:03,885 --> 00:19:06,680
scratch in Rust that is like fully protocol compliant.

405
00:19:07,409 --> 00:19:11,190
And also has like rest APIs for every get protocol

406
00:19:11,190 --> 00:19:13,409
operation and it's like super performance.

407
00:19:13,440 --> 00:19:15,240
They built it for like five coding infrastructure.

408
00:19:15,270 --> 00:19:18,030
It is like every single project on V zero lovable all these,

409
00:19:18,090 --> 00:19:20,190
they don't, those aren't, they're companies like that.

410
00:19:20,190 --> 00:19:23,340
Every single time someone opens a browser, you need to create a get repo.

411
00:19:23,875 --> 00:19:24,985
Now, there are two problems with this.

412
00:19:24,985 --> 00:19:26,785
They have a great shot, but there are two problems with this.

413
00:19:26,845 --> 00:19:27,745
Oh, several actually.

414
00:19:27,835 --> 00:19:31,525
One is everyone can build a tool that solves their particular problem.

415
00:19:32,125 --> 00:19:34,585
How and, and how is other people's requirements.

416
00:19:34,645 --> 00:19:35,935
I've been down that road enough.

417
00:19:36,264 --> 00:19:38,725
So here's my pitch for you is like, what is the

418
00:19:38,725 --> 00:19:44,335
minimal set of APIs needed to create a headless GitHub?

419
00:19:45,075 --> 00:19:48,045
So that anybody who wants to can kind of vibe, code the

420
00:19:48,045 --> 00:19:50,715
front end part, which is like, you know, code still matters,

421
00:19:50,715 --> 00:19:53,235
but like you can't break everybody else's infrastructure.

422
00:19:53,235 --> 00:19:56,625
You can't like, and you can throw it out and rebuild it pretty quickly.

423
00:19:57,075 --> 00:19:58,935
What is the bare set of operations you

424
00:19:58,935 --> 00:20:01,305
need to create something that I can build.

425
00:20:01,365 --> 00:20:02,205
I'm not gonna rebuild GitHub.

426
00:20:02,355 --> 00:20:05,205
I'm not gonna vibe code my own Git server, but if you give

427
00:20:05,205 --> 00:20:08,115
me a really reliable backend that fits the right interface.

428
00:20:08,655 --> 00:20:11,925
I'll happily like build my own front end on it and integrate

429
00:20:11,925 --> 00:20:15,405
it into my vibe coded CRM manager plus project manager plus

430
00:20:15,405 --> 00:20:17,460
like the thing I'm using to run my business of like my.

431
00:20:18,105 --> 00:20:21,465
Custom SAS that is built on like solid bones and

432
00:20:21,465 --> 00:20:23,865
the backend, but I bring the information together.

433
00:20:23,865 --> 00:20:24,585
How I like

434
00:20:25,005 --> 00:20:28,215
J Get Outta the Eclipse project supports a native Git

435
00:20:28,215 --> 00:20:31,515
repository backend of an S3 bucket or other object store.

436
00:20:31,545 --> 00:20:37,215
So technically that would qualify like S3 is pretty solid.

437
00:20:37,215 --> 00:20:39,735
You're not gonna beat that from a raw infrastructure perspective.

438
00:20:40,125 --> 00:20:40,695
Okay.

439
00:20:40,935 --> 00:20:43,275
And if you don't have too much traffic, 'cause you're

440
00:20:43,275 --> 00:20:45,915
only hosting your own version of it, you could just

441
00:20:45,915 --> 00:20:48,825
run, get on top of S3 and as long as you could run,

442
00:20:48,825 --> 00:20:50,835
get on top of a uh Linux box on a pie

443
00:20:50,835 --> 00:20:52,935
somewhere and just use SSH as your interface.

444
00:20:53,640 --> 00:20:56,070
I guess if you were gonna build this as a product for

445
00:20:56,070 --> 00:20:58,500
other people to Right, hell is other people's requirements.

446
00:20:58,500 --> 00:21:01,320
Well that's where it gets tricky is because, okay, why?

447
00:21:01,320 --> 00:21:04,110
So you have your friends building this in rust for vibe coding purposes.

448
00:21:04,170 --> 00:21:04,890
Awesome, great.

449
00:21:05,250 --> 00:21:07,200
Why would I use that instead of vibe coding my own?

450
00:21:07,290 --> 00:21:08,790
Well, so they didn't vibe code this, they,

451
00:21:08,790 --> 00:21:11,700
they like wrote every token by hand A year ago.

452
00:21:11,730 --> 00:21:13,230
I was like, you guys gotta get on this quad code thing.

453
00:21:13,230 --> 00:21:14,820
And they were like, no, it's not good enough.

454
00:21:14,820 --> 00:21:15,900
Our code is perfect and I'm.

455
00:21:16,500 --> 00:21:17,250
Now I'm like, wow.

456
00:21:17,250 --> 00:21:21,570
There are a shrinking number of pieces of software that meet that standard.

457
00:21:21,810 --> 00:21:23,340
There's also a network effect to GitHub.

458
00:21:23,460 --> 00:21:24,450
Everything integrates with it.

459
00:21:24,840 --> 00:21:26,070
The ecosystem is the hard part.

460
00:21:26,070 --> 00:21:27,720
This is why you'll never replace Salesforce either.

461
00:21:27,840 --> 00:21:31,500
It's not the API on top of a database, it's the ecosystem.

462
00:21:31,860 --> 00:21:32,790
I'll take it a step further.

463
00:21:32,790 --> 00:21:34,230
I don't like CPS for most things.

464
00:21:34,230 --> 00:21:38,520
Like AWS has five or six CPS that I, I'll find useless because you've already

465
00:21:38,520 --> 00:21:43,620
got the A-W-S-C-L-I and in theory, the models already know how to do this.

466
00:21:44,045 --> 00:21:44,885
Which is awesome.

467
00:21:45,245 --> 00:21:48,275
Watching it stumble through trying to get the parameters right, just like I do.

468
00:21:48,275 --> 00:21:51,455
It's like, oh, computers, they're just like us, uh, is fun.

469
00:21:51,455 --> 00:21:53,585
From my perspective, in a cynical, sad way,

470
00:21:53,885 --> 00:21:56,375
sort of the an ant farm situation, right?

471
00:21:56,795 --> 00:21:57,125
Yeah.

472
00:21:57,275 --> 00:21:59,735
It can do everything it needs to do without going

473
00:21:59,735 --> 00:22:02,255
down the MCP path, that clutters the contact window.

474
00:22:02,585 --> 00:22:06,845
So yes, and I think this is one of the most common complaints about MCP.

475
00:22:06,845 --> 00:22:08,735
I think my pushback on that would be like.

476
00:22:09,480 --> 00:22:13,950
That is only true if you have a Bash tool and in a lot of cases, UA want

477
00:22:13,950 --> 00:22:19,050
to run an agent without a bash tool for safety, security, reliability.

478
00:22:19,050 --> 00:22:21,720
I actually think one of my predictions is by the end of 2026,

479
00:22:21,720 --> 00:22:24,810
most agents are gonna remove the Bash tool and replace it with

480
00:22:24,810 --> 00:22:29,700
something either like more narrow and scoped or some minimal.

481
00:22:30,120 --> 00:22:33,810
Bash like thing that has a lot less, uh, flexibility.

482
00:22:34,170 --> 00:22:37,590
I think we're gonna find out because that's a really interesting point of view.

483
00:22:37,765 --> 00:22:39,540
A, a challenge that I would have here in your

484
00:22:39,540 --> 00:22:41,940
shoes, trying to help people use these tools better.

485
00:22:42,600 --> 00:22:45,000
Why don't I just put on my enterprise pants?

486
00:22:45,540 --> 00:22:49,290
Do do an evaluation that's 18 months and by that point, we're in a brave new

487
00:22:49,290 --> 00:22:52,680
world again, because this stuff is iterating so quickly, why wouldn't I just

488
00:22:52,680 --> 00:22:55,920
wait for the foundation models to improve and solve these problems for me?

489
00:22:56,354 --> 00:23:00,854
Well, if you need 18 months to make a decision, then you probably should.

490
00:23:00,975 --> 00:23:05,415
I think that the reason that I wrote that paper about

491
00:23:05,415 --> 00:23:08,385
context engineering a year ago, that was like basically

492
00:23:08,385 --> 00:23:11,385
like, Hey look, I built a thing for the agent ecosystem.

493
00:23:11,385 --> 00:23:14,354
Turns out nobody's shipping vertical AI to the

494
00:23:14,354 --> 00:23:16,574
enterprise and actually like delivering results.

495
00:23:16,965 --> 00:23:18,615
Is using any of that stuff.

496
00:23:18,645 --> 00:23:20,715
They're all ignoring the bidder lesson.

497
00:23:20,715 --> 00:23:24,225
They're all building very specific prompts and pipelines and workflows

498
00:23:24,225 --> 00:23:28,545
to improve the capabilities of today's models was 'cause I, I really

499
00:23:28,545 --> 00:23:34,485
believe now that there will always be a frontier for the model, right?

500
00:23:34,485 --> 00:23:35,235
And it's very jagged.

501
00:23:35,685 --> 00:23:38,685
You have certain things that can do 40% accuracy, certain things you can do,

502
00:23:38,685 --> 00:23:42,885
99% accuracy and everything in between for every single task under the sun.

503
00:23:43,215 --> 00:23:47,024
From coding to healthcare to law, to every

504
00:23:47,024 --> 00:23:48,375
single thing you could wanna do, right?

505
00:23:48,615 --> 00:23:50,325
Well, except for the thing that whatever listener is

506
00:23:50,325 --> 00:23:52,485
listening to this and saying, well, that's the thing I do.

507
00:23:52,485 --> 00:23:55,245
Therefore, it could never truly be replaced by a computer.

508
00:23:57,465 --> 00:23:58,004
Yes.

509
00:23:58,004 --> 00:23:59,264
Many such cases.

510
00:23:59,655 --> 00:24:02,385
Probably our entire pitch right, is like, Hey, there's things the

511
00:24:02,385 --> 00:24:04,455
models are good at and the things that the models aren't good at,

512
00:24:04,455 --> 00:24:06,225
and we don't think they're gonna get good at them anytime soon.

513
00:24:06,960 --> 00:24:08,940
And so we are obsessed with building workflows

514
00:24:08,940 --> 00:24:10,980
of like, how do you give humans more leverage?

515
00:24:10,980 --> 00:24:11,400
Right?

516
00:24:11,520 --> 00:24:14,610
Where are the parts where like, yes, a model may eventually get this

517
00:24:14,610 --> 00:24:17,070
right, or if you throw enough tokens at the problem, the, the model

518
00:24:17,070 --> 00:24:20,970
might get it right, but the performance is still low enough that like,

519
00:24:21,000 --> 00:24:25,560
if you put a human in here, it is high leverage for a human to read it.

520
00:24:25,560 --> 00:24:25,800
You know?

521
00:24:25,890 --> 00:24:28,230
For example, read a 200 line markdown doc that

522
00:24:28,230 --> 00:24:30,180
summarizes a code change we're gonna make.

523
00:24:30,405 --> 00:24:34,350
And a rete at the 25,000 foot level before going down into the

524
00:24:34,350 --> 00:24:37,290
weeds and writing the thousand or 2000 lines of code or whatever.

525
00:24:37,290 --> 00:24:37,650
It's,

526
00:24:38,010 --> 00:24:42,180
so we've encountered an inflection point recently where it happened

527
00:24:42,180 --> 00:24:45,660
very quickly, where open source projects got a bunch of security

528
00:24:45,660 --> 00:24:49,770
reports that were AI powered, slop nonsense, and that was terrible.

529
00:24:50,250 --> 00:24:52,200
And at some point now.

530
00:24:52,725 --> 00:24:54,165
They're still getting a bunch of them, but they're

531
00:24:54,165 --> 00:24:57,735
all valid and good and actual security problems.

532
00:24:57,945 --> 00:25:02,024
People are turning off their bug bounty program just because they need to.

533
00:25:02,054 --> 00:25:04,935
They need to deal with the influx of this and cynically, they

534
00:25:04,935 --> 00:25:09,165
didn't budget for this, which I get, but it's wild now where it

535
00:25:09,165 --> 00:25:11,895
feels like I could take Claude code, throw it at some well-known

536
00:25:11,895 --> 00:25:14,985
tool, like great, find the following type of security problem.

537
00:25:15,044 --> 00:25:16,784
Go with a little bit of steering.

538
00:25:17,054 --> 00:25:17,264
Yeah.

539
00:25:17,264 --> 00:25:21,975
The supply curve for discovered CVEs has shifted way to the right.

540
00:25:22,334 --> 00:25:26,504
It's become much, much cheaper, faster, and easier to

541
00:25:26,504 --> 00:25:30,705
find vulnerabilities, and so basic macroeconomics, right?

542
00:25:30,705 --> 00:25:31,419
The price must fall then.

543
00:25:32,115 --> 00:25:34,034
Like the everyone's need to, gonna need to cut their

544
00:25:34,034 --> 00:25:36,705
bug bounty from $200 a finding to $2 a finding.

545
00:25:37,094 --> 00:25:39,645
And then at some point it's like, well, all right, I have a zero

546
00:25:39,645 --> 00:25:42,735
day that gets me remote access to any EC2 instance out there.

547
00:25:43,034 --> 00:25:45,915
Like I don't care what the bug bounty is because that's worth millions

548
00:25:45,915 --> 00:25:49,064
and millions and millions of dollars of a zero day on certain markets.

549
00:25:49,064 --> 00:25:51,195
Similar to I have an iPhone zero day.

550
00:25:51,284 --> 00:25:52,215
Uh, okay.

551
00:25:52,844 --> 00:25:55,334
Maybe that's basically, do you want to do

552
00:25:55,334 --> 00:25:56,570
the right thing or do you want to be rich?

553
00:25:57,304 --> 00:26:00,335
I would like to believe there's a path to do both.

554
00:26:00,425 --> 00:26:00,995
I do too.

555
00:26:00,995 --> 00:26:01,955
I have to sleep at night.

556
00:26:02,105 --> 00:26:02,345
Yes.

557
00:26:03,365 --> 00:26:06,665
But this does tie back to something you said at the beginning where as

558
00:26:06,665 --> 00:26:09,605
I'm using this to figure out what those USB codes are, whenever I swipe

559
00:26:09,605 --> 00:26:12,665
my, uh, finger on the, uh, fingerprint reader built into the keyboard.

560
00:26:12,845 --> 00:26:13,235
You're right.

561
00:26:13,235 --> 00:26:15,695
If I'm starting to try, like use the steal Bloomberg stuff, as you

562
00:26:15,695 --> 00:26:19,145
mentioned, that could wind up getting me turned off by anthropic

563
00:26:19,595 --> 00:26:23,014
security research, though clearly that is not happening at scale.

564
00:26:23,254 --> 00:26:25,534
How is this being navigated by the providers?

565
00:26:25,949 --> 00:26:28,679
I listen to a really good podcast with Boris Cherney, with

566
00:26:28,679 --> 00:26:31,709
Ryan Peterman, and he talks about like just some of the safety.

567
00:26:31,709 --> 00:26:33,389
It was a very short snippet of it, but they're talking

568
00:26:33,389 --> 00:26:36,510
about the safety requirements and safety is not just like.

569
00:26:36,945 --> 00:26:39,375
Is the model gonna go Terminator and kill us all?

570
00:26:39,405 --> 00:26:42,165
It's like they have test environments, they have models they

571
00:26:42,165 --> 00:26:44,445
haven't shipped because they found, so someone found out that

572
00:26:44,445 --> 00:26:47,895
the model would, if you prompted it, like not even that hard,

573
00:26:48,045 --> 00:26:50,505
you could get it to help you develop a biological weapon.

574
00:26:50,745 --> 00:26:51,795
It's for a novel.

575
00:26:52,065 --> 00:26:52,455
Yes.

576
00:26:53,505 --> 00:26:54,225
Yeah, exactly.

577
00:26:54,225 --> 00:26:55,305
I'm, I'm writing sci-fi.

578
00:26:55,335 --> 00:26:56,895
Uh, how would, how would you do this?

579
00:26:57,195 --> 00:26:59,985
It, it's the same problem you have in all security scenarios, right?

580
00:26:59,985 --> 00:27:05,565
Where there's a huge asymmetry of like an attacker has to find one tiny hole.

581
00:27:06,120 --> 00:27:09,179
And the defender has to cover all infinite

582
00:27:09,179 --> 00:27:12,090
potential holes in the security boundary.

583
00:27:12,570 --> 00:27:17,100
I do not envy the model providers here we are dealing with in many ways.

584
00:27:17,100 --> 00:27:18,510
What is a frontier ethics problem?

585
00:27:18,915 --> 00:27:19,995
Frontier ethics,

586
00:27:20,055 --> 00:27:21,435
right versus wrong.

587
00:27:21,615 --> 00:27:25,365
For example, putting content, even the training of the models, putting a, uh,

588
00:27:25,365 --> 00:27:29,145
blog post that you write out, that you wrote by hand out on the internet for

589
00:27:29,145 --> 00:27:33,705
anyone who comes by to read great, awesome models, come and train on all of it.

590
00:27:33,765 --> 00:27:36,075
Well, okay, now is that acceptable use?

591
00:27:36,105 --> 00:27:36,885
Is it not?

592
00:27:37,125 --> 00:27:39,495
Because that is how humans wind up learning things.

593
00:27:39,615 --> 00:27:40,845
It's only a question of scale.

594
00:27:41,135 --> 00:27:43,385
Maybe that doesn't make sense, but it does seem to me that we

595
00:27:43,385 --> 00:27:46,175
are pushing ethical boundaries and frontiers all the time with

596
00:27:46,175 --> 00:27:48,665
ways that copyright wasn't designed to build a deal with this.

597
00:27:49,295 --> 00:27:50,825
Yeah, it's, it's super interesting.

598
00:27:50,825 --> 00:27:55,235
There's like a, there's like a price, there's like now baked into our

599
00:27:55,235 --> 00:27:59,315
ethics of like, what is acceptable reuse of someone else's material.

600
00:27:59,655 --> 00:28:03,105
There is a like price we put on of like, Hey, if you're gonna go

601
00:28:03,105 --> 00:28:06,285
read an article and then spend three hours yourself slaving over a

602
00:28:06,285 --> 00:28:09,435
blog post that has some quotes and citations and it's well made and

603
00:28:09,435 --> 00:28:13,095
it's well written and you put a lot of effort into it, that's okay.

604
00:28:13,335 --> 00:28:15,795
But if someone else just slops out a bunch of copy that's

605
00:28:15,795 --> 00:28:18,055
like, I don't wanna say it's unethical, but it's like.

606
00:28:18,945 --> 00:28:20,925
It is not valued human behavior.

607
00:28:20,985 --> 00:28:23,325
Like we are all smart enough to realize that like we,

608
00:28:23,325 --> 00:28:26,295
we as humans value like effort and investment and like

609
00:28:26,295 --> 00:28:28,605
what makes art good is not what the thing looks like.

610
00:28:28,605 --> 00:28:30,915
I mean, part of it is it has to look good, but like you look at

611
00:28:30,915 --> 00:28:33,615
a painting in a museum, part of what makes it good is the story

612
00:28:33,615 --> 00:28:36,765
that went into it and the emotion and energy that went into it.

613
00:28:36,765 --> 00:28:37,635
That makes you appreciate it.

614
00:28:37,965 --> 00:28:38,145
Yeah.

615
00:28:38,145 --> 00:28:39,075
That's how he makes you feel.

616
00:28:39,645 --> 00:28:39,975
Yeah.

617
00:28:40,035 --> 00:28:41,655
I mean, we talked about technical writing a lot.

618
00:28:41,655 --> 00:28:43,515
I, I do want to quickly come back to your

619
00:28:43,515 --> 00:28:45,020
question 'cause I think, I think I would like.

620
00:28:45,689 --> 00:28:49,229
We were both love tangents and this is my third cold brew of the day.

621
00:28:49,260 --> 00:28:52,469
But you asked something about like, why invest in all of these

622
00:28:52,469 --> 00:28:55,199
workflows and, and, and prompting and, and getting the most

623
00:28:55,199 --> 00:28:58,110
outta the models today if they just get smarter in a generation.

624
00:28:58,379 --> 00:29:00,120
And then all of that is now irrelevant.

625
00:29:00,449 --> 00:29:03,840
Yeah, I got my two, my 2024 book chat chip for dummies.

626
00:29:03,870 --> 00:29:06,510
Uh, why can't I just use that for all my prompting tips?

627
00:29:06,689 --> 00:29:08,429
Well, so I, I think there's, there's an interesting

628
00:29:08,429 --> 00:29:11,729
like, set of skills that are translatable across models.

629
00:29:12,210 --> 00:29:16,260
They're not translatable across like building harnesses or workflows around

630
00:29:16,260 --> 00:29:21,090
models for a specific task, but understanding like how transformer based

631
00:29:21,090 --> 00:29:26,130
attention works and the quadratic nature of a attention and the like increasing

632
00:29:26,130 --> 00:29:29,610
cost and decreasing quality of results you get as you put more and more

633
00:29:29,610 --> 00:29:34,530
into the context window is a skill set that will be relevant no matter how.

634
00:29:34,980 --> 00:29:37,860
Like as long as we have transformer based attention and

635
00:29:37,860 --> 00:29:41,490
nobody has been able to come up with an attention model.

636
00:29:42,255 --> 00:29:43,455
That beats transformers.

637
00:29:43,485 --> 00:29:44,265
They have linear attention.

638
00:29:44,265 --> 00:29:45,105
We have mamba, Jamba.

639
00:29:45,105 --> 00:29:48,075
It's like, yes, you have achieved linear attention, but

640
00:29:48,075 --> 00:29:50,895
you have somehow regressed on everything else, like all

641
00:29:50,895 --> 00:29:53,505
the tasks and the usefulness is not, is not there yet.

642
00:29:53,835 --> 00:29:56,235
And so I think there's this skillset that like if people

643
00:29:56,235 --> 00:29:58,725
are working with ai, you have kind of three options.

644
00:29:58,905 --> 00:30:02,775
You can kind of like yolo out prompts and just be like, cool.

645
00:30:02,775 --> 00:30:05,985
It's not worth trying anything more than just take the smartest model

646
00:30:05,985 --> 00:30:08,715
and do the minimum effort and see what it can do and be happy with that.

647
00:30:09,254 --> 00:30:11,625
Or you can like learn how to push those models

648
00:30:11,625 --> 00:30:15,105
10 to 15% further on specific tasks, right?

649
00:30:15,105 --> 00:30:17,834
And maybe you make them worse at certain tasks and better at other tasks

650
00:30:17,925 --> 00:30:20,804
by the way that you prompt them or the way you like stitch together context

651
00:30:20,804 --> 00:30:24,405
windows in a workflow and then the next frontier model gets, comes out.

652
00:30:24,675 --> 00:30:28,395
And it's better in every way than all of the custom code you wrote.

653
00:30:28,784 --> 00:30:31,995
But those skills of understanding how context windows work and how

654
00:30:31,995 --> 00:30:35,985
attention works and how to get more out of a model today is still

655
00:30:35,985 --> 00:30:38,235
gonna translate and it's gonna enable you with a little bit of work.

656
00:30:38,715 --> 00:30:42,044
But if you're constantly like at the frontier trying to push things

657
00:30:42,044 --> 00:30:45,465
to their limits, if you understand these things and you invest in

658
00:30:45,465 --> 00:30:49,274
this like core intuition about LLMs, you will always be able to

659
00:30:49,274 --> 00:30:53,534
generate a solution that is 10, 15% better, maybe 50% better to

660
00:30:53,534 --> 00:30:57,195
specific task, because you're kind of applying these base concepts.

661
00:30:57,284 --> 00:30:59,835
And so people tell me like, Dex, this is all gonna get bitter, lessened.

662
00:30:59,835 --> 00:31:00,075
Then I'm like.

663
00:31:00,600 --> 00:31:03,600
I think that's how we get to a GI, I mean, SWIX said this too

664
00:31:03,600 --> 00:31:07,410
is like the way we get to a GI is we continually like ignore

665
00:31:07,410 --> 00:31:10,050
the bitter lesson and trying to make these things better.

666
00:31:10,260 --> 00:31:12,120
And that's how we learn what the next generation

667
00:31:12,120 --> 00:31:13,410
of model needs to do over and over again.

668
00:31:14,689 --> 00:31:17,205
That is fractally weird, if that makes sense.

669
00:31:17,445 --> 00:31:18,165
It's a little weird.

670
00:31:18,165 --> 00:31:19,245
We'll see how it plays out.

671
00:31:19,365 --> 00:31:21,885
The, the cynical thing you could say is like, here we are engineers

672
00:31:21,885 --> 00:31:24,885
trying to make sense of this crazy new world that's moving so, so fast

673
00:31:24,885 --> 00:31:27,794
and trying to figure out how we can add value to a thing that's there.

674
00:31:27,794 --> 00:31:31,395
And then Red Khan justifying of like, no, it's worth putting in this effort.

675
00:31:31,395 --> 00:31:33,254
'cause the next models will be smarter, but I'll be able

676
00:31:33,254 --> 00:31:36,014
to make them even smarter over and over again until a GI.

677
00:31:37,100 --> 00:31:38,840
If people wanna learn more about what you're up to and how

678
00:31:38,840 --> 00:31:40,940
you view the world, where's the best place to find you?

679
00:31:41,149 --> 00:31:43,460
If you want the cutting edge stuff, just follow me on Twitter.

680
00:31:43,460 --> 00:31:46,399
I'm Dex Horthy, D-E-X-H-O-R-T-H-Y.

681
00:31:46,580 --> 00:31:48,830
And then, you know, we're building products in this space.

682
00:31:48,830 --> 00:31:50,360
You can go to human layer.dev.

683
00:31:50,389 --> 00:31:51,800
We will be launching soon.

684
00:31:51,800 --> 00:31:52,399
I know, I get.

685
00:31:52,650 --> 00:31:55,590
You can come hang out at our discord, but it's literally just a wall of

686
00:31:55,590 --> 00:31:58,710
angry people asking me like, when the heck are you gonna launch this thing?

687
00:31:58,740 --> 00:32:00,900
We're kind of in private preview with a small group.

688
00:32:00,960 --> 00:32:03,570
We are looking forward to giving it to more people soon.

689
00:32:03,570 --> 00:32:06,060
But if you go to human layer.dev, you can sign up on the list, you'll get the

690
00:32:06,060 --> 00:32:10,050
launch announcements and you can uh, see some of the fun stuff we're hacking on,

691
00:32:10,380 --> 00:32:12,420
and we put links to that in the show notes.

692
00:32:12,870 --> 00:32:13,170
Next.

693
00:32:13,170 --> 00:32:14,940
Thank you so much for taking the time to speak with me.

694
00:32:14,940 --> 00:32:15,630
I appreciate it.

695
00:32:15,990 --> 00:32:18,900
This was a delightful journey around a bunch of places I did

696
00:32:18,900 --> 00:32:21,360
not expect to be talking about, but I had fun the whole way.

697
00:32:21,480 --> 00:32:22,830
That's the entire point.

698
00:32:23,400 --> 00:32:26,550
Dex Horthy, CEO, and co-founder of Human Layer.

699
00:32:26,820 --> 00:32:29,940
I'm Cloud economist Corey Quinn, and this is Screaming In the Cloud.

700
00:32:30,000 --> 00:32:31,920
If you've enjoyed this podcast, please leave a five

701
00:32:31,920 --> 00:32:33,990
star review on your podcast platform of choice.

702
00:32:34,350 --> 00:32:36,990
Whereas if you've hated this episode, please, we have a five star

703
00:32:36,990 --> 00:32:40,020
review on your podcast platform of choice, and then have your model

704
00:32:40,110 --> 00:32:42,750
write a dom comment on that platform, and then we'll just wait

705
00:32:42,750 --> 00:32:45,389
for a smarter model to come along that can dunk on you right back.