1
00:00:00,010 --> 00:00:02,310
My general advice to people is start with Postgres before you

2
00:00:02,310 --> 00:00:04,720
start with any of these specialized databases, whether that's

3
00:00:04,720 --> 00:00:07,330
a document database or any of those other, start with Postgres.

4
00:00:07,550 --> 00:00:09,840
And if then you run into scaling issues or something else, go read

5
00:00:09,840 --> 00:00:12,290
the documentation because you're probably doing it not optimally.

6
00:00:18,259 --> 00:00:21,670
Welcome to Fork Around and Find Out, the podcast about

7
00:00:21,670 --> 00:00:24,790
building, running, and maintaining software and systems.

8
00:00:37,735 --> 00:00:40,305
Welcome to fork around and find out I am Justin

9
00:00:40,305 --> 00:00:42,485
Garrison with me as always is Autumn Nash.

10
00:00:42,505 --> 00:00:45,065
And today on the episode, we have Steve Posty,

11
00:00:45,295 --> 00:00:47,595
principal developer advocate at Voxel 51.

12
00:00:47,785 --> 00:00:50,425
This episode is you're going to learn about how to embed things

13
00:00:50,505 --> 00:00:53,305
in your head for vector databases or something like that.

14
00:00:53,315 --> 00:00:57,104
It's because I saw Steve at all things open in 2024.

15
00:00:57,144 --> 00:01:01,785
I walked into your talk, Steve, about vectors, databases, and embeddings.

16
00:01:02,025 --> 00:01:03,845
And I had been asking every.

17
00:01:04,035 --> 00:01:04,614
Talk.

18
00:01:04,965 --> 00:01:05,895
What is a vector database?

19
00:01:05,895 --> 00:01:06,485
Why do I need it?

20
00:01:06,485 --> 00:01:07,655
And none of them gave me a good answer.

21
00:01:07,655 --> 00:01:09,185
And then you gave great descriptions of it.

22
00:01:09,185 --> 00:01:14,594
So I said, you need to come tell the rest of the people because 2025 everyone

23
00:01:14,594 --> 00:01:18,004
learned they didn't have a vector database and for some reason they needed one.

24
00:01:18,035 --> 00:01:19,015
And so I wanted to have you.

25
00:01:19,134 --> 00:01:22,235
Do you know how many podcast guests he's asked that question?

26
00:01:22,615 --> 00:01:22,905
And it's

27
00:01:23,235 --> 00:01:24,155
been a running joke.

28
00:01:24,205 --> 00:01:24,435
Yes.

29
00:01:24,505 --> 00:01:24,885
Really?

30
00:01:24,995 --> 00:01:26,385
And they just like, they can't explain it.

31
00:01:26,385 --> 00:01:26,645
Well,

32
00:01:26,685 --> 00:01:27,355
most of them can't.

33
00:01:27,395 --> 00:01:27,575
Yeah.

34
00:01:27,575 --> 00:01:30,635
They, they can't explain at least to the, the, the dumb

35
00:01:30,635 --> 00:01:33,985
down simpleness that you did for me, which was great.

36
00:01:34,745 --> 00:01:36,735
This is because people don't care about pedagogy.

37
00:01:37,044 --> 00:01:37,874
That's what I have to say.

38
00:01:38,265 --> 00:01:40,094
I love when people get excited about like,

39
00:01:40,484 --> 00:01:42,265
you just, you missed these whole face.

40
00:01:42,265 --> 00:01:43,905
Like he lit up and he was like, I'm ready.

41
00:01:43,905 --> 00:01:46,135
Call me in for vector database right here.

42
00:01:46,145 --> 00:01:47,035
Like I got this.

43
00:01:47,265 --> 00:01:50,704
I got, this is exciting because I've also talked about it enough.

44
00:01:50,955 --> 00:01:55,565
figured out the, let's start from things, you know, which

45
00:01:55,565 --> 00:01:59,255
is pedagogy to things you don't know and make that bridge

46
00:01:59,255 --> 00:02:02,024
explicit so that you can ground yourself in what you do know.

47
00:02:02,054 --> 00:02:02,414
So that's

48
00:02:02,414 --> 00:02:03,854
why I give talks and do stuff.

49
00:02:03,854 --> 00:02:05,714
That is my favorite moment.

50
00:02:06,135 --> 00:02:09,884
Working in tech is when you take something and you like go down the rabbit

51
00:02:09,884 --> 00:02:12,895
hole and you finally get to the point where you understand it in a way

52
00:02:12,895 --> 00:02:16,445
that you could use an analogy to like, either make it a real world thing

53
00:02:16,445 --> 00:02:20,195
or something that people will understand, but to explain a technically

54
00:02:20,195 --> 00:02:23,184
depth concept, that's like the, like, I love that light bulb moment.

55
00:02:23,185 --> 00:02:23,925
I live for it.

56
00:02:24,355 --> 00:02:26,785
Before we go too deep down the rabbit hole, I want to do a little

57
00:02:26,785 --> 00:02:29,285
bit of housekeeping for the podcast, just for everyone listening in

58
00:02:29,445 --> 00:02:33,685
FYI, we are going to be moving this podcast to a monthly cadence.

59
00:02:34,045 --> 00:02:35,145
Uh, we also.

60
00:02:35,440 --> 00:02:37,410
Are going to be editing the show ourselves, which

61
00:02:37,420 --> 00:02:40,610
means I don't have enough time to bleep out any words.

62
00:02:40,680 --> 00:02:43,000
So there might be a little more, hopefully

63
00:02:43,000 --> 00:02:44,170
the levels are still going to be fine.

64
00:02:44,180 --> 00:02:45,750
There might be a little more distractions.

65
00:02:46,080 --> 00:02:49,860
A lot more ums and likes possibly, but definitely

66
00:02:49,860 --> 00:02:52,170
some more things that would have been bleeped before.

67
00:02:52,170 --> 00:02:54,209
So if you're listening to this show with kids or in a

68
00:02:54,209 --> 00:02:57,399
public space, that's a, you don't want that going on.

69
00:02:57,409 --> 00:03:00,079
Then just FYI, probably this episode going

70
00:03:00,079 --> 00:03:01,699
forward, we'll might have some of that.

71
00:03:01,699 --> 00:03:04,129
And I'll, I'll try to warn some people for the next couple of episodes in

72
00:03:04,130 --> 00:03:07,620
case you miss one, but I just want to put that out there because we have been

73
00:03:07,630 --> 00:03:10,690
for the last three months paying editors, which have done a fantastic job.

74
00:03:10,740 --> 00:03:12,190
Uh, it's just a, a lot of.

75
00:03:12,810 --> 00:03:14,010
We don't have, we can't afford it.

76
00:03:14,549 --> 00:03:16,720
And we're, so we're moving, moving the podcast

77
00:03:16,720 --> 00:03:18,140
a little bit more to make it sustainable.

78
00:03:18,769 --> 00:03:21,540
So you mean everyone doesn't teach their kids like bad

79
00:03:21,540 --> 00:03:24,450
songs and NWA and just tell them to not say the bad words?

80
00:03:24,549 --> 00:03:24,799
No,

81
00:03:25,589 --> 00:03:26,739
I just let them listen to it.

82
00:03:26,739 --> 00:03:27,729
And I don't say anything at all.

83
00:03:27,759 --> 00:03:29,129
I mean, not, not, not NWA.

84
00:03:29,189 --> 00:03:30,939
That's not the kind of rap I like, but in other

85
00:03:30,939 --> 00:03:33,829
songs, I'm more of an old school East coast guy.

86
00:03:33,920 --> 00:03:34,170
So.

87
00:03:35,135 --> 00:03:35,485
East Coast?

88
00:03:35,735 --> 00:03:37,355
You live on the West Coast, Steve!

89
00:03:37,355 --> 00:03:38,115
Yo, I'm from

90
00:03:38,395 --> 00:03:41,595
f ing New York, don't you give me that b h!

91
00:03:41,595 --> 00:03:44,214
Yo, I'm not here for that, I'm not here for that.

92
00:03:44,215 --> 00:03:44,385
I

93
00:03:45,755 --> 00:03:48,544
mean, I am definitely a West Coast 90s,

94
00:03:48,594 --> 00:03:51,104
like, person, but we love you anyway, Steve.

95
00:03:51,115 --> 00:03:51,885
Thanks, that's really nice.

96
00:03:52,185 --> 00:03:55,155
I mean, I like some West Coast, like, like Farside, but

97
00:03:55,155 --> 00:03:57,755
that Farside feels kind of like East Coast though, right?

98
00:03:57,785 --> 00:03:59,505
Like, it's not that same kind of How dare

99
00:03:59,505 --> 00:03:59,665
you?

100
00:03:59,865 --> 00:04:00,865
That's where you were gonna pick.

101
00:04:03,030 --> 00:04:04,040
What's wrong with Forsythe?

102
00:04:04,120 --> 00:04:05,390
Okay, Steve, do you drink coffee?

103
00:04:05,460 --> 00:04:06,140
Generally, no.

104
00:04:06,500 --> 00:04:06,750
Is that

105
00:04:06,750 --> 00:04:07,170
a good or a bad thing?

106
00:04:08,680 --> 00:04:10,430
I'm going to start keeping a hash over here on my wall.

107
00:04:10,430 --> 00:04:12,130
I know, we need a tally on who

108
00:04:12,130 --> 00:04:15,390
uses Mac and Linux, and who drinks coffee.

109
00:04:15,750 --> 00:04:19,319
Linux, Linux, Linux.

110
00:04:19,399 --> 00:04:23,520
Okay, so, the only reason I run Windows is because I play a lot of video games.

111
00:04:23,650 --> 00:04:27,280
And so, video games I have a Steam deck and I can run some

112
00:04:27,280 --> 00:04:30,090
of the games on there, but if you want to run like a triple

113
00:04:30,090 --> 00:04:33,560
A, like Cod or something like that, I need a Windows machine.

114
00:04:34,240 --> 00:04:38,019
If I could be Linux every day in and day out, Linux day in and

115
00:04:38,019 --> 00:04:38,059
day out.

116
00:04:38,059 --> 00:04:40,540
Shout out to Bazite, uh, the Linuxistro,

117
00:04:40,550 --> 00:04:42,870
which I absolutely love it on my Steam deck.

118
00:04:42,940 --> 00:04:46,080
See, that's why I run like Linux on a dev desktop.

119
00:04:46,255 --> 00:04:49,295
And use a Mac because I like it when everything syncs together

120
00:04:49,325 --> 00:04:51,695
and I don't have to think about stuff in my personal life.

121
00:04:52,495 --> 00:04:56,055
Well, so here's the problem for me with that using Mac, Max, my least

122
00:04:56,055 --> 00:04:58,425
favorite operating system, by the way, because it's close enough to

123
00:04:58,425 --> 00:05:01,855
Linux without being Linux, that all my muscle memory is frustration.

124
00:05:02,070 --> 00:05:05,820
That's what I love about it though, because it gives me the continuity,

125
00:05:06,229 --> 00:05:10,640
but I can use all my like Unix commands for Linux commands on Unix.

126
00:05:10,669 --> 00:05:13,859
Yeah, but it's Z shell and stuff like it's like, it's, it's BSD.

127
00:05:13,869 --> 00:05:14,640
So no.

128
00:05:14,710 --> 00:05:15,270
Once you use

129
00:05:15,279 --> 00:05:17,280
sed, if you don't have GNU sed, you're done.

130
00:05:17,309 --> 00:05:18,999
Like, it's just like, I'm, as soon as I

131
00:05:18,999 --> 00:05:20,639
go to sed and on a Mac, I'm like, I'm out.

132
00:05:20,639 --> 00:05:21,399
I'm installing

133
00:05:21,449 --> 00:05:22,169
power shells.

134
00:05:23,080 --> 00:05:23,430
Oh, no, no.

135
00:05:23,430 --> 00:05:24,450
I don't do any of that kind of stuff.

136
00:05:24,460 --> 00:05:24,469
Oh,

137
00:05:24,820 --> 00:05:26,940
I was going to say like, how would that be better?

138
00:05:26,940 --> 00:05:27,090
Like,

139
00:05:27,310 --> 00:05:30,429
no, no, no, no, no, let's, let's be calm and clear here.

140
00:05:30,969 --> 00:05:32,320
He was like, well, too far, too

141
00:05:32,710 --> 00:05:33,799
far, too far, too fast.

142
00:05:34,260 --> 00:05:38,019
No, I mean, I played with PowerShell just a little bit, but like

143
00:05:39,389 --> 00:05:40,330
sound effects.

144
00:05:40,930 --> 00:05:41,330
Sorry.

145
00:05:41,330 --> 00:05:42,940
I mean, PowerShell, I mean, it.

146
00:05:43,609 --> 00:05:45,880
Kudos to Microsoft for at least moving off of DOS.

147
00:05:46,219 --> 00:05:49,180
So, kudos to them for that, but I mostly have WS If I

148
00:05:49,310 --> 00:05:52,229
need to do that kind of stuff on my Windows box, it's WSL.

149
00:05:52,740 --> 00:05:56,229
Which is a way better Linux environment than a Mac is ever going to give you.

150
00:05:56,299 --> 00:05:57,110
And so it's just like

151
00:05:57,110 --> 00:05:58,649
Okay, I don't want to be in this religious war.

152
00:05:58,650 --> 00:05:59,780
I'm clocking out.

153
00:05:59,780 --> 00:06:01,349
I'm just letting you know.

154
00:06:01,799 --> 00:06:02,169
You see,

155
00:06:02,580 --> 00:06:07,099
Justin just leads you into spicy and Steve was not, like, not today, Satan.

156
00:06:07,150 --> 00:06:07,599
Not today.

157
00:06:07,599 --> 00:06:08,260
Today.

158
00:06:08,299 --> 00:06:09,109
I've been down this

159
00:06:09,109 --> 00:06:09,449
road before.

160
00:06:09,449 --> 00:06:11,390
Next, you guys are going to be like, what's your favorite IDE?

161
00:06:11,640 --> 00:06:15,659
And if any, if you guys say anything, don't say it because if you say anything,

162
00:06:15,989 --> 00:06:18,669
yes, if you say anything other than a JetBrains product, I'm hanging up.

163
00:06:18,669 --> 00:06:19,049
Okay.

164
00:06:19,050 --> 00:06:19,569
See?

165
00:06:19,820 --> 00:06:25,350
We, I was, I'm glad Justin, he has to be like the super special

166
00:06:25,370 --> 00:06:29,659
snowflake of IDEs, operating systems and all of the things.

167
00:06:30,020 --> 00:06:32,560
And he doesn't drink coffee and he has to have Dr. Pepper,

168
00:06:32,560 --> 00:06:35,879
which means next week we will be walking up and down the

169
00:06:35,880 --> 00:06:38,670
streets of California, trying to find him Dr. Pepper.

170
00:06:38,699 --> 00:06:39,340
I'll bring my own

171
00:06:39,410 --> 00:06:39,980
Dr. Pepper.

172
00:06:40,010 --> 00:06:40,360
It's fine.

173
00:06:40,660 --> 00:06:43,090
You said that last time when you did not bring enough Dr. Pepper.

174
00:06:43,120 --> 00:06:43,520
That's true.

175
00:06:43,520 --> 00:06:44,090
It wasn't cold.

176
00:06:44,090 --> 00:06:44,980
That was the problem.

177
00:06:45,240 --> 00:06:47,360
Anyway, back to the topic at hand.

178
00:06:47,539 --> 00:06:49,190
I just used NeoVim.

179
00:06:49,310 --> 00:06:53,670
It's mostly because I do remote desktop, like I just,

180
00:06:53,670 --> 00:06:55,080
y'all didn't see Steve's eyebrows.

181
00:06:55,130 --> 00:06:57,220
Like you just missed a gem.

182
00:06:57,240 --> 00:07:00,530
Steve's eyebrows, the judgment and his eyebrows.

183
00:07:00,570 --> 00:07:00,980
Like

184
00:07:01,530 --> 00:07:04,440
I have never given IntelliJ a fair chance.

185
00:07:04,530 --> 00:07:08,085
Like I tried VS code for years.

186
00:07:08,085 --> 00:07:10,819
Also, I'm really bitter that they got rid of Adam and like Adam was

187
00:07:10,819 --> 00:07:11,209
like my

188
00:07:11,209 --> 00:07:11,404
favorite.

189
00:07:11,555 --> 00:07:12,705
And I'm going to bite my tongue on that

190
00:07:12,705 --> 00:07:13,005
one, too.

191
00:07:13,215 --> 00:07:14,695
Now I don't like either one of you, but, um,

192
00:07:17,755 --> 00:07:18,825
the thing with NeoVim.

193
00:07:18,825 --> 00:07:21,165
So look, here's the thing I'm going to say about like those command line

194
00:07:21,165 --> 00:07:23,815
people, like Vim Emacs people, which I don't want to even get into that.

195
00:07:24,095 --> 00:07:24,925
Emacs different.

196
00:07:24,935 --> 00:07:25,945
That's a religion.

197
00:07:25,985 --> 00:07:27,845
Like that is joining a cult.

198
00:07:28,214 --> 00:07:28,604
Okay.

199
00:07:28,655 --> 00:07:30,125
But the thing is, I get it.

200
00:07:30,145 --> 00:07:32,835
Like I've seen some of them, like when I was working on OpenShift.

201
00:07:33,055 --> 00:07:36,215
Back in the day at Red Hat, like there was the lead engineer.

202
00:07:36,285 --> 00:07:38,695
He was just like, I have VI with all these mappings.

203
00:07:38,695 --> 00:07:42,415
And he was just like, and I was like, that looks miserable.

204
00:07:42,415 --> 00:07:42,815
I don't want to

205
00:07:42,815 --> 00:07:46,794
waste.

206
00:07:46,795 --> 00:07:48,715
Ever since I've been a kid, come on Legos.

207
00:07:51,470 --> 00:07:54,830
Can Steve just be in the background of all our podcasts?

208
00:07:54,860 --> 00:07:56,730
And then like, when someone says something, Steve

209
00:07:56,730 --> 00:07:58,850
will just drop in with the like sound effects.

210
00:07:58,850 --> 00:07:59,819
I love it so much.

211
00:07:59,890 --> 00:08:00,920
Is that called Foley's?

212
00:08:00,920 --> 00:08:01,480
Are those Foley's?

213
00:08:01,490 --> 00:08:02,250
Is that what those Foley's are?

214
00:08:02,250 --> 00:08:02,870
Foley engineers.

215
00:08:02,870 --> 00:08:03,120
Yeah.

216
00:08:03,280 --> 00:08:04,209
Yeah, yeah, yeah.

217
00:08:04,490 --> 00:08:04,830
Thunder.

218
00:08:04,830 --> 00:08:08,979
See, look, if tech doesn't work out, Steve

219
00:08:08,989 --> 00:08:11,159
can moonlight as like a Foley engineer.

220
00:08:11,310 --> 00:08:11,710
Totally.

221
00:08:12,160 --> 00:08:13,980
So yeah, we could get stuck like this for a long time.

222
00:08:14,010 --> 00:08:16,610
I don't, I don't know if it's true, but from what I've

223
00:08:16,660 --> 00:08:19,940
already observed of my, of my hosts, it seems like we might.

224
00:08:20,225 --> 00:08:23,915
All three be somewhere on the ADHD

225
00:08:23,915 --> 00:08:26,164
spectrum.

226
00:08:26,165 --> 00:08:31,095
And so, you know, bottom lining is kind of like, uh, moving and jumping.

227
00:08:31,434 --> 00:08:33,414
You didn't have to attack us like that.

228
00:08:33,794 --> 00:08:34,195
I'm attacked

229
00:08:34,394 --> 00:08:34,965
like that every day.

230
00:08:35,235 --> 00:08:37,605
I wasn't a diagnosed until I was 52.

231
00:08:37,614 --> 00:08:39,194
Do you know how hard that it was?

232
00:08:39,294 --> 00:08:40,514
We can talk about that for a while.

233
00:08:40,714 --> 00:08:42,674
We're nine minutes in and this is our eighth topic.

234
00:08:42,685 --> 00:08:42,914
So

235
00:08:44,915 --> 00:08:45,235
none of them.

236
00:08:45,745 --> 00:08:46,625
And it's not linear.

237
00:08:46,640 --> 00:08:47,509
I

238
00:08:47,510 --> 00:08:49,180
think at this point, this is what's expected.

239
00:08:49,189 --> 00:08:52,689
Like if we actually held a topic too long, I think they'd be disappointed.

240
00:08:53,319 --> 00:08:54,069
We get bored.

241
00:08:54,140 --> 00:08:55,849
I know I am the adult usually.

242
00:08:55,849 --> 00:08:57,060
So I am going to try to bring it back.

243
00:08:57,739 --> 00:08:58,089
Let's go back.

244
00:08:58,160 --> 00:08:58,390
Let's go

245
00:08:58,390 --> 00:08:58,439
back.

246
00:08:58,439 --> 00:08:59,199
Let's go a little bit back.

247
00:08:59,885 --> 00:09:04,385
Back to the topic of, of embeddings, vector databases, uh, not IDE stuff.

248
00:09:04,445 --> 00:09:06,205
I was going to make an old man joke, but

249
00:09:06,205 --> 00:09:08,775
like, like you are the old man of the podcast.

250
00:09:09,085 --> 00:09:09,215
I'm

251
00:09:09,215 --> 00:09:09,935
the responsible one.

252
00:09:09,935 --> 00:09:10,224
I know.

253
00:09:11,365 --> 00:09:13,374
Technically though, I bet you I'm older, but let's go.

254
00:09:13,374 --> 00:09:13,575
I'm sure

255
00:09:13,575 --> 00:09:15,714
you're, it's fine, but we won't.

256
00:09:16,845 --> 00:09:17,584
We're going to get stuck.

257
00:09:17,584 --> 00:09:20,415
Next thing's going to happen if we stay focused, because otherwise

258
00:09:20,415 --> 00:09:22,545
we're going to start talking dating advice and stuff like that.

259
00:09:23,685 --> 00:09:25,344
I mean, we'll get

260
00:09:25,344 --> 00:09:26,094
there at some point.

261
00:09:26,324 --> 00:09:26,635
We will,

262
00:09:27,165 --> 00:09:30,525
but so like in good ADHD fashion, that will be our reward.

263
00:09:31,185 --> 00:09:32,994
We will talk about vector databases.

264
00:09:33,025 --> 00:09:35,625
And when we get through that topic, we're allowed,

265
00:09:38,085 --> 00:09:40,015
we'll get to talk about other things.

266
00:09:40,415 --> 00:09:41,655
Okay, let's set a timer.

267
00:09:42,330 --> 00:09:45,090
Even before we started recording, Steve hinted that he had some

268
00:09:45,090 --> 00:09:47,330
parenting advice, which we're going to leave that for the end.

269
00:09:47,480 --> 00:09:48,700
Like, we got to stick to this whole thing.

270
00:09:48,710 --> 00:09:48,820
And then

271
00:09:48,820 --> 00:09:51,040
he like, didn't tell us, like, it was like when someone's

272
00:09:51,040 --> 00:09:53,360
like, Hey, I got something really important to tell you.

273
00:09:53,360 --> 00:09:54,680
And then they're like, I'll tell you later.

274
00:09:54,709 --> 00:09:55,149
And then you just.

275
00:09:55,150 --> 00:09:55,990
Did you

276
00:09:55,999 --> 00:09:57,099
use Pomodoro's?

277
00:09:57,100 --> 00:09:59,219
He did!

278
00:10:00,599 --> 00:10:01,420
It's not even a tomato or anything.

279
00:10:01,469 --> 00:10:03,060
I'm trying to set a 15 minute timer.

280
00:10:03,060 --> 00:10:03,643
Is that

281
00:10:03,643 --> 00:10:03,935
15?

282
00:10:03,935 --> 00:10:04,809
15? I think

283
00:10:05,390 --> 00:10:05,654
that's 15.

284
00:10:05,814 --> 00:10:08,714
It's the adulting of the podcast has been taken over by Steve.

285
00:10:08,735 --> 00:10:10,255
Alright, here we go, it's a timer.

286
00:10:10,255 --> 00:10:13,464
Oh crap, it went to the wrong one.

287
00:10:13,475 --> 00:10:14,895
Okay, there, 15 minutes.

288
00:10:14,895 --> 00:10:15,795
We got 15 minutes.

289
00:10:15,935 --> 00:10:19,785
We have to talk about vector databases and vectors for the next 15 minutes.

290
00:10:19,954 --> 00:10:21,015
Let's start with embeddings.

291
00:10:21,064 --> 00:10:21,365
Okay.

292
00:10:21,595 --> 00:10:22,745
Describe an embedding.

293
00:10:22,934 --> 00:10:24,675
Okay, and we have to start with embeddings,

294
00:10:24,675 --> 00:10:26,055
otherwise vector databases don't make any sense.

295
00:10:26,145 --> 00:10:30,365
Okay, so the idea behind an embedding is there's two types of

296
00:10:30,365 --> 00:10:33,734
data that you put on your computer, structured and unstructured.

297
00:10:33,955 --> 00:10:34,175
Right.

298
00:10:34,445 --> 00:10:37,405
Structured data is something like a database table and Excel spreadsheet.

299
00:10:37,655 --> 00:10:38,305
It's really easy.

300
00:10:38,325 --> 00:10:40,995
And it's like all numbers and little strings and stuff.

301
00:10:40,995 --> 00:10:43,845
So it's really easy for the computer to say, Oh, I know what to do with this.

302
00:10:43,845 --> 00:10:45,465
I know what to do is two bigger than three.

303
00:10:45,684 --> 00:10:46,154
Yes.

304
00:10:47,565 --> 00:10:49,775
It's wrong, but three is bigger than two,

305
00:10:50,265 --> 00:10:52,675
but it can, that's a bad, that's a bad,

306
00:10:52,825 --> 00:10:54,005
that's what the AI would say.

307
00:10:54,045 --> 00:10:54,425
That's right.

308
00:10:54,825 --> 00:10:54,955
Exactly.

309
00:10:54,955 --> 00:10:57,935
Cause it can't tell numbers, but the point being that in normal space,

310
00:10:57,945 --> 00:11:01,035
normal computing stuff we do, it's all small stuff that's structured.

311
00:11:01,045 --> 00:11:03,524
Like the computer knows what to do with it inherently.

312
00:11:04,010 --> 00:11:07,939
At a certain level, I know I'm fudging that, but unstructured data

313
00:11:07,939 --> 00:11:12,949
is things which is doesn't actually have that kind of, this is an

314
00:11:12,959 --> 00:11:16,130
integer, this is, you know, 16 characters, this is whatever that

315
00:11:16,130 --> 00:11:19,660
is, and there is no real easy way to represent that in a computer.

316
00:11:19,670 --> 00:11:20,920
So examples of this are.

317
00:11:21,335 --> 00:11:22,665
Photographs, right?

318
00:11:22,724 --> 00:11:23,435
A photograph.

319
00:11:23,485 --> 00:11:24,025
Yes, it has.

320
00:11:24,385 --> 00:11:26,275
It's made up of numbers, but there there's

321
00:11:26,275 --> 00:11:27,444
no way that you can just look at them.

322
00:11:27,464 --> 00:11:31,255
If I just showed you the matrix, which was the number for each of the pixels,

323
00:11:31,255 --> 00:11:34,244
you can't look at that and go, Oh, yeah, I know exactly what that's showing me.

324
00:11:34,304 --> 00:11:35,925
And there's nothing that computer can do with images.

325
00:11:35,925 --> 00:11:39,945
Like if you say, is this kitten picture cuter than this kitten picture, right?

326
00:11:39,975 --> 00:11:41,165
It can't answer that question.

327
00:11:41,165 --> 00:11:44,275
It has no way of there's no semantics around that because it's unstructured.

328
00:11:44,505 --> 00:11:47,175
They can say, is this image the exact same or very similar?

329
00:11:47,305 --> 00:11:48,264
But other than that, yeah.

330
00:11:48,475 --> 00:11:49,645
Computers don't know what to do with the rest of it.

331
00:11:49,665 --> 00:11:53,225
And other examples of this, I'm going to be talking a lot about vision, probably

332
00:11:53,225 --> 00:11:55,605
mostly around this stuff, just because that's what I do now for my day job.

333
00:11:55,965 --> 00:11:58,595
But this is the exact same thing we do with all

334
00:11:58,595 --> 00:12:02,074
the AI stuff, what people call LLM stuff, right?

335
00:12:02,084 --> 00:12:06,805
So the book, you know, what is the general theme of this book?

336
00:12:06,864 --> 00:12:08,414
A computer can't tell you that inherently.

337
00:12:08,415 --> 00:12:11,165
Can you give us a general like overview of what you do currently,

338
00:12:11,185 --> 00:12:14,075
just so we have like the context, whatever you can talk about, like.

339
00:12:14,450 --> 00:12:15,130
I can talk about everything.

340
00:12:15,530 --> 00:12:16,440
I'm a developer advocate.

341
00:12:16,440 --> 00:12:18,499
So I work at voxel 51.

342
00:12:18,780 --> 00:12:22,410
And so voxel 51 is, it was started by a bunch of computer vision

343
00:12:22,410 --> 00:12:25,389
scientists who are like, we need this platform to do this.

344
00:12:25,420 --> 00:12:28,079
Oh, Hey, a bunch of other people really need this platform to do this.

345
00:12:28,090 --> 00:12:31,410
Cause like working with, for all of these models, all this

346
00:12:31,410 --> 00:12:34,460
computer AI stuff, even if you've taken any statistics,

347
00:12:34,920 --> 00:12:37,010
you understand that the data is the most important thing.

348
00:12:37,444 --> 00:12:37,665
Right?

349
00:12:37,665 --> 00:12:40,775
Like if you have messy data, if your data is not clean, it doesn't matter

350
00:12:40,775 --> 00:12:44,245
how fancy your model gets, it's still going to basically produce garbage.

351
00:12:44,545 --> 00:12:48,185
And so what voxel 51 is, is this piece is like you pick a sensor

352
00:12:48,734 --> 00:12:51,405
and you pick a question, like what's your question that you

353
00:12:51,405 --> 00:12:54,855
want to ask, and then we kind of bind together the whole process

354
00:12:54,855 --> 00:12:57,765
of, you know, tuning your data, cleaning your data, making sure

355
00:12:57,765 --> 00:13:00,835
it's okay, annotating your data, like all the pieces in between.

356
00:13:00,845 --> 00:13:03,165
And then once you've training your model, fine tuning

357
00:13:03,165 --> 00:13:06,024
your model, and then you put it into production, right?

358
00:13:06,025 --> 00:13:06,574
And then we don't.

359
00:13:07,720 --> 00:13:09,220
Put it into production, but then we'll help

360
00:13:09,220 --> 00:13:10,920
monitor it as it comes out of production.

361
00:13:10,930 --> 00:13:13,569
Like, why are these, what is the class of all these errors

362
00:13:13,569 --> 00:13:15,859
we keep getting with our vision model and stuff like that?

363
00:13:15,860 --> 00:13:19,839
And so my job as a developer advocate is to be a bridge, right?

364
00:13:19,839 --> 00:13:23,499
So I'm the bridge to help you understand what voxel

365
00:13:23,510 --> 00:13:25,830
does, which hopefully did an okay job right there.

366
00:13:26,120 --> 00:13:29,100
And then go out into the world and help people understand that.

367
00:13:29,285 --> 00:13:32,065
And understand computer vision more, rising tide lifts all boats.

368
00:13:32,605 --> 00:13:34,895
But then also like when I teach a workshop or when I will talk

369
00:13:34,895 --> 00:13:37,355
to Justin at All Things Open and he says, Hey, blah, blah, blah.

370
00:13:37,355 --> 00:13:41,555
I tried 51 and then I come back and say to the engineers, Hey, you know,

371
00:13:41,555 --> 00:13:43,964
I've talked to a bunch of people and they're having problems with this.

372
00:13:43,964 --> 00:13:45,405
Or I hear a lot of people are doing this.

373
00:13:45,405 --> 00:13:48,295
We should really think about putting this into our project product.

374
00:13:48,365 --> 00:13:51,835
We have an open source project and then we have an enterprise product.

375
00:13:52,865 --> 00:13:53,495
And so that's what I do.

376
00:13:53,495 --> 00:13:54,455
And then I'm also like.

377
00:13:55,109 --> 00:13:57,319
A bridge between sales and engineering and support.

378
00:13:57,319 --> 00:14:01,980
Like we advocates generally are bridges between lots of different groups.

379
00:14:01,980 --> 00:14:02,560
And that's what we do.

380
00:14:02,610 --> 00:14:03,400
So does that make sense?

381
00:14:03,730 --> 00:14:04,890
Did I give it a good explanation for that?

382
00:14:04,970 --> 00:14:05,120
Okay.

383
00:14:05,849 --> 00:14:08,079
You can do embeddings for all sorts of unstructured things.

384
00:14:08,079 --> 00:14:12,499
Like, so like big text, lots of text, PDF document, a book, several

385
00:14:12,499 --> 00:14:15,220
paragraphs, even like for most of us who've worked with computers,

386
00:14:15,220 --> 00:14:18,250
we did that exercise of like comparing two strings, right?

387
00:14:18,260 --> 00:14:19,060
That's pretty easy.

388
00:14:19,420 --> 00:14:23,110
But other than saying this three paragraphs is exactly the

389
00:14:23,110 --> 00:14:25,490
same, or there's this, here's all the characters that are

390
00:14:25,490 --> 00:14:28,580
different, there's not much you can do that has meaning with it

391
00:14:28,620 --> 00:14:31,840
in the same way you can say is two bigger than less than three.

392
00:14:32,425 --> 00:14:33,165
I got it right that time.

393
00:14:33,485 --> 00:14:38,565
So videos, audio, all these things that are generally big and

394
00:14:38,565 --> 00:14:42,774
without we as our brains can process is super easy to process,

395
00:14:43,135 --> 00:14:45,425
but computers don't have an inherent way to process them.

396
00:14:45,444 --> 00:14:47,265
So that's unstructured data, right?

397
00:14:47,265 --> 00:14:48,524
And so for a long time, there wasn't

398
00:14:48,524 --> 00:14:49,994
anything we could do with unstructured data.

399
00:14:50,395 --> 00:14:52,985
And then There are some simple things we could do,

400
00:14:53,245 --> 00:14:55,985
but what comes along are these neural networks, right?

401
00:14:56,335 --> 00:15:00,395
And so basically what an embedding is, is you take that

402
00:15:00,395 --> 00:15:03,285
unstructured data, you shove it into the neural network.

403
00:15:03,395 --> 00:15:08,135
It has already been trained to pick out the word that they use as features.

404
00:15:08,515 --> 00:15:09,729
So important things.

405
00:15:10,330 --> 00:15:12,319
So things that are, another way of saying is

406
00:15:12,319 --> 00:15:14,380
things that have semantic information, right?

407
00:15:14,380 --> 00:15:16,230
That have like meaning you'll hear the word semantics

408
00:15:16,230 --> 00:15:19,180
thrown around a lot called features called thrown a lot.

409
00:15:19,210 --> 00:15:22,120
And at the other side of it, it spits out a vector.

410
00:15:22,699 --> 00:15:25,059
Usually it's like 512.

411
00:15:25,060 --> 00:15:26,510
It could be up to 2000.

412
00:15:26,520 --> 00:15:30,130
It depends on what, but it's a vector of numbers typically between

413
00:15:30,130 --> 00:15:33,019
minus one and positive one, some sort of floats, but there's

414
00:15:33,019 --> 00:15:36,085
all sorts of other ways to do stuff with But what's encoded in

415
00:15:36,085 --> 00:15:41,265
that vector is the semantic meaning of that unstructured text.

416
00:15:41,655 --> 00:15:44,045
And one of the properties of these is that things that are

417
00:15:44,045 --> 00:15:46,605
more similar, unstructured things that are more similar,

418
00:15:47,535 --> 00:15:51,515
should, in that 512 dimensional space, be closer to each other.

419
00:15:51,615 --> 00:15:52,525
So let me, let me try it.

420
00:15:52,865 --> 00:15:55,265
Explain this back in a way that I'm trying to think about it.

421
00:15:55,585 --> 00:15:59,495
If I have a text file and I run it through SHA 256, I

422
00:15:59,495 --> 00:16:01,595
would get a string of letters and numbers out of it.

423
00:16:01,655 --> 00:16:04,664
And if I change one character in that text file,

424
00:16:04,965 --> 00:16:08,294
theoretically, I get a completely different SHA 256 out of it.

425
00:16:08,304 --> 00:16:12,475
Like that's a good hash is different, no matter, you know, with small changes.

426
00:16:12,985 --> 00:16:15,265
And in this case, we want like the opposite of that is

427
00:16:15,265 --> 00:16:18,675
like, I changed one character or one word in that text file.

428
00:16:18,814 --> 00:16:20,555
And I want something that looks kind of like.

429
00:16:20,830 --> 00:16:24,260
The thing that came out before that would be, I could like, Oh, you know what?

430
00:16:24,260 --> 00:16:26,060
Like 80 percent of this is the same, right?

431
00:16:26,060 --> 00:16:26,770
Like that hashes.

432
00:16:26,810 --> 00:16:30,860
And so this, this vector is just a series of 512 numbers or

433
00:16:30,890 --> 00:16:34,500
more or less, but generally 512 numbers between zero and one.

434
00:16:34,720 --> 00:16:37,299
And then that's our quote unquote hash in this example.

435
00:16:37,460 --> 00:16:38,669
And I can say, Oh, look like.

436
00:16:38,880 --> 00:16:40,670
You know, 30 percent of those are pretty close.

437
00:16:40,680 --> 00:16:43,540
So that's probably similar in, in to this

438
00:16:43,560 --> 00:16:45,969
other thing that has a similar characteristics.

439
00:16:46,410 --> 00:16:48,860
In a very broad sense, that's exactly what it is, right?

440
00:16:48,860 --> 00:16:50,200
I mean, one of the things you should be aware

441
00:16:50,200 --> 00:16:52,500
of is embeddings, calculating embeddings.

442
00:16:52,509 --> 00:16:53,690
That's what they call these in vectors.

443
00:16:54,000 --> 00:16:56,130
What these vectors, they call them embeddings.

444
00:16:56,270 --> 00:16:57,349
It is a compression technique.

445
00:16:57,359 --> 00:16:58,859
It's a lossy compression technique, right?

446
00:16:58,890 --> 00:17:04,200
I've taken a big chunk of text and converted it into 512 numbers.

447
00:17:04,200 --> 00:17:07,040
And it's lossy because I can't go exactly

448
00:17:07,050 --> 00:17:09,069
back from that vector to the original text.

449
00:17:09,149 --> 00:17:09,510
text.

450
00:17:09,860 --> 00:17:11,360
I remember in the early days, people were

451
00:17:11,360 --> 00:17:14,720
using hashes as vector embeddings, right?

452
00:17:14,720 --> 00:17:17,579
Like they would just say, I'm going to hash it and that's a vector embedding.

453
00:17:17,829 --> 00:17:20,310
Some other examples of vector embeddings that people

454
00:17:20,319 --> 00:17:23,030
have come across are principal components analysis.

455
00:17:23,060 --> 00:17:24,999
I don't know if any of you did, did you guys, any of you do stats?

456
00:17:25,149 --> 00:17:25,389
Nope.

457
00:17:26,120 --> 00:17:27,489
It's a statistical technique.

458
00:17:27,489 --> 00:17:29,510
Like when you have tabular data numbers and all that

459
00:17:29,510 --> 00:17:32,580
stuff, you do matrix manipulations and then you basically

460
00:17:33,040 --> 00:17:35,830
end up with vectors where each vector is orthogonal.

461
00:17:37,800 --> 00:17:40,550
The first vector captures the most variation in the data.

462
00:17:41,080 --> 00:17:44,580
The second vector captures like the second most

463
00:17:44,640 --> 00:17:47,180
variation in the data orthogonal to the first one.

464
00:17:47,560 --> 00:17:49,359
That's another way of creating an embedding, right?

465
00:17:49,360 --> 00:17:53,339
It's basically taking some data and then reducing the amount of numbers in it.

466
00:17:53,350 --> 00:17:53,509
But

467
00:17:53,509 --> 00:17:58,260
the general idea of an embedding is, is just a way to almost fuzzy match.

468
00:17:58,435 --> 00:17:59,985
Something else that is similar,

469
00:18:00,045 --> 00:18:01,545
but it turns out it's better than that.

470
00:18:02,195 --> 00:18:03,885
So it's not just for matching.

471
00:18:03,885 --> 00:18:04,715
So this is the important part.

472
00:18:04,735 --> 00:18:06,715
So first, let me explain why they're called embeddings.

473
00:18:07,205 --> 00:18:10,694
So the reason, cause I just call them vectors, but the reason why some

474
00:18:10,705 --> 00:18:13,064
people in the machine learning community who need to rename everything

475
00:18:13,065 --> 00:18:15,354
that's already well known in the statistics community is some funky name.

476
00:18:15,355 --> 00:18:18,915
The reason why they call it embeddings is because, and this is related

477
00:18:18,915 --> 00:18:24,370
to that similarity that That vector represents the coordinates of

478
00:18:24,370 --> 00:18:28,300
that piece of unstructured information in a 512 dimensional space.

479
00:18:28,659 --> 00:18:30,820
So you're taking that unstructured information

480
00:18:30,820 --> 00:18:34,550
and embedding it into that 512 dimensional space.

481
00:18:35,070 --> 00:18:35,300
So it's

482
00:18:35,300 --> 00:18:35,870
a point.

483
00:18:36,259 --> 00:18:37,029
Yes, a point.

484
00:18:37,249 --> 00:18:40,094
It is the point coordinate in a 512 dimensional

485
00:18:40,094 --> 00:18:42,455
space, which no one can visualize, right?

486
00:18:42,475 --> 00:18:43,435
But, but

487
00:18:43,554 --> 00:18:45,594
I can visualize three coordinates, right?

488
00:18:45,594 --> 00:18:49,134
And say, okay, if I, this was a three characteristic

489
00:18:49,384 --> 00:18:52,314
vector, I could say it goes in this point, and if I say a

490
00:18:52,314 --> 00:18:56,314
cat is, is, you know, this point, then a leopard might be.

491
00:18:56,314 --> 00:18:57,304
A point close to it.

492
00:18:57,334 --> 00:18:57,664
Yes.

493
00:18:57,844 --> 00:18:59,495
So I mean, you can actually do that, right?

494
00:18:59,554 --> 00:19:02,164
There's no reason you have to spit out a 512 dimensional vector.

495
00:19:02,164 --> 00:19:03,965
You could just spit out a two dimensional vector if you wanted.

496
00:19:04,465 --> 00:19:04,675
Right?

497
00:19:04,675 --> 00:19:06,655
And that would be like through the neural network.

498
00:19:06,655 --> 00:19:08,215
It would be looking for features and doing stuff.

499
00:19:08,215 --> 00:19:10,465
But then at the end it would say, here's the X

500
00:19:10,465 --> 00:19:12,715
and here's the Y. And that's where it ends up.

501
00:19:12,774 --> 00:19:15,835
And so, yes, what should happening in that dimensional space?

502
00:19:15,835 --> 00:19:17,425
And if you wanna just think in two or three

503
00:19:17,425 --> 00:19:19,585
dimensions, CAT should ends up near cats.

504
00:19:19,760 --> 00:19:21,360
And if you have put in a picture of a dog,

505
00:19:21,390 --> 00:19:23,040
the picture of the dog should be farther away.

506
00:19:23,310 --> 00:19:25,910
And if I put in a picture of a motorcycle, that should be far away from

507
00:19:26,030 --> 00:19:29,230
either, farther away from both and somewhere else in that dimensional space.

508
00:19:29,480 --> 00:19:32,600
How do vectors and graphs, graph databases relate?

509
00:19:32,980 --> 00:19:36,529
So beyond my expertise, A, this is, this is one thing you have

510
00:19:36,529 --> 00:19:38,890
to be good at as a dev advocate and saying, I don't really know.

511
00:19:39,305 --> 00:19:40,175
But I'll make stuff up.

512
00:19:40,195 --> 00:19:40,205
I

513
00:19:40,275 --> 00:19:42,285
think everyone in technology should be good at that.

514
00:19:42,295 --> 00:19:42,935
That's real.

515
00:19:42,965 --> 00:19:43,845
That's so real.

516
00:19:44,085 --> 00:19:46,225
It is because a lot of damage could be avoided.

517
00:19:46,325 --> 00:19:48,244
Or at least say I'm making this part up.

518
00:19:48,245 --> 00:19:50,105
I mean, I think something like that.

519
00:19:50,335 --> 00:19:53,565
So what I've seen people actually care about graph

520
00:19:53,565 --> 00:19:56,544
databases in this space a lot as well, because one of the

521
00:19:56,545 --> 00:19:58,885
things that graph databases do is capture relationships.

522
00:19:59,439 --> 00:20:00,360
Between things.

523
00:20:00,479 --> 00:20:04,120
And so when you're looking at relationships between paragraphs and

524
00:20:04,120 --> 00:20:07,290
stuff like that, people want to kind of ground with a graphing database

525
00:20:07,699 --> 00:20:11,189
because it actually had, like, or you could say like this picture

526
00:20:11,189 --> 00:20:14,389
of this cat, this picture is the picture of the parent of that cat.

527
00:20:14,409 --> 00:20:16,679
Or you could say, this is a, you can do stuff with it.

528
00:20:17,149 --> 00:20:19,209
Honestly, though, I haven't played as much with them.

529
00:20:19,610 --> 00:20:22,879
Neo4j does have an embedding extension, right?

530
00:20:22,899 --> 00:20:25,050
So it's, it's obviously something different.

531
00:20:25,479 --> 00:20:28,779
It's just a way to take a piece of unstructured data and turn it into.

532
00:20:29,460 --> 00:20:29,950
Numbers.

533
00:20:30,350 --> 00:20:32,690
It's what it's trying to do is take an unstructured data and

534
00:20:32,690 --> 00:20:35,500
turn it into an array of numbers that captures the most amount

535
00:20:35,500 --> 00:20:39,409
of meaning from the original structure, the unstructured data.

536
00:20:40,019 --> 00:20:40,939
That's its goal.

537
00:20:41,300 --> 00:20:44,490
I think one of the things I've learned about graph databases is just the how

538
00:20:44,500 --> 00:20:49,209
bad they are at things that aren't explicitly put into the database, right?

539
00:20:49,209 --> 00:20:52,590
Like if I have rows in a, you know, SQL database or something,

540
00:20:52,590 --> 00:20:54,460
like I can't just like make up like, Oh, what would be the.

541
00:20:54,660 --> 00:20:57,110
Customer between these two, like it can't, it can't do that.

542
00:20:57,110 --> 00:20:58,930
It can't make up something in between.

543
00:20:59,260 --> 00:21:01,730
And graph databases are similar to that where it's like, Oh, we're

544
00:21:01,730 --> 00:21:04,030
going to capture all this information about the things you put

545
00:21:04,030 --> 00:21:07,170
in here, but we can't, if you give us something new, I can't tell

546
00:21:07,170 --> 00:21:09,879
you like what the relationships might be with those other things.

547
00:21:09,889 --> 00:21:11,329
And that's where it seems like.

548
00:21:11,550 --> 00:21:14,770
Vector databases and embeddings fill that gap.

549
00:21:14,800 --> 00:21:16,320
And that's, that was my understanding.

550
00:21:16,320 --> 00:21:20,000
What was like, Oh, and actually just go ahead and

551
00:21:20,000 --> 00:21:22,110
describe vector database and how they relate to that.

552
00:21:22,310 --> 00:21:24,590
No, I can't yet because there's really, there's a whole bunch of really

553
00:21:24,590 --> 00:21:27,800
fascinating things about embeddings that I really want to cover because

554
00:21:28,149 --> 00:21:30,490
vector bases are interesting, but like embeddings are even more interesting.

555
00:21:31,009 --> 00:21:33,969
Embeddings, you can do algebra with embeddings.

556
00:21:34,620 --> 00:21:37,520
So this part is kind of really fascinating that you can do this.

557
00:21:37,870 --> 00:21:42,800
What you can do is you can say king minus woman.

558
00:21:43,245 --> 00:21:48,375
And it will return queen or you could say things like it

559
00:21:48,385 --> 00:21:51,375
knows that relationship through algebra, like you can add

560
00:21:51,375 --> 00:21:54,165
and subtract embeddings and it will obey that relationship.

561
00:21:54,615 --> 00:21:58,315
I think I might have gotten that exact, not exactly right, but I do

562
00:21:58,315 --> 00:22:01,545
know that you can do algebra like man is to woman as king is to queen.

563
00:22:01,554 --> 00:22:03,445
It knows that relationship, right?

564
00:22:03,445 --> 00:22:06,375
So that you can say things like king minus woman equals queen or.

565
00:22:06,635 --> 00:22:09,535
King one plus woman is equals queen or something like that.

566
00:22:09,535 --> 00:22:12,355
I think it may be plus so king plus woman equals queen.

567
00:22:12,725 --> 00:22:15,054
Have you ever seen it get something really, really wrong?

568
00:22:15,595 --> 00:22:17,484
So here's the thing, of course.

569
00:22:17,744 --> 00:22:19,094
I mean, the algebra could be, of course.

570
00:22:19,385 --> 00:22:22,545
The thing is, remember the part that was in the middle that makes the embedding?

571
00:22:23,350 --> 00:22:24,920
It's a neural network, right?

572
00:22:24,920 --> 00:22:29,070
And everybody, especially all the tech bros, want you to think these

573
00:22:29,070 --> 00:22:32,110
neural networks are actually thinking and they're kind of like humans

574
00:22:32,110 --> 00:22:34,590
and they're smart and they have a soul and they don't want to die.

575
00:22:35,169 --> 00:22:38,010
They're all really basically just really, really, really fancy.

576
00:22:38,555 --> 00:22:39,504
Regression equation.

577
00:22:39,955 --> 00:22:42,105
I mean, I think people really need to understand.

578
00:22:42,135 --> 00:22:43,574
And this is the part that I think is important.

579
00:22:43,764 --> 00:22:46,385
I mean, if you want to go a little bit more sophisticated,

580
00:22:46,725 --> 00:22:49,215
it's a whole bunch of nonlinear regression equations

581
00:22:49,215 --> 00:22:51,924
at each node in the neural network is what it's doing.

582
00:22:51,924 --> 00:22:54,144
So it's this big nonlinear system of equations,

583
00:22:54,405 --> 00:22:56,514
but it's still a statistical fitting to the data.

584
00:22:57,205 --> 00:23:01,965
So what people are calling hallucinations, Now, we're not of embedding space.

585
00:23:01,995 --> 00:23:04,705
Now we're into model, like model and prediction space, but what

586
00:23:04,705 --> 00:23:06,665
they're calling hallucinations, they're not hallucinations.

587
00:23:06,995 --> 00:23:08,435
Hallucinations are what humans do.

588
00:23:08,925 --> 00:23:10,334
It's a model error.

589
00:23:10,475 --> 00:23:11,185
It mispredicted.

590
00:23:11,975 --> 00:23:12,755
That's all it did.

591
00:23:12,765 --> 00:23:16,345
It made an error in predicting and they will always make

592
00:23:16,355 --> 00:23:18,675
errors in predicting because they're a statistical model.

593
00:23:18,685 --> 00:23:22,425
There's no way they can't, the way those they're made, there is.

594
00:23:22,655 --> 00:23:24,995
If anyone tells you that they can get rid of what they're calling

595
00:23:24,995 --> 00:23:29,895
hallucinations by using this technique or that technique wrong

596
00:23:30,015 --> 00:23:33,345
and run because, oh, we did 15 minutes, that was pretty good.

597
00:23:33,855 --> 00:23:34,155
Um,

598
00:23:34,395 --> 00:23:37,415
I'm so proud of this, but I think like that's kind of, I

599
00:23:37,415 --> 00:23:40,565
love that for one you said, Hey, I'm gonna explain this,

600
00:23:40,565 --> 00:23:43,415
but I might get parts of it wrong and I don't know this.

601
00:23:43,415 --> 00:23:45,900
And then just to like, I think people think

602
00:23:45,995 --> 00:23:48,275
that we're against AI and I use AI every day.

603
00:23:48,335 --> 00:23:52,955
I use it to make life easier all the time, but we can't keep.

604
00:23:53,430 --> 00:23:56,820
Selling these products and selling this thing for what it's not.

605
00:23:56,850 --> 00:23:58,909
It doesn't think it doesn't have emotions.

606
00:23:58,919 --> 00:24:02,929
It's really fancy math and a bunch of like regression.

607
00:24:02,929 --> 00:24:03,960
Like you said, you know what I mean?

608
00:24:03,969 --> 00:24:07,759
And if we just use it as for what it is and we understand that, then

609
00:24:07,759 --> 00:24:11,070
we know where to put it and how to use it and how to use it safely,

610
00:24:11,080 --> 00:24:14,819
you know, but I just don't get why we can't just use it and call

611
00:24:14,819 --> 00:24:18,100
it what it is like, what, what is the, such a hard time with that?

612
00:24:18,270 --> 00:24:21,660
Well, I can give you one reason why one is copyright violation.

613
00:24:21,925 --> 00:24:24,235
They're all pretty sure that they're scraping

614
00:24:24,235 --> 00:24:26,925
copyrighted information and not reporting it.

615
00:24:26,965 --> 00:24:31,915
And the thing is, if they can make it portrayed to be a human, like

616
00:24:31,915 --> 00:24:35,214
human like and it's learning, then it's not violating copyright.

617
00:24:35,524 --> 00:24:38,495
Like there's nothing stopping, I'm not violating copyright if I read

618
00:24:38,964 --> 00:24:42,565
the Encyclopedia Britannica and then tell you information I learned

619
00:24:42,575 --> 00:24:44,865
from it because I've taken it and consumed it and I've learned it.

620
00:24:44,875 --> 00:24:46,125
That's not a copyright violation.

621
00:24:46,860 --> 00:24:51,010
If I though, say I'm quoting the Encyclopedia Britannica and I keep quoting

622
00:24:51,010 --> 00:24:53,720
it without referencing it, then I, it's a copyright violation, right?

623
00:24:53,850 --> 00:24:57,609
Every, every 90s CD said the, the bands that influenced them, right?

624
00:24:57,609 --> 00:24:59,160
They're like, you read the like five different

625
00:24:59,170 --> 00:25:00,870
artists that like were the most influential.

626
00:25:00,870 --> 00:25:03,029
And if that wasn't a person that would be copyrighted.

627
00:25:03,660 --> 00:25:05,039
Here's what's wild to me.

628
00:25:05,580 --> 00:25:10,120
It would make it a more reliable source and it would be easier to use, right?

629
00:25:10,120 --> 00:25:10,580
Like, okay.

630
00:25:10,600 --> 00:25:12,810
So when we were all in college and high

631
00:25:12,810 --> 00:25:15,259
school, we had to quote our sources, right?

632
00:25:15,629 --> 00:25:15,959
If.

633
00:25:16,125 --> 00:25:19,014
It gave us an answer and then said, I got it from this source.

634
00:25:19,014 --> 00:25:24,584
Like we figured out how to move the times in multiple other areas before

635
00:25:24,584 --> 00:25:28,125
people used to sell CDs and they used to get money in a certain way.

636
00:25:28,435 --> 00:25:31,725
Then they had to figure out how to make money off of like iTunes.

637
00:25:31,784 --> 00:25:32,554
And you know what I mean?

638
00:25:32,905 --> 00:25:35,995
At what point do you figure out how to pay a little bit back,

639
00:25:36,014 --> 00:25:39,074
but you are giving, you're establishing the trust of knowing,

640
00:25:39,094 --> 00:25:42,945
like, let's say it says, Hey, I was, I think you mean this.

641
00:25:43,195 --> 00:25:44,524
I got it from this source.

642
00:25:44,814 --> 00:25:47,164
It would help us to trust those models and use them

643
00:25:47,164 --> 00:25:49,544
in a better way because it's citing its sources.

644
00:25:49,594 --> 00:25:52,874
And then it would also help our kids and just the future

645
00:25:52,874 --> 00:25:55,684
and everyone that's using these to know where it's going.

646
00:25:55,684 --> 00:25:56,914
Like, it's just wild to me.

647
00:25:56,914 --> 00:25:58,534
This is like a billion dollar industry.

648
00:25:58,544 --> 00:26:03,254
We're willing to sink so much money into it and you could almost legitimize it.

649
00:26:03,415 --> 00:26:04,514
But they're like, no,

650
00:26:05,495 --> 00:26:08,455
because if they legitimize it, they won't get the valuation.

651
00:26:08,544 --> 00:26:09,364
One is copyright.

652
00:26:09,385 --> 00:26:11,985
The other I think is they won't get the valuations they get right.

653
00:26:11,985 --> 00:26:15,974
They keep wanting to say that we've got AGI and we're not even close to AGI.

654
00:26:16,125 --> 00:26:16,255
If

655
00:26:16,255 --> 00:26:17,595
you make something understandable.

656
00:26:18,139 --> 00:26:20,389
Then it was like, Oh, I can't dream what it might be.

657
00:26:20,389 --> 00:26:21,980
It's like, Oh no, I understand how that works now.

658
00:26:22,030 --> 00:26:22,590
I don't really, it's

659
00:26:22,600 --> 00:26:23,070
wild.

660
00:26:23,070 --> 00:26:26,620
How much of the black box meant magic that tech just, you

661
00:26:26,620 --> 00:26:29,040
know, like it's like back when they had it, like, yeah.

662
00:26:29,040 --> 00:26:33,129
Like remember when like people thought the cloud was magic and like, yeah.

663
00:26:33,130 --> 00:26:35,719
And people had to start going around and making the shirts

664
00:26:35,719 --> 00:26:38,020
that said, bro, it's just somebody who's like Linux server.

665
00:26:38,050 --> 00:26:38,149
You know,

666
00:26:39,040 --> 00:26:40,280
the cloud is somebody else's server.

667
00:26:40,840 --> 00:26:45,240
So much of it is like, you'll hear people talk about this new cool thing.

668
00:26:45,240 --> 00:26:47,389
And you're like, that's the same thing we've been doing.

669
00:26:47,499 --> 00:26:48,539
With this guys,

670
00:26:48,739 --> 00:26:49,439
it's just someone else's.

671
00:26:49,519 --> 00:26:50,809
I don't have to manage the servers anymore.

672
00:26:50,809 --> 00:26:53,350
I mean, so I'm violently agreeing with you on, and

673
00:26:53,350 --> 00:26:55,590
I don't think it's going to change, unfortunately.

674
00:26:55,629 --> 00:26:57,529
And I know we're supposed to be talking about embeddings,

675
00:26:57,549 --> 00:26:59,539
but everybody gets to this eventually as well.

676
00:26:59,549 --> 00:27:01,419
Do you want me to explain what an LLM is doing?

677
00:27:01,419 --> 00:27:02,479
Has someone explained that to you?

678
00:27:02,709 --> 00:27:03,159
Not yet.

679
00:27:03,229 --> 00:27:04,360
I have two other questions.

680
00:27:04,489 --> 00:27:04,879
Okay.

681
00:27:05,159 --> 00:27:07,129
Because I want but I want to say it because it's

682
00:27:07,139 --> 00:27:10,129
do that because I really think I would love to contribute

683
00:27:10,139 --> 00:27:13,370
to educating people because it like, you know what I mean?

684
00:27:13,409 --> 00:27:13,689
Like, yes.

685
00:27:14,049 --> 00:27:17,849
And the reason I want to do this is because I for the exact reason you said,

686
00:27:18,120 --> 00:27:22,920
which is people then can understand why it's hallucinating or what it's doing.

687
00:27:22,929 --> 00:27:25,280
And then maybe we can work on tech to make it better.

688
00:27:25,280 --> 00:27:27,170
But that apart, just you can just tell it where

689
00:27:27,170 --> 00:27:28,300
to use that tool.

690
00:27:28,320 --> 00:27:29,400
It is a tool.

691
00:27:29,410 --> 00:27:30,050
Let's use it.

692
00:27:30,090 --> 00:27:31,150
I'm for using it.

693
00:27:31,150 --> 00:27:33,539
Let's just not use it where we're going to hurt people and make them.

694
00:27:33,540 --> 00:27:35,580
Dumb like implementations, like

695
00:27:35,910 --> 00:27:36,600
two questions.

696
00:27:37,110 --> 00:27:38,940
Have you all played infinite craft?

697
00:27:39,440 --> 00:27:40,030
I love that.

698
00:27:40,030 --> 00:27:42,570
This was your question that you stopped him from explaining it.

699
00:27:42,570 --> 00:27:44,200
And this is exactly

700
00:27:44,369 --> 00:27:48,270
what he was saying, where you can do, you can do algebra with.

701
00:27:48,865 --> 00:27:49,455
The embeddings.

702
00:27:49,455 --> 00:27:50,665
And that's literally what this topic.

703
00:27:50,665 --> 00:27:51,069
Okay, dad.

704
00:27:51,069 --> 00:27:52,684
So that, that is exactly what it's like.

705
00:27:52,684 --> 00:27:54,365
I'm going to put it in the show notes.

706
00:27:54,365 --> 00:27:55,585
Fun about Justin though.

707
00:27:55,615 --> 00:27:56,265
Like really?

708
00:27:56,655 --> 00:27:59,894
Like he pretends like he's just like super funny and just

709
00:27:59,895 --> 00:28:03,755
so spicy takes, but underneath the spiciness in that beard,

710
00:28:04,135 --> 00:28:07,445
he was like a math major and sometimes Justin has the most.

711
00:28:07,750 --> 00:28:11,840
Interesting, random, I read these three books and I connected

712
00:28:11,840 --> 00:28:14,500
them like his brain works like a vector and graph database.

713
00:28:14,530 --> 00:28:15,199
It's fire.

714
00:28:17,400 --> 00:28:18,940
So infinite craft, you haven't played it.

715
00:28:18,940 --> 00:28:21,289
It's just is literally just doing algebra on the embeddings.

716
00:28:21,290 --> 00:28:23,050
And it's great because it does exactly that where

717
00:28:23,050 --> 00:28:25,869
you're like woman plus, you know, king is queen, right?

718
00:28:25,869 --> 00:28:28,190
It's like you can do that math and you can try to Like create

719
00:28:28,200 --> 00:28:30,500
everything really fun, but you're not allowed to play it right now.

720
00:28:30,500 --> 00:28:31,130
On the podcast.

721
00:28:31,140 --> 00:28:32,410
You would not say another word.

722
00:28:32,460 --> 00:28:34,570
My kids played it for like three weekends straight.

723
00:28:34,600 --> 00:28:34,890
It was

724
00:28:34,890 --> 00:28:35,210
hilarious.

725
00:28:35,220 --> 00:28:36,370
It's called infinite craft.

726
00:28:36,430 --> 00:28:38,500
I will send the link, but not right now.

727
00:28:38,550 --> 00:28:38,740
Okay.

728
00:28:38,740 --> 00:28:39,080
I don't know.

729
00:28:39,080 --> 00:28:39,920
I just want to know the name, but

730
00:28:39,920 --> 00:28:40,170
Google

731
00:28:40,170 --> 00:28:40,370
it later.

732
00:28:40,529 --> 00:28:40,759
Okay.

733
00:28:40,760 --> 00:28:41,149
Go ahead.

734
00:28:41,300 --> 00:28:41,759
I want it.

735
00:28:41,769 --> 00:28:43,709
So my kids will be busy for three weeks, but

736
00:28:44,960 --> 00:28:50,019
the nodes and, and how things become, how a feature becomes a number.

737
00:28:50,019 --> 00:28:52,209
Basically, here's my thinking of it.

738
00:28:52,209 --> 00:28:55,359
How I think about it in my head is, is we basically take.

739
00:28:55,995 --> 00:28:58,985
A picture of a cat and we chop it up into all the features that we think.

740
00:28:58,985 --> 00:29:01,774
And we put it in like a, you remember the Plinko machines from,

741
00:29:02,054 --> 00:29:05,254
yeah, we're like the ball, the, like the token falls, all the pins.

742
00:29:05,414 --> 00:29:07,204
And I kind of think of all the nodes as those

743
00:29:07,204 --> 00:29:09,485
pins that like put it one way or another.

744
00:29:09,494 --> 00:29:13,425
And in my opinion, like the way I visualize it is a weight

745
00:29:13,764 --> 00:29:16,344
is just making that pin slanted one way or another, right?

746
00:29:16,344 --> 00:29:18,844
It, like it, it bends it to one side or

747
00:29:18,844 --> 00:29:21,165
another and say, Hey, this should be weighted.

748
00:29:21,370 --> 00:29:21,929
a certain way.

749
00:29:21,929 --> 00:29:25,459
I want this thing to fall a certain way more often than not.

750
00:29:25,769 --> 00:29:28,419
And in my head, that's how I kind of visualize it as okay.

751
00:29:28,419 --> 00:29:31,100
A node is all these little nails on a board and the weights

752
00:29:31,100 --> 00:29:33,199
are just, sometimes we just block off entire pieces.

753
00:29:33,219 --> 00:29:35,129
We're like, you're not, we're, we are waiting

754
00:29:35,159 --> 00:29:36,600
out this thing and you're not allowed to do that.

755
00:29:36,600 --> 00:29:40,860
But also when we use the model, we don't hit all the pins.

756
00:29:41,179 --> 00:29:43,999
Like, we're only hitting certain pins for certain features.

757
00:29:44,009 --> 00:29:47,820
So, like, we cut up the picture of the cat into 200 little tokens, and

758
00:29:47,820 --> 00:29:50,810
we're only sending it out parts of the board to get a certain thing out.

759
00:29:51,590 --> 00:29:52,769
How wrong am I?

760
00:29:52,920 --> 00:29:55,389
And do you have a different way of describing nodes to people?

761
00:29:59,710 --> 00:30:02,380
Yeah, I don't have a specific way, but I can

762
00:30:02,400 --> 00:30:04,000
add some, a little bit more flavor on that.

763
00:30:04,000 --> 00:30:05,350
I don't think you're necessarily wrong, right?

764
00:30:05,350 --> 00:30:06,670
I mean, that's a good way of thinking about

765
00:30:06,670 --> 00:30:09,670
it, about you sent in this group of pixels.

766
00:30:09,680 --> 00:30:11,830
Where is it going to end up when it comes out?

767
00:30:11,830 --> 00:30:12,150
Right?

768
00:30:12,410 --> 00:30:15,520
At least in vision models, it's a little bit easier because we can actually

769
00:30:15,699 --> 00:30:18,119
look at what the activate, when you activate the different neurons, you

770
00:30:18,119 --> 00:30:22,550
can see what comes up as you're moving up layers in the neural network for

771
00:30:22,550 --> 00:30:25,730
a fully connected neural network, you're building higher level features.

772
00:30:26,190 --> 00:30:29,180
So like, if you look at the first level of a computer vision neural network,

773
00:30:29,180 --> 00:30:33,605
it's like, Finding straight lines or maybe a gradient or things like that.

774
00:30:33,795 --> 00:30:37,555
And then what happens at the next level is it combines some of

775
00:30:37,555 --> 00:30:41,135
those different neurons together to form, let's say a curve, right?

776
00:30:41,135 --> 00:30:43,585
Cause that's multiple, a curve that has colors going through it or something.

777
00:30:43,855 --> 00:30:45,984
And as you keep going up, you start getting things like noses.

778
00:30:45,985 --> 00:30:47,754
And then actually what you'll start getting is it.

779
00:30:48,185 --> 00:30:50,715
Some of those nodes like a nose and a couple eyes will

780
00:30:50,715 --> 00:30:53,555
combine and then now you've gotten a face feature, right?

781
00:30:53,555 --> 00:30:57,165
And so towards the end you're getting Very close to the things

782
00:30:57,165 --> 00:30:59,694
that you want to actually get out of the model in the end

783
00:31:00,085 --> 00:31:03,305
And that's the big difference between deep networks and wide networks, right?

784
00:31:03,305 --> 00:31:04,974
Cuz like a wide network is just gonna be

785
00:31:04,975 --> 00:31:06,705
like straight lines at different angles.

786
00:31:06,705 --> 00:31:10,385
It can't really do the abstractions necessarily You have to categorize

787
00:31:10,415 --> 00:31:14,965
every a possible thing that you want to become an embedding versus

788
00:31:14,965 --> 00:31:19,075
a deep neural network where like I can build eight straight lines to

789
00:31:19,075 --> 00:31:21,895
make a triangle in a different shape in a nose or something, right?

790
00:31:22,035 --> 00:31:22,355
Yes.

791
00:31:22,395 --> 00:31:22,675
Yes.

792
00:31:22,915 --> 00:31:25,504
And I'm not an expert on that deep level stuff just to let you

793
00:31:25,505 --> 00:31:28,904
know, but everything I also do know about talking to those people

794
00:31:28,904 --> 00:31:32,145
who are at that level is it's kind of magical in the sense, right?

795
00:31:32,195 --> 00:31:33,185
It's black box.

796
00:31:33,285 --> 00:31:36,055
Oh, so this is another other topic we should talk about.

797
00:31:36,305 --> 00:31:40,805
There's two reasons why you build mathematical models or statistical models.

798
00:31:41,405 --> 00:31:42,025
One.

799
00:31:42,515 --> 00:31:45,755
Is I want to actually understand what's happening.

800
00:31:46,270 --> 00:31:52,180
Like, if I increase the temperature, what actually happens to that thing, right?

801
00:31:52,180 --> 00:31:55,860
So I don't care if I actually get the prediction exactly right, what I care most

802
00:31:55,860 --> 00:32:00,180
about is the relationship between temperature and the rate of reaction, right?

803
00:32:00,220 --> 00:32:03,019
I might not be able to predict most accurately, but I really know what

804
00:32:03,019 --> 00:32:05,900
it means to change the temperature and I understand the mechanism.

805
00:32:06,500 --> 00:32:08,900
And then there's another type, which is prediction, like, I don't.

806
00:32:09,470 --> 00:32:13,270
Well, I'm not, I'll, I don't give a about what is going on into the hoods.

807
00:32:13,340 --> 00:32:14,680
I just want to know tomorrow.

808
00:32:14,680 --> 00:32:16,070
What's that stock price going to be?

809
00:32:16,450 --> 00:32:17,779
And should I put money into it or not?

810
00:32:17,809 --> 00:32:19,430
I don't care how you figure that out.

811
00:32:19,689 --> 00:32:23,620
Just, I want you to be as accurate as possible with that stock price, right?

812
00:32:23,689 --> 00:32:25,280
And those are two different ways of modeling.

813
00:32:25,280 --> 00:32:29,509
And to be clear, all this stuff that we're talking about here is all black box.

814
00:32:29,640 --> 00:32:32,080
I don't care about the relationship underneath.

815
00:32:32,140 --> 00:32:35,700
I just want you to give me as accurate as a prediction as possible, right?

816
00:32:35,700 --> 00:32:39,290
So these models are not built for understanding what's the relationship between.

817
00:32:39,749 --> 00:32:44,649
It's just, if I give you this, get really accurate at giving me this, right?

818
00:32:44,840 --> 00:32:46,129
So that's important to understand.

819
00:32:46,129 --> 00:32:48,329
So I think that's part of why we don't know what's going

820
00:32:48,330 --> 00:32:50,299
on underneath the hoods because they haven't been built.

821
00:32:50,660 --> 00:32:54,390
To be the kind of model where you can figure out the relationship.

822
00:32:54,390 --> 00:32:56,120
They're just like, just get really good.

823
00:32:56,350 --> 00:32:59,290
And if I show you this face, is this the person I should let into the computer?

824
00:32:59,910 --> 00:33:00,120
Right.

825
00:33:00,169 --> 00:33:01,450
That's all they really want to be.

826
00:33:01,450 --> 00:33:03,509
I don't care if you've got the nose angle, exactly.

827
00:33:03,510 --> 00:33:03,930
Right.

828
00:33:04,190 --> 00:33:06,200
Whatever you do, which can also be bad though.

829
00:33:06,329 --> 00:33:08,070
You want to hear a funny story about that with computer vision?

830
00:33:08,999 --> 00:33:10,720
I can tell autumn loves funny stories.

831
00:33:10,720 --> 00:33:10,890
Yeah.

832
00:33:10,940 --> 00:33:12,980
She doesn't like to laugh, but she does like funny stories.

833
00:33:13,180 --> 00:33:14,270
I do like to laugh.

834
00:33:14,270 --> 00:33:14,710
I love it.

835
00:33:14,910 --> 00:33:15,940
Autumn, you were laughing your

836
00:33:16,090 --> 00:33:17,000
ass off just two seconds ago.

837
00:33:17,000 --> 00:33:17,460
I'm kidding.

838
00:33:17,660 --> 00:33:17,960
Okay.

839
00:33:18,430 --> 00:33:19,910
What are you like from California or something?

840
00:33:19,910 --> 00:33:21,876
What are

841
00:33:21,876 --> 00:33:23,650
you from California?

842
00:33:23,760 --> 00:33:24,230
We don't laugh.

843
00:33:24,610 --> 00:33:24,980
Sorry.

844
00:33:24,980 --> 00:33:27,480
That's a, that's a little bit aggressive, Stephen.

845
00:33:29,549 --> 00:33:32,630
I'm just saying it's been very, it was a good.

846
00:33:32,650 --> 00:33:36,370
10 to 12 years of cultural acclimation for me

847
00:33:36,370 --> 00:33:37,870
coming from the East Coast to the West Coast.

848
00:33:38,090 --> 00:33:39,350
I can just talk about that later too.

849
00:33:39,350 --> 00:33:42,760
Just because you guys are grouchy and we have tacos, leave us alone.

850
00:33:43,579 --> 00:33:45,460
We have better pizza and we're more direct.

851
00:33:45,510 --> 00:33:49,830
Okay, so back to and bagels too.

852
00:33:50,130 --> 00:33:52,880
Um, but you guys are, but I'll give you tacos and

853
00:33:52,880 --> 00:33:53,389
sushi.

854
00:33:53,390 --> 00:33:54,230
No, tacos and

855
00:33:54,230 --> 00:33:54,750
sushi.

856
00:33:54,800 --> 00:33:56,480
Tacos and sushi are life.

857
00:33:56,850 --> 00:33:58,010
Yeah, I give you, I'm with you on that.

858
00:33:58,120 --> 00:33:58,860
Okay.

859
00:33:59,255 --> 00:34:00,265
You're going to tell us a funny story.

860
00:34:00,305 --> 00:34:02,765
You're going to tell us about how someone, someone found out.

861
00:34:02,765 --> 00:34:03,535
Oh, vision models.

862
00:34:03,665 --> 00:34:04,105
Thank you.

863
00:34:04,875 --> 00:34:05,855
All three of us together.

864
00:34:05,855 --> 00:34:06,645
We make one brain.

865
00:34:06,705 --> 00:34:08,894
The thing was, there were these people who were

866
00:34:08,894 --> 00:34:11,275
like, Oh, let's see if we can tell dogs from wolves.

867
00:34:11,615 --> 00:34:13,755
Like, can we train a model that can actually say, this is a wolf.

868
00:34:13,764 --> 00:34:14,275
This is a dog.

869
00:34:14,285 --> 00:34:15,465
Cause people are always like, Oh, is that dog

870
00:34:15,474 --> 00:34:17,875
part wolf, whatever reason they wanted to do it.

871
00:34:18,255 --> 00:34:19,785
And they got this model and it was.

872
00:34:20,385 --> 00:34:21,195
Awesome.

873
00:34:21,685 --> 00:34:25,475
Like it predicted like with 99 percent accuracy on new data

874
00:34:25,475 --> 00:34:28,875
data sets that like, yes, this is a dog and this is a wolf.

875
00:34:29,525 --> 00:34:30,704
And how can this be?

876
00:34:30,704 --> 00:34:32,155
How can we have created such a great model?

877
00:34:32,525 --> 00:34:33,704
And then when they actually looked at your

878
00:34:33,704 --> 00:34:35,655
data and this is tying back to data again.

879
00:34:36,225 --> 00:34:38,515
All the pictures of wolves had either snow or

880
00:34:38,515 --> 00:34:41,055
forest in the background and very few of the dogs.

881
00:34:41,655 --> 00:34:43,845
Yeah, so all the model had actually been

882
00:34:43,845 --> 00:34:45,955
trained on is if you see a forest or snow.

883
00:34:46,015 --> 00:34:46,385
Yes.

884
00:34:46,395 --> 00:34:48,614
If you see a forest or snow wolf.

885
00:34:48,805 --> 00:34:49,944
If you see if you don't.

886
00:34:50,365 --> 00:34:51,445
Yeah, exact same story.

887
00:34:51,455 --> 00:34:52,024
Someone was training a

888
00:34:52,024 --> 00:34:52,464
model.

889
00:34:52,810 --> 00:34:55,380
On, on, if something was cancerous or not, and

890
00:34:55,380 --> 00:34:56,980
it was literally health, like a health related.

891
00:34:56,980 --> 00:34:59,310
They were training this model and they kept getting it right.

892
00:34:59,490 --> 00:35:01,580
This model is the main exact same thing that, Oh, guess what?

893
00:35:01,760 --> 00:35:05,569
All of the pictures with a ruler in it were cancerous because they

894
00:35:05,569 --> 00:35:07,920
were the ones that were like, and in a doctor's office, it wasn't

895
00:35:07,960 --> 00:35:10,230
like someone just randomly taking a picture or something, right?

896
00:35:10,230 --> 00:35:10,969
We're like, Oh, now we

897
00:35:10,970 --> 00:35:12,380
actually need to know how big that tumor is.

898
00:35:12,380 --> 00:35:13,550
So we got to put a ruler in there.

899
00:35:13,550 --> 00:35:13,960
Oh, look

900
00:35:13,960 --> 00:35:16,100
at, look, this is how we decided it wasn't the model.

901
00:35:16,120 --> 00:35:16,440
And that's

902
00:35:16,480 --> 00:35:17,050
innocuous.

903
00:35:17,050 --> 00:35:17,100
Yeah.

904
00:35:17,190 --> 00:35:18,830
Those examples are kind of innocuous about

905
00:35:18,830 --> 00:35:20,880
that, but this has also helped happened a bunch.

906
00:35:21,040 --> 00:35:22,540
Oh, we can talk about open source AI too.

907
00:35:22,840 --> 00:35:25,560
This is a part of the important part about why

908
00:35:25,790 --> 00:35:28,410
any open source AI needs to include his data.

909
00:35:28,420 --> 00:35:29,619
There was something similar to that.

910
00:35:29,620 --> 00:35:33,800
I think with health and race, I forget which races and which

911
00:35:33,800 --> 00:35:38,329
health item, but it was basically because most one of the races

912
00:35:38,359 --> 00:35:41,810
had more disease in it than the other, if it knew it was that race.

913
00:35:41,915 --> 00:35:43,975
Or something like that, or something like that, it would predict

914
00:35:43,985 --> 00:35:46,415
that that person had the disease, even if they didn't have it,

915
00:35:46,415 --> 00:35:49,545
just because it had learned, this is more likely to happen.

916
00:35:49,545 --> 00:35:51,605
So I'm going to wait more towards that because of

917
00:35:51,605 --> 00:35:53,355
the way the data set was unbalanced or something.

918
00:35:53,355 --> 00:35:55,554
Just think about how that worked in like the mortgage

919
00:35:55,554 --> 00:35:58,054
scandal with Wells Fargo and several companies.

920
00:35:58,055 --> 00:35:58,665
You know what I mean?

921
00:35:59,004 --> 00:35:59,304
Like.

922
00:35:59,645 --> 00:36:02,835
And it's crazy because I argue all the time that, like,

923
00:36:02,875 --> 00:36:05,875
people try to put these fancy applications on top of

924
00:36:05,875 --> 00:36:09,055
bad data and I'm like, how many examples do we need?

925
00:36:09,205 --> 00:36:09,735
How many examples do we need?

926
00:36:10,225 --> 00:36:11,775
But it's ongoing though.

927
00:36:11,775 --> 00:36:14,375
I mean, the Leon data set, this was a very famous and they wanted

928
00:36:14,760 --> 00:36:18,080
You know, open data set and they said it was great and it turns

929
00:36:18,080 --> 00:36:21,250
out because they had actually opened the data, some researchers

930
00:36:21,250 --> 00:36:23,740
had found there was a whole bunch of kiddie porn in it, right?

931
00:36:23,750 --> 00:36:24,220
Like there.

932
00:36:24,430 --> 00:36:26,719
Yeah, there was like a whole bunch of kids because it was such a huge data set.

933
00:36:26,720 --> 00:36:29,530
There was no way for a human to actually go through and check it all.

934
00:36:29,530 --> 00:36:30,809
But it was like this large scraping.

935
00:36:31,330 --> 00:36:32,829
And the only reason we could tell that is because

936
00:36:32,830 --> 00:36:34,540
somebody could actually go and look at data, they said.

937
00:36:34,640 --> 00:36:37,800
But after that, they said we're closing the data, but we fixed it.

938
00:36:38,260 --> 00:36:39,490
And so now there's no way to know.

939
00:36:39,490 --> 00:36:40,984
And so the thing is, yeah.

940
00:36:41,645 --> 00:36:45,825
This is a huge debate in the open source community and in the AI community.

941
00:36:45,845 --> 00:36:47,355
What does it mean to be open source AI?

942
00:36:48,255 --> 00:36:51,385
I've been also doing a lot of research on that because I'm writing

943
00:36:51,385 --> 00:36:54,755
a talk about security and how to make your code secure and like

944
00:36:54,784 --> 00:36:57,864
writing open source like repositories and how to keep them secure.

945
00:36:58,245 --> 00:37:01,685
And it's so interesting that like you would think that You know,

946
00:37:01,685 --> 00:37:05,075
like most enterprise companies are like, it has to be, all of our

947
00:37:05,075 --> 00:37:07,995
code has to be behind closed doors and we don't want anyone to

948
00:37:07,995 --> 00:37:11,405
look at it, but there's also so much evidence showing that a lot

949
00:37:11,405 --> 00:37:14,925
of open source problems have been found because the data is open.

950
00:37:14,925 --> 00:37:17,584
So it's like a really interesting balance of what you want.

951
00:37:17,805 --> 00:37:19,985
To be seen in public with open source and whatnot.

952
00:37:20,305 --> 00:37:21,535
Let me just want to say one thing on this though.

953
00:37:21,695 --> 00:37:22,155
Just one thing.

954
00:37:22,185 --> 00:37:23,085
I won't go deep into it.

955
00:37:23,575 --> 00:37:26,434
I just don't think everybody needs to be open source though.

956
00:37:26,475 --> 00:37:29,655
Like I love open source and I'm a huge, some products

957
00:37:29,664 --> 00:37:31,185
should definitely not be open source

958
00:37:31,244 --> 00:37:33,154
and some data sets should not be open source, right?

959
00:37:33,155 --> 00:37:34,564
Like if your data set.

960
00:37:35,140 --> 00:37:37,810
He's using proprietary, like people's health information

961
00:37:37,810 --> 00:37:39,520
and it was personally identifying for those people.

962
00:37:39,560 --> 00:37:40,240
Please do not like

963
00:37:40,510 --> 00:37:40,820
our government.

964
00:37:41,190 --> 00:37:41,460
Yeah,

965
00:37:41,460 --> 00:37:45,249
plus

966
00:37:47,509 --> 00:37:48,440
this is being recorded.

967
00:37:48,440 --> 00:37:49,340
What are you thinking?

968
00:37:52,539 --> 00:37:54,480
So, but just don't call yourself open source and that's fine.

969
00:37:54,530 --> 00:37:56,350
You want to open the weights up and give us the weights.

970
00:37:56,360 --> 00:37:56,470
Great.

971
00:37:56,659 --> 00:37:56,819
I mean,

972
00:37:57,149 --> 00:37:58,140
they're called open AI.

973
00:37:58,530 --> 00:37:59,550
Like we have a problem.

974
00:37:59,600 --> 00:38:01,750
Oh, that's a very basic problem with them in general.

975
00:38:02,160 --> 00:38:04,050
It's like them and Metta are the two worst.

976
00:38:04,350 --> 00:38:05,322
Metta keeps wanting to They can't

977
00:38:05,322 --> 00:38:05,990
tell what they want.

978
00:38:06,010 --> 00:38:07,050
Do you want to be a non profit?

979
00:38:07,050 --> 00:38:07,613
Do you want to be open?

980
00:38:07,613 --> 00:38:08,269
Do you not want to be open?

981
00:38:08,270 --> 00:38:08,380
They

982
00:38:08,380 --> 00:38:08,820
want profit.

983
00:38:09,040 --> 00:38:09,690
They want it all.

984
00:38:09,840 --> 00:38:10,730
Two recommendations.

985
00:38:10,800 --> 00:38:13,490
Anyone listening, read the book, The Alignment Problem.

986
00:38:13,529 --> 00:38:16,680
It was the best, like it had all the examples of everything

987
00:38:16,690 --> 00:38:19,410
wrong with like aligning your data with the outcomes you want.

988
00:38:19,820 --> 00:38:20,410
Fantastic book.

989
00:38:20,410 --> 00:38:22,970
And also go watch the movie, The Mitchells, Mitchells vs.

990
00:38:22,970 --> 00:38:23,580
The Machines.

991
00:38:23,945 --> 00:38:24,865
I love that

992
00:38:25,175 --> 00:38:25,255
movie!

993
00:38:25,485 --> 00:38:27,815
Oh my whole, the whole problem breaks

994
00:38:27,815 --> 00:38:29,585
down to like them trying to get a vision model

995
00:38:29,585 --> 00:38:32,625
that can detect a dog versus a muffin or whatever.

996
00:38:32,755 --> 00:38:34,305
Like, that's how they beat the, it was great.

997
00:38:34,475 --> 00:38:34,924
Go watch that.

998
00:38:34,925 --> 00:38:35,055
Dude, it

999
00:38:35,055 --> 00:38:38,204
was, that is the cutest movie, like.

1000
00:38:38,314 --> 00:38:38,724
It was great.

1001
00:38:39,744 --> 00:38:40,514
Netflix original.

1002
00:38:40,534 --> 00:38:43,375
I ignore my kids movies like 90 percent of the time.

1003
00:38:43,385 --> 00:38:45,495
If they turn that movie on, I'm like, move out of the way.

1004
00:38:45,545 --> 00:38:46,015
Move.

1005
00:38:46,195 --> 00:38:47,045
Where's the popcorn?

1006
00:38:47,455 --> 00:38:49,385
Also, my favorite part is the tech bro gets

1007
00:38:49,405 --> 00:38:51,725
kidnapped by his own robots and it makes me so happy.

1008
00:38:53,645 --> 00:38:55,955
But Autumn, this, the fact that you just said that,

1009
00:38:55,985 --> 00:38:57,955
makes me think you're showing your kids the wrong movies.

1010
00:38:57,965 --> 00:38:59,994
Do you not show them the whole Miyazaki series?

1011
00:39:00,860 --> 00:39:01,290
What's that?

1012
00:39:02,100 --> 00:39:02,480
What?

1013
00:39:02,930 --> 00:39:05,170
Autumn, you're leaving the podcast right now.

1014
00:39:05,290 --> 00:39:05,880
I'm sorry.

1015
00:39:07,440 --> 00:39:10,330
You have, you have, you have, you have cat

1016
00:39:10,760 --> 00:39:13,289
headphones and you, and you have like, It's

1017
00:39:13,289 --> 00:39:16,450
funny when I tease you guys, but I don't like it when you do it back to me.

1018
00:39:16,450 --> 00:39:16,889
Oh,

1019
00:39:16,890 --> 00:39:17,570
Autumn.

1020
00:39:17,580 --> 00:39:18,890
You have just, Don't

1021
00:39:19,050 --> 00:39:20,509
give me the Steve eyebrows.

1022
00:39:20,510 --> 00:39:21,699
Those

1023
00:39:21,699 --> 00:39:22,199
are for Justin.

1024
00:39:22,779 --> 00:39:22,869
Okay.

1025
00:39:23,810 --> 00:39:24,130
Autumn.

1026
00:39:24,280 --> 00:39:25,950
What I will say, Autumn, is we've, you've,

1027
00:39:26,250 --> 00:39:27,320
Actually, I'll say it in a positive way.

1028
00:39:27,560 --> 00:39:29,210
Autumn, let me introduce you to this great

1029
00:39:29,530 --> 00:39:31,300
new world that you and your kids can share.

1030
00:39:32,050 --> 00:39:33,860
So we'll do it off podcast, but.

1031
00:39:34,159 --> 00:39:37,469
Steve, your expressions are just the top tier.

1032
00:39:37,470 --> 00:39:37,980
Not Princess

1033
00:39:38,010 --> 00:39:38,600
Mononoke.

1034
00:39:38,850 --> 00:39:39,740
Just FYI, don't start there.

1035
00:39:39,750 --> 00:39:40,540
How old are your kids?

1036
00:39:41,230 --> 00:39:41,579
How old

1037
00:39:41,579 --> 00:39:41,659
are your kids?

1038
00:39:42,379 --> 00:39:44,510
Seven, five and 11.

1039
00:39:44,670 --> 00:39:45,020
Okay.

1040
00:39:45,070 --> 00:39:46,690
So the 11 year old may be Princess Mononoke.

1041
00:39:47,460 --> 00:39:49,710
Listen, Miyazaki is a Japanese animator.

1042
00:39:49,995 --> 00:39:56,615
But he writes the most glorious movies ever like the imagination

1043
00:39:56,645 --> 00:39:59,335
They're strong female character, which is important for your boys.

1044
00:39:59,595 --> 00:40:04,235
Almost all of them have really strong female characters Even the leads as well.

1045
00:40:04,504 --> 00:40:08,040
They're just Yes, it's amazing Start.

1046
00:40:08,090 --> 00:40:10,890
My suggestion is start with my neighbor Totoro.

1047
00:40:10,960 --> 00:40:12,180
All the kids will like that one.

1048
00:40:12,180 --> 00:40:13,800
And then your five year old may not get freaked out.

1049
00:40:13,800 --> 00:40:15,280
I started my kids a little too young.

1050
00:40:15,290 --> 00:40:16,150
I'm like spirited away.

1051
00:40:16,150 --> 00:40:16,529
And the other is

1052
00:40:16,730 --> 00:40:17,610
very, it was a little, yeah.

1053
00:40:18,030 --> 00:40:19,870
But to row Kiki's delivery service.

1054
00:40:19,889 --> 00:40:20,839
Fantastic.

1055
00:40:20,929 --> 00:40:23,299
The best thing you need to know about Miyazaki is he had

1056
00:40:23,299 --> 00:40:26,130
a quote about, they showed him AI generated animation

1057
00:40:26,159 --> 00:40:28,590
and his quote was, it was an insult to life itself.

1058
00:40:28,945 --> 00:40:32,655
I think this might be how artists actually finally get paid their worth

1059
00:40:32,685 --> 00:40:35,905
because A. I. is going to ruin art so bad that they're going to be like,

1060
00:40:36,245 --> 00:40:39,815
Now, everything that someone makes for hand will pay a bunch of money for.

1061
00:40:39,815 --> 00:40:42,005
Also, Justin, you did a great job because you

1062
00:40:42,005 --> 00:40:44,635
just found Steve randomly with a vector talk.

1063
00:40:44,635 --> 00:40:47,435
Like, you found all this personality in a vector talk?

1064
00:40:47,845 --> 00:40:49,335
I didn't know he was from New York, though,

1065
00:40:49,415 --> 00:40:50,705
and so it was a little unfortunate there.

1066
00:40:50,705 --> 00:40:53,195
I couldn't get all wins, but we're, we're getting Whoa!

1067
00:40:53,195 --> 00:40:56,434
Whoa!

1068
00:40:56,435 --> 00:40:58,865
We got, we got Wait, but I think the personality

1069
00:40:58,865 --> 00:41:01,895
is because he came from New York, so we got, you know, there you go.

1070
00:41:02,065 --> 00:41:03,725
Bing, bing, oh, sound effect, bing, bing, bing.

1071
00:41:03,905 --> 00:41:04,505
We accept

1072
00:41:04,505 --> 00:41:05,555
you for who you are.

1073
00:41:05,775 --> 00:41:06,345
Thank you.

1074
00:41:06,355 --> 00:41:08,844
We're not just canceling people and firing them just

1075
00:41:08,844 --> 00:41:11,795
because of We got, we got roughly 20 minutes left.

1076
00:41:11,935 --> 00:41:14,244
All right, so look, so once you've calculated these

1077
00:41:14,244 --> 00:41:16,300
embeddings There's all sorts of cool things you can do.

1078
00:41:16,310 --> 00:41:18,630
You can actually use these embeddings to replace your information.

1079
00:41:18,640 --> 00:41:20,880
Like people are now doing analysis just straight

1080
00:41:20,880 --> 00:41:22,290
on embeddings, not the pictures themselves.

1081
00:41:22,290 --> 00:41:22,580
They throw it.

1082
00:41:22,590 --> 00:41:24,919
Once they calculate the embedding, they throw the picture away and they do their

1083
00:41:24,919 --> 00:41:28,230
analysis on the embedding because it's got all the relevant information out.

1084
00:41:28,329 --> 00:41:28,619
Okay.

1085
00:41:28,890 --> 00:41:33,700
So now we've done a whole bunch of these embeddings, every picture you.

1086
00:41:34,070 --> 00:41:36,000
If you send the picture through an embedding

1087
00:41:36,010 --> 00:41:38,430
model, you'll always get the same numbers out.

1088
00:41:38,460 --> 00:41:39,910
Same picture, same numbers always.

1089
00:41:39,910 --> 00:41:40,700
As long as you don't

1090
00:41:40,770 --> 00:41:42,170
change the model, you don't change the picture.

1091
00:41:42,210 --> 00:41:44,350
As long as you don't change the model, you can put any picture you

1092
00:41:44,350 --> 00:41:48,710
want, and all of those images will get mapped into that space, right?

1093
00:41:48,710 --> 00:41:52,294
And so now you've got, let's say, 10, 000 embeddings.

1094
00:41:52,755 --> 00:41:54,305
You're like, what do I do with all these embeddings?

1095
00:41:54,715 --> 00:41:58,305
Like I said before, embeddings that are closer in space should be more similar.

1096
00:41:58,955 --> 00:42:03,455
And so what vector databases do is they allow you to take those embeddings,

1097
00:42:03,925 --> 00:42:09,025
put them into a database, and then create an index so that you can

1098
00:42:09,025 --> 00:42:12,585
quickly search who's close to this embedding that I just passed in.

1099
00:42:12,925 --> 00:42:14,805
So, like, find a picture similar.

1100
00:42:14,975 --> 00:42:17,595
Or find text similar or an abstract similar.

1101
00:42:18,095 --> 00:42:20,525
Take your, your unstructured thing and put it through the exact

1102
00:42:20,525 --> 00:42:23,315
same embedding model that you use to create your other embeddings.

1103
00:42:24,005 --> 00:42:26,785
Get the embedding that comes out, ask the database,

1104
00:42:26,785 --> 00:42:30,224
Hey, what embeddings are closest to this embedding?

1105
00:42:30,354 --> 00:42:32,224
And it'll give you back if you say, I want five,

1106
00:42:32,224 --> 00:42:34,014
it'll give you back the five closest things.

1107
00:42:34,015 --> 00:42:35,794
So

1108
00:42:35,875 --> 00:42:39,225
basically an embedding would be something good if you had

1109
00:42:39,225 --> 00:42:41,515
data that you didn't want to keep and have to worry about

1110
00:42:41,515 --> 00:42:44,205
securing and it could be something, you know what I mean?

1111
00:42:44,205 --> 00:42:49,214
Like, because then you wouldn't have to care or worry as much about like,

1112
00:42:49,214 --> 00:42:52,374
Hey, I have all this data and I don't want to be responsible for it.

1113
00:42:52,375 --> 00:42:53,545
I don't want to pay the storage.

1114
00:42:53,545 --> 00:42:55,805
Could you just keep the embeddings?

1115
00:42:56,280 --> 00:42:57,170
and use those?

1116
00:42:57,530 --> 00:42:59,180
There's a reason I'm not in the security field.

1117
00:42:59,320 --> 00:43:02,800
I will say, tentatively, yes, in the same way that to

1118
00:43:02,800 --> 00:43:06,630
secure the embeddings, but I wonder if that would be less to store.

1119
00:43:06,680 --> 00:43:07,290
It's the same

1120
00:43:07,290 --> 00:43:09,400
as, yeah, it's the same kind of thing though, right?

1121
00:43:09,430 --> 00:43:11,159
Like, it's any of those kinds of things where I

1122
00:43:11,159 --> 00:43:13,459
can compress and retain the essential information.

1123
00:43:13,460 --> 00:43:15,939
I can't, it's not the exact, right?

1124
00:43:15,940 --> 00:43:19,940
So I can't go back to saying, yes, this was a picture, exactly a picture of

1125
00:43:19,940 --> 00:43:23,510
a cat, but I can say, This is a cat like picture that was originally there.

1126
00:43:23,520 --> 00:43:24,690
Yes, you could do that.

1127
00:43:24,930 --> 00:43:28,890
It's a way to compress your data while retaining as opposed to something

1128
00:43:28,890 --> 00:43:32,510
like in a hash, which doesn't have any semantic meaning in it, right?

1129
00:43:32,510 --> 00:43:33,280
It's just a hash.

1130
00:43:33,590 --> 00:43:38,410
This says compress the data, but retain a bunch of the stuff That's important

1131
00:43:38,430 --> 00:43:42,470
to me, which I, we haven't gotten to this yet, but I want to get all the

1132
00:43:42,470 --> 00:43:45,890
way through in vector embeddings, the model you use is actually important

1133
00:43:46,360 --> 00:43:49,560
in terms of what actually comes out in the embedding vector as well, right?

1134
00:43:49,560 --> 00:43:52,660
It's not like I can throw it into any model and I'm

1135
00:43:52,660 --> 00:43:54,629
going to get the exact same kind of features to come out.

1136
00:43:54,659 --> 00:43:56,229
It depends on how that model was trained.

1137
00:43:56,569 --> 00:43:59,690
So when you pick an embedding model, you want to pick an embedding model that.

1138
00:43:59,995 --> 00:44:03,195
Has both the architecture that you would want, but also was trained

1139
00:44:03,195 --> 00:44:06,615
on data similar to the stuff you want to ask questions about, right?

1140
00:44:06,615 --> 00:44:09,785
So like if I trained a model on like, so a very

1141
00:44:09,785 --> 00:44:12,455
famous data set is M. N. I. S. T. Image data set.

1142
00:44:12,655 --> 00:44:14,584
It's those handwritten numbers, which I'm

1143
00:44:14,585 --> 00:44:16,255
sure you've seen over and over again, right?

1144
00:44:16,655 --> 00:44:19,995
So if I train a So, uh, computer vision model on that.

1145
00:44:20,095 --> 00:44:22,275
But I want to actually tell apples from oranges

1146
00:44:22,445 --> 00:44:24,555
or good apples from apples that have rot on them.

1147
00:44:25,225 --> 00:44:28,025
A model trained on that data sets not going to do really well, right?

1148
00:44:28,025 --> 00:44:30,365
Because it never saw those things.

1149
00:44:31,014 --> 00:44:32,674
It only saw handwritten numbers.

1150
00:44:32,685 --> 00:44:36,044
So you really do have to pay attention to the model that you use when

1151
00:44:36,044 --> 00:44:38,465
you create these embeddings, because you can create a whole bunch of

1152
00:44:38,465 --> 00:44:41,705
embeddings, but they might not actually show what you want it to show, right?

1153
00:44:41,715 --> 00:44:44,625
And you can put them all the way through, get

1154
00:44:44,625 --> 00:44:46,265
similarities, but they may not actually be things.

1155
00:44:47,590 --> 00:44:49,600
Because the model doesn't even know that they were, was

1156
00:44:49,600 --> 00:44:51,380
never trained to know that those were similar things.

1157
00:44:51,920 --> 00:44:53,410
A lot of this stuff, if you're going to do

1158
00:44:53,410 --> 00:44:55,720
it internally is a lot of experimentation.

1159
00:44:56,270 --> 00:44:56,509
Right?

1160
00:44:56,509 --> 00:44:57,589
This isn't like writing code.

1161
00:44:57,589 --> 00:44:58,740
And did I get the right output?

1162
00:44:58,759 --> 00:45:00,290
It's let's try to run some of these stuff.

1163
00:45:00,400 --> 00:45:01,240
Let's try this model.

1164
00:45:01,249 --> 00:45:03,030
Let's see if we add this many more parameters.

1165
00:45:03,030 --> 00:45:05,399
How much does it, if we use a bigger model, yes, it's more

1166
00:45:05,399 --> 00:45:07,410
accurate, but it takes this much longer to run and we have to use

1167
00:45:07,410 --> 00:45:10,190
as much GPU and it's like, there's all these things, there's all

1168
00:45:10,190 --> 00:45:12,310
these trade offs and things that you need to pay attention to.

1169
00:45:12,520 --> 00:45:15,520
I'm giving you the simplified version of this whole thing, just so you can.

1170
00:45:15,710 --> 00:45:16,580
Get the framework, but

1171
00:45:16,640 --> 00:45:17,360
no, but that helps.

1172
00:45:17,370 --> 00:45:20,460
And I also think that it's another example of when

1173
00:45:20,460 --> 00:45:23,170
you need the education to use these things, right?

1174
00:45:23,210 --> 00:45:24,720
Because it's effective.

1175
00:45:24,729 --> 00:45:26,810
It sounds like it could be used to have

1176
00:45:26,849 --> 00:45:29,190
new, more efficient ways to do these things.

1177
00:45:29,599 --> 00:45:32,469
But you need this education to make sure you pick the right tool.

1178
00:45:32,469 --> 00:45:33,339
So rad.

1179
00:45:33,399 --> 00:45:33,599
Yeah.

1180
00:45:33,600 --> 00:45:34,810
Cause that's definitely one of the things

1181
00:45:34,810 --> 00:45:36,060
you're going to need to pay attention to.

1182
00:45:36,340 --> 00:45:36,560
Okay.

1183
00:45:36,580 --> 00:45:37,930
So then you put them in the vector database.

1184
00:45:38,130 --> 00:45:39,750
The, one of the reasons you want to put them in a

1185
00:45:39,750 --> 00:45:42,569
vector database rather than just calculating them and.

1186
00:45:43,270 --> 00:45:45,350
If you didn't have the vector database and you wanted to do

1187
00:45:45,350 --> 00:45:48,010
that similarity query, you'd have to keep all your embeddings

1188
00:45:48,020 --> 00:45:52,590
in memory and then actually you can calculate it exactly

1189
00:45:52,620 --> 00:45:55,570
right using an exact thing or you could build an index on top.

1190
00:45:55,940 --> 00:45:59,760
Most databases, the nice two parts about them is their low resource consumption.

1191
00:46:00,325 --> 00:46:00,625
Right.

1192
00:46:00,655 --> 00:46:04,415
Compared to keeping everything in memory and two, they have an index, right?

1193
00:46:04,415 --> 00:46:06,335
So you don't have to brute force, calculate the distance

1194
00:46:06,335 --> 00:46:08,895
between everything all the time, which you can do.

1195
00:46:08,895 --> 00:46:10,175
And that'll give you an exact search.

1196
00:46:10,475 --> 00:46:14,725
Most of the vectors searches that you do now in a database, I use

1197
00:46:14,725 --> 00:46:17,895
Postgres because why not Postgres, you should Postgres everything,

1198
00:46:18,134 --> 00:46:20,815
just like you should use IntelliJ, you should use Postgres because.

1199
00:46:21,615 --> 00:46:23,545
You all know relational already, probably

1200
00:46:23,545 --> 00:46:24,935
you've used some sort of relational database.

1201
00:46:25,115 --> 00:46:26,125
Postgres is open source.

1202
00:46:26,125 --> 00:46:27,245
It's owned by a foundation.

1203
00:46:27,265 --> 00:46:29,345
So nobody's going to be like, well, we closed source that.

1204
00:46:29,684 --> 00:46:30,665
You guys are out of luck now.

1205
00:46:31,025 --> 00:46:33,755
And then the third reason is there's a ton

1206
00:46:33,755 --> 00:46:35,664
of plugins and I'm like a big geospatial guy.

1207
00:46:35,665 --> 00:46:37,675
That's actually my really deep background.

1208
00:46:38,114 --> 00:46:41,034
And there's a, it's like the gold standard for geospatial

1209
00:46:41,035 --> 00:46:43,465
analysis and a database is PostGIS, which is in there.

1210
00:46:43,495 --> 00:46:44,435
Also, I feel like

1211
00:46:44,755 --> 00:46:49,115
you, you want one database, you want performance, you want.

1212
00:46:49,470 --> 00:46:54,210
Something that like, there's a lot of good databases and you need to

1213
00:46:54,210 --> 00:46:58,260
pick the right database for the tool, but relational databases are

1214
00:46:58,260 --> 00:47:02,499
what we teach in school, first of all, non, uh, like we'll not, I won't

1215
00:47:02,499 --> 00:47:06,590
say non relational cause that's like a lie, no SQL databases will.

1216
00:47:06,770 --> 00:47:09,280
Like there, you still have to learn the

1217
00:47:09,280 --> 00:47:11,010
access patterns, which people don't know.

1218
00:47:11,010 --> 00:47:13,070
So even though it might be the better database, they'll

1219
00:47:13,080 --> 00:47:16,480
dump it in and not do the right things that they need to do.

1220
00:47:17,180 --> 00:47:20,510
So if you have to make a choice that you need something that's slightly

1221
00:47:20,539 --> 00:47:25,230
more no SQL and not as much structure, but you want it to perform well.

1222
00:47:25,640 --> 00:47:28,310
And like your team has like a relational database

1223
00:47:28,310 --> 00:47:30,670
background and the fact that you don't want to license.

1224
00:47:31,060 --> 00:47:31,960
Pulled from underneath you.

1225
00:47:31,990 --> 00:47:34,080
Postgres has so many

1226
00:47:34,970 --> 00:47:35,520
data type.

1227
00:47:35,540 --> 00:47:35,830
Yeah.

1228
00:47:35,860 --> 00:47:39,019
And then like, not just that, but B Row half of the relational database

1229
00:47:39,019 --> 00:47:42,040
have Postgres somewhere underneath them with a bunch of cool tuning

1230
00:47:42,040 --> 00:47:44,750
and like an interstate phase, which is why they're so expensive.

1231
00:47:44,990 --> 00:47:45,840
We can get on that whole thing.

1232
00:47:45,840 --> 00:47:48,119
But my general advice to people is start with Postgres before

1233
00:47:48,120 --> 00:47:50,629
you start with any of these specialized databases, whether that's

1234
00:47:50,629 --> 00:47:53,240
a document database or any of those other start with Postgres.

1235
00:47:53,520 --> 00:47:55,040
You might've noticed that engineers in

1236
00:47:55,040 --> 00:47:57,720
general think they need the fastest, biggest.

1237
00:47:57,875 --> 00:47:59,515
Whatever server there's going to be.

1238
00:47:59,515 --> 00:48:02,035
And then you, they get purchased and then they're using

1239
00:48:02,035 --> 00:48:04,125
like 2 percent resource utilization the entire time.

1240
00:48:04,495 --> 00:48:06,944
So start with Postgres, try to, and if then you run

1241
00:48:06,945 --> 00:48:08,594
into scaling issues or something else, go read the

1242
00:48:08,594 --> 00:48:10,995
documentation, cause you're probably doing it not optimally.

1243
00:48:11,485 --> 00:48:13,365
After you've done the documentation and tune your database

1244
00:48:13,365 --> 00:48:15,615
again, if you still can't get it to work, then look at something

1245
00:48:15,615 --> 00:48:17,775
specialized, but just start with Postgres from the beginning.

1246
00:48:17,875 --> 00:48:18,115
Okay.

1247
00:48:18,355 --> 00:48:21,415
So back to the databases and the similarity search, right?

1248
00:48:21,415 --> 00:48:23,105
So those are the reasons you want it in the database.

1249
00:48:23,215 --> 00:48:24,245
Once you've done this.

1250
00:48:24,500 --> 00:48:25,720
The important part with these kinds of

1251
00:48:25,720 --> 00:48:28,220
searches is you're not doing an exact search.

1252
00:48:28,380 --> 00:48:30,550
If most of your questions are exact questions,

1253
00:48:30,810 --> 00:48:32,470
like how much money did we make last year?

1254
00:48:33,250 --> 00:48:35,830
You don't want vector, a data vector database for that, right?

1255
00:48:35,870 --> 00:48:38,380
That's like, that's relational or whatever.

1256
00:48:38,380 --> 00:48:39,440
Yeah, that's analytics

1257
00:48:39,440 --> 00:48:43,359
and relational, which again is like giving people the information to make

1258
00:48:43,359 --> 00:48:46,630
the right tools for the job because they think it's a black box of magic.

1259
00:48:47,060 --> 00:48:47,830
And I'm just like,

1260
00:48:48,470 --> 00:48:49,370
Right, exactly.

1261
00:48:49,390 --> 00:48:51,860
So, so for this one, there's questions like, like.

1262
00:48:52,295 --> 00:48:56,665
What is like this thing or what thing is close to this thing or show me

1263
00:48:56,685 --> 00:49:01,335
other things like this things or clustering or looking for anomalies, right?

1264
00:49:01,385 --> 00:49:04,034
It's not like looking for exact opposites or anything like that.

1265
00:49:04,034 --> 00:49:07,865
You're looking for something that is not similar, like credit card transactions.

1266
00:49:07,895 --> 00:49:08,415
You could do that.

1267
00:49:08,415 --> 00:49:10,595
You could build an embedding around credit card transactions

1268
00:49:10,595 --> 00:49:13,345
that was both like the picture of the receipt and all sorts

1269
00:49:13,345 --> 00:49:15,655
of other things so that it like comes up with some vector.

1270
00:49:15,655 --> 00:49:16,095
I don't know.

1271
00:49:16,345 --> 00:49:18,535
And you can say, look, show me the closest

1272
00:49:18,635 --> 00:49:20,405
credit card transactions for this person.

1273
00:49:20,820 --> 00:49:21,710
To this new one.

1274
00:49:22,130 --> 00:49:25,210
And if the distance between those is really far, you could be

1275
00:49:25,210 --> 00:49:28,740
like, this is probably not one of their credit card transactions.

1276
00:49:28,740 --> 00:49:28,950
Right.

1277
00:49:29,299 --> 00:49:31,260
And so that's what would happen with that.

1278
00:49:31,270 --> 00:49:35,150
I, I think there's different ways to measure the distance as well.

1279
00:49:35,650 --> 00:49:37,830
Um, just so you know, and then they have different properties to them.

1280
00:49:37,840 --> 00:49:39,350
The one that most people use is cosine.

1281
00:49:39,359 --> 00:49:40,510
So it just looks at the angles.

1282
00:49:40,709 --> 00:49:43,200
It does really well because mathematical reasons.

1283
00:49:43,410 --> 00:49:44,980
But you can also do things like Euclidean

1284
00:49:44,980 --> 00:49:46,820
distance, even in 512 dimensional space.

1285
00:49:46,820 --> 00:49:50,090
You know, like, Euclidean distance is like the hypotenuse of a triangle.

1286
00:49:50,460 --> 00:49:54,950
X squared plus b, the square root of the hypotenuse is a squared plus b squared.

1287
00:49:55,860 --> 00:49:57,850
That thing, like straight line distance,

1288
00:49:57,890 --> 00:49:59,620
but you can do it in all sorts of space.

1289
00:49:59,970 --> 00:50:02,149
There's also like, some people call it Manhattan, some other

1290
00:50:02,149 --> 00:50:04,760
people call it Taxi distance, which is like if you had to

1291
00:50:04,760 --> 00:50:07,200
drive around blocks, right, rather than straight lines.

1292
00:50:07,470 --> 00:50:08,210
So those are those guys.

1293
00:50:08,440 --> 00:50:08,960
There's all sorts of

1294
00:50:09,250 --> 00:50:10,820
New Yorker does make you funny.

1295
00:50:11,280 --> 00:50:12,690
Yeah, but I'm not from New York City.

1296
00:50:12,749 --> 00:50:13,399
Thank goodness.

1297
00:50:13,459 --> 00:50:15,600
Um, I don't like I'm from Rockland County.

1298
00:50:16,109 --> 00:50:19,419
Do you know where Westchester is like White Plains that area?

1299
00:50:19,419 --> 00:50:23,440
Like I'm like, it's a commuting suburb, 45 minutes north of New York City.

1300
00:50:23,660 --> 00:50:26,620
But it's in New York, but it's just like a suburb of it.

1301
00:50:26,740 --> 00:50:27,140
Yes.

1302
00:50:27,200 --> 00:50:29,210
And if you're from New York City, I'm from upstate.

1303
00:50:29,780 --> 00:50:31,780
And if you're from upstate, I'm from New York City.

1304
00:50:32,030 --> 00:50:32,250
Right.

1305
00:50:32,250 --> 00:50:36,130
It's like one of those places where our claim to fame of the town that I'm from

1306
00:50:36,170 --> 00:50:40,350
is we were one of the places supposedly that they went to on Sex and the City.

1307
00:50:40,500 --> 00:50:40,970
Ooh.

1308
00:50:41,640 --> 00:50:42,450
I've watched that show.

1309
00:50:42,670 --> 00:50:43,040
You did?

1310
00:50:43,070 --> 00:50:43,290
Okay.

1311
00:50:43,290 --> 00:50:46,190
So you remember Carrie had like the woodworker boyfriend?

1312
00:50:46,740 --> 00:50:47,130
Yeah.

1313
00:50:47,160 --> 00:50:48,209
Not Mr. Big, but the other one.

1314
00:50:48,209 --> 00:50:49,089
I forget what his name was.

1315
00:50:49,410 --> 00:50:51,530
The one who like had a dog and like they bricked

1316
00:50:51,560 --> 00:50:53,170
down the bridge, the apartment between them.

1317
00:50:53,230 --> 00:50:53,830
Oh yeah, yeah.

1318
00:50:54,120 --> 00:50:54,600
Whatever his name was.

1319
00:50:54,910 --> 00:50:55,890
Anyway, he had a cabin.

1320
00:50:56,280 --> 00:50:56,970
He was my favorite.

1321
00:50:57,130 --> 00:50:57,820
Yeah, mine too.

1322
00:50:57,890 --> 00:50:59,230
But he's from Northern Exposure too.

1323
00:50:59,230 --> 00:51:00,054
He was in Northern Exposure.

1324
00:51:00,054 --> 00:51:00,379
Was he

1325
00:51:00,430 --> 00:51:00,900
Aiden?

1326
00:51:01,100 --> 00:51:01,610
But Aiden?

1327
00:51:01,850 --> 00:51:02,450
Yes.

1328
00:51:02,690 --> 00:51:03,050
Yes.

1329
00:51:03,350 --> 00:51:03,800
So a yes.

1330
00:51:03,800 --> 00:51:04,010
He was the

1331
00:51:04,010 --> 00:51:06,710
only one that wasn't toxic in total garbage, but keep going.

1332
00:51:06,950 --> 00:51:07,280
Yes.

1333
00:51:07,370 --> 00:51:10,190
So Aiden had a cabin in the woods, was supposedly

1334
00:51:10,190 --> 00:51:12,140
in the woods, like far out in the middle of nowhere.

1335
00:51:12,260 --> 00:51:15,350
And Carrie goes to visit him and she calls up, I think Miranda

1336
00:51:15,350 --> 00:51:17,120
or somebody who's complaining, I had to go all the way to

1337
00:51:17,120 --> 00:51:19,010
New Jersey to get a cup of coffee, blah, blah, blah, blah.

1338
00:51:19,520 --> 00:51:22,360
The name of the town was suffering, which is where I'm from.

1339
00:51:22,360 --> 00:51:23,440
So it's an easy play on words.

1340
00:51:23,440 --> 00:51:26,440
I'm suff, I'm in suffering, I'm suffering here in suffering.

1341
00:51:26,830 --> 00:51:28,780
But there was Starbucks all over the place and it was.

1342
00:51:29,240 --> 00:51:29,270
They're

1343
00:51:29,270 --> 00:51:32,210
just lying out here in Saxon City, and also, like, maybe

1344
00:51:32,210 --> 00:51:34,150
you had to go all the way out there because he was the only

1345
00:51:34,150 --> 00:51:36,700
one that wasn't toxic and hot garbage, but okay, whatever.

1346
00:51:36,930 --> 00:51:39,020
And actually had a connection to nature and real things.

1347
00:51:39,050 --> 00:51:41,260
Okay, anyway, I think we've covered what

1348
00:51:41,260 --> 00:51:43,840
you use vector databases for in general.

1349
00:51:43,850 --> 00:51:47,340
So their use case is generally around similarity, not exactness.

1350
00:51:47,620 --> 00:51:50,005
And you're using it For things which are unstructured,

1351
00:51:50,015 --> 00:51:53,065
generally, one of the things that you might have heard about

1352
00:51:53,105 --> 00:51:56,155
a lot lately is this thing called creating a RAG system,

1353
00:51:56,485 --> 00:52:01,355
Retrieval Augmented Generation, that requires a vector database.

1354
00:52:01,525 --> 00:52:04,294
What the idea between a RAG is Could

1355
00:52:04,295 --> 00:52:07,274
you explain RAG versus, was it A G?

1356
00:52:07,335 --> 00:52:09,015
What is it a IG or a Oh, ag agentic.

1357
00:52:10,035 --> 00:52:10,455
Yes.

1358
00:52:10,755 --> 00:52:11,295
Between the two.

1359
00:52:12,505 --> 00:52:12,655
Okay.

1360
00:52:12,655 --> 00:52:14,425
So here's my under, I'll just give you a quick on Agen

1361
00:52:14,425 --> 00:52:16,375
'cause I haven't really worked on agen systems that much.

1362
00:52:16,555 --> 00:52:21,295
Agen to me is a fancy what, from what I've seen, is a fancy way of written.

1363
00:52:21,295 --> 00:52:25,465
We've written a set of microservices using LLMs is basically what it is.

1364
00:52:25,465 --> 00:52:26,190
So when you say it's a Damon

1365
00:52:26,815 --> 00:52:30,290
Yeah, it's a Damon that runs a Damon that does AI stuff and it's like, oh, this.

1366
00:52:30,925 --> 00:52:34,325
You send in a request to the agentic system, which is probably

1367
00:52:34,325 --> 00:52:38,505
going to have a specialized LLM that's going to say, Oh, given that

1368
00:52:38,505 --> 00:52:42,365
type of question, this is the kind of LLM you want to answer this.

1369
00:52:42,374 --> 00:52:45,035
So it's going to send that query off to that other LLM.

1370
00:52:45,625 --> 00:52:48,525
And if it's a multi step problem, we'll say, okay, I'll get that answer first.

1371
00:52:49,090 --> 00:52:51,770
It'll come back to that same agent again, and then it'll send it off to

1372
00:52:51,770 --> 00:52:55,680
another agent, another microservice that handles the next part of the question.

1373
00:52:55,710 --> 00:52:57,950
And then it reassembles the answer and sends it back.

1374
00:52:58,570 --> 00:53:00,750
That's the general idea of a Gentek.

1375
00:53:00,780 --> 00:53:03,539
Maybe there's something more exciting, but given the history of how

1376
00:53:03,840 --> 00:53:07,289
people have bragged about stuff, I'm basically thinking it's mostly

1377
00:53:07,289 --> 00:53:11,180
like a microservice architecture for connecting LLMs together.

1378
00:53:11,410 --> 00:53:12,450
That's an agentic system, right?

1379
00:53:12,450 --> 00:53:15,480
So I'm something smart enough or has some logic to say, put these together and.

1380
00:53:15,885 --> 00:53:19,115
Bring it back into a coherent whole rag is completely different.

1381
00:53:19,235 --> 00:53:23,175
The other option to rag is fine tuning one of your models, right?

1382
00:53:23,235 --> 00:53:28,754
So fine tuning means I take somebody's model, like the llama model, and I

1383
00:53:28,895 --> 00:53:33,165
chop off the last layer, or I open up the weights again, and then I give it

1384
00:53:33,175 --> 00:53:38,665
my own data and I retune parts of the model to be more focused on my data.

1385
00:53:39,090 --> 00:53:42,730
My questions, the stuff on, so for example,

1386
00:53:42,820 --> 00:53:47,030
Justin is going to build an IOT specific LLM.

1387
00:53:47,430 --> 00:53:48,520
Is that, is it all that stuff in the

1388
00:53:48,520 --> 00:53:51,029
background for IOT or is it more for Woodshop?

1389
00:53:51,260 --> 00:53:52,720
It's, it's all different things.

1390
00:53:52,770 --> 00:53:53,449
Okay, sure.

1391
00:53:54,100 --> 00:53:55,729
Justin is going to build a It's

1392
00:53:55,730 --> 00:53:57,119
called ADHD Hobbies.

1393
00:53:57,199 --> 00:53:58,049
Exactly.

1394
00:54:00,050 --> 00:54:01,130
It's a pegboard of stuff.

1395
00:54:01,320 --> 00:54:01,530
It's a

1396
00:54:01,530 --> 00:54:04,680
pegboard of This was January of 2023.

1397
00:54:04,730 --> 00:54:07,229
This is So he's going to build a maker.

1398
00:54:08,315 --> 00:54:10,455
Like a maker LLM that answers all sorts of questions

1399
00:54:10,455 --> 00:54:12,625
around makers, not just the general internet, right?

1400
00:54:13,235 --> 00:54:15,485
And so what he would do is he'd have a whole bunch of

1401
00:54:15,485 --> 00:54:18,725
question and answer and text pairings around maker stuff.

1402
00:54:18,755 --> 00:54:21,485
Like you take maker magazine and scan that and do some other

1403
00:54:21,485 --> 00:54:24,864
stuff, scrape some maker site specifically, and then open up

1404
00:54:24,865 --> 00:54:28,435
the last couple of layers of the model and retrain or do that

1405
00:54:28,435 --> 00:54:31,155
same training process basically again, but with his new data.

1406
00:54:31,665 --> 00:54:34,315
So the model gets all this really good information

1407
00:54:34,315 --> 00:54:37,034
it learned early on in the earlier layers.

1408
00:54:37,330 --> 00:54:40,230
We're leaving that kind of structure, like how sentences are

1409
00:54:40,230 --> 00:54:43,420
put together or how, what are features and images I care about.

1410
00:54:43,800 --> 00:54:47,170
But the later layers are now getting tuned specifically

1411
00:54:47,610 --> 00:54:49,709
to the questions and the data that he's giving it.

1412
00:54:49,990 --> 00:54:51,279
And that's called a fine tune model.

1413
00:54:51,350 --> 00:54:52,869
And that's pretty cool.

1414
00:54:52,919 --> 00:54:54,249
And it's fun once you do it.

1415
00:54:54,329 --> 00:54:57,690
What RAG is, is I don't have an ML machine learning staff.

1416
00:54:58,230 --> 00:55:00,020
really there at my company.

1417
00:55:00,210 --> 00:55:01,970
I don't have that kind of hardware.

1418
00:55:02,550 --> 00:55:05,260
So I, but I want to see if I can take advantage of some of this stuff.

1419
00:55:05,750 --> 00:55:09,639
And the problem with using something like OpenAI or Cloud or any of these really

1420
00:55:09,690 --> 00:55:12,870
large foundational models is they were trained on the whole internet, right?

1421
00:55:12,870 --> 00:55:14,770
Or some huge corpus of text.

1422
00:55:15,140 --> 00:55:18,780
So the example I really wanted to build that was perfect for RAG, one of

1423
00:55:18,780 --> 00:55:22,830
my friends when I was at VMware started using OpenAI as a dungeon master.

1424
00:55:22,990 --> 00:55:25,060
And so he would be like, we're doing this campaign.

1425
00:55:25,650 --> 00:55:28,770
These are the use cases that we need to use open AI for.

1426
00:55:28,800 --> 00:55:29,830
Like, you know what I mean?

1427
00:55:30,200 --> 00:55:31,630
It's not going to hurt anybody.

1428
00:55:31,640 --> 00:55:35,470
It's not responsible for any crazy decisions, but we're going to be efficient

1429
00:55:35,470 --> 00:55:39,069
and let humans live their best lives and spend more time doing fun stuff.

1430
00:55:39,459 --> 00:55:42,520
When you notice when he was doing it, it kept forgetting

1431
00:55:42,530 --> 00:55:44,960
things and it also didn't know all the rules, right?

1432
00:55:44,960 --> 00:55:46,420
So he had to keep saying that was close.

1433
00:55:46,650 --> 00:55:49,130
But remember kobolds will attack usually like

1434
00:55:49,130 --> 00:55:50,850
this and then their hit dice is like this.

1435
00:55:51,315 --> 00:55:53,775
And then it would generate a random number and do stuff, but he had to keep,

1436
00:55:53,885 --> 00:55:57,405
it wasn't really getting it, which is a perfect system for what RAG is.

1437
00:55:57,415 --> 00:56:01,184
So what RAG, it stands for retrieval augmented generation.

1438
00:56:01,375 --> 00:56:03,785
We were working on a demo and I've actually calculated the embeddings.

1439
00:56:03,815 --> 00:56:06,424
I just, I actually had to get a job rather than doing consulting.

1440
00:56:06,774 --> 00:56:08,764
Um, and I actually had to get paid rather

1441
00:56:08,765 --> 00:56:09,945
than just doing something that was fun.

1442
00:56:10,234 --> 00:56:11,775
But, so what I did is we found all the

1443
00:56:11,775 --> 00:56:14,535
fifth edition D and D manuals in Markdown.

1444
00:56:14,775 --> 00:56:17,875
And so what I did is I took all the manuals and created embeddings

1445
00:56:18,235 --> 00:56:22,815
for every markdown header section from the D& D manuals.

1446
00:56:22,815 --> 00:56:24,925
Like we had the player's guide, the dungeon master's

1447
00:56:24,925 --> 00:56:26,955
guide, a couple of the monster manuals, a couple of

1448
00:56:26,955 --> 00:56:30,425
campaigns, and I made embeddings for all of them, right?

1449
00:56:30,484 --> 00:56:32,465
And then I stuck those embeddings in a database.

1450
00:56:33,084 --> 00:56:38,430
And so now what we can do is we can when the user says, Hey, I

1451
00:56:38,440 --> 00:56:43,340
open the door to dot dot dot dot before we send that to open a I.

1452
00:56:43,880 --> 00:56:45,240
We intercept that query.

1453
00:56:45,430 --> 00:56:49,439
We take that query, create an embedding for that query, then

1454
00:56:49,449 --> 00:56:54,050
use that to query our database for information in all the

1455
00:56:54,050 --> 00:56:56,880
guides that is similar to the query the user was asking.

1456
00:56:57,119 --> 00:57:00,130
And now when we send that query on to open a I. We say, Okay.

1457
00:57:00,615 --> 00:57:01,535
Here's the original query.

1458
00:57:01,545 --> 00:57:04,985
And then underneath it, we say something like for context, and then we

1459
00:57:04,985 --> 00:57:09,164
include the information that came from the database that was tied to you.

1460
00:57:09,165 --> 00:57:11,585
Like we stored the original text along with the embeddings,

1461
00:57:11,914 --> 00:57:14,855
and we include that original text now with the user's query.

1462
00:57:14,954 --> 00:57:17,955
Now, when we send that to open AI, open AI says, Oh, right.

1463
00:57:18,155 --> 00:57:19,545
We're not talking about the whole internet.

1464
00:57:19,905 --> 00:57:22,445
We're talking about this very specific stuff.

1465
00:57:22,805 --> 00:57:26,075
And I'm going to focus my answer particular to, and I, and I

1466
00:57:26,085 --> 00:57:28,465
have more recent information, like you're talking about kobolds.

1467
00:57:28,495 --> 00:57:31,055
I maybe crawled a couple of things on kobolds, but

1468
00:57:31,055 --> 00:57:33,505
now you're giving me like paragraphs about kobolds.

1469
00:57:33,754 --> 00:57:36,915
So I know more about kobolds to give my answer back to you.

1470
00:57:37,265 --> 00:57:37,915
So like.

1471
00:57:38,815 --> 00:57:41,865
Embedding is like extra information kind of, is it

1472
00:57:41,865 --> 00:57:44,015
like a text file or like what, like just basically

1473
00:57:44,035 --> 00:57:48,145
extra information to help kind of give it more context.

1474
00:57:48,234 --> 00:57:52,084
The embeddings used in this case is you have this database of really

1475
00:57:52,085 --> 00:57:55,375
relevant information to the user's question that the open AI doesn't

1476
00:57:55,395 --> 00:57:58,685
have access, either doesn't have access to or it's not really focused on.

1477
00:57:59,025 --> 00:58:01,835
So you want to find the most relevant information in your

1478
00:58:01,835 --> 00:58:05,455
database to help give more flavor to the user's query.

1479
00:58:06,140 --> 00:58:06,470
Okay.

1480
00:58:06,470 --> 00:58:10,430
So does it then just use your database or does it use your database

1481
00:58:10,450 --> 00:58:13,880
and the whole internet, but just uses your database to enrich the data.

1482
00:58:14,210 --> 00:58:16,780
And now we get to what is an LLM actually doing.

1483
00:58:17,109 --> 00:58:21,019
So this is why you need to understand what an LLM does right at its essence.

1484
00:58:21,109 --> 00:58:21,980
And if you.

1485
00:58:22,360 --> 00:58:24,980
Substitute time for position in a sentence.

1486
00:58:25,670 --> 00:58:28,960
An LLM is an autoregressive time series model.

1487
00:58:29,120 --> 00:58:30,320
This is the exact same thing.

1488
00:58:30,330 --> 00:58:33,610
So what happens is you send in your query.

1489
00:58:34,280 --> 00:58:35,620
So here's our query, right?

1490
00:58:35,620 --> 00:58:37,119
It's this chunk of text over here.

1491
00:58:37,330 --> 00:58:42,679
What we ask, ask the model to do is to start predicting words.

1492
00:58:42,759 --> 00:58:44,259
So I don't know how they come up with their first word.

1493
00:58:44,259 --> 00:58:45,479
They've got some sort of magic thing where

1494
00:58:45,479 --> 00:58:46,929
they come up with their first word, right?

1495
00:58:47,610 --> 00:58:50,160
That's going to answer your question, but they come up one word at a time.

1496
00:58:50,160 --> 00:58:51,480
They don't predict the whole sentence at once.

1497
00:58:51,480 --> 00:58:53,200
That's why you see it scroll across.

1498
00:58:53,200 --> 00:58:55,390
Sometimes takes that first word.

1499
00:58:55,875 --> 00:58:57,465
Let's just put that first word in, the.

1500
00:58:57,645 --> 00:58:59,665
Let's just say for some reason it came up with the, I don't know how it did it.

1501
00:59:00,585 --> 00:59:01,125
It's got the.

1502
00:59:01,135 --> 00:59:04,085
Then it says, okay, I got it, and it's time for me to predict the next word.

1503
00:59:04,835 --> 00:59:08,265
I'm going to predict this next word based upon the, the word

1504
00:59:08,265 --> 00:59:13,354
before, and also constrained by the stuff that you passed in.

1505
00:59:13,605 --> 00:59:13,935
Right.

1506
00:59:13,985 --> 00:59:17,255
So when they trained the LLM, it learned all these relationships

1507
00:59:17,255 --> 00:59:21,025
between words and how they appear in sentences and how which

1508
00:59:21,025 --> 00:59:23,454
words appear more often closer to each other and all that stuff.

1509
00:59:23,974 --> 00:59:27,085
So what it's doing is it's saying, okay, given these words that

1510
00:59:27,085 --> 00:59:29,375
you told me in your query and this word I predicted, I'm going

1511
00:59:29,375 --> 00:59:32,925
to predict this is the most, this is the likeliest next word.

1512
00:59:33,035 --> 00:59:35,415
And then the next word is given.

1513
00:59:35,465 --> 00:59:37,925
It can be given these two words or just the word before or

1514
00:59:37,925 --> 00:59:41,065
whatever, or the rest of the sentence, given what you passed

1515
00:59:41,065 --> 00:59:46,220
in and those words, What is the next most likely word?

1516
00:59:46,920 --> 00:59:49,280
And it just keeps doing that over and over again.

1517
00:59:49,280 --> 00:59:52,670
It just keeps predicting words one at a time as it goes down a chain.

1518
00:59:53,070 --> 00:59:54,410
Can I ask a quick question?

1519
00:59:54,600 --> 00:59:55,070
Of course.

1520
00:59:55,339 --> 00:59:59,370
Why does it use the compute power or, you know, space to think, to

1521
00:59:59,370 --> 01:00:02,540
predict the next word, instead of just taking in all the information

1522
01:00:02,570 --> 01:00:05,790
and then looking through whatever data to do it, like, what is the

1523
01:00:05,790 --> 01:00:08,720
purpose of the prediction instead of just taking it all in at once?

1524
01:00:09,040 --> 01:00:11,460
No, it's taken all these, these words in at once.

1525
01:00:12,220 --> 01:00:16,550
The ones you passed in, and it's using those words plus the words that came

1526
01:00:16,550 --> 01:00:20,610
before to constrain the probability of whatever the next word is going to be.

1527
01:00:21,510 --> 01:00:22,910
Oh, so the next word of the output?

1528
01:00:23,210 --> 01:00:24,459
Yes, of the output.

1529
01:00:24,469 --> 01:00:25,270
Okay, the output, got it.

1530
01:00:25,480 --> 01:00:25,970
Sorry, sorry,

1531
01:00:25,970 --> 01:00:26,219
sorry.

1532
01:00:26,219 --> 01:00:29,635
So it's going to be like the Cobalt and then it's going to put what's

1533
01:00:29,635 --> 01:00:31,975
the next word that's going to come after cobalt given that this is

1534
01:00:31,985 --> 01:00:35,955
the information the question you asked me and all the relationships

1535
01:00:35,955 --> 01:00:39,435
I've learned in the word and the first two words were the cobalt

1536
01:00:39,575 --> 01:00:43,055
what's the next most likely word in that sentence does that make sense

1537
01:00:43,445 --> 01:00:45,965
that does make sense and it makes sense the way that it's

1538
01:00:46,054 --> 01:00:49,084
it just constantly thinks as it spits out more information

1539
01:00:49,084 --> 01:00:51,645
because it's predicting as it goes on so that makes sense.

1540
01:00:52,490 --> 01:00:56,390
So, and this is, this is how the beauty of ADHD, we're going to tie this all

1541
01:00:56,390 --> 01:00:59,860
back to stuff all the way from the beginning about the safety and what we

1542
01:00:59,860 --> 01:01:05,060
should be able to do, it's the way it's implemented currently is what's causing

1543
01:01:05,060 --> 01:01:10,049
the problem, because the way that we use most of those models is we tell it.

1544
01:01:10,270 --> 01:01:12,170
It doesn't matter how much confidence you have in that

1545
01:01:12,170 --> 01:01:15,139
word, you could, you know, your probability could be 0.

1546
01:01:15,140 --> 01:01:18,780
1, that that's the right next word, I still want you to put it

1547
01:01:18,780 --> 01:01:22,020
in there, you still have to make a complete sentence for me, I

1548
01:01:22,020 --> 01:01:24,639
don't care what probability that the next word is, it could be 0.

1549
01:01:24,640 --> 01:01:27,620
9, that you're really confident that that's the next word, or it could be 0.

1550
01:01:27,620 --> 01:01:30,420
1, I have no idea, but this is the best I got, and so what ends

1551
01:01:30,420 --> 01:01:33,500
up happening is, that's how an error gets introduced, Right?

1552
01:01:33,520 --> 01:01:35,850
Like we force it to predict a low confidence

1553
01:01:35,860 --> 01:01:38,200
word or it knew it was a low confidence word.

1554
01:01:38,490 --> 01:01:41,100
Statistically, it knew, but it still had to put something in.

1555
01:01:41,100 --> 01:01:41,960
So it puts it in.

1556
01:01:42,239 --> 01:01:44,810
Well, now we're going to predict the word after that.

1557
01:01:44,880 --> 01:01:47,370
Are there any models that don't force it to do that?

1558
01:01:47,760 --> 01:01:50,340
The programmers could when they go to output it so that

1559
01:01:50,550 --> 01:01:52,940
when that's getting done opening, I used to have this

1560
01:01:52,940 --> 01:01:55,550
really great example and I can't for the life of me find it.

1561
01:01:55,710 --> 01:01:56,770
And I wish everybody would just.

1562
01:01:56,955 --> 01:02:02,235
Do it now is they would color code the, like the highlight each word

1563
01:02:02,645 --> 01:02:05,645
given the probability that the model thought that was the right word.

1564
01:02:05,975 --> 01:02:09,785
So the sentence would look like yellows and greens and reds based

1565
01:02:09,785 --> 01:02:12,065
on the probability that the model thought that was the right word.

1566
01:02:12,065 --> 01:02:16,115
So if we're looking, it's not exactly perfect for us, but if you see like a

1567
01:02:16,115 --> 01:02:19,685
red in a sentence that seems kind of fishy to you, you could be like, oh well.

1568
01:02:19,745 --> 01:02:21,945
That's probably a low probability word.

1569
01:02:22,055 --> 01:02:23,245
This sentence is probably off.

1570
01:02:23,275 --> 01:02:25,055
This totally goes back to what we were saying.

1571
01:02:25,055 --> 01:02:27,265
They want it to be the magic box so bad,

1572
01:02:27,275 --> 01:02:29,865
but that would give more trust and more.

1573
01:02:30,225 --> 01:02:35,044
And like, I would rather you, I would pick and probably pay for a model

1574
01:02:35,355 --> 01:02:39,454
that would tell me when it's unsure, tell me where it got it from and

1575
01:02:39,484 --> 01:02:44,304
what are the other possibilities, because then you have context, I don't

1576
01:02:44,304 --> 01:02:46,434
think you're ever going to get it to tell you where it got it from.

1577
01:02:46,840 --> 01:02:50,040
And I'll, I can explain that one in a second, but you can get this uncertainty.

1578
01:02:50,280 --> 01:02:52,100
You can also, they have other, like, if you

1579
01:02:52,100 --> 01:02:53,580
play with some of the examples on OpenAI.

1580
01:02:53,690 --> 01:02:53,820
Maybe

1581
01:02:53,820 --> 01:02:55,480
not tell you where it got it from, but

1582
01:02:55,480 --> 01:02:57,770
tell you if it's a reputable source, right?

1583
01:02:57,770 --> 01:02:59,680
Like a reputable source into context.

1584
01:02:59,850 --> 01:03:03,479
It still can't get that for you, because the only way you could

1585
01:03:03,479 --> 01:03:06,614
get that is you could constrain the data that was trained on.

1586
01:03:06,735 --> 01:03:09,195
To only be reputable, what you consider reputable sources.

1587
01:03:09,275 --> 01:03:12,015
I mean, I would still take that too, you know, because like, look

1588
01:03:12,015 --> 01:03:15,245
at, like, I usually really agree with Mark Cuban on a lot of things.

1589
01:03:15,245 --> 01:03:19,165
I think he's great, but he basically told all the blue sky a week

1590
01:03:19,165 --> 01:03:22,894
or two ago that you don't need to go to college or learn things.

1591
01:03:22,894 --> 01:03:25,194
You can just kind of.

1592
01:03:25,690 --> 01:03:30,340
Use AI because AI knows so much, then you can do all these highly skilled jobs.

1593
01:03:30,340 --> 01:03:30,900
And it's like,

1594
01:03:30,930 --> 01:03:31,760
you can do importantly,

1595
01:03:32,160 --> 01:03:32,850
exactly.

1596
01:03:33,000 --> 01:03:35,110
And I think AI can be a learning tool.

1597
01:03:35,130 --> 01:03:38,340
Cause sometimes when I'm tired of just reading and doing stuff, I use it to

1598
01:03:38,340 --> 01:03:42,600
kind of almost play games with, to like ask questions and help me get started.

1599
01:03:42,600 --> 01:03:43,809
Like, how would you phrase this?

1600
01:03:43,809 --> 01:03:44,510
Or like how to make.

1601
01:03:44,730 --> 01:03:49,140
My ADHD brain sound a little bit more coherent, but I still have to go back,

1602
01:03:49,520 --> 01:03:53,050
read it, make sure it's saying what I need it to say and do all these things.

1603
01:03:53,050 --> 01:03:57,189
And like, it's almost a disservice to the fact that this could help kids.

1604
01:03:57,190 --> 01:03:58,760
It could help even kids with learning disabilities.

1605
01:03:58,760 --> 01:04:01,310
It could help people like get more into things.

1606
01:04:01,320 --> 01:04:03,659
It really could be an educational tool, but we're

1607
01:04:03,769 --> 01:04:07,340
selling it as a Bible and not an education tool.

1608
01:04:07,360 --> 01:04:08,070
You know what I mean?

1609
01:04:08,080 --> 01:04:09,230
And like selling it as an.

1610
01:04:09,525 --> 01:04:11,585
You don't have to be educated is a lie.

1611
01:04:11,635 --> 01:04:16,555
Selling it as this can be your new tool to get educated would be helpful.

1612
01:04:16,695 --> 01:04:16,985
Like,

1613
01:04:17,005 --> 01:04:19,434
so I wrote a blog, I wrote two blog posts, why I don't

1614
01:04:19,435 --> 01:04:22,225
like the current LLMs, and then I wrote someone why

1615
01:04:22,225 --> 01:04:24,404
I do like them, like what I'm excited about for them.

1616
01:04:24,844 --> 01:04:26,885
And one of the things I'm excited about, and this is

1617
01:04:26,885 --> 01:04:29,485
probably back again, related to the HD ADHD stuff and all

1618
01:04:29,485 --> 01:04:31,705
the neural stuff, but writing has always been hard for me.

1619
01:04:31,705 --> 01:04:33,265
I think I probably have a writing disability.

1620
01:04:34,495 --> 01:04:37,205
To me, these LLMs are writing calculator.

1621
01:04:37,680 --> 01:04:41,390
Yes, that's exactly what I use it for, because I have all these thoughts,

1622
01:04:41,390 --> 01:04:43,590
so I put all the thoughts out there, and then I'm like, how would you

1623
01:04:43,590 --> 01:04:47,440
say this incoherently, and like full, and then I go back, read it, add

1624
01:04:47,450 --> 01:04:49,760
what I want, because it always takes out stuff that's important, and it

1625
01:04:49,789 --> 01:04:54,059
doesn't have context, but it's like wild, because we could give the kids,

1626
01:04:54,060 --> 01:05:00,410
or us, or ADHD folks context to use these tools to help us to do things.

1627
01:05:00,460 --> 01:05:01,060
Just like a math

1628
01:05:01,060 --> 01:05:01,720
calculator, right?

1629
01:05:01,720 --> 01:05:03,590
You have to, you have to learn math.

1630
01:05:03,905 --> 01:05:06,325
You have to learn the basic operations, you can't get out of it,

1631
01:05:06,625 --> 01:05:09,115
but we're not good when we you go to do advanced calculations,

1632
01:05:09,115 --> 01:05:11,455
we're not going to penalize you from being able to do that advanced

1633
01:05:11,455 --> 01:05:14,625
thing just because you can't do this early like this, you have

1634
01:05:14,695 --> 01:05:17,614
trouble actually physically in your brain doing these early things.

1635
01:05:17,915 --> 01:05:19,174
It's the same thing like looking at a blank

1636
01:05:19,215 --> 01:05:21,944
page for me and saying, write a full essay.

1637
01:05:22,885 --> 01:05:23,275
Sure.

1638
01:05:23,275 --> 01:05:25,105
Guarantee of like terribleness,

1639
01:05:25,325 --> 01:05:25,595
dude.

1640
01:05:25,595 --> 01:05:30,425
I've been in like this email, like talk,

1641
01:05:30,515 --> 01:05:33,514
total just paralysis for like three days.

1642
01:05:33,514 --> 01:05:35,675
And I'm just like, I literally was like,

1643
01:05:35,764 --> 01:05:37,635
tell me where you would start with this.

1644
01:05:37,665 --> 01:05:38,505
And then wrote like.

1645
01:05:38,970 --> 01:05:41,720
A ton of like, I got so much stuff done because I just need

1646
01:05:41,720 --> 01:05:46,089
it to get the like, blank page paralysis out of the way

1647
01:05:46,090 --> 01:05:48,100
and even if it's completely wrong, you can at least

1648
01:05:48,140 --> 01:05:50,660
look at it and be like, Oh, they got this part wrong.

1649
01:05:50,680 --> 01:05:51,339
And then this thing is

1650
01:05:51,340 --> 01:05:52,079
so much stuff.

1651
01:05:52,080 --> 01:05:53,299
I had to give it more context.

1652
01:05:53,340 --> 01:05:57,310
I had exactly like, but it was the point of, I wasn't stuck for hours

1653
01:05:57,529 --> 01:06:00,970
staring at a blank page and with like, I can edit great.

1654
01:06:01,380 --> 01:06:01,510
Yeah.

1655
01:06:01,510 --> 01:06:03,010
A lot of people are better editors than

1656
01:06:03,010 --> 01:06:05,110
they are writers and you, you can't express.

1657
01:06:05,395 --> 01:06:07,385
What it is that you want, but you know, when it's not

1658
01:06:07,385 --> 01:06:09,515
the right thing and you can, you can nudge it back.

1659
01:06:09,515 --> 01:06:10,155
But it's also

1660
01:06:10,185 --> 01:06:12,255
executive functioning of just getting started.

1661
01:06:12,255 --> 01:06:14,275
And it seems like such a daunting task.

1662
01:06:14,605 --> 01:06:17,875
And when you feel like, okay, I don't have to do too much more.

1663
01:06:17,875 --> 01:06:20,394
I just need to add this in and make sure it didn't forget things.

1664
01:06:20,395 --> 01:06:23,414
And then exactly.

1665
01:06:23,415 --> 01:06:28,075
I've never let it just write anything for me, but it's a great start.

1666
01:06:28,075 --> 01:06:29,305
So I feel unstuck.

1667
01:06:29,425 --> 01:06:32,315
That's basically what an LLM is doing under the hoods.

1668
01:06:32,565 --> 01:06:34,305
And that's why we could make them better.

1669
01:06:34,320 --> 01:06:37,250
But we don't, I, it could start to do, we could start to do

1670
01:06:37,270 --> 01:06:39,710
things like predict the whole sentence before you send it to me.

1671
01:06:39,710 --> 01:06:42,000
And if there's more than like three words in there that are below 0.

1672
01:06:42,000 --> 01:06:43,930
1, just tell me, you don't know,

1673
01:06:44,119 --> 01:06:45,829
dude, I would pay so much for that.

1674
01:06:45,829 --> 01:06:49,079
Like, you know what I just, it's wild that it could, like, you're telling me

1675
01:06:49,079 --> 01:06:51,830
that we got these scores and everything, and they're just sleeping on it and

1676
01:06:51,839 --> 01:06:55,269
like, and they're all doing the same stuff and not differentiating themselves.

1677
01:06:55,269 --> 01:06:56,179
And these are wild.

1678
01:06:56,420 --> 01:06:59,650
Clear ways they could differentiate themselves and take the

1679
01:06:59,650 --> 01:07:02,340
market with hundreds of millions of dollars being dumped in.

1680
01:07:02,340 --> 01:07:03,470
And they're like, no, it's good.

1681
01:07:03,470 --> 01:07:04,470
We want to be the

1682
01:07:04,470 --> 01:07:05,029
best.

1683
01:07:05,030 --> 01:07:07,609
You see them fighting on those leaderboards over a 0.

1684
01:07:07,609 --> 01:07:09,580
1 or like a 1 percent improvement or a 0.

1685
01:07:09,580 --> 01:07:11,419
1 percent improvement or whatever.

1686
01:07:11,759 --> 01:07:12,150
And it's like.

1687
01:07:12,535 --> 01:07:15,275
Uh, that's like, you know, little bits on the end of the pencil.

1688
01:07:15,285 --> 01:07:16,875
The thing that I need is a better features

1689
01:07:16,875 --> 01:07:21,855
and like, it's a whole, it's very tech bro.

1690
01:07:21,944 --> 01:07:23,674
Typical is what I would say.

1691
01:07:24,035 --> 01:07:26,635
Before you drop Steve, we have to hear your parental, uh,

1692
01:07:26,665 --> 01:07:28,325
advice and the other things that we were talking about at

1693
01:07:28,325 --> 01:07:29,995
the beginning of the show that you were going to give us.

1694
01:07:30,185 --> 01:07:30,374
Right.

1695
01:07:30,374 --> 01:07:30,955
But there's one other

1696
01:07:30,955 --> 01:07:33,205
thing I still want to say about the model, but I want to explain to you

1697
01:07:33,205 --> 01:07:36,265
why you'll never get, cause I know, I understand why you want sources.

1698
01:07:36,555 --> 01:07:37,635
And I said that in my blog post.

1699
01:07:38,435 --> 01:07:39,755
You can't get sources out of it.

1700
01:07:39,915 --> 01:07:41,195
You took regression though, right?

1701
01:07:41,235 --> 01:07:43,025
Like you remember, do you remember regression, right?

1702
01:07:43,415 --> 01:07:47,694
So what comes out of the regression is an equation, Y equals MX plus B, right?

1703
01:07:47,745 --> 01:07:49,935
Like if I increase temperature by this much

1704
01:07:49,965 --> 01:07:51,675
plus some error, I'm going to get this.

1705
01:07:51,995 --> 01:07:55,635
I can't, what it has is like one of those equations under the hood.

1706
01:07:55,635 --> 01:07:57,834
There's a big, huge equation under its hoods.

1707
01:07:58,584 --> 01:08:01,850
And I can't go back and say, Hey, If we're doing height versus

1708
01:08:01,850 --> 01:08:05,240
weight, tell me exactly which height made you predict this weight.

1709
01:08:05,310 --> 01:08:06,850
You can't do that with a regression model and

1710
01:08:06,890 --> 01:08:09,290
you can't do that with these models either.

1711
01:08:09,519 --> 01:08:11,860
I can't go back and say, because all the words and

1712
01:08:11,860 --> 01:08:14,900
relationships gets mixed up into like a large equation.

1713
01:08:15,350 --> 01:08:19,795
And so you can't go back and say, Which article specifically taught

1714
01:08:19,795 --> 01:08:23,215
you this relationship to this word or whatever it is, you can't.

1715
01:08:23,625 --> 01:08:24,835
It's just the nature of the beast.

1716
01:08:25,145 --> 01:08:28,265
You can't rag can help some with that, though, and things like

1717
01:08:28,265 --> 01:08:31,494
that, because you can give it a source and say, please include these

1718
01:08:31,495 --> 01:08:35,604
sources in the result like that, like related to your stuff, because

1719
01:08:35,834 --> 01:08:37,905
that's what that's exactly what like the Bing search does, right?

1720
01:08:37,905 --> 01:08:39,075
Like Bing search is doing that.

1721
01:08:39,115 --> 01:08:40,755
It's like it's going to fetch a bunch of searches that does a

1722
01:08:40,755 --> 01:08:43,485
web search and then generates an answer based on that web search.

1723
01:08:43,495 --> 01:08:44,335
And it's like, Hey, yeah.

1724
01:08:44,445 --> 01:08:47,475
These are all the sources that I use to generate this other answer,

1725
01:08:47,495 --> 01:08:50,075
but then everyone has to use Bing and that's just not as terrible.

1726
01:08:50,215 --> 01:08:51,505
Bing is such a bad search engine.

1727
01:08:51,855 --> 01:08:54,335
So, I mean, I want to keep wanting to, what's your

1728
01:08:54,335 --> 01:08:55,245
parenting advice?

1729
01:08:55,505 --> 01:08:55,755
Okay.

1730
01:08:55,755 --> 01:08:57,285
So the dating advice, that was the one that you

1731
01:08:57,285 --> 01:08:59,525
were first, you were like, why can't guys just be.

1732
01:08:59,545 --> 01:09:02,295
Okay, so here's my advice.

1733
01:09:02,395 --> 01:09:05,635
This is what I gave to my friend who, she had some

1734
01:09:05,635 --> 01:09:07,655
anxiety issues and stuff and she kept dating guys.

1735
01:09:08,395 --> 01:09:08,575
Yeah.

1736
01:09:08,595 --> 01:09:10,845
So they were all, most of the guys she was dating.

1737
01:09:10,845 --> 01:09:12,904
And so what I said to her, like they didn't understand

1738
01:09:12,905 --> 01:09:14,824
it and they were like, why are you getting so upset?

1739
01:09:14,824 --> 01:09:15,145
And blah, blah, blah.

1740
01:09:15,145 --> 01:09:15,484
All this stuff.

1741
01:09:15,485 --> 01:09:18,044
I said, listen, I'll call her Mary.

1742
01:09:18,134 --> 01:09:18,774
Even though that's not her name.

1743
01:09:19,334 --> 01:09:19,774
Mary, listen.

1744
01:09:20,155 --> 01:09:26,155
If you can find a guy who's divorced with kids and you see his relationship

1745
01:09:26,155 --> 01:09:30,275
is good with his kids, that's the kind of guy you want to be with.

1746
01:09:30,395 --> 01:09:31,944
That's literally what I look for.

1747
01:09:32,005 --> 01:09:32,875
Do they exist?

1748
01:09:32,875 --> 01:09:36,285
And then like, how do I like, just where do they live?

1749
01:09:36,295 --> 01:09:37,824
Do, do they hide under rocks?

1750
01:09:37,824 --> 01:09:38,565
Like I just.

1751
01:09:38,835 --> 01:09:40,055
Do you have to look for nerdy ones?

1752
01:09:40,785 --> 01:09:41,635
No, those are mean

1753
01:09:41,635 --> 01:09:42,045
to me.

1754
01:09:42,215 --> 01:09:43,225
Those are mean to me.

1755
01:09:43,245 --> 01:09:43,595
Do they have

1756
01:09:43,625 --> 01:09:43,915
kids

1757
01:09:43,935 --> 01:09:44,285
though?

1758
01:09:44,305 --> 01:09:44,585
Yes.

1759
01:09:44,845 --> 01:09:45,625
I refuse.

1760
01:09:45,625 --> 01:09:46,845
They don't even like their kids.

1761
01:09:46,855 --> 01:09:47,485
I broke.

1762
01:09:47,485 --> 01:09:47,695
Okay.

1763
01:09:47,695 --> 01:09:47,805
So

1764
01:09:47,805 --> 01:09:48,455
that was the point.

1765
01:09:48,495 --> 01:09:49,165
What was the thing?

1766
01:09:49,165 --> 01:09:49,695
I said,

1767
01:09:49,735 --> 01:09:52,065
okay, but they don't, the two don't go together.

1768
01:09:52,405 --> 01:09:52,955
Yes, they do.

1769
01:09:53,555 --> 01:09:54,225
Oh,

1770
01:09:54,345 --> 01:09:57,335
you know what, Steve, you send me a list of your friends.

1771
01:09:57,375 --> 01:09:57,665
Okay.

1772
01:09:57,665 --> 01:09:58,409
It's

1773
01:09:58,409 --> 01:09:59,154
hard.

1774
01:09:59,154 --> 01:09:59,464
It's hard.

1775
01:09:59,464 --> 01:10:00,474
It's hard to find.

1776
01:10:00,984 --> 01:10:02,725
I'm just saying they don't

1777
01:10:02,725 --> 01:10:03,305
exist.

1778
01:10:03,305 --> 01:10:04,355
I have searched.

1779
01:10:04,775 --> 01:10:05,985
I'm about to create a bot.

1780
01:10:06,005 --> 01:10:07,595
That's AI that does my dating.

1781
01:10:07,655 --> 01:10:07,865
Okay.

1782
01:10:07,865 --> 01:10:10,235
I really do have to run to go get the kids, but Steve,

1783
01:10:10,465 --> 01:10:11,195
I will come back.

1784
01:10:11,415 --> 01:10:14,325
Okay, when you come back, I will, but just say, look, when you

1785
01:10:14,335 --> 01:10:18,065
look at their dating profile and they say they have kids, look if

1786
01:10:18,065 --> 01:10:21,335
they have pictures of themselves with their kids in their profile.

1787
01:10:21,655 --> 01:10:23,395
It's like the using a picture of a puppy.

1788
01:10:23,485 --> 01:10:25,204
I know, but it's better than just dating some guy.

1789
01:10:25,215 --> 01:10:27,995
I mean, you could get lucky and find some guy who's never had kids.

1790
01:10:28,185 --> 01:10:29,534
We have to, we have to.

1791
01:10:29,595 --> 01:10:29,975
Okay.

1792
01:10:29,995 --> 01:10:30,355
Okay.

1793
01:10:30,545 --> 01:10:30,895
Okay.

1794
01:10:30,895 --> 01:10:33,225
I have, I look, we had, we need to just

1795
01:10:33,255 --> 01:10:35,245
look, Steve, I will be back in 10 minutes.

1796
01:10:35,875 --> 01:10:37,465
Thank you, Steve, so much for coming on.

1797
01:10:37,800 --> 01:10:40,900
Thank you everyone for listening and Steve, where can people find you online?

1798
01:10:41,430 --> 01:10:43,470
I'm the Steve zero on blue sky.

1799
01:10:43,570 --> 01:10:46,330
You can find me as the Steve zero on GitHub.

1800
01:10:47,050 --> 01:10:49,429
I am in multiple different Slack communities.

1801
01:10:49,470 --> 01:10:52,200
I've worked in Kubernetes before, so I'm in the Kubernetes Slack.

1802
01:10:52,580 --> 01:10:53,859
I'm in the geospatial Slack.

1803
01:10:53,880 --> 01:10:55,789
I'm in the voxel 51 discord.

1804
01:10:56,230 --> 01:10:58,584
Where else can they make them find me on voxel 51?

1805
01:10:58,945 --> 01:11:02,825
We did put you in the blue sky starter pack for fork around and find out guests.

1806
01:11:02,835 --> 01:11:06,335
So if you are looking for you, you can, people can check it out there.

1807
01:11:06,365 --> 01:11:06,625
So

1808
01:11:06,705 --> 01:11:08,465
I was also going to say, you can find me on YouTube

1809
01:11:08,484 --> 01:11:10,945
also, but I don't know if I'm the Steve zero on YouTube.

1810
01:11:11,635 --> 01:11:13,665
Um, I, and some of these talks of the stuff we talked

1811
01:11:13,665 --> 01:11:17,895
about today are actually, I have YouTube talks, you know.

1812
01:11:18,340 --> 01:11:19,740
I linked to the recorded talk.

1813
01:11:20,820 --> 01:11:22,320
I will put that in the show notes too.

1814
01:11:22,350 --> 01:11:22,560
Once we

1815
01:11:22,740 --> 01:11:24,480
get, I'll send it to you so you can have it in the show notes.

1816
01:11:24,690 --> 01:11:24,940
Okay.

1817
01:11:25,410 --> 01:11:27,809
Thank you everyone for listening and we will talk to you again soon.

1818
01:11:28,050 --> 01:11:28,300
Okay.

1819
01:11:28,510 --> 01:11:28,740
Bye.

1820
01:11:28,750 --> 01:11:29,180
Thanks.

1821
01:11:44,240 --> 01:11:47,239
Thank you for listening to this episode of fork around and find out.

1822
01:11:47,559 --> 01:11:49,709
If you like this show, please consider sharing it with

1823
01:11:49,709 --> 01:11:52,889
a friend, a coworker, a family member, or even an enemy.

1824
01:11:52,999 --> 01:11:55,100
However, we get the word out about this show

1825
01:11:55,310 --> 01:11:57,510
helps it to become sustainable for the longterm.

1826
01:11:57,805 --> 01:12:01,485
If you want to sponsor this show, please go to fafo.

1827
01:12:01,525 --> 01:12:05,055
fm slash sponsor and reach out to us there about what

1828
01:12:05,055 --> 01:12:07,255
you're interested in sponsoring and how we can help.

1829
01:12:08,534 --> 01:12:11,715
We hope your systems stay available and your pagers stay quiet.

1830
01:12:12,235 --> 01:12:13,405
We'll see you again next time.