1
00:00:00,060 --> 00:00:00,780
Nikolay: Hello, hello.

2
00:00:00,780 --> 00:00:01,960
This is Postgres.FM.

3
00:00:01,960 --> 00:00:06,360
As usual, my name is Nik, Postgres.AI, and my co-host is Michael,

4
00:00:06,400 --> 00:00:07,240
pgMustard.

5
00:00:07,240 --> 00:00:08,040
Hi, Michael.

6
00:00:08,440 --> 00:00:09,180
Michael: Hi, Nik.

7
00:00:09,320 --> 00:00:15,640
Nikolay: And we have an unexpected guest today, Simon Eskildsen, CEO and

8
00:00:15,640 --> 00:00:17,020
co-founder of turbopuffer.

9
00:00:17,240 --> 00:00:18,020
Hi, Simon.

10
00:00:18,340 --> 00:00:19,640
Simon: Thank you for having me.

11
00:00:19,660 --> 00:00:19,900
Nikolay: Yeah.

12
00:00:19,900 --> 00:00:25,320
Thank you for coming and it was very unexpected because we mentioned

13
00:00:25,320 --> 00:00:29,340
turbopuffer last time and you messaged me on Twitter and I think

14
00:00:29,340 --> 00:00:33,260
it's a great idea sometimes to look outside of traditional Postgres

15
00:00:33,260 --> 00:00:33,760
ecosystem.

16
00:00:34,280 --> 00:00:37,180
I think it's beneficial for everyone, should be.

17
00:00:37,500 --> 00:00:39,780
So yeah, thank you for coming for sure.

18
00:00:39,860 --> 00:00:40,680
It's a great idea.

19
00:00:40,680 --> 00:00:41,340
I think.

20
00:00:41,420 --> 00:00:43,680
Simon: Yeah, the origin story is kind of funny and it was only

21
00:00:43,680 --> 00:00:44,800
a couple of days ago.

22
00:00:45,040 --> 00:00:48,500
I have a, there's a script where if turbopuffer is mentioned

23
00:00:48,520 --> 00:00:51,140
anywhere, then I'll get an email or a summary.

24
00:00:52,100 --> 00:00:55,120
And you guys were discussing last time, different ANN, like both

25
00:00:55,120 --> 00:00:57,660
in Postgres and outside and turbopuffer was mentioned.

26
00:00:57,940 --> 00:01:00,840
And so I just DM'd you and asked, hey, we get, I'd come on, chat

27
00:01:00,840 --> 00:01:04,120
about Postgres, chat about MySQL, chat about like databases and

28
00:01:04,120 --> 00:01:07,280
when you choose one over the other and when Postgres breaks for

29
00:01:07,280 --> 00:01:09,700
some of these workloads that we've seen and when it's great.

30
00:01:10,440 --> 00:01:11,320
And now it's what?

31
00:01:11,320 --> 00:01:13,640
Yeah, 2 or 3 days later and we're on.

32
00:01:13,940 --> 00:01:15,360
Nikolay: Including the weekend actually.

33
00:01:15,360 --> 00:01:18,480
So podcast was out on Friday and we record this on Monday.

34
00:01:18,480 --> 00:01:21,340
That's, that's, that's the loss that you, everyone should try

35
00:01:21,340 --> 00:01:22,040
to achieve.

36
00:01:22,660 --> 00:01:23,300
I like it.

37
00:01:23,300 --> 00:01:23,480
Simon: Yeah.

38
00:01:23,480 --> 00:01:26,100
And you've had, you've had chicken hatch in the meantime.

39
00:01:26,200 --> 00:01:26,880
I got,

40
00:01:27,100 --> 00:01:31,060
Nikolay: that's why my camera is overexposed because a whole,

41
00:01:31,060 --> 00:01:33,760
whole night this camera, I used it to broadcast.

42
00:01:33,820 --> 00:01:35,640
I didn't have time to tune it properly.

43
00:01:35,740 --> 00:01:37,420
But again, like, thank you for coming.

44
00:01:37,420 --> 00:01:38,540
This is about databases.

45
00:01:38,680 --> 00:01:42,560
This time probably not so much about Postgres, but definitely

46
00:01:42,560 --> 00:01:46,520
we should talk about vector, vectors, right, and ANN.

47
00:01:47,180 --> 00:01:52,380
And maybe we should start from distance and talk about your background.

48
00:01:52,920 --> 00:01:55,120
And I've heard some MySQL is involved, right?

49
00:01:55,120 --> 00:01:57,420
Can you discuss it a little bit?

50
00:01:57,720 --> 00:01:58,400
Simon: For sure.

51
00:01:58,740 --> 00:02:04,080
Yeah, my background is I spent
almost a decade at Shopify scaling

52
00:02:04,120 --> 00:02:07,120
mainly the Database layer there,
but pretty much anything that

53
00:02:07,120 --> 00:02:10,380
would break as the scale increased
and increased through the

54
00:02:10,380 --> 00:02:10,880
2010s.

55
00:02:11,520 --> 00:02:14,820
Shopify, like most companies started
in the early 2000s was on

56
00:02:14,820 --> 00:02:15,320
MySQL.

57
00:02:15,800 --> 00:02:19,400
So a lot of the work that I did
was, was with MySQL, but also

58
00:02:19,400 --> 00:02:23,300
every other Database that Shopify
employed, like Redis, Memcached,

59
00:02:23,600 --> 00:02:27,600
Elasticsearch, and a ton of others,
Kafka, so on.

60
00:02:27,600 --> 00:02:28,780
So it's been a long time there.

61
00:02:28,780 --> 00:02:30,720
When I joined, it was a couple
of hundred requests per second.

62
00:02:30,720 --> 00:02:33,960
And when I left, it was in the
millions and very very intimate

63
00:02:33,960 --> 00:02:35,580
with the data layer there.

64
00:02:35,660 --> 00:02:40,080
I was on the last resort pager
for many many years and that has

65
00:02:40,080 --> 00:02:44,060
informed a lot about how I write
software today and a couple

66
00:02:44,060 --> 00:02:47,200
years ago I started a company called
turbopuffer because I thought...

67
00:02:47,200 --> 00:02:49,340
Nikolay: Before that, sorry for
interrupting.

68
00:02:49,340 --> 00:02:53,000
I remember Shopify, actually, I
always, we always recommend the

69
00:02:53,000 --> 00:02:53,500
article.

70
00:02:53,860 --> 00:02:58,520
Shopify had a couple of blog posts
about UUID and how UUID version

71
00:02:58,520 --> 00:03:01,080
4 is not good and version 7 is
much better.

72
00:03:01,500 --> 00:03:05,460
And it doesn't matter MySQL or
Postgres, the mechanics behind

73
00:03:05,460 --> 00:03:06,680
the scene is the same.

74
00:03:06,680 --> 00:03:08,100
It's how B-tree behaves.

75
00:03:08,480 --> 00:03:12,920
So I remember, and I think, Michael,
we mentioned that article

76
00:03:12,920 --> 00:03:14,940
on podcast a few times as well.

77
00:03:15,040 --> 00:03:16,720
Michael: We have mentioned it, Nik,
for sure.

78
00:03:16,720 --> 00:03:18,760
And I think Shopify has come up
before.

79
00:03:18,760 --> 00:03:20,740
I thought, is it a Vitess shop?

80
00:03:20,740 --> 00:03:23,740
I thought it might have been 1
of the early Vitess adopters.

81
00:03:24,140 --> 00:03:26,320
Simon: Shopify is using a little
bit of Vitess, but not very

82
00:03:26,320 --> 00:03:26,820
much.

83
00:03:26,820 --> 00:03:29,480
Vitess was not around when we
did all the sharding back in the

84
00:03:29,480 --> 00:03:30,900
early 2010s.

85
00:03:31,120 --> 00:03:34,460
And so we did it all at the, at
the, at the application layer.

86
00:03:34,460 --> 00:03:37,120
I wasn't part of the actual sharding
decision, but I was part

87
00:03:37,120 --> 00:03:39,260
of a lot of the sharding work over
the years.

88
00:03:39,900 --> 00:03:42,400
And it's funny because at the time
I know that they looked at

89
00:03:42,400 --> 00:03:45,200
a bunch of proxies and all of the
businesses they later looked

90
00:03:45,200 --> 00:03:46,500
at had gone out of business.

91
00:03:46,720 --> 00:03:48,380
It's not a great business, unfortunately.

92
00:03:48,800 --> 00:03:51,720
But everything was done in Ruby
land through a, just a module

93
00:03:51,720 --> 00:03:54,320
called sharding, and it did a lot
of things and a lot of monkey

94
00:03:54,320 --> 00:03:55,420
patches into Rails.

95
00:03:55,760 --> 00:03:59,060
But let's talk about this, like
UUID v4 thing, because I think

96
00:03:59,060 --> 00:04:03,420
if we wanted to do a pros and cons,
MySQL versus Postgres, I

97
00:04:03,420 --> 00:04:08,560
spent quite a bit of time with
both and this 1 actually, to my

98
00:04:08,560 --> 00:04:10,960
knowledge, only really matters
for MySQL.

99
00:04:10,960 --> 00:04:13,440
Well, it actually matters for Postgres
as well, but in a different

100
00:04:13,440 --> 00:04:13,940
way.

101
00:04:14,020 --> 00:04:17,600
So on MySQL, the primary key dictates
how the B-tree is laid

102
00:04:17,600 --> 00:04:19,080
out for the primary key, right?

103
00:04:19,080 --> 00:04:20,400
So for the entire row.

104
00:04:20,740 --> 00:04:24,720
So if you have UUID v4, it's completely
randomly scattered along

105
00:04:24,720 --> 00:04:25,340
the B-tree.

106
00:04:25,460 --> 00:04:27,340
So whenever you're doing an insert,
right?

107
00:04:27,340 --> 00:04:29,760
It will just kind of fall somewhere
random in the B-tree, which

108
00:04:29,760 --> 00:04:33,160
makes the updates very expensive
because if you're updating 10,

109
00:04:33,160 --> 00:04:35,860
if you're adding 10 rows at a time,
so you're not updating to

110
00:04:35,860 --> 00:04:39,860
ads, doing 10 insertions that are
in 10 different leaves, well,

111
00:04:39,860 --> 00:04:42,740
you're just doing a lot more disk
I/O and your write information

112
00:04:42,740 --> 00:04:43,460
is high.

113
00:04:44,640 --> 00:04:47,640
In Postgres, of course, you're
just appending to the heap with

114
00:04:47,640 --> 00:04:49,340
all of its drawbacks and benefits.

115
00:04:49,340 --> 00:04:52,480
And so it doesn't matter as much
other than on the indexes, right?

116
00:04:52,480 --> 00:04:55,800
It's my knowledge, but on the indexes,
it will matter a lot because

117
00:04:55,800 --> 00:04:58,080
on the indexes, of course, it is
sorted by that.

118
00:04:58,080 --> 00:05:01,120
And if you have some temporal locality
then it's not gonna matter

119
00:05:01,120 --> 00:05:01,780
as much.

120
00:05:01,880 --> 00:05:03,120
So that's my understanding.

121
00:05:03,120 --> 00:05:04,900
So this matters a lot in MySQL.

122
00:05:05,240 --> 00:05:09,680
Now, that article I think was after
I left and Shopify doesn't

123
00:05:09,680 --> 00:05:12,100
use UUIDs as primary keys for anything.

124
00:05:12,100 --> 00:05:13,900
So I don't really know where this
mattered.

125
00:05:13,900 --> 00:05:17,220
It must be something tangential
because MySQL really just does

126
00:05:17,220 --> 00:05:18,060
auto increment.

127
00:05:18,580 --> 00:05:22,540
And every shard does an auto increment
with basically like with

128
00:05:22,540 --> 00:05:25,440
a 32K auto increment number.

129
00:05:25,440 --> 00:05:28,620
And then every shard has a plus
offset into that to allow it

130
00:05:28,620 --> 00:05:30,220
to grow to 32K shards.

131
00:05:30,400 --> 00:05:32,600
Given how much of a pain in the
ass it would be to change that,

132
00:05:32,600 --> 00:05:34,020
that's probably still the case.

133
00:05:34,140 --> 00:05:36,080
But I always really liked that
scheme.

134
00:05:36,420 --> 00:05:39,860
So some tables over time at Shopify
ended up having a primary

135
00:05:39,860 --> 00:05:43,520
key on the shop ID comma the ID
because that would give locality

136
00:05:43,520 --> 00:05:44,060
for a shop.

137
00:05:44,060 --> 00:05:46,000
Because otherwise you have a lot
of re-damification if you're

138
00:05:46,000 --> 00:05:49,460
trying to dump out a bunch of products
for a shop, because the

139
00:05:49,460 --> 00:05:54,440
chance that there's gonna be multiple
products for a shop, like

140
00:05:54,840 --> 00:05:57,540
in a leaf, unless you do that,
it's just a lot lower.

141
00:05:57,740 --> 00:05:59,600
So that ended up working really
well.

142
00:05:59,700 --> 00:06:03,520
And this is a pain to do in Postgres,
because if you want to

143
00:06:03,520 --> 00:06:08,040
rewrite the primary key or the
heap by an ID, you have to rewrite

144
00:06:08,040 --> 00:06:09,160
the entire thing.

145
00:06:09,480 --> 00:06:12,740
That was 1 of my surprises, having
worked a bit more with Postgres

146
00:06:12,880 --> 00:06:13,880
in later years.

147
00:06:15,060 --> 00:06:15,780
Nikolay: Yeah, yeah, yeah.

148
00:06:15,780 --> 00:06:19,840
And I agree with you and I agree
that in Postgres it also matters,

149
00:06:19,840 --> 00:06:24,140
but only for B-tree itself, primary
key B-tree, if it's after increment

150
00:06:24,140 --> 00:06:28,440
or in Postgres it's called big
serial, serial for example, or

151
00:06:28,440 --> 00:06:28,940
auto-generated.

152
00:06:30,040 --> 00:06:33,400
Right now there's another method,
but behind the scenes it's

153
00:06:33,400 --> 00:06:35,140
also like sequence and inserts.

154
00:06:35,460 --> 00:06:37,260
Oh no, in this case it's not sequence.

155
00:06:37,420 --> 00:06:41,780
There should be a function which
generates UUID version 4 and

156
00:06:42,280 --> 00:06:47,720
if it's random, like version 4
is random, Version 7 is closer

157
00:06:47,720 --> 00:06:52,660
to regular numbers, basically,
because it's monotonically growing,

158
00:06:52,660 --> 00:06:53,160
right?

159
00:06:54,020 --> 00:06:56,260
Lexicographically ordered, right?

160
00:06:56,260 --> 00:06:59,880
So in this case, you insert only
on the right side of B-tree, and

161
00:06:59,880 --> 00:07:03,300
dirty pages, if you think about
how checkpointer is working.

162
00:07:03,740 --> 00:07:07,200
Also, in Postgres there is also
overhead after each checkpoint

163
00:07:07,200 --> 00:07:10,440
is full page write which involves
indexes as well.

164
00:07:10,520 --> 00:07:15,560
So if you touch random pages all
the time, disk I/O overhead and

165
00:07:15,720 --> 00:07:19,200
replication and backups, actually
everything receives additional

166
00:07:19,200 --> 00:07:19,700
overhead.

167
00:07:19,740 --> 00:07:23,260
While in version 7 we write on
the right side all the time.

168
00:07:23,520 --> 00:07:24,560
It's much better.

169
00:07:24,860 --> 00:07:27,220
But heap, yes, heap is different.

170
00:07:27,500 --> 00:07:32,080
So I agree with this and anyway,
like I just wanted to say we

171
00:07:32,080 --> 00:07:36,260
use MySQL article because it's
written very well.

172
00:07:36,740 --> 00:07:40,060
And in Postgres we didn't have
version 7 for quite some time.

173
00:07:40,760 --> 00:07:45,980
Last Thursday version 18 release
candidate was out, which will

174
00:07:45,980 --> 00:07:49,900
include full implementation of
UUID v7, which was live-coded

175
00:07:50,660 --> 00:07:55,940
on Postgres TV with a couple of
friends, just in Cursor, I think.

176
00:07:57,180 --> 00:07:59,760
Oh no, it was not in Cursor, it
was before that, but anyway,

177
00:07:59,760 --> 00:08:03,860
it was just created right online.

178
00:08:05,500 --> 00:08:09,560
We did it, and right now it took
a couple of years to reach maturity

179
00:08:09,660 --> 00:08:14,260
because Postgres always waits until
RFCs are finalized and so

180
00:08:14,260 --> 00:08:14,760
on.

181
00:08:15,060 --> 00:08:21,540
Anyway, soon version 18 will be
out and UUID version 7 is inside.

182
00:08:21,740 --> 00:08:25,280
But I think everyone is already
using it on client.

183
00:08:25,920 --> 00:08:26,660
Okay, great.

184
00:08:26,660 --> 00:08:32,060
So you had this great career and
then decided to create another,

185
00:08:32,800 --> 00:08:36,480
is it, can we call it a database
system, database management

186
00:08:36,480 --> 00:08:36,980
system?

187
00:08:37,440 --> 00:08:39,520
Simon: It's certainly a full-blown
database.

188
00:08:40,440 --> 00:08:43,600
You know, underneath turbopuffer
is an LSM.

189
00:08:43,780 --> 00:08:46,580
An LSM works really well for object
storage.

190
00:08:47,320 --> 00:08:51,640
And, you know, every successful
database ends up implementing

191
00:08:51,720 --> 00:08:55,580
every, every query eventually right
in the limit and turbopuffer

192
00:08:55,640 --> 00:08:58,780
will end up doing the same thing,
but every good database also

193
00:08:58,780 --> 00:09:01,040
starts with some specialization,
right?

194
00:09:01,240 --> 00:09:04,140
And our specialization has been
on, on search workloads.

195
00:09:04,460 --> 00:09:08,440
I would say that it's by no means
a replacement for Postgres.

196
00:09:08,640 --> 00:09:11,100
There always becomes a time where
it starts to make sense to

197
00:09:11,100 --> 00:09:14,960
move parts of data into more specialized
algorithms, more specialized

198
00:09:14,960 --> 00:09:17,020
data structures, and more specialized
storage.

199
00:09:17,600 --> 00:09:21,560
In general, my, my hypothesis on when it's time to create a database

200
00:09:21,560 --> 00:09:25,020
is that you need 2 things in the air in order to create a generational

201
00:09:25,040 --> 00:09:26,020
database company.

202
00:09:26,200 --> 00:09:28,480
The first thing that you need is that you need a new storage

203
00:09:28,480 --> 00:09:31,200
architecture, because if you don't have a new storage architecture,

204
00:09:31,200 --> 00:09:34,640
some new way to store the data, ideally both data structure wise,

205
00:09:34,640 --> 00:09:37,360
and also the actual medium that you're persisting on.

206
00:09:37,500 --> 00:09:40,840
There's no reason why in the limit the other databases won't

207
00:09:40,840 --> 00:09:42,160
do the workload better.

208
00:09:42,160 --> 00:09:43,780
They already have the existing momentum.

209
00:09:43,780 --> 00:09:44,940
They already have the workloads.

210
00:09:45,180 --> 00:09:48,360
In the Postgres case, of course, you know, it's the classic relational

211
00:09:48,400 --> 00:09:52,660
architecture where you replicate every byte onto 3 disks.

212
00:09:52,920 --> 00:09:54,840
And this works phenomenally well, right?

213
00:09:54,840 --> 00:09:57,740
We've had this in production for probably more than 40 years.

214
00:09:57,920 --> 00:09:59,020
And it works great.

215
00:09:59,060 --> 00:10:01,620
It has high performance, It has a very predictable performance

216
00:10:01,640 --> 00:10:04,600
profile and it works really, really well with the page cache

217
00:10:04,600 --> 00:10:07,060
or the buffer pool, whatever database you're using.

218
00:10:07,300 --> 00:10:10,820
The problem with this model is that if the data is not very valuable,

219
00:10:11,260 --> 00:10:12,680
this model is expensive.

220
00:10:13,260 --> 00:10:17,600
Every gigabyte of disks, a network bound disk is about 10 cents

221
00:10:17,600 --> 00:10:18,280
per gigabyte.

222
00:10:18,620 --> 00:10:23,100
And unless you are really risky DBA, you're gonna run those disks

223
00:10:23,100 --> 00:10:26,700
at 50% utilization on all the replicas and on the primary.

224
00:10:27,020 --> 00:10:30,280
So you're paying for this disk kind of 3 times, which ends up

225
00:10:30,280 --> 00:10:32,800
with a whole all-in cost of 60 cents per gigabyte.

226
00:10:32,800 --> 00:10:35,320
And that's not even accounting for all the CPUs that you also

227
00:10:35,320 --> 00:10:38,080
need on the replicas because you need the replicas to process

228
00:10:38,080 --> 00:10:39,780
the rights as fast as the primary.

229
00:10:39,840 --> 00:10:42,220
So you're kind of paying for the same thing 3 times.

230
00:10:42,260 --> 00:10:47,120
So the all in sort of per terabyte cost, when you also take into

231
00:10:47,120 --> 00:10:49,640
consideration that the system can be a little bit more memory

232
00:10:49,640 --> 00:10:54,220
hungry is probably around 60, 60 cents to $2 per gigabyte

233
00:10:54,960 --> 00:10:55,700
Nikolay: per month.

234
00:10:56,380 --> 00:10:57,160
Simon: Per month.

235
00:10:57,520 --> 00:10:58,000
Yeah.

236
00:10:58,000 --> 00:11:01,060
Per month USD on object storage.

237
00:11:01,060 --> 00:11:03,020
The base cost is 2 cents per gigabyte.

238
00:11:03,140 --> 00:11:03,640
Right.

239
00:11:04,280 --> 00:11:07,920
And When we need that data in RAM or on disk, we only have to

240
00:11:07,920 --> 00:11:08,740
pay for 1.

241
00:11:08,740 --> 00:11:11,240
We only have to pay for some percentage of the data to be in

242
00:11:11,240 --> 00:11:12,280
cache at all times.

243
00:11:12,360 --> 00:11:13,680
We mentioned Cursor earlier.

244
00:11:13,780 --> 00:11:15,480
Cursor is a turbopuffer customer.

245
00:11:15,560 --> 00:11:20,580
They don't need every single code base on SSDs or in memory at

246
00:11:20,580 --> 00:11:21,360
all times.

247
00:11:21,420 --> 00:11:23,940
They need some percentage in memory, the ones that are queried

248
00:11:23,940 --> 00:11:26,520
a lot right now, and some percentage on disk, the ones that are

249
00:11:26,520 --> 00:11:29,280
gonna be queried again in a few minutes or maybe in a few hours.

250
00:11:29,340 --> 00:11:32,660
And so you end up paying a lot less because we only have to keep

251
00:11:32,660 --> 00:11:36,420
1 copy of a subset of the data rather than 3 copies of all of

252
00:11:36,420 --> 00:11:37,000
the data.

253
00:11:37,240 --> 00:11:39,780
Now that comes with a fundamental set of trade-offs, right?

254
00:11:39,780 --> 00:11:41,140
We want to be as public as that.

255
00:11:41,140 --> 00:11:42,840
You can't use turbopuffer for everything.

256
00:11:43,260 --> 00:11:46,220
If you want to do a write to turbopuffer, we commit that to S3,

257
00:11:46,240 --> 00:11:46,560
right?

258
00:11:46,560 --> 00:11:49,640
So by the time it's committed to turbopuffer, the guarantee

259
00:11:49,640 --> 00:11:52,840
is actually stronger than most relational systems because we've

260
00:11:52,840 --> 00:11:55,940
committed it into S3, which takes a couple hundred milliseconds,

261
00:11:55,960 --> 00:11:58,060
but the durability guarantee is very strong.

262
00:11:58,180 --> 00:12:01,360
But if you're building a system like Shopify, well, you can't

263
00:12:01,360 --> 00:12:03,840
live with a commit time in hundreds of milliseconds.

264
00:12:03,840 --> 00:12:04,940
It's just not acceptable.

265
00:12:05,280 --> 00:12:08,260
So that's a trade-off that means that this is not a system that

266
00:12:08,260 --> 00:12:11,280
will ever replace a relational database store.

267
00:12:11,280 --> 00:12:14,440
The other downside is that because not all the data is on a disk

268
00:12:14,440 --> 00:12:17,260
or in memory at all times, it means that you can have tail latency.

269
00:12:17,320 --> 00:12:19,840
And that can be really catastrophic in very large systems that

270
00:12:19,840 --> 00:12:21,580
are doing millions of queries per second.

271
00:12:21,580 --> 00:12:24,900
If you can't rely on a very predictable query profile, you can

272
00:12:24,900 --> 00:12:27,180
have massive outages to hydrate the caches.

273
00:12:27,980 --> 00:12:31,020
I've seen these outages on disks, I've seen them.

274
00:12:31,320 --> 00:12:34,460
And just even the workload changing slightly can mess with the

275
00:12:34,460 --> 00:12:36,580
buffer pool in a way where you have a massive outage.

276
00:12:36,580 --> 00:12:39,520
So these 2 things may sound like small trade-offs, but they're

277
00:12:39,520 --> 00:12:41,920
massive trade-offs for very large production systems.

278
00:12:42,180 --> 00:12:45,640
But it might mean that if you have let's say a billion vectors

279
00:12:45,640 --> 00:12:47,460
and you're trying to store them into Postgres.

280
00:12:47,560 --> 00:12:49,080
The economics just don't make sense.

281
00:12:49,080 --> 00:12:51,580
You're paying thousands of thousands, not tens of thousands of

282
00:12:51,580 --> 00:12:54,680
dollars in hardware costs for workload that might cost tens or

283
00:12:54,680 --> 00:12:56,740
hundreds of dollars on turbopuffer.

284
00:12:56,740 --> 00:12:58,040
Nikolay: How much is it really?

285
00:12:58,040 --> 00:12:59,320
1 billion vectors.

286
00:12:59,440 --> 00:13:03,160
If 1 vector is what's like 700

287
00:13:03,920 --> 00:13:05,800
Simon: and a 768 dimensions.

288
00:13:07,220 --> 00:13:07,720
Yeah.

289
00:13:08,260 --> 00:13:11,020
It's, It's 3 terabytes.

290
00:13:12,180 --> 00:13:14,540
Nikolay: 3 terabytes to store 1 billion vectors.

291
00:13:14,540 --> 00:13:15,040
Yeah.

292
00:13:15,040 --> 00:13:17,780
And also we don't have a good index for 1 billion scale.

293
00:13:18,960 --> 00:13:25,700
Yeah, I mean, in Postgres, pgvector, HNSW won't work with 1 billion.

294
00:13:26,120 --> 00:13:29,620
Simon: I mean, we could just run the math a couple of different

295
00:13:29,620 --> 00:13:30,060
ways, right?

296
00:13:30,060 --> 00:13:33,080
I'm not saying this is how it works in pgvector, I'm less familiar

297
00:13:33,080 --> 00:13:33,980
with it now.

298
00:13:34,000 --> 00:13:38,760
But even if you have 3 terabytes of data raw, you're probably

299
00:13:38,760 --> 00:13:40,880
going to need to store more than that.

300
00:13:41,060 --> 00:13:44,440
You might be able to do some tricks to make the vector smaller,

301
00:13:44,440 --> 00:13:47,360
so you only have to store maybe, I don't know, a terabyte or

302
00:13:47,360 --> 00:13:48,940
something along those lines, right?

303
00:13:48,940 --> 00:13:53,500
But remember that a gigabyte of DRAM is $5 per month.

304
00:13:53,800 --> 00:13:56,160
And you need that 3 times on all your replicas.

305
00:13:56,160 --> 00:13:59,320
So you're paying $15 per gigabyte per month.

306
00:14:00,060 --> 00:14:03,600
So if you're doing that, if you have to store that 3 times, you

307
00:14:03,600 --> 00:14:06,260
put everything in memory and you're somehow able to get it down

308
00:14:06,260 --> 00:14:09,960
to a terabyte, then you're talking about $15,000 per month,

309
00:14:09,960 --> 00:14:14,440
right, across the 3 replicas, just for the RAM alone.

310
00:14:14,680 --> 00:14:15,600
Nikolay: Yeah, I agree with you.

311
00:14:15,600 --> 00:14:17,420
HNSW, it's like memory.

312
00:14:17,560 --> 00:14:21,220
Memory is the key, and creation of index takes a lot of time.

313
00:14:21,220 --> 00:14:25,940
And for billion, I already have issues with a few million vectors

314
00:14:25,960 --> 00:14:26,460
scale.

315
00:14:27,040 --> 00:14:30,660
I know Timescale, which I now renamed to Tiger Data, they developed

316
00:14:31,320 --> 00:14:35,740
another index based on DiskANN from Microsoft, I think, which

317
00:14:35,740 --> 00:14:38,660
is more like for disk, right?

318
00:14:39,060 --> 00:14:44,640
But I agree with you, like for this scale, it's not convenient.

319
00:14:44,640 --> 00:14:48,160
But Also, I think it's not only about vectors.

320
00:14:48,160 --> 00:14:52,200
This argument that we need to save on storage costs, and it's

321
00:14:52,200 --> 00:14:56,000
insane to pay for storage and memory as well when we have replicas

322
00:14:56,000 --> 00:14:56,820
and we scale.

323
00:14:57,560 --> 00:15:00,140
In Postgres, if it's a physical replica, it's everything.

324
00:15:00,220 --> 00:15:03,040
So you replicate everything, all indexes, everything.

325
00:15:03,040 --> 00:15:04,680
You cannot replicate even...

326
00:15:04,860 --> 00:15:07,820
There is no ability to replicate only 1 logical database.

327
00:15:07,820 --> 00:15:10,580
You need to get the whole cluster.

328
00:15:11,000 --> 00:15:16,260
And it means that you multiply costs for storage and for memory.

329
00:15:16,940 --> 00:15:20,360
And it would be so great to have some tiered storage maybe with

330
00:15:20,360 --> 00:15:25,160
partitioning as much as automated as possible and offload all

331
00:15:25,160 --> 00:15:29,360
data to S3 as basically you consider.

332
00:15:29,800 --> 00:15:34,200
S3 is great for this and I remember I was actually I explored

333
00:15:34,200 --> 00:15:35,460
turbopuffer through Cursor.

334
00:15:35,460 --> 00:15:37,500
It's great like to see the documentation.

335
00:15:37,700 --> 00:15:40,620
I knew Cursor is using Postgres.

336
00:15:40,640 --> 00:15:43,940
It was a few months ago, but then I heard they considered moving

337
00:15:43,940 --> 00:15:44,640
to PlanetScale.

338
00:15:45,140 --> 00:15:48,780
That was before PlanetScale announced support of Postgres, so

339
00:15:48,780 --> 00:15:51,160
I was thinking, are they switching to MySQL?

340
00:15:51,480 --> 00:15:53,820
And then I saw vectors are stored in turbopuffer.

341
00:15:54,020 --> 00:15:54,520
Great.

342
00:15:54,760 --> 00:15:58,800
Then I learned that several of our clients who get consulting

343
00:15:58,840 --> 00:16:02,640
from us and use Postgres, they also store vectors in turbopuffer.

344
00:16:03,080 --> 00:16:06,940
It was, I think, Photoroom, a couple of more companies.

345
00:16:06,980 --> 00:16:10,680
It was a surprise for me, and I started to think, oh, that's

346
00:16:10,680 --> 00:16:11,180
interesting.

347
00:16:12,720 --> 00:16:14,900
And then I checked your talks.

348
00:16:15,560 --> 00:16:19,460
I think you also mentioned there that if we...

349
00:16:21,980 --> 00:16:27,600
So there is this approach with
SPFresh algorithm, right?

350
00:16:27,660 --> 00:16:32,040
Different it's not HNSW, different
types of index, but also some

351
00:16:32,040 --> 00:16:36,500
additional interesting ideas about
economics you mentioned, right?

352
00:16:36,500 --> 00:16:38,040
Not only these sense.

353
00:16:38,400 --> 00:16:40,220
Can you elaborate a little bit?

354
00:16:40,240 --> 00:16:43,020
Simon: I think it might be helpful
to just talk at a very high

355
00:16:43,020 --> 00:16:46,660
level about the different algorithms
to do vector indexing into.

356
00:16:47,160 --> 00:16:50,060
I'll try to simplify it as much
as possible, and we can dig into

357
00:16:50,060 --> 00:16:51,260
it further if you want.

358
00:16:52,200 --> 00:16:56,380
Fundamentally, the simplest way
to do vector search is that you

359
00:16:56,380 --> 00:16:59,440
just store all of the vectors in
a flat array on disk, right?

360
00:16:59,440 --> 00:17:02,440
And then on every search, you just
compare the query vector to

361
00:17:02,440 --> 00:17:03,220
all of those.

362
00:17:03,220 --> 00:17:05,720
The problem with that is that you
very quickly run up against

363
00:17:05,720 --> 00:17:06,780
bandwidth limits, right?

364
00:17:06,780 --> 00:17:09,380
If you have a gigabyte of vectors
and you're searching that at

365
00:17:09,380 --> 00:17:11,980
maybe 10 gigabytes per second,
if you can exhaust the memory

366
00:17:11,980 --> 00:17:15,180
bandwidth, which is unlikely in
a big production System, you're

367
00:17:15,180 --> 00:17:18,580
only doing, you know, maybe 5 queries
per second and the Query

368
00:17:18,580 --> 00:17:20,040
latency in the hundreds of milliseconds.

369
00:17:20,180 --> 00:17:23,880
So if you have very, very few queries
and you don't care about

370
00:17:23,880 --> 00:17:26,180
latency, this can be feasible on
small scale.

371
00:17:26,200 --> 00:17:28,060
And lots of people are doing that
in production.

372
00:17:28,260 --> 00:17:32,220
But if you want to search a million
vectors in less than a couple

373
00:17:32,220 --> 00:17:34,740
hundred milliseconds and maybe
10 milliseconds and as part of

374
00:17:34,740 --> 00:17:37,340
a bigger pipeline, you need some
kind of index.

375
00:17:37,640 --> 00:17:40,840
The problem with indexing vectors
is that there is no known way

376
00:17:40,840 --> 00:17:42,920
to do it in a perfect way, right?

377
00:17:42,920 --> 00:17:47,180
If I search for a Query vector
about a fruit or whatever, I know

378
00:17:47,180 --> 00:17:49,460
that if I'm searching for banana,
I get the closest fruit.

379
00:17:49,460 --> 00:17:51,100
Maybe, I don't know, maybe there's
a plantain.

380
00:17:51,100 --> 00:17:51,880
I don't know.

381
00:17:52,200 --> 00:17:53,300
Right, in the cluster.

382
00:17:53,520 --> 00:17:57,040
But you have to build an approximate
index in order to make this

383
00:17:57,040 --> 00:17:57,500
faster.

384
00:17:57,500 --> 00:17:58,780
Nikolay: Because of too many dimensions.

385
00:17:58,940 --> 00:18:02,300
For small number of dimensions,
there are approaches and approaches

386
00:18:02,380 --> 00:18:06,140
have them for years, but for hundreds
of thousands of dimensions

387
00:18:06,160 --> 00:18:06,660
yeah.

388
00:18:06,940 --> 00:18:10,200
Simon: That's right yeah there's
KD trees and so on for the

389
00:18:10,200 --> 00:18:13,220
for the smaller dimensional space
which we can use for geo coordinates

390
00:18:13,260 --> 00:18:15,660
and things like that and simpler
geometry.

391
00:18:16,340 --> 00:18:19,260
So for very high dimensional spaces
these things fall apart.

392
00:18:19,400 --> 00:18:21,020
Curse of dimensionality it's called.

393
00:18:21,020 --> 00:18:23,860
So they're very large is also important
about the vectors.

394
00:18:23,860 --> 00:18:26,580
If you have a kilobyte of text
it can easily turn into tens and

395
00:18:26,580 --> 00:18:29,760
tens of kilobytes of vectors which
is why separating into cheaper

396
00:18:29,760 --> 00:18:31,160
storage makes a lot of sense.

397
00:18:31,160 --> 00:18:33,960
So there's 2 fundamental ways that
you can index the data.

398
00:18:33,960 --> 00:18:37,560
There's the graph-based approaches,
HNSW and DiskANN, were the

399
00:18:37,560 --> 00:18:38,700
2 you mentioned before.

400
00:18:38,740 --> 00:18:40,540
And there's the clustering-based
approach.

401
00:18:40,840 --> 00:18:43,780
The graph-based approach is phenomenal
if you can store all of

402
00:18:43,780 --> 00:18:48,680
the data in memory and you have
very high QPS and very low latency

403
00:18:48,700 --> 00:18:49,200
requirements.

404
00:18:49,440 --> 00:18:52,020
So if you have, let's say a hundred
million vectors and you're

405
00:18:52,020 --> 00:18:54,940
searching that at, you know, a
hundred thousand queries per second,

406
00:18:54,960 --> 00:18:58,480
and you need very low latency,
you're not going to beat HNSW.

407
00:18:58,780 --> 00:19:02,060
It's going to create a very, very
good graph to, to navigate

408
00:19:02,060 --> 00:19:02,700
it with.

409
00:19:02,700 --> 00:19:04,960
The problem is that it's very expensive
and the other problem

410
00:19:04,960 --> 00:19:07,480
is that almost no workloads in
the real world actually look like

411
00:19:07,480 --> 00:19:07,980
this.

412
00:19:08,080 --> 00:19:12,600
So HNSW got very popular because
it's fairly simple to implement

413
00:19:12,600 --> 00:19:14,880
correctly and it's very simple
to maintain.

414
00:19:15,140 --> 00:19:19,300
So when you create the graph which
is essentially just you know

415
00:19:19,300 --> 00:19:21,880
points that are close in vector
space or close in the graph,

416
00:19:21,880 --> 00:19:23,680
it's very simple to incrementally
maintain.

417
00:19:23,680 --> 00:19:26,320
You put 1 thing in, you search
the graph, and then you add it.

418
00:19:26,320 --> 00:19:27,440
There's very simple rules.

419
00:19:27,440 --> 00:19:30,720
You can implement something like
HNSW in tens of lines of code

420
00:19:30,720 --> 00:19:33,340
if you did a very, very simple
implementation of it.

421
00:19:33,340 --> 00:19:36,380
The problem with HNSW is that every
time you do a write, you

422
00:19:36,380 --> 00:19:38,000
have to update a lot of data, right?

423
00:19:38,000 --> 00:19:40,760
In database land, we call this
write amplification, where every

424
00:19:40,760 --> 00:19:43,660
byte or every page you update,
you have to update a lot of others,

425
00:19:43,660 --> 00:19:46,640
and the reason for that is that
You add something, you add a

426
00:19:46,640 --> 00:19:48,760
node in the graph, and then you
have to update all these other

427
00:19:48,760 --> 00:19:51,000
things to do connections to that
node in the graph.

428
00:19:51,280 --> 00:19:55,040
So this works great in memory,
because memory is very fast at

429
00:19:55,040 --> 00:19:57,180
updating and very fast at random
writes.

430
00:19:57,180 --> 00:19:59,440
But the problem is also on the
read path.

431
00:19:59,440 --> 00:20:01,560
So memory is very fast at doing
random reads.

432
00:20:01,560 --> 00:20:04,640
You can do a random read at 100
nanoseconds, but a random read

433
00:20:04,640 --> 00:20:08,400
on S3 or on a disk is much slower,
into hundreds of microseconds

434
00:20:08,860 --> 00:20:10,820
to the hundreds of milliseconds
on S3.

435
00:20:11,120 --> 00:20:14,320
And on a graph, right, you don't
really, there's no speculation

436
00:20:14,380 --> 00:20:15,060
that helps, right?

437
00:20:15,060 --> 00:20:17,640
If you start at the middle of the
graph and then greedily navigate

438
00:20:17,640 --> 00:20:21,360
the graph from the query vector
to the closest matching vectors,

439
00:20:21,600 --> 00:20:24,240
well, every single time you do
that, there's a round trip.

440
00:20:24,240 --> 00:20:26,980
And on S3, that's a round trip
that takes hundreds of milliseconds.

441
00:20:26,980 --> 00:20:29,620
So you're sort of navigating from
the root, it's like 200 milliseconds.

442
00:20:29,620 --> 00:20:31,300
You go out 1, 200 milliseconds.

443
00:20:31,420 --> 00:20:33,140
You go out another 1, 200 milliseconds.

444
00:20:33,680 --> 00:20:37,940
And in general, for HNSW on a million,
this might be in the tens

445
00:20:37,940 --> 00:20:38,860
of round trips.

446
00:20:38,860 --> 00:20:40,440
So this gets very slow, right?

447
00:20:40,440 --> 00:20:42,740
This is in the seconds to do this
on S3.

448
00:20:42,740 --> 00:20:45,980
And even on a disk, this very quickly
adds up to tens of milliseconds.

449
00:20:46,400 --> 00:20:48,380
That's the fundamental problem with graphs.

450
00:20:48,480 --> 00:20:52,400
Now, DiskANN is essentially using a lot of the same ideas

451
00:20:52,540 --> 00:20:56,780
as other graph-based indexes, but instead of trying to have the

452
00:20:56,780 --> 00:21:00,460
tens of round trips that HNSW has, that's very good for memory,

453
00:21:00,480 --> 00:21:02,900
DiskANN basically tries to shrink the graph so that there

454
00:21:02,900 --> 00:21:03,860
are fewer jumps, right?

455
00:21:03,860 --> 00:21:09,000
So instead of 200 milliseconds, 30 times, it tries to get it

456
00:21:09,000 --> 00:21:13,740
to maybe 6 or 7 times, or 10 times, by shrinking the graph as

457
00:21:13,740 --> 00:21:14,640
much as possible.

458
00:21:14,680 --> 00:21:16,780
That's essentially the insight in DiskANN.

459
00:21:17,120 --> 00:21:20,920
The problem with DiskANN is that after you have added more than

460
00:21:20,920 --> 00:21:24,800
10 or so, 10 to 20% of the size of the data, you have to rebuild

461
00:21:24,800 --> 00:21:27,600
the entire graph, which is an incredibly expensive operation.

462
00:21:28,260 --> 00:21:31,080
That is absolutely terrifying to me, to someone that's been on

463
00:21:31,080 --> 00:21:34,200
call for large databases to just have, like you could max out

464
00:21:34,200 --> 00:21:37,720
like 128 cores, rebuilding this graph in prod, and it could take

465
00:21:37,720 --> 00:21:38,560
you down to 3 a.m.

466
00:21:38,560 --> 00:21:40,340
Because you don't know when you hit that threshold.

467
00:21:40,560 --> 00:21:43,860
And if you don't do it, then the approximations start getting

468
00:21:43,860 --> 00:21:45,860
bad and you start getting bad search results.

469
00:21:46,860 --> 00:21:49,500
The nice thing about the graphs is just they have very low latency

470
00:21:49,500 --> 00:21:51,340
but they're just very expensive to maintain.

471
00:21:51,740 --> 00:21:54,640
Now the first way, then there's the other type of index which

472
00:21:54,640 --> 00:21:55,880
are the clustered indexes.

473
00:21:56,260 --> 00:21:59,820
Clustered indexes are very simple to picture in your head, right?

474
00:21:59,820 --> 00:22:03,160
If you have a cluster of let's say you took out every song in

475
00:22:03,160 --> 00:22:05,760
Spotify and you clustered them in a coordinate system and you

476
00:22:05,760 --> 00:22:07,740
can visualize this in 2 dimensions.

477
00:22:08,720 --> 00:22:11,600
If you, then you plotted all the songs and the songs that are

478
00:22:11,600 --> 00:22:14,880
adjacent, of course, also adjacent in vector space and genres

479
00:22:15,380 --> 00:22:16,220
will emerge, right?

480
00:22:16,220 --> 00:22:18,580
There'll be a rap cluster, there'll be a rock cluster.

481
00:22:18,700 --> 00:22:21,580
You zoom in and you get, you know, like little sub clusters,

482
00:22:21,580 --> 00:22:24,160
I don't know, death metal, black metal, I don't know what all

483
00:22:24,160 --> 00:22:26,100
the different rock genres are, right?

484
00:22:26,320 --> 00:22:28,160
Somewhere in these clusters.

485
00:22:28,260 --> 00:22:30,920
And you can generate great recommendations based on that because

486
00:22:30,920 --> 00:22:33,280
you can look at, okay, what did Michael listen to and what are

487
00:22:33,280 --> 00:22:35,860
some songs that are close by that he hasn't listened to and same

488
00:22:35,860 --> 00:22:36,540
for Nik.

489
00:22:36,820 --> 00:22:40,200
So this is, it's very simple, you create a clustering algorithm

490
00:22:40,200 --> 00:22:43,020
that basically just tries to divide the data set into clusters

491
00:22:43,040 --> 00:22:45,840
and when you do a query, instead of looking at all the vectors,

492
00:22:45,840 --> 00:22:49,400
you look at, well, clearly the user is asking about a rock song,

493
00:22:49,400 --> 00:22:51,100
so we're only going to look in the rock cluster.

494
00:22:51,100 --> 00:22:53,560
And that way you divide down the number of vectors that you have

495
00:22:53,560 --> 00:22:54,220
to seek.

496
00:22:54,480 --> 00:22:57,720
Now the problem with this is that if you have everything in memory,

497
00:22:57,720 --> 00:23:00,200
it's not necessarily as optimal because you might have to look

498
00:23:00,200 --> 00:23:02,360
at more data than you do in a graph-based approach.

499
00:23:02,360 --> 00:23:06,880
And because RAM has such good random
read latency, the penalty

500
00:23:06,880 --> 00:23:09,440
is not necessarily worth it if
everything is in memory at all

501
00:23:09,440 --> 00:23:09,940
times.

502
00:23:10,200 --> 00:23:14,060
But this is great for disks, and
it's great for S3, because I

503
00:23:14,060 --> 00:23:17,560
can go to S3 and in 200 milliseconds
get gigabytes of data back,

504
00:23:17,560 --> 00:23:17,960
right?

505
00:23:17,960 --> 00:23:20,940
It doesn't matter if I'm getting
like, you know, a megabyte or

506
00:23:20,940 --> 00:23:23,720
a gigabyte, I can often get that
in the same round trip time

507
00:23:23,720 --> 00:23:25,120
if I exhaust the network.

508
00:23:25,320 --> 00:23:28,620
So if you then go into S3, basically,
you have to download all

509
00:23:28,620 --> 00:23:29,280
the clusters.

510
00:23:29,340 --> 00:23:32,560
So like, let's say clusters of
JSON to really simplify this.

511
00:23:32,560 --> 00:23:36,420
And then you just look at the closest
n clusters to your query

512
00:23:36,420 --> 00:23:36,900
vector.

513
00:23:36,900 --> 00:23:39,940
And then you download, you know,
cluster1.json, cluster2.json,

514
00:23:40,580 --> 00:23:42,560
whichever ones are closest in 2
round trips.

515
00:23:42,560 --> 00:23:44,960
And now instead of on the graph-based
ones where you're doing

516
00:23:44,960 --> 00:23:47,720
200 milliseconds, 200 milliseconds,
200 milliseconds to navigate

517
00:23:47,720 --> 00:23:50,420
the graph, you just have to do
200 milliseconds to get the clusters

518
00:23:50,420 --> 00:23:53,320
and then 200 milliseconds to get
all the clusters that were adjacent.

519
00:23:53,560 --> 00:23:56,580
The nice thing about these clustered
indexes is that with algorithms

520
00:23:56,580 --> 00:23:59,740
like SPFresh and lots of modifications
to them, we can incrementally

521
00:23:59,820 --> 00:24:02,420
maintain these clusters because
you can imagine that when you

522
00:24:02,420 --> 00:24:04,840
add a vector you just have to add
it to the cluster and it's

523
00:24:04,840 --> 00:24:05,440
just 1 write.

524
00:24:05,440 --> 00:24:07,180
The write amplification is very
low.

525
00:24:07,440 --> 00:24:10,080
Once in a while that cluster will
grow beyond a size.

526
00:24:10,080 --> 00:24:12,440
Let's say it's a thousand elements
long and you have to split

527
00:24:12,440 --> 00:24:14,940
the cluster and then you have to
do some modifications.

528
00:24:15,040 --> 00:24:16,720
That's essentially what SPFresh
is.

529
00:24:16,720 --> 00:24:18,980
And then there's a little bit higher
write amplification.

530
00:24:19,200 --> 00:24:22,200
But it's stable in the way that
you never reach this threshold

531
00:24:22,200 --> 00:24:25,020
where, okay, I've added 20% of
the dataset, I have to rebuild

532
00:24:25,020 --> 00:24:26,960
the entire thing as you do in DiskANN.

533
00:24:27,160 --> 00:24:29,700
HNSW don't have to do that, which
is why it's very, very, like,

534
00:24:29,700 --> 00:24:31,860
it's very nice, but it's just slow
still.

535
00:24:32,120 --> 00:24:36,360
So SPFresh, we think, writes a
really, really nice set of trade-offs

536
00:24:36,360 --> 00:24:39,720
where it's gonna be a little bit
slower, but slower in terms

537
00:24:39,720 --> 00:24:42,040
of, okay, instead of a search returning
in a millisecond, it

538
00:24:42,040 --> 00:24:45,020
might take 5 milliseconds, and
just no 1 cares in production

539
00:24:45,020 --> 00:24:45,600
for search.

540
00:24:45,720 --> 00:24:49,300
This matters for a point lookup
into a relational database, but

541
00:24:49,300 --> 00:24:51,540
for search it's a perfect set of
tradeoffs.

542
00:24:53,920 --> 00:24:55,540
Michael: Question on the cluster
splitting.

543
00:24:56,260 --> 00:24:59,440
Does that mean we don't ever need
to rebuild the whole index?

544
00:24:59,440 --> 00:25:02,960
Because I think that was a limitation
of the first cluster-based...

545
00:25:04,080 --> 00:25:07,420
IVF flat, I think, we started with
in pgvector, and that didn't

546
00:25:07,420 --> 00:25:11,000
have the splitting as far as I'm
aware, and therefore we had

547
00:25:11,000 --> 00:25:14,540
to rebuild the whole index from
time to time if the data changed

548
00:25:14,540 --> 00:25:15,040
significantly.

549
00:25:15,660 --> 00:25:16,220
Simon: That's right.

550
00:25:16,220 --> 00:25:17,360
And there's also MERGE,

551
00:25:17,360 --> 00:25:17,720
right?

552
00:25:17,720 --> 00:25:20,580
Nikolay: There's also not only split, there
is also MERGE in SPFresh,

553
00:25:20,680 --> 00:25:21,520
as I remember.

554
00:25:21,740 --> 00:25:23,040
Simon: Yes, there's MERGE as
well.

555
00:25:23,040 --> 00:25:23,700
Michael: Makes sense.

556
00:25:24,140 --> 00:25:26,980
Simon: And because, because you
might do deletes, which are

557
00:25:26,980 --> 00:25:27,940
also a pain.

558
00:25:28,660 --> 00:25:31,660
To my knowledge, pgvector does
not implement SPFresh.

559
00:25:31,960 --> 00:25:36,240
It's a, it is very difficult to
implement correctly.

560
00:25:36,760 --> 00:25:40,280
Nikolay: But also funny, I did
a little bit research in May,

561
00:25:40,280 --> 00:25:43,220
I think, when I discovered turbopuffer
and started reading about

562
00:25:43,220 --> 00:25:43,720
this.

563
00:25:44,140 --> 00:25:48,300
I saw original implementation was
actually on forked Postgres

564
00:25:49,300 --> 00:25:49,800
SPFresh.

565
00:25:50,540 --> 00:25:55,460
I saw some, you need to dig into
it, like in some Microsoft repositories

566
00:25:55,860 --> 00:26:00,400
also I think some Chinese engineers
were involved, something

567
00:26:00,400 --> 00:26:03,680
like there, Like I saw some repository
which was basically forked

568
00:26:03,680 --> 00:26:07,540
Postgres and original SPFresh
implementation was on top of that.

569
00:26:08,160 --> 00:26:11,260
Maybe I'm hugely mistaken, but
I saw something like this.

570
00:26:11,460 --> 00:26:12,380
But it's hard.

571
00:26:12,380 --> 00:26:12,940
I agree.

572
00:26:13,680 --> 00:26:17,200
And I also recalled what I wanted
to ask.

573
00:26:17,200 --> 00:26:21,100
I was lost in my previous question
because it was too many things.

574
00:26:21,280 --> 00:26:26,000
I recalled in your talks, you discuss
that S3 is ready to store

575
00:26:26,000 --> 00:26:26,500
data.

576
00:26:26,660 --> 00:26:30,260
S3 is ready because over the last
few years, they added important

577
00:26:30,280 --> 00:26:30,720
features.

578
00:26:30,720 --> 00:26:34,240
Can you recall what you talk about
at the talks?

579
00:26:34,740 --> 00:26:37,380
Simon: Yeah, I mentioned that
there were 2 things that you

580
00:26:37,380 --> 00:26:38,980
needed to build a new database.

581
00:26:39,840 --> 00:26:42,880
And there's a reason why a database
like turbopuffer hasn't already

582
00:26:42,880 --> 00:26:43,280
been built.

583
00:26:43,280 --> 00:26:44,440
It's a new storage architecture.

584
00:26:44,440 --> 00:26:46,040
It's only really possible now.

585
00:26:46,380 --> 00:26:49,540
The 3 things that have enabled
a database like turbopuffer to

586
00:26:49,540 --> 00:26:51,880
exist with this sort of pufferfish
architecture, right, where

587
00:26:51,880 --> 00:26:54,620
it's, you know, when the pufferfish
is deflated, it's in S3,

588
00:26:54,620 --> 00:26:57,840
and when it's, you know, somewhere
in between, it's in SSD, and

589
00:26:57,840 --> 00:27:00,800
then it's in memory when it's all
the way inflated.

590
00:27:00,860 --> 00:27:02,800
The reason that's possible is because
of 3 things.

591
00:27:02,800 --> 00:27:04,940
The first 1 is our NVMe SSDs.

592
00:27:05,320 --> 00:27:07,760
NVMe SSDs have a new set of trade-offs.

593
00:27:08,900 --> 00:27:11,620
They act completely differently
than other disks, right?

594
00:27:11,980 --> 00:27:13,580
SSDs are just sort of like...

595
00:27:13,860 --> 00:27:19,620
The old SSDs were very fast, but
NVMe SSDs have just phenomenal

596
00:27:19,760 --> 00:27:24,140
performance potential where basically
on an NVMe SSD, the cost

597
00:27:24,140 --> 00:27:28,700
per gigabyte is a hundred times
lower than memory, but the performance,

598
00:27:28,740 --> 00:27:31,460
if you use it correctly, is only
about 5 times worse.

599
00:27:32,720 --> 00:27:36,020
And you have to, by using NVMe
SSD correctly is really that you

600
00:27:36,020 --> 00:27:38,940
have to put a lot of concurrency
on the disk, but again, similar

601
00:27:38,940 --> 00:27:43,240
to S3, every single round trip
takes into hundreds of microseconds,

602
00:27:43,320 --> 00:27:45,280
but you can drive a lot of bandwidth.

603
00:27:45,660 --> 00:27:47,900
Old storage engines have not been
designed for that.

604
00:27:47,900 --> 00:27:50,440
You have to design from that, from
the day that you write the

605
00:27:50,440 --> 00:27:51,180
first line of code.

606
00:27:51,180 --> 00:27:53,140
Otherwise, it takes a very long
time to retrofit.

607
00:27:53,620 --> 00:27:56,940
It happens to be that that exact
usage pattern is also what's

608
00:27:56,940 --> 00:27:58,040
very good for S3.

609
00:27:58,520 --> 00:28:02,540
NVMe SSDs were not available in
the cloud until 2017, 2018.

610
00:28:02,700 --> 00:28:06,260
So this is, in database languages,
this is relatively new.

611
00:28:06,660 --> 00:28:09,940
Second thing that needs to happen
is that S3 was not consistent

612
00:28:10,080 --> 00:28:11,820
until December of 2020.

613
00:28:13,040 --> 00:28:16,480
I think this is the most counterintuitive
to most because most

614
00:28:16,480 --> 00:28:19,260
of us just think that it always
has been, but it hasn't.

615
00:28:19,280 --> 00:28:22,600
What that means is that when, if
you put an object on S3 and

616
00:28:22,600 --> 00:28:25,480
then you read it immediately after
you were, as of December of

617
00:28:25,480 --> 00:28:28,760
2020, guaranteed that it was going
to be read after write consistency.

618
00:28:29,440 --> 00:28:32,540
The third thing that needs to happen,
and this is very informed

619
00:28:32,560 --> 00:28:35,800
by the fact that I was on call
for Shopify for so long.

620
00:28:35,800 --> 00:28:39,140
When you're on call for a long
time, you gravitate towards very

621
00:28:39,140 --> 00:28:42,520
simple systems and ideally systems
that are constantly tested

622
00:28:42,520 --> 00:28:43,380
on their resiliency.

623
00:28:43,580 --> 00:28:46,020
So you don't get paged when something
abnormal happens.

624
00:28:46,020 --> 00:28:46,500
Nikolay: Yeah.

625
00:28:46,500 --> 00:28:48,820
Simon: And for us, what was very
important for us to be on

626
00:28:48,820 --> 00:28:52,260
call for a database for Justin
and I was that it only had 1 dependency.

627
00:28:52,440 --> 00:28:55,320
And that dependency could only
be 1 of the most reliable systems

628
00:28:55,320 --> 00:28:58,500
on earth, which is S3, Google Cloud
Storage, right?

629
00:28:58,500 --> 00:29:00,920
And the other derivatives like
Azure blob storage and so on.

630
00:29:00,920 --> 00:29:03,060
They're very, very, very reliable
systems.

631
00:29:03,520 --> 00:29:06,300
But you could not build the metadata
layer on top, right?

632
00:29:06,300 --> 00:29:08,600
So Snowflake and Databricks and
others that have built on top

633
00:29:08,600 --> 00:29:12,100
of this in the last generation
of new databases needed another

634
00:29:12,100 --> 00:29:16,220
metadata layer, some consensus
layer like FoundationDB or their

635
00:29:16,220 --> 00:29:21,020
own Paxos or RAF protocol to essentially
enforce the read off

636
00:29:21,020 --> 00:29:24,220
the right consistency, but also
to do various metadata operations

637
00:29:24,320 --> 00:29:24,820
atomically.

638
00:29:25,440 --> 00:29:29,860
But in late 2024, S3 finally announced
that re:Invent

639
00:29:29,860 --> 00:29:30,360
compare-and-swap.

640
00:29:30,360 --> 00:29:33,580
And what compare-and-swap allows
you to do is to put a file,

641
00:29:33,580 --> 00:29:37,260
let's say metadata.json, you download
the file, you do some modifications

642
00:29:37,360 --> 00:29:41,540
to it, and then you upload it again
with a version number, and

643
00:29:41,540 --> 00:29:44,100
you only upload it if the version
number is the same.

644
00:29:44,100 --> 00:29:46,960
Basically guaranteeing that you
did an atomic and nothing has

645
00:29:46,960 --> 00:29:48,140
changed in the interim.

646
00:29:48,480 --> 00:29:50,860
Very important when you're building
distributed systems, right,

647
00:29:50,860 --> 00:29:53,800
you can really implement anything
on top of that, as long as

648
00:29:53,800 --> 00:29:56,200
you're willing to take the performance
hit of going back and

649
00:29:56,200 --> 00:29:57,040
forth to S3.

650
00:29:57,040 --> 00:29:59,860
And of course, they have a whole metadata, Paxos, whatever thing

651
00:29:59,860 --> 00:30:00,840
to implement that.

652
00:30:00,840 --> 00:30:03,400
In GCS, it's Spanner, but I don't have to worry about that.

653
00:30:03,400 --> 00:30:06,300
That's for them to formally verify and whatever they need to

654
00:30:06,300 --> 00:30:07,580
do to uphold those constraints.

655
00:30:07,860 --> 00:30:10,460
Those were the 3 things that needed to happen.

656
00:30:10,460 --> 00:30:13,520
And that is requirement number 1, to build a new database.

657
00:30:13,900 --> 00:30:17,220
And so That's what was in the air for turbopuffer to grab.

658
00:30:17,440 --> 00:30:19,920
The second thing that you need for a new database is that you

659
00:30:19,920 --> 00:30:23,660
need a new workload that's begging for that particular storage

660
00:30:23,720 --> 00:30:24,220
architecture.

661
00:30:24,620 --> 00:30:28,580
So for Snowflake and Databricks, we saw that in, well, we want

662
00:30:28,580 --> 00:30:32,080
big scale analytics and it doesn't make sense also for the dollar

663
00:30:32,080 --> 00:30:35,220
per gigabyte that we have to pay in operational databases and

664
00:30:35,220 --> 00:30:36,600
adding indexes on everything.

665
00:30:36,760 --> 00:30:41,640
So there was a new OLAP workload, and there was at least an acceleration

666
00:30:41,720 --> 00:30:44,440
of the OLAP workload with all the successful web applications,

667
00:30:45,040 --> 00:30:47,880
and then the new storage architecture on top of S3, but they

668
00:30:47,880 --> 00:30:50,980
had to do some more work because these APIs that I just mentioned

669
00:30:50,980 --> 00:30:51,800
weren't available.

670
00:30:51,980 --> 00:30:55,620
And a new workload now is connecting lots of data to AI.

671
00:30:55,840 --> 00:30:58,460
And this storage architecture is a very good fit for it.

672
00:30:59,340 --> 00:31:00,920
Nikolay: Yeah, So that's great.

673
00:31:00,920 --> 00:31:05,020
Thank you for elaborating about this 3 changes at S3.

674
00:31:05,020 --> 00:31:08,760
But they also made another change recently and they announced

675
00:31:08,800 --> 00:31:10,280
S3 vectors, right?

676
00:31:11,120 --> 00:31:14,400
What do you think about this comparing to your system?

677
00:31:15,580 --> 00:31:19,840
Simon: Yeah, I think that S3 Vectors is, if you, if you are

678
00:31:19,840 --> 00:31:22,200
writing the vectors once and you don't query them that much,

679
00:31:22,200 --> 00:31:24,640
and you don't care that much about the query latency, it can

680
00:31:24,640 --> 00:31:27,180
be a useful, useful product in the same way that you might be

681
00:31:27,180 --> 00:31:28,400
using S3 today.

682
00:31:28,820 --> 00:31:32,420
But S3 Vectors is not, doesn't do full text search, right?

683
00:31:32,420 --> 00:31:35,880
It doesn't do lots of these features that you need for a serious

684
00:31:36,200 --> 00:31:36,660
system.

685
00:31:36,660 --> 00:31:40,100
And even S3 Vectors recommends that you load into OpenSearch.

686
00:31:40,460 --> 00:31:42,900
And so for archival vectors, this can make sense.

687
00:31:42,900 --> 00:31:45,660
So they're still operational, but there's lots of limitations

688
00:31:45,660 --> 00:31:47,960
that would make it very difficult for it to go into production

689
00:31:47,960 --> 00:31:48,520
systems, right?

690
00:31:48,520 --> 00:31:51,660
If you do a query to S3 vectors, it takes hundreds and hundreds

691
00:31:51,660 --> 00:31:53,680
of milliseconds to get the result back.

692
00:31:53,680 --> 00:31:55,840
Whereas the turbopuffer, you can get the result back in less

693
00:31:55,840 --> 00:31:56,700
than 10 milliseconds.

694
00:31:57,180 --> 00:31:58,480
Nikolay: Thanks to cache, right?

695
00:31:58,480 --> 00:31:59,400
Simon: Thanks to cache.

696
00:31:59,440 --> 00:31:59,940
Nikolay: Yeah.

697
00:32:00,060 --> 00:32:01,980
Yeah, that's totally makes sense.

698
00:32:02,120 --> 00:32:05,460
I still I have skeptical questions more of them, if you don't

699
00:32:05,460 --> 00:32:05,860
mind.

700
00:32:05,860 --> 00:32:06,680
Simon: Let's go.

701
00:32:06,960 --> 00:32:11,200
Nikolay: Yeah, 1 of them is your
your cache layer is on local

702
00:32:11,200 --> 00:32:12,420
NVMEs, right?

703
00:32:12,540 --> 00:32:12,840
Yeah.

704
00:32:12,840 --> 00:32:13,940
But why?

705
00:32:13,940 --> 00:32:14,340
Why?

706
00:32:14,340 --> 00:32:16,060
Like, we could store it there.

707
00:32:17,020 --> 00:32:20,860
PlanetScale recently came to Postgres
ecosystem and they said

708
00:32:20,860 --> 00:32:26,080
let's stop having fears about using
local NVMEs and yes, like

709
00:32:26,080 --> 00:32:30,620
ephemeral storage and so on and
there is a reliable failover

710
00:32:32,140 --> 00:32:33,000
and so on.

711
00:32:33,060 --> 00:32:34,280
Let's do it super fast.

712
00:32:34,280 --> 00:32:35,740
Yes, I agree, super fast.

713
00:32:35,740 --> 00:32:41,820
And 4TB for a billion or 3TB for
a billion vectors probably won't

714
00:32:41,820 --> 00:32:46,080
cost too much because this price
is usually in AWS, for example,

715
00:32:46,320 --> 00:32:49,780
in the instances which have local
NVMe, it's basically included.

716
00:32:50,740 --> 00:32:54,720
And the limit is many, many dozens
of terabytes of local NVMe

717
00:32:54,720 --> 00:32:56,920
storage these days on larger instances.

718
00:32:56,920 --> 00:33:01,900
So we are fine to store not only
Vectors, but everything else.

719
00:33:02,060 --> 00:33:06,960
So my question is like back from
S3 to MySQL, for example, My

720
00:33:06,960 --> 00:33:10,300
SQL supports storage engines for
many years.

721
00:33:10,740 --> 00:33:15,340
Have you considered building storage
engine for, for MySQL,

722
00:33:15,340 --> 00:33:18,080
for example, and, and using local
NVMe?

723
00:33:19,700 --> 00:33:22,060
Simon: You can't outrun the economics,
right?

724
00:33:22,200 --> 00:33:24,360
The economics are still the same.

725
00:33:24,480 --> 00:33:28,080
You, you have to replicate whether
it's local NVMe or not, which

726
00:33:28,080 --> 00:33:31,020
is not necessarily cheaper, maybe
only marginally than, than

727
00:33:31,020 --> 00:33:32,320
a, than a network volume.

728
00:33:32,320 --> 00:33:34,420
You still have to replicate it
3 times, right?

729
00:33:34,540 --> 00:33:37,080
You still have to put it on 3 machines.

730
00:33:37,080 --> 00:33:39,480
You still need all the cores and
memory and so on that you would

731
00:33:39,480 --> 00:33:44,280
need on the primary to keep up,
unless you start to have a heterogeneous

732
00:33:44,700 --> 00:33:47,360
replica stack, which for a variety
of reasons would be a really

733
00:33:47,360 --> 00:33:48,060
bad idea.

734
00:33:48,080 --> 00:33:50,540
So you're still paying for paying
for all of that.

735
00:33:50,540 --> 00:33:53,660
Now up to a certain point that
makes a lot of sense right.

736
00:33:53,740 --> 00:33:56,820
If I have a customer who gets on
a call with our sales team and

737
00:33:56,820 --> 00:34:00,300
they have a couple million vectors
in pgvector there is no reason

738
00:34:00,300 --> 00:34:01,120
to move off of it.

739
00:34:01,120 --> 00:34:02,020
That is perfect.

740
00:34:02,120 --> 00:34:05,580
You should not be taking on the
complexity of ETLs and so on.

741
00:34:05,740 --> 00:34:10,320
But if you have like tens of terabytes
of vector data, it is

742
00:34:10,320 --> 00:34:12,180
not economical for a lot of businesses.

743
00:34:12,260 --> 00:34:14,060
Now for some businesses it are,
right?

744
00:34:14,060 --> 00:34:17,180
But the art of a business is to
earn a return on the underlying

745
00:34:17,200 --> 00:34:17,700
costs.

746
00:34:17,840 --> 00:34:20,740
And for some businesses, it's very,
very challenging to earn

747
00:34:20,740 --> 00:34:24,920
a return on storing this on 3 replicas,
this vector data, it's

748
00:34:24,920 --> 00:34:26,920
generally not valuable enough to
the business.

749
00:34:27,400 --> 00:34:30,940
So turbopuffer doesn't make sense
to as a storage engine into

750
00:34:30,940 --> 00:34:33,220
into a MySQL or into a Postgres.

751
00:34:33,400 --> 00:34:36,340
It's just fundamentally not compatible
with the way that we do

752
00:34:36,340 --> 00:34:38,400
things outside of replica chains,
right?

753
00:34:38,420 --> 00:34:41,320
You could maybe come up with a
storage engine where you page

754
00:34:41,320 --> 00:34:45,380
everything into S3 and all of that,
but you're now trying to

755
00:34:45,380 --> 00:34:48,560
build a new database inside of
an extremely old database, right?

756
00:34:48,560 --> 00:34:53,320
The storage layer is, is too, especially,
more so in Postgres,

757
00:34:53,320 --> 00:34:56,720
I think that in MySQL is very,
very weaved into, into how the,

758
00:34:56,720 --> 00:34:58,520
how the query planner works and
all of that.

759
00:34:58,520 --> 00:35:01,680
So at that point, you're rebuilding
the query planner to be very

760
00:35:01,680 --> 00:35:04,580
round trip sensitive, you're rebuilding
a storage layer.

761
00:35:04,640 --> 00:35:05,920
It doesn't make sense anymore.

762
00:35:05,920 --> 00:35:07,400
It's a completely different database.

763
00:35:07,800 --> 00:35:08,300
Nikolay: Okay.

764
00:35:08,680 --> 00:35:10,440
Another skeptical question.

765
00:35:10,680 --> 00:35:15,140
I go to turbopuffer website and,
by the way, great design, very

766
00:35:15,140 --> 00:35:20,280
popular these days, with mono,
mono space font and, like this

767
00:35:20,280 --> 00:35:24,520
types of graphic and your the ties
for 1 billion scale by 1 billion

768
00:35:24,520 --> 00:35:26,180
vectors or 1 billion documents.

769
00:35:26,320 --> 00:35:32,080
But there, if I check 1 billion,
the console of namespaces pops

770
00:35:32,080 --> 00:35:32,580
up.

771
00:35:32,860 --> 00:35:37,280
And namespaces, it's, if I choose
1 billion, if I choose a hundred

772
00:35:37,280 --> 00:35:38,160
million, it's okay.

773
00:35:38,160 --> 00:35:43,520
I can have 1 namespace, but if
it's 1 billion, there is a warning

774
00:35:43,540 --> 00:35:46,820
that probably you should split
it to 10 namespaces.

775
00:35:47,540 --> 00:35:51,240
And this means 10 different indexes,
right?

776
00:35:51,820 --> 00:35:52,620
Simon: That's right.

777
00:35:52,960 --> 00:35:53,460
Nikolay: Yeah.

778
00:35:54,020 --> 00:35:57,080
So it's not actual 1 billion scale.

779
00:35:57,800 --> 00:36:01,320
Or like something is off here in
my mind.

780
00:36:01,420 --> 00:36:06,040
Like 1 billion is a single index,
should be index, single index.

781
00:36:06,260 --> 00:36:11,480
If we talk about 1 billion but
divided by 10 collections and

782
00:36:11,480 --> 00:36:15,360
indexes also, like it's already
a different story.

783
00:36:16,640 --> 00:36:19,940
And so can you explain this to
me?

784
00:36:20,240 --> 00:36:24,240
I saw this in the beginning and
I'm happy in this position I

785
00:36:24,240 --> 00:36:27,480
can ask the founder himself about
this.

786
00:36:27,660 --> 00:36:31,280
Simon: Yeah, we try to be extremely
transparent in our limits,

787
00:36:31,280 --> 00:36:31,560
right?

788
00:36:31,560 --> 00:36:34,460
And I think your mental model is
correct.

789
00:36:34,840 --> 00:36:37,760
Before this, we just had a limit
that said that we could do in

790
00:36:37,760 --> 00:36:41,620
a single namespace around 250 million
vectors, right?

791
00:36:41,780 --> 00:36:45,480
But I mean, even that is a simplification
because how big are

792
00:36:45,480 --> 00:36:46,000
the vectors?

793
00:36:46,000 --> 00:36:49,920
If they're 128 dimensions, they
are a lot less space, which is

794
00:36:49,920 --> 00:36:52,540
ultimately what matters here, which
is why we also put the gigabytes.

795
00:36:52,680 --> 00:36:56,660
So when we in the past, we had
just 250 million vectors on there

796
00:36:56,660 --> 00:37:00,040
as a limit, people came to us and
said, or we knew that people

797
00:37:00,040 --> 00:37:03,120
weren't testing with us, because
they wanted to sort of search

798
00:37:03,120 --> 00:37:04,060
a billion at once.

799
00:37:04,060 --> 00:37:07,380
And they didn't realize that you
could just do id % N and

800
00:37:07,380 --> 00:37:09,960
then you could do a billion and
it would be very economical for

801
00:37:09,960 --> 00:37:10,200
them.

802
00:37:10,200 --> 00:37:13,500
So we sort of had to put in the
docs like, yes, you can do a

803
00:37:13,500 --> 00:37:16,020
billion at once, but you have to
shard it.

804
00:37:16,560 --> 00:37:19,340
Now I would love to handle that
sharding for you, right?

805
00:37:19,340 --> 00:37:22,300
I mean that's what Elasticsearch
does and it's what a lot of

806
00:37:22,300 --> 00:37:25,520
databases do because the only way
to scale any database is sharding,

807
00:37:25,520 --> 00:37:25,920
right?

808
00:37:25,920 --> 00:37:26,920
You don't get around it.

809
00:37:26,920 --> 00:37:29,120
The question is where the complexity
lives, right?

810
00:37:29,180 --> 00:37:33,080
Does the complexity live inside
of the database to handle it

811
00:37:33,080 --> 00:37:33,880
for you.

812
00:37:34,000 --> 00:37:37,040
Some of the most sophisticated
sharding you will find lives inside

813
00:37:37,040 --> 00:37:40,060
of Cockroach and Spanner and these
kinds of systems.

814
00:37:40,440 --> 00:37:43,640
And the simplest type of sharding
is what we're exposing, where

815
00:37:43,640 --> 00:37:47,360
every single shard is just a directory
on S3, and you can put

816
00:37:47,360 --> 00:37:49,760
as many of them as you want and
you can query as many of them

817
00:37:49,760 --> 00:37:50,460
as you want.

818
00:37:50,460 --> 00:37:52,820
Now over time, of course, we need
to create an orchestration

819
00:37:52,860 --> 00:37:55,800
layer on top of that so that a
logical namespace to the user

820
00:37:55,800 --> 00:37:57,540
is actually multiple namespaces
underneath.

821
00:37:57,840 --> 00:38:01,560
But we're challenging ourselves
to make every individual namespace

822
00:38:01,560 --> 00:38:03,300
as large as it possibly can.

823
00:38:03,420 --> 00:38:06,820
When I ran an Elasticsearch cluster,
or was involved in scaling

824
00:38:06,820 --> 00:38:10,600
Elasticsearch clusters, every shard
was around 50 gigabytes.

825
00:38:10,600 --> 00:38:13,240
That was roughly what was recommended
for Elasticsearch.

826
00:38:14,060 --> 00:38:15,260
Nikolay: Quite small, right?

827
00:38:15,360 --> 00:38:17,020
Simon: It's quite small, right?

828
00:38:17,020 --> 00:38:19,400
Like on Postgres it's a lot larger
than that.

829
00:38:19,540 --> 00:38:21,140
It's basically the size of the
machine.

830
00:38:21,420 --> 00:38:25,640
But the problem with a small shard
size is that for something

831
00:38:25,640 --> 00:38:29,600
like a B-tree, right, it's obviously
like it's log n, right?

832
00:38:29,600 --> 00:38:31,340
The number of searches you have
to do.

833
00:38:31,640 --> 00:38:36,300
And if you have log n and n is
very high, well, that's a great

834
00:38:36,300 --> 00:38:38,300
number of operations that you have
to do.

835
00:38:38,300 --> 00:38:40,600
But for every shard, there's sort
of like an m log n.

836
00:38:40,600 --> 00:38:44,760
And now m is very high if the shard
size is small and n is small.

837
00:38:44,760 --> 00:38:46,220
So you're doing a lot more operations.

838
00:38:46,440 --> 00:38:49,120
So we want the shards to be as
large as possible, because of

839
00:38:49,120 --> 00:38:53,400
course you can get to a billion
by just doing, you know, a thousand,

840
00:38:53,400 --> 00:38:56,260
1 million shards, but like that's
incredibly competitionally

841
00:38:56,480 --> 00:38:56,980
ineffective.

842
00:38:57,600 --> 00:39:01,080
So we have shards now for some
users that we're testing that

843
00:39:01,080 --> 00:39:03,380
do almost a billion documents at
once, right?

844
00:39:03,380 --> 00:39:05,740
But it requires some careful tuning
on our end.

845
00:39:05,740 --> 00:39:08,540
So we wanna push that number as
high as possible.

846
00:39:08,740 --> 00:39:11,600
The other thing with namespaces
is that turbopuffer is a multi-tenant

847
00:39:11,600 --> 00:39:14,280
system and we have to bin pack
these namespaces.

848
00:39:14,480 --> 00:39:18,260
So if a namespace is 5 terabytes
large, it's much harder for

849
00:39:18,260 --> 00:39:21,040
us to bin pack on node sizes that make sense.

850
00:39:21,040 --> 00:39:23,480
So we have to walk some threshold there where we try to make

851
00:39:23,480 --> 00:39:25,800
the shards as small as possible from the logical data.

852
00:39:25,800 --> 00:39:29,340
We have to bin pack and then we have to index by without slowing

853
00:39:29,340 --> 00:39:32,640
the indexing down because the larger an ANN index gets, the harder

854
00:39:32,640 --> 00:39:34,060
it is to maintain the index.

855
00:39:34,060 --> 00:39:35,820
So those are the constraints we walk.

856
00:39:35,860 --> 00:39:38,460
So over time, we update these limits continuously.

857
00:39:38,560 --> 00:39:40,220
You will see that shard size increase.

858
00:39:40,320 --> 00:39:43,520
But you're not going to find anyone who's doing 100 billion vectors

859
00:39:43,520 --> 00:39:47,880
in a single index on a single machine, AKA M is 1 and N is a

860
00:39:47,880 --> 00:39:50,660
hundred billion, but we want to get as high as possible because

861
00:39:50,660 --> 00:39:52,320
that is the most competitionally ineffective.

862
00:39:52,360 --> 00:39:54,360
That's how we get our users the best prices.

863
00:39:54,380 --> 00:39:56,500
And we see the most ambitious use cases.

864
00:39:57,260 --> 00:40:00,920
Nikolay: And, and, shards are these actual shards or partitions?

865
00:40:01,100 --> 00:40:04,340
So it's like shards, meaning that the different compute nodes

866
00:40:04,860 --> 00:40:08,800
behind the scenes are used if it's 10 namespaces is it 10 shards

867
00:40:08,800 --> 00:40:10,820
10 different compute nodes?

868
00:40:11,380 --> 00:40:14,940
Simon: So a namespace for us and this is also why we've chosen

869
00:40:14,940 --> 00:40:19,060
a different name for it right A namespace to us is a directory

870
00:40:19,060 --> 00:40:23,320
on S3, and which compute node that goes to is essentially just

871
00:40:23,320 --> 00:40:28,500
a consistent hash of the name of the prefix and the User ID,

872
00:40:28,500 --> 00:40:29,000
right?

873
00:40:29,180 --> 00:40:32,520
So compute node 1 could hash to maybe, you know, a hundred thousand

874
00:40:32,520 --> 00:40:33,260
different namespaces.

875
00:40:34,220 --> 00:40:37,120
And they share the compute and then we scale the compute right

876
00:40:37,120 --> 00:40:39,340
with, with that, and that's, that's why the bin packing is

877
00:40:39,340 --> 00:40:39,840
Nikolay: flexible.

878
00:40:39,960 --> 00:40:40,640
I see.

879
00:40:40,640 --> 00:40:40,900
Simon: Yes.

880
00:40:40,900 --> 00:40:43,780
And that's also part of why we can do get really good economics.

881
00:40:44,380 --> 00:40:44,540
Nikolay: Yeah.

882
00:40:44,540 --> 00:40:49,160
I wanted to shout out again, like front page is beautiful.

883
00:40:49,180 --> 00:40:54,800
It explains like basics, architecture and pricing right here,

884
00:40:54,800 --> 00:40:56,900
latencies right here and case studies.

885
00:41:00,480 --> 00:41:06,200
This is like how it should be, right?

886
00:41:06,200 --> 00:41:06,700
For engineers, you know, so you quickly see all the numbers and

887
00:41:06,700 --> 00:41:07,440
understand.

888
00:41:07,440 --> 00:41:07,440
That's great.

889
00:41:07,760 --> 00:41:08,800
Thank you for transparency.

890
00:41:08,940 --> 00:41:09,720
That's great.

891
00:41:09,920 --> 00:41:12,620
Simon: I think it's, it just comes from the fact that I was

892
00:41:12,620 --> 00:41:15,420
on that, you know, your side of the table, my whole career, right?

893
00:41:15,420 --> 00:41:18,720
I've been buying databases, evaluating databases, and sometimes

894
00:41:18,740 --> 00:41:22,480
I swear I end up on a Database website and I don't know if they're

895
00:41:22,480 --> 00:41:24,320
marketing a sneaker or a Database.

896
00:41:24,720 --> 00:41:28,840
Nikolay: Yeah, go figure out storage or compute price in AWS

897
00:41:28,940 --> 00:41:29,780
or GCP.

898
00:41:30,060 --> 00:41:32,580
It's like always a task.

899
00:41:32,800 --> 00:41:35,280
Simon: So we really wanted to
just put the foot forward of

900
00:41:35,280 --> 00:41:36,600
like the diagram is right there.

901
00:41:36,600 --> 00:41:36,820
Okay.

902
00:41:36,820 --> 00:41:38,040
This is my mental model.

903
00:41:38,040 --> 00:41:39,220
This is what it does.

904
00:41:39,280 --> 00:41:40,440
This is what it costs.

905
00:41:40,460 --> 00:41:41,880
This is how fast it is.

906
00:41:41,880 --> 00:41:42,880
These are the limits.

907
00:41:43,020 --> 00:41:45,020
These are the customers and how
they use it.

908
00:41:45,020 --> 00:41:46,100
You go to the documentation.

909
00:41:46,240 --> 00:41:47,220
We talk about guarantees.

910
00:41:47,220 --> 00:41:48,260
We talk about trade-offs.

911
00:41:48,900 --> 00:41:52,020
The kinds of things that I always
look for immediately to slot

912
00:41:52,020 --> 00:41:53,040
it into my model.

913
00:41:53,300 --> 00:41:56,120
Because I don't want you to use
our database at all costs.

914
00:41:56,120 --> 00:41:58,140
I want you to use it if it makes
sense for you.

915
00:41:58,140 --> 00:42:00,220
I want it to be a competitive advantage
to you.

916
00:42:00,220 --> 00:42:03,620
I want to save you like millions
of dollars a year and in order

917
00:42:03,620 --> 00:42:06,140
for that to make sense You have
to just put all the bullshit

918
00:42:06,140 --> 00:42:08,200
aside and make it very clear what
you're good at and what you're

919
00:42:08,200 --> 00:42:08,940
not good at

920
00:42:08,940 --> 00:42:11,960
Nikolay: What about for example
if I I know there is a vector

921
00:42:12,040 --> 00:42:15,100
search and full text search support
to try it But what about

922
00:42:15,100 --> 00:42:18,520
additional filtering when we have
some dimensions when we want

923
00:42:18,520 --> 00:42:20,220
to filter on them.

924
00:42:20,220 --> 00:42:21,020
Is it possible?

925
00:42:21,580 --> 00:42:22,040
Yes.

926
00:42:22,040 --> 00:42:27,760
For example, some categories or
like time ranges or something,

927
00:42:27,800 --> 00:42:29,140
it's all possible, right?

928
00:42:29,140 --> 00:42:30,900
In both types of search in turbopuffer.

929
00:42:31,120 --> 00:42:34,200
Simon: I'll go back to my line
before of every successful database

930
00:42:34,200 --> 00:42:36,200
eventually supports every query,
right?

931
00:42:36,200 --> 00:42:40,060
And, we're, I don't know, I like
maybe 5% of the way there.

932
00:42:40,440 --> 00:42:43,200
So we support filters and filtering
on vector search is actually

933
00:42:43,200 --> 00:42:44,140
very difficult, right?

934
00:42:44,140 --> 00:42:47,540
Because if you do something like,
let's say I'm searching for

935
00:42:47,540 --> 00:42:51,480
banana and it's an e-commerce site
and I have to filter for ships

936
00:42:51,480 --> 00:42:52,160
to Canada.

937
00:42:52,800 --> 00:42:55,440
Well that might cut off a fruit
cluster which is actually where

938
00:42:55,440 --> 00:42:58,780
I want it to be and then like you
know I get to the banana on

939
00:42:58,780 --> 00:43:01,060
a t-shirt cluster but it's really
far away.

940
00:43:01,160 --> 00:43:04,700
So you have to do very sophisticated
filtering to get good results.

941
00:43:05,240 --> 00:43:08,760
I don't know what pgvector does,
but I know that our query planner

942
00:43:08,760 --> 00:43:09,640
is recall aware.

943
00:43:09,640 --> 00:43:12,880
It does a lot of statistics and
distribution of where the vectors

944
00:43:12,880 --> 00:43:14,940
are to ensure that we have high
recall.

945
00:43:15,040 --> 00:43:18,160
And for a percentage of queries
in production, we check against

946
00:43:18,160 --> 00:43:21,540
an exhausted search what the recall
is, the accuracy of the index.

947
00:43:21,580 --> 00:43:24,920
I don't know if pgvector does this,
but I know it's been a monumental

948
00:43:24,920 --> 00:43:25,760
effort for us to

949
00:43:25,760 --> 00:43:25,860
Nikolay: do it.

950
00:43:25,860 --> 00:43:26,720
It's a big headache there.

951
00:43:26,720 --> 00:43:27,720
It's a big headache.

952
00:43:27,720 --> 00:43:30,740
Actually, it's a good point that
additional filtering can change

953
00:43:30,740 --> 00:43:32,140
semantics of the search.

954
00:43:32,140 --> 00:43:36,240
If it's searching for bananas,
size equals nano, it's very different.

955
00:43:37,460 --> 00:43:40,580
Michael: 1 thing they've implemented
relatively recently is iterative

956
00:43:40,640 --> 00:43:43,940
index scans so I think they had
a problem where for example you'd

957
00:43:43,940 --> 00:43:47,620
put a limit you do a similarity
search with a limit and a filter

958
00:43:48,480 --> 00:43:52,080
and you'd ask for a hundred results
and you'd get back fewer

959
00:43:52,420 --> 00:43:55,560
even though they were so it's a
they had a problem so they go

960
00:43:55,560 --> 00:44:00,100
back to the index and request more
so yes it's like a work around

961
00:44:00,100 --> 00:44:03,460
I'd say more than a solution but
it's pretty effective for a

962
00:44:03,460 --> 00:44:04,460
lot of these cases.

963
00:44:04,660 --> 00:44:07,240
Simon: I think the problem there
is that I would be very skeptical

964
00:44:07,240 --> 00:44:09,280
of the recall in a case like that,
right?

965
00:44:09,280 --> 00:44:12,860
Because, okay, so you just like
you got 100, but you probably

966
00:44:12,860 --> 00:44:15,920
should have looked at a lot more
data to know what the true top

967
00:44:15,920 --> 00:44:16,700
K was.

968
00:44:16,960 --> 00:44:20,440
So, you know, that solution was
not acceptable to us.

969
00:44:20,440 --> 00:44:23,660
Like we had to go much, much further
to ensure high recall.

970
00:44:23,680 --> 00:44:26,280
And I think the problem with a
lot of these solutions is that

971
00:44:26,280 --> 00:44:28,280
people are not evaluating their
recall because it's actually

972
00:44:28,280 --> 00:44:29,180
very difficult to do.

973
00:44:29,180 --> 00:44:30,900
And it's very competitionally expensive.

974
00:44:30,900 --> 00:44:34,640
You're not gonna have your production
Postgres run an exhaustive

975
00:44:34,640 --> 00:44:36,100
search on millions of vectors.

976
00:44:36,180 --> 00:44:38,860
It's too dangerous to do that while
you're doing your transactional

977
00:44:39,020 --> 00:44:39,520
workloads.

978
00:44:39,720 --> 00:44:42,040
But again, if you're hitting these
kinds of problems, then it

979
00:44:42,040 --> 00:44:44,800
may be time to consider maybe for
a subset of your workloads,

980
00:44:44,800 --> 00:44:47,960
whether something more specialized
makes sense, whether the trade-offs

981
00:44:47,960 --> 00:44:48,900
make sense to you.

982
00:44:48,900 --> 00:44:51,560
But I think with the pre-filtering
and post-filtering, it can

983
00:44:51,560 --> 00:44:53,940
be, it's very challenging to create
a query planner that can

984
00:44:53,940 --> 00:44:54,640
do that well.

985
00:44:54,640 --> 00:44:56,200
So we support all the queries,
right?

986
00:44:56,200 --> 00:44:58,080
You can do, and you can do range
queries.

987
00:44:58,080 --> 00:44:59,520
You can do exact queries.

988
00:44:59,540 --> 00:45:00,760
You can operate with arrays.

989
00:45:00,760 --> 00:45:04,540
You can do all types of queries,
set intersections that people

990
00:45:04,540 --> 00:45:05,660
use for permissions.

991
00:45:06,820 --> 00:45:09,720
We can do full-text search queries,
we can do GROUP BYs, we

992
00:45:09,720 --> 00:45:11,060
can do simple aggregates.

993
00:45:11,240 --> 00:45:13,940
So we can do more and more queries
and we're constantly expanding

994
00:45:13,940 --> 00:45:17,140
that with what our users demand
from the system that we've built.

995
00:45:17,220 --> 00:45:18,920
Nikolay: 1 more unknowing question.

996
00:45:18,960 --> 00:45:20,280
I know it's not open source.

997
00:45:20,280 --> 00:45:22,060
There is no free version at all.

998
00:45:22,740 --> 00:45:26,480
Even I'm pretty sure it was decision
not to go with open core

999
00:45:26,480 --> 00:45:29,400
or open source model at all, and
can you explain why?

1000
00:45:29,640 --> 00:45:32,820
Simon: Yeah, I think there's
never been any, any, any particular

1001
00:45:32,880 --> 00:45:34,860
resistance to open source.

1002
00:45:34,860 --> 00:45:36,740
I mean, I love open source.

1003
00:45:36,820 --> 00:45:40,200
The reason we're not open source
is because open source is, if

1004
00:45:40,200 --> 00:45:42,280
you want to do it well, it's a
lot of effort.

1005
00:45:42,780 --> 00:45:45,060
And it's also a lot of effort to
build a company.

1006
00:45:45,060 --> 00:45:47,380
It's a lot of effort to find product
market fit.

1007
00:45:47,680 --> 00:45:50,720
And we decided to pour all of our
energy into that.

1008
00:45:51,680 --> 00:45:53,980
And it's similar argument for the
minimum.

1009
00:45:53,980 --> 00:45:57,840
It's really the, the only thing
that a startup has over, you

1010
00:45:57,840 --> 00:45:59,980
know, the big incumbents is focus.

1011
00:46:00,140 --> 00:46:02,900
And so our focus has been on the
customers that are willing to

1012
00:46:02,900 --> 00:46:03,400
pay.

1013
00:46:03,820 --> 00:46:05,660
And 1 day I would love to give
a free tier.

1014
00:46:05,660 --> 00:46:08,300
I would love for people's blogs
to run on turbopuffer.

1015
00:46:08,360 --> 00:46:12,280
But for now, I'm afraid that we
have to prioritize the customers

1016
00:46:12,740 --> 00:46:15,040
that are willing to put some money
behind it.

1017
00:46:15,040 --> 00:46:18,040
It's not because we have provisioned
hardware or anything like

1018
00:46:18,040 --> 00:46:18,420
that.

1019
00:46:18,420 --> 00:46:21,760
It's really just that we need to
know that you're willing to

1020
00:46:21,760 --> 00:46:24,520
put some money behind it so we
can give you amazing support,

1021
00:46:24,520 --> 00:46:24,720
right?

1022
00:46:24,720 --> 00:46:27,260
A lot of the times there'll be
engineers working on the database

1023
00:46:27,260 --> 00:46:30,040
who are supporting your tickets
and we can't do that with a free

1024
00:46:30,040 --> 00:46:30,540
tier.

1025
00:46:32,200 --> 00:46:32,500
Nikolay: Yeah.

1026
00:46:32,500 --> 00:46:33,280
Makes sense.

1027
00:46:34,020 --> 00:46:34,280
Yeah.

1028
00:46:34,280 --> 00:46:37,700
And we, Mike and I have different
points of view on this area.

1029
00:46:37,900 --> 00:46:39,220
Michael: I think that was a good
answer.

1030
00:46:39,240 --> 00:46:40,620
I'm on your side, Simon.

1031
00:46:41,380 --> 00:46:44,940
I had a question on the full text
search and semantic search.

1032
00:46:45,040 --> 00:46:49,200
We had a discussion a while back
about hybrid, I think it's generally

1033
00:46:49,200 --> 00:46:50,280
called hybrid search.

1034
00:46:50,280 --> 00:46:52,960
Are you seeing much of that where
people are wanting to kind

1035
00:46:52,960 --> 00:46:56,600
of mix the results and order between
them?

1036
00:46:57,180 --> 00:46:58,240
Simon: Yeah, we do.

1037
00:47:00,280 --> 00:47:03,220
And what we see is that the embedding
models are pretty phenomenal

1038
00:47:03,520 --> 00:47:07,760
at finding good results for subjects
that they know about, right?

1039
00:47:07,800 --> 00:47:09,300
And terms that they know about.

1040
00:47:09,780 --> 00:47:13,020
You know, you, you, you run a project
called pgMustard, right?

1041
00:47:13,040 --> 00:47:15,060
And maybe the embedding model doesn't
know.

1042
00:47:15,060 --> 00:47:17,160
I think it's popular enough that
it will know what it is, but

1043
00:47:17,160 --> 00:47:18,400
let's say it didn't know.

1044
00:47:18,520 --> 00:47:21,420
And then it puts it in the cluster
close to ketchup.

1045
00:47:21,820 --> 00:47:23,440
And the results are just horrible.

1046
00:47:23,720 --> 00:47:26,160
And the text, the full text search
tends to be really good at

1047
00:47:26,160 --> 00:47:26,480
this.

1048
00:47:26,480 --> 00:47:29,140
Like recall for things that the
embedding model can't know about.

1049
00:47:29,140 --> 00:47:32,840
If you're searching for an algae,
you know, these like TV SKUs,

1050
00:47:32,840 --> 00:47:34,800
it's like, you know, it's indecipherable.

1051
00:47:35,860 --> 00:47:38,040
Actually, the embedding models
are actually quite good at these

1052
00:47:38,040 --> 00:47:40,020
because they happen enough on the
web.

1053
00:47:40,160 --> 00:47:42,820
But imagine that it wasn't these
kinds of searches it's good

1054
00:47:42,820 --> 00:47:43,200
for.

1055
00:47:43,200 --> 00:47:45,040
The other thing is search as your
type.

1056
00:47:45,060 --> 00:47:49,160
So if I'm searching for SI, the
embedding model might find that,

1057
00:47:49,160 --> 00:47:52,380
well, he's searching for something
agreeable in Spanish, right?

1058
00:47:53,200 --> 00:47:56,180
But really what I wanted was the
document that started with Simon.

1059
00:47:56,180 --> 00:47:58,620
So sometimes you just need to compliment
these twos.

1060
00:47:58,620 --> 00:48:01,480
And I think that that's why we've
doubled down on the full-text

1061
00:48:01,480 --> 00:48:04,120
search information in turbopuffer
to do this hybrid.

1062
00:48:04,120 --> 00:48:06,920
But people can get a lot of mileage
of embedding models alone.

1063
00:48:07,380 --> 00:48:07,880
Michael: Wonderful.

1064
00:48:08,520 --> 00:48:10,620
Nikolay: Yeah, thank you so much.

1065
00:48:11,000 --> 00:48:12,540
Yeah, it was super interesting.

1066
00:48:12,780 --> 00:48:14,420
Good luck with your system.

1067
00:48:15,360 --> 00:48:19,340
And yeah, I think it's an interesting
idea, I guess, for Postgres

1068
00:48:19,340 --> 00:48:24,340
users who need, as you said, to
store more and have all characteristics

1069
00:48:25,080 --> 00:48:26,260
turbopuffer have.

1070
00:48:26,920 --> 00:48:32,940
Maybe it's worth considering, right,
to keep all OLTP workloads

1071
00:48:33,040 --> 00:48:38,040
and data in regular Postgres while
moving vectors to turbopuffer,

1072
00:48:38,080 --> 00:48:38,580
right?

1073
00:48:39,000 --> 00:48:43,580
Simon: I mean, it's just similar
to, people have taken workloads

1074
00:48:43,580 --> 00:48:44,400
out of Postgres.

1075
00:48:44,540 --> 00:48:46,160
It's like updating a posting list.

1076
00:48:46,160 --> 00:48:49,220
It's a very expensive operational
and a transactional store.

1077
00:48:49,300 --> 00:48:52,380
And it's similar in an index updating
that with the same kinds

1078
00:48:52,380 --> 00:48:55,060
of ACID semantics that Postgres
upholds is very expensive.

1079
00:48:55,160 --> 00:48:58,020
And I've ripped full-text search
out of Postgres many times because

1080
00:48:58,020 --> 00:49:00,040
it's very, very expensive to do.

1081
00:49:00,060 --> 00:49:03,040
So we do that because we don't
want to shard Postgres, because

1082
00:49:03,040 --> 00:49:04,160
it's like a lot of work.

1083
00:49:04,160 --> 00:49:06,960
And so we start by moving some
of these workloads out first and

1084
00:49:06,960 --> 00:49:08,900
search is 1 of the early ones to
go.

1085
00:49:09,960 --> 00:49:14,640
Nikolay: So your point is that
it probably postpones the moment

1086
00:49:14,640 --> 00:49:15,760
when you need to shard.

1087
00:49:16,080 --> 00:49:20,040
Simon: Yeah, same reason from
memcache and redis, right?

1088
00:49:20,660 --> 00:49:23,100
You separate out these workloads
as soon as possible to avoid

1089
00:49:23,100 --> 00:49:23,600
sharding.

1090
00:49:24,920 --> 00:49:25,640
Kick the can.

1091
00:49:25,640 --> 00:49:26,660
Nikolay: Interesting idea.

1092
00:49:28,080 --> 00:49:30,060
Okay, again, thank you so much
for coming.

1093
00:49:30,060 --> 00:49:31,560
It was a super interesting discussion.

1094
00:49:31,560 --> 00:49:32,620
I enjoyed a lot.

1095
00:49:32,700 --> 00:49:33,420
Simon: Thank you so much.

1096
00:49:33,420 --> 00:49:34,480
Michael: Yeah, really nice to meet
you.

1097
00:49:34,480 --> 00:49:35,460
Thanks for joining us.