1
00:00:00,140 --> 00:00:02,360
Nikolay: Hello, hello, this is
Postgres.FM.

2
00:00:02,620 --> 00:00:08,160
As usual, I'm Nik, PostgresAI,
and as usual, my co-host is

3
00:00:08,160 --> 00:00:09,060
Michael, pgMustard.

4
00:00:09,440 --> 00:00:11,060
Hi, Michael, how are you?

5
00:00:11,580 --> 00:00:13,480
Michael: Hello, Nik, I'm good,
how are you?

6
00:00:14,200 --> 00:00:15,000
Nikolay: Very good.

7
00:00:15,460 --> 00:00:22,160
So, the topic you chose is to talk
about beyond Postgres, when

8
00:00:22,160 --> 00:00:25,220
we should avoid using Postgres,
right?

9
00:00:26,460 --> 00:00:29,880
Michael: Yeah, you put a shout
out on a few social networks asking

10
00:00:29,880 --> 00:00:34,020
people what kind of questions they'd
like us to answer and we

11
00:00:34,020 --> 00:00:37,860
had lots of good suggestions as
we've had for many years now

12
00:00:38,080 --> 00:00:40,860
and 1 of them was particularly
good and I thought was worth a

13
00:00:40,860 --> 00:00:44,480
whole episode which was yeah when
not to use Postgres I think

14
00:00:44,480 --> 00:00:50,240
there's a growing trend of or a
few popular blog posts of people

15
00:00:50,860 --> 00:00:54,480
saying you should consider Postgres
for most workloads these

16
00:00:54,480 --> 00:00:57,740
days and I think it is still an
interesting topic to discuss

17
00:00:57,820 --> 00:01:01,000
are there cases where it doesn't
make sense if so what are those

18
00:01:01,000 --> 00:01:04,340
and what like what when does it
make sense not to use Postgres?

19
00:01:04,760 --> 00:01:07,280
I was interested in your take on
some of these as well.

20
00:01:08,300 --> 00:01:11,820
Nikolay: Yeah, well, classic example
is analytics.

21
00:01:13,820 --> 00:01:17,220
Michael: Do you want to list a
few and then discuss them in a

22
00:01:17,220 --> 00:01:18,040
bit more detail.

23
00:01:18,540 --> 00:01:21,360
Nikolay: Yeah, let's create a frame
of this episode.

24
00:01:22,220 --> 00:01:25,060
So analytics, embedded databases.

25
00:01:26,740 --> 00:01:30,340
Michael: Yeah, so like, and I think
analytics, yeah, so analytics

26
00:01:30,340 --> 00:01:37,020
embedded, I think storing large
objects, there are some cases

27
00:01:37,020 --> 00:01:37,160
where

28
00:01:37,160 --> 00:01:37,640
Nikolay: it makes sense.

29
00:01:37,640 --> 00:01:38,460
Pictures and database.

30
00:01:40,080 --> 00:01:42,980
Michael: Exactly, especially larger
ones like videos, like very

31
00:01:42,980 --> 00:01:44,240
large objects, 100%.

32
00:01:45,060 --> 00:01:49,600
Nikolay: And just let's agree in
every area we can discuss pros

33
00:01:49,600 --> 00:01:53,080
and cons of using Postgres because
some people definitely have

34
00:01:53,080 --> 00:01:56,080
an opinion that it's not an excuse
to avoid Postgres.

35
00:01:56,380 --> 00:02:02,880
Let me add then, with this in mind,
let me add then a topic like

36
00:02:02,940 --> 00:02:05,020
ML datasets and pipelines.

37
00:02:06,300 --> 00:02:06,800
Right?

38
00:02:07,540 --> 00:02:08,040
Yeah.

39
00:02:08,100 --> 00:02:10,560
Machine learning and big data.

40
00:02:11,980 --> 00:02:14,400
Michael: I think anything where
there's specialized databases

41
00:02:14,680 --> 00:02:18,640
like search, vector databases.

42
00:02:19,180 --> 00:02:20,200
Nikolay: Vectors, exactly.

43
00:02:20,340 --> 00:02:21,860
Let's talk about vectors separately.

44
00:02:22,060 --> 00:02:23,120
It's worth it.

45
00:02:23,460 --> 00:02:26,040
Michael: And then 1 more, which
I think, well, actually, I had

46
00:02:26,040 --> 00:02:30,720
2 more on my list 1 was potentially
controversial I wondered

47
00:02:30,720 --> 00:02:39,640
if there's a case if you're at
extreme OLTP Write heavy Very

48
00:02:39,640 --> 00:02:43,940
very write heavy and you, like,
let's say you've got institutional

49
00:02:45,660 --> 00:02:49,840
experience with Vitess I think
sticking with that for the moment

50
00:02:49,840 --> 00:02:52,740
makes a lot of sense so like what
when not to use Postgres I

51
00:02:52,740 --> 00:02:56,120
wondered if like a new project
came along starting with Vitess

52
00:02:56,120 --> 00:03:01,120
while we get these sharded Postgres
like OLTP sharding Postgres

53
00:03:01,740 --> 00:03:05,220
solutions up and running I think
at the moment maybe it still

54
00:03:05,220 --> 00:03:07,260
makes sense to not use Postgres
there.

55
00:03:07,260 --> 00:03:09,940
Nikolay: Let's add then time series.

56
00:03:10,680 --> 00:03:11,180
Yeah.

57
00:03:11,520 --> 00:03:17,220
Let's discuss this area, time series,
and also data that can

58
00:03:17,220 --> 00:03:19,400
be compressed really well.

59
00:03:21,160 --> 00:03:23,040
This topic is close to analytics.

60
00:03:23,560 --> 00:03:24,060
True,

61
00:03:24,400 --> 00:03:24,900
Michael: yeah.

62
00:03:25,080 --> 00:03:28,680
And then I have 1 more, but I guess
it's kind of what I said

63
00:03:28,680 --> 00:03:29,440
just now.

64
00:03:29,540 --> 00:03:33,860
I think If you or your organization
have tons of experience with

65
00:03:33,860 --> 00:03:37,940
another database, the argument
for using Postgres for your next

66
00:03:37,940 --> 00:03:38,940
project is weaker.

67
00:03:38,940 --> 00:03:41,140
I'm not sure it's like when not
to use Postgres.

68
00:03:41,140 --> 00:03:43,364
I think I could think of lots of
counter examples.

69
00:03:43,364 --> 00:03:43,652
Well, this

70
00:03:43,652 --> 00:03:46,040
Nikolay: is an orthogonal discussion.

71
00:03:46,120 --> 00:03:50,280
You can say if we already have
a huge contract with Oracle, we

72
00:03:50,540 --> 00:03:55,940
already signed for next 5 years,
it's not wise to start using

73
00:03:55,940 --> 00:03:59,740
Postgres and those money will be
spent for nothing, right?

74
00:03:59,860 --> 00:04:03,160
There are many such reasons, right?

75
00:04:03,600 --> 00:04:05,460
Michael: Should we stick to technical
ones then?

76
00:04:05,460 --> 00:04:09,260
Nikolay: Yeah, like area, like
types of usage, area.

77
00:04:10,120 --> 00:04:12,060
Queue-like workloads, I would add.

78
00:04:12,700 --> 00:04:13,480
Michael: Yeah, interesting.

79
00:04:14,440 --> 00:04:15,340
Nikolay: Yeah, yeah, yeah.

80
00:04:16,060 --> 00:04:18,360
Last 1 is like kind of Kafka territory.

81
00:04:18,940 --> 00:04:20,560
Or there are others of course.

82
00:04:21,420 --> 00:04:21,920
Michael: Yeah.

83
00:04:23,300 --> 00:04:25,240
All right, let's, should we start
with analytics then?

84
00:04:25,240 --> 00:04:28,180
I feel like that's, I know we did
a whole episode, kind of a

85
00:04:28,180 --> 00:04:29,440
whole episode on this.

86
00:04:30,320 --> 00:04:34,660
Nikolay: Yeah, so, so row store
is not good for analytics and

87
00:04:34,660 --> 00:04:39,020
SELECT COUNT will always be slow,
you need denormalization or

88
00:04:39,020 --> 00:04:45,480
estimates, estimates will be slow,
well not slow, too rough and

89
00:04:45,480 --> 00:04:46,660
too wrong sometimes.

90
00:04:47,980 --> 00:04:49,320
Yeah, it sucks.

91
00:04:50,740 --> 00:04:53,460
Michael: I think there's a scale,
I think we've talked about

92
00:04:53,460 --> 00:04:56,940
this before, but there's a scale
up to which you'll be fine on

93
00:04:56,940 --> 00:04:57,440
Postgres.

94
00:04:58,080 --> 00:05:00,060
You could achieve better performance
elsewhere.

95
00:05:00,060 --> 00:05:04,000
But if you have hybrid, a lot of
systems now are hybrid, right?

96
00:05:04,000 --> 00:05:07,640
Like they have to be transactional,
but they have to provide

97
00:05:07,640 --> 00:05:10,840
some analytics dashboard for users
or something like it, but

98
00:05:10,840 --> 00:05:12,540
they still want real-time data.

99
00:05:12,540 --> 00:05:16,500
They still want transactional processing,
maybe 90 to 95, maybe

100
00:05:16,500 --> 00:05:19,640
even 99% of the workload is transactional,
and only like there's

101
00:05:19,640 --> 00:05:21,860
a few analytical queries from time to time.

102
00:05:22,360 --> 00:05:24,980
I still think those make a ton of sense on Postgres.

103
00:05:25,940 --> 00:05:27,600
Nikolay: Counts everyone needs, right?

104
00:05:27,880 --> 00:05:32,980
Pure OLTP applications, they need to show counts or understand

105
00:05:33,000 --> 00:05:35,580
pagination and like still have counts.

106
00:05:36,580 --> 00:05:40,200
If you think about social media, you will need to show how many

107
00:05:40,200 --> 00:05:42,420
likes, comments, and so on, reposts.

108
00:05:43,940 --> 00:05:51,500
So, and like to implement it in purely in Postgres, like a good

109
00:05:51,500 --> 00:05:53,860
way for generalization would be needed.

110
00:05:54,900 --> 00:06:00,480
And also, there are so many rakes lying around this, You can

111
00:06:00,480 --> 00:06:03,280
easily step on them and have hotspot.

112
00:06:04,180 --> 00:06:08,040
Like, you know, this classic hotspot when accounting system,

113
00:06:08,160 --> 00:06:12,520
like, like taking balances and let's say we have like whole balance

114
00:06:12,520 --> 00:06:15,600
and all transactions Update single row.

115
00:06:16,060 --> 00:06:16,380
Yeah.

116
00:06:16,380 --> 00:06:19,900
So this is where it can be an issue.

117
00:06:19,920 --> 00:06:23,660
And yeah, but at the same time, there are many attempts to do

118
00:06:23,660 --> 00:06:27,380
better and some attempts led to companies being acquired, right?

119
00:06:29,020 --> 00:06:29,940
I mean, Crunchy.

120
00:06:31,260 --> 00:06:36,720
There are many new ways of aiming to solve this problem, better

121
00:06:36,780 --> 00:06:37,900
analytics for Postgres.

122
00:06:39,280 --> 00:06:43,340
I see 2 big trends right now.

123
00:06:43,340 --> 00:06:50,280
1 trend is how Sid, founder of GitLab, recently had a post saying

124
00:06:50,280 --> 00:06:57,320
that Postgres and ClickHouse is a great couple of database systems.

125
00:06:57,440 --> 00:07:00,220
I don't remember the exact title of that post, but the idea is

126
00:07:00,220 --> 00:07:01,160
that it's great.

127
00:07:01,560 --> 00:07:03,480
They go together very well.

128
00:07:03,900 --> 00:07:09,040
We also had Sai, founder of PeerDB, which was acquired by ClickHouse.

129
00:07:09,840 --> 00:07:15,440
And I met with him last week, and we talked about, Again, the

130
00:07:15,440 --> 00:07:19,740
same idea that ClickHouse and Postgres are great together.

131
00:07:21,760 --> 00:07:22,940
This is 1 direction.

132
00:07:23,160 --> 00:07:27,240
Another direction is saying ClickHouse is very different.

133
00:07:28,860 --> 00:07:30,180
And not even maintaining.

134
00:07:30,180 --> 00:07:33,900
Maintaining is absolutely different, but also using it requires

135
00:07:33,900 --> 00:07:38,400
a different mindset and skills so it's better to choose, for

136
00:07:38,400 --> 00:07:39,360
example, DuckDB.

137
00:07:42,720 --> 00:07:45,700
Do everything inside Postgres but in a smarter way.

138
00:07:45,700 --> 00:07:51,000
This is what multiple companies worked on recently and 1 of them

139
00:07:51,000 --> 00:07:53,440
Crunchy was acquired by Snowflake.

140
00:07:54,560 --> 00:07:57,360
Michael: And we had it, we did an episode on pg_duckdb as well,

141
00:07:57,360 --> 00:07:59,340
so like a slightly different approach on that.

142
00:07:59,340 --> 00:08:02,720
But yeah, The Crunchy one's interesting because all the Queries

143
00:08:02,720 --> 00:08:07,280
go through Postgres, but a lot of the data's stored in like Iceberg

144
00:08:07,540 --> 00:08:09,300
or like some other file format.

145
00:08:10,080 --> 00:08:10,880
Yeah, exactly.

146
00:08:10,920 --> 00:08:12,510
It's in the Column last side.

147
00:08:12,510 --> 00:08:12,980
Yeah.

148
00:08:12,980 --> 00:08:15,060
So yeah, this definitely feels like 1 of those ones.

149
00:08:15,060 --> 00:08:18,560
There's also a third, by the way, there's a third option, which

150
00:08:18,560 --> 00:08:22,860
is these massively parallel databases.

151
00:08:24,140 --> 00:08:28,000
Like, well, I was at a, you spoke to Cy last week, I was at a

152
00:08:28,000 --> 00:08:34,040
UK event this week, And there was presentation from the successors

153
00:08:34,080 --> 00:08:37,360
of the Greenplum project, the kind of the open source successors,

154
00:08:37,420 --> 00:08:38,460
which is called Cloudberry.

155
00:08:38,560 --> 00:08:40,140
It looks like really interesting work.

156
00:08:40,140 --> 00:08:45,640
But that's another way of doing some analytics from within Postgres,

157
00:08:45,640 --> 00:08:46,320
kind of.

158
00:08:46,500 --> 00:08:46,800
Nikolay: Yeah.

159
00:08:46,800 --> 00:08:50,660
And from previous experience, from the past I remember cases

160
00:08:50,660 --> 00:08:54,400
when Postgres and Greenplum was combined in 1 project, and it

161
00:08:54,400 --> 00:08:57,660
was great, and it was some bank, even quite big 1 bank.

162
00:08:58,680 --> 00:09:02,940
And yeah, but somehow I stopped looking at Greenplum for quite

163
00:09:02,940 --> 00:09:04,460
long already, I don't know.

164
00:09:05,100 --> 00:09:06,880
There are also, of course, commercial databases.

165
00:09:06,940 --> 00:09:10,160
I remember Vertica, there is Snowflake, super popular, it's a

166
00:09:10,160 --> 00:09:11,700
major player in this area.

167
00:09:11,980 --> 00:09:14,620
By the way, I would distinguish 2 areas of analytics.

168
00:09:15,180 --> 00:09:18,880
1 is internal needs.

169
00:09:19,400 --> 00:09:22,580
For a company, we need to understand how business is doing a

170
00:09:22,580 --> 00:09:23,400
lot of stuff.

171
00:09:23,400 --> 00:09:24,640
We need a lot of reports.

172
00:09:26,660 --> 00:09:31,320
And another need is we need to show our users some counts.

173
00:09:31,720 --> 00:09:33,340
Like I said, social media.

174
00:09:33,340 --> 00:09:35,240
So 2 big areas, I think, also.

175
00:09:35,840 --> 00:09:36,840
Michael: Yeah, good point.

176
00:09:37,200 --> 00:09:39,180
Nikolay: In the first case, users are internal.

177
00:09:39,560 --> 00:09:41,600
In the second case, users are external.

178
00:09:41,880 --> 00:09:46,160
I'm pretty sure there are a lot of mixed cases, additional cases.

179
00:09:46,220 --> 00:09:48,560
But I personally like these 2 directions.

180
00:09:48,560 --> 00:09:53,500
Of course there are others, there is there is also Redshift on

181
00:09:53,500 --> 00:09:54,000
AWS.

182
00:09:55,520 --> 00:09:57,940
Michael: Also originally based on Postgres, yeah.

183
00:09:58,280 --> 00:10:01,320
Nikolay: So there are many options here.

184
00:10:01,980 --> 00:10:04,980
Michael: So yeah, is the short version at sufficient scale?

185
00:10:05,580 --> 00:10:08,680
It probably doesn't make sense to be using Postgres at this point

186
00:10:08,680 --> 00:10:11,820
for analytics, but like that that level is quite high.

187
00:10:13,580 --> 00:10:18,220
Nikolay: Yeah, but also I see cases when companies go to Snowflake

188
00:10:18,240 --> 00:10:20,400
then try to escape it.

189
00:10:21,220 --> 00:10:22,120
Michael: Come back, yeah.

190
00:10:22,120 --> 00:10:22,620
Nikolay: Okay.

191
00:10:23,160 --> 00:10:28,180
So going to Snowflake, it's like going to Oracle, in my opinion.

192
00:10:28,680 --> 00:10:30,300
Michael: You mean like in terms of financial?

193
00:10:30,960 --> 00:10:32,360
Nikolay: In terms of vendor lock-in

194
00:10:32,400 --> 00:10:33,780
Michael: and so on.

195
00:10:34,140 --> 00:10:37,400
Nikolay: Because it's just purely commercial offering.

196
00:10:39,000 --> 00:10:44,160
There are of course many tempting things there, features, performance,

197
00:10:46,160 --> 00:10:46,560
integrations.

198
00:10:46,560 --> 00:10:48,240
Michael: Nice product to use as well, yeah.

199
00:10:48,240 --> 00:10:49,000
Developer friendly.

200
00:10:49,000 --> 00:10:51,100
Nikolay: Yeah, well, users love it.

201
00:10:51,180 --> 00:10:51,800
I agree.

202
00:10:52,300 --> 00:10:58,000
But if we try to remain in more
open source and vendor look in

203
00:10:58,000 --> 00:11:01,520
less, then it's like it should
be excluded.

204
00:11:02,080 --> 00:11:02,860
Even ClickHouse.

205
00:11:04,020 --> 00:11:05,720
ClickHouse is open source itself.

206
00:11:08,000 --> 00:11:08,400
Michael: Yeah.

207
00:11:08,400 --> 00:11:08,900
Right.

208
00:11:09,520 --> 00:11:11,980
Well, you mentioned time series
being quite close to this.

209
00:11:11,980 --> 00:11:13,700
I feel like we should jump to that
next.

210
00:11:13,700 --> 00:11:14,560
What do you reckon?

211
00:11:14,960 --> 00:11:19,460
Nikolay: Well, TimescaleDB is
great, but it's also kind of underlooking

212
00:11:19,600 --> 00:11:21,000
because it's not open.

213
00:11:22,060 --> 00:11:22,540
Yeah.

214
00:11:22,540 --> 00:11:22,960
Michael: Yeah.

215
00:11:22,960 --> 00:11:26,500
So because of their license, other
cloud providers can't provide

216
00:11:26,500 --> 00:11:28,220
Timescale as a service easily.

217
00:11:28,820 --> 00:11:31,860
Or at least not the version with
lots of nice features.

218
00:11:32,600 --> 00:11:35,560
Nikolay: Yeah, in Timescale Cloud,
I had a recent case where

219
00:11:35,560 --> 00:11:38,600
we saw limitations again, very
badly.

220
00:11:39,140 --> 00:11:41,020
Like create database doesn't work.

221
00:11:41,680 --> 00:11:45,520
And moreover, lack of observability
tooling.

222
00:11:45,820 --> 00:11:50,200
Like again, like I keep promoting
on this podcast, if guys who

223
00:11:50,200 --> 00:11:55,460
build platforms listen to us, you
must add pg_wait_sampling.

224
00:11:56,760 --> 00:11:58,500
Unless you are RDS, okay.

225
00:11:58,660 --> 00:12:01,200
But even in case of RDS, we talked
about this.

226
00:12:02,160 --> 00:12:06,600
It's great to have it in SQL context
and be able to combine wait

227
00:12:06,600 --> 00:12:10,700
event analysis with regular pg_stat_statements
analysis.

228
00:12:10,760 --> 00:12:14,840
And pg_stat_kcache, additional very
good observability point.

229
00:12:15,040 --> 00:12:20,080
Because I had a case when guys
just compared everything, saw

230
00:12:20,080 --> 00:12:24,020
worst performance, worked closely
with Timescale.

231
00:12:24,340 --> 00:12:27,540
But in case of RDS, you see performance
insights, understand

232
00:12:29,020 --> 00:12:30,340
where we wait, right?

233
00:12:31,000 --> 00:12:37,460
In case of Timescale, only a rare
collection of samples from

234
00:12:37,460 --> 00:12:39,560
pg_stat_activity is possible.

235
00:12:39,820 --> 00:12:44,180
It's sometimes good enough, but
it's quite a rough tool to analyze

236
00:12:44,180 --> 00:12:44,680
performance.

237
00:12:46,020 --> 00:12:48,540
So yes, such things are lacking.

238
00:12:50,660 --> 00:12:54,520
And unfortunately, more and more
I come to the conclusion that

239
00:12:55,120 --> 00:13:00,060
when I recommend TimescaleDB to
customers, it contradicts with

240
00:13:00,060 --> 00:13:02,620
the idea they want to stay on managed
service.

241
00:13:03,480 --> 00:13:03,980
Michael: Yeah.

242
00:13:04,400 --> 00:13:04,900
Yeah.

243
00:13:05,500 --> 00:13:07,320
Because they're down to a single
choice.

244
00:13:07,380 --> 00:13:07,716
Yeah.

245
00:13:07,716 --> 00:13:08,052
Nikolay: Yeah.

246
00:13:08,052 --> 00:13:14,020
That being said, the Timescale
Cloud even offered me some bounty

247
00:13:14,020 --> 00:13:16,160
if I convince someone to go to
them.

248
00:13:16,160 --> 00:13:20,080
And this is great, like I love
loyalty programs, but I need to

249
00:13:20,080 --> 00:13:23,980
be fair, some pieces, big pieces
are missing, unfortunately.

250
00:13:24,840 --> 00:13:25,340
Michael: Yeah.

251
00:13:26,240 --> 00:13:31,040
And again, Postgres, even without
Timescale, can be used for

252
00:13:31,040 --> 00:13:33,560
good time series workloads up to
a certain point.

253
00:13:33,560 --> 00:13:38,320
We're just talking about at very
high scale, where all the features

254
00:13:38,320 --> 00:13:43,220
like compression, like continuous
aggregates, like automatic

255
00:13:43,240 --> 00:13:43,740
partition.

256
00:13:43,940 --> 00:13:44,760
Nikolay: Straight to the point.

257
00:13:44,760 --> 00:13:49,660
Straight to the 0, by the way,
for time series, ClickHouse is

258
00:13:49,660 --> 00:13:53,420
also still a good option and there
is also VictoriaMetrics,

259
00:13:53,420 --> 00:13:53,920
right?

260
00:13:54,340 --> 00:13:54,440
Michael: Yeah.

261
00:13:54,440 --> 00:13:58,680
Well, and I learned just yesterday
about even Cloudberry have

262
00:13:58,680 --> 00:14:00,460
incrementally updated materialized
views.

263
00:14:00,460 --> 00:14:01,840
I need to look into it.

264
00:14:01,840 --> 00:14:03,260
But that's quite cool.

265
00:14:03,260 --> 00:14:06,660
And if you're like, maybe that
would be

266
00:14:06,660 --> 00:14:08,240
Nikolay: what you think about this?

267
00:14:09,240 --> 00:14:12,340
Wouldn't it be great to have in
Postgres something like update

268
00:14:12,340 --> 00:14:15,960
materialize you where in you just
define the scope.

269
00:14:16,980 --> 00:14:18,140
And also concurrently.

270
00:14:19,340 --> 00:14:20,740
Michael: We should do a whole episode.

271
00:14:20,740 --> 00:14:23,320
I think there are several projects
that have started to look

272
00:14:23,320 --> 00:14:25,240
into incrementally updating materialized
views.

273
00:14:25,240 --> 00:14:27,720
And I think they're more complicated
than I've ever.

274
00:14:27,720 --> 00:14:30,100
Like, they're just, it's like 1
of those topics, the more you

275
00:14:30,100 --> 00:14:32,540
learn about it, the harder you
realize it is.

276
00:14:32,680 --> 00:14:36,680
Nikolay: Right now I'm in a position
where most, not everyone,

277
00:14:36,680 --> 00:14:40,360
but most of our customers are on
managed Postgres, so it's really

278
00:14:40,360 --> 00:14:43,900
hard for me to look at extensions
which are not available on

279
00:14:43,900 --> 00:14:46,160
RDS, CloudSQL and others.

280
00:14:46,720 --> 00:14:48,940
Michael: I understand, I'm just
thinking, like, I think it's

281
00:14:48,940 --> 00:14:51,820
worth learning from the extensions
as to what would be needed

282
00:14:51,820 --> 00:14:54,880
in core, like, how did, what did
they try, what was difficult

283
00:14:54,880 --> 00:14:57,740
about that, what, and it's not
just extensions, right, there

284
00:14:57,740 --> 00:15:00,820
are, there are whole companies
that have been built on the premise

285
00:15:00,840 --> 00:15:03,380
of, is it called, is it Materialize?

286
00:15:03,480 --> 00:15:03,920
Or like what

287
00:15:03,920 --> 00:15:04,840
Nikolay: was the- Yeah, yeah, yeah.

288
00:15:04,840 --> 00:15:07,960
I haven't heard for a few years
from them.

289
00:15:08,040 --> 00:15:10,420
What's, like, I'm curious what's
happening there.

290
00:15:10,460 --> 00:15:12,980
Lack of autonomous transactions
will be an issue,

291
00:15:13,320 --> 00:15:13,820
Michael: right?

292
00:15:14,480 --> 00:15:14,980
Yeah.

293
00:15:15,420 --> 00:15:20,260
Nikolay: Or Q-like, Q-like tool
inside Postgres.

294
00:15:20,600 --> 00:15:25,180
So asynchronously update will be
propagated through Q-like.

295
00:15:26,180 --> 00:15:31,360
If everyone had PgQ like CloudSQL has, from Skype, developed

296
00:15:32,300 --> 00:15:33,460
20 years ago.

297
00:15:34,020 --> 00:15:37,880
In this case, implementing incrementally
asynchronously updated

298
00:15:38,100 --> 00:15:39,340
materialized views would be easier.

299
00:15:39,340 --> 00:15:41,460
Michael: Well, yeah, async and
sync is...

300
00:15:41,840 --> 00:15:43,240
Anyway, this is an

301
00:15:44,060 --> 00:15:45,040
Nikolay: interesting topic.

302
00:15:45,060 --> 00:15:45,480
Yeah, yeah, yeah.

303
00:15:45,480 --> 00:15:48,540
And we already basically we just
touched workloads, queue-like

304
00:15:48,540 --> 00:15:49,040
workloads.

305
00:15:49,200 --> 00:15:50,280
It's still hard.

306
00:15:51,280 --> 00:15:52,580
Bloat is an issue, right?

307
00:15:52,580 --> 00:15:53,960
We discussed it, I think.

308
00:15:54,900 --> 00:15:56,920
Michael: I think, well, we've discussed
there are solutions,

309
00:15:57,080 --> 00:15:57,280
right?

310
00:15:57,280 --> 00:16:00,080
I actually think queues is 1 of
the ones that I was going to

311
00:16:00,080 --> 00:16:01,240
fight you hardest on.

312
00:16:01,240 --> 00:16:04,800
Like, I think there are ways to
do it badly within Postgres.

313
00:16:05,240 --> 00:16:08,680
And again, at extreme scale, I
think it wouldn't be smart to

314
00:16:08,680 --> 00:16:09,720
put it in, especially.

315
00:16:09,720 --> 00:16:10,700
Nikolay: I don't agree.

316
00:16:10,880 --> 00:16:12,420
Skype had extreme scale.

317
00:16:13,500 --> 00:16:14,200
Michael: Yeah, well, yeah.

318
00:16:14,200 --> 00:16:14,700
Okay.

319
00:16:15,040 --> 00:16:17,540
Nikolay: 20 years ago, 1 billion
users was a target.

320
00:16:18,840 --> 00:16:19,540
Michael: Good point.

321
00:16:19,540 --> 00:16:23,700
So maybe actually of all the ones
that we added to potentially

322
00:16:23,720 --> 00:16:26,540
be on the list, that would be 1
where I think if you manage it

323
00:16:26,540 --> 00:16:32,960
well, like PgQ at Skype did with
partitions, actually I think

324
00:16:33,860 --> 00:16:37,000
that's not an excuse to not use
Postgres if that's the title.

325
00:16:37,480 --> 00:16:39,560
Nikolay: Yeah, benefits are serious.

326
00:16:39,860 --> 00:16:44,620
Problem is like, when I recommend
keep queue-like workloads inside

327
00:16:44,620 --> 00:16:50,360
Postgres, I just say like understanding
whole complexity and

328
00:16:50,920 --> 00:16:54,960
hidden issues that you just need
a high level of understanding

329
00:16:54,960 --> 00:16:57,020
of what's happening to have it.

330
00:16:57,660 --> 00:16:59,440
But if you have it, it will be
great.

331
00:16:59,440 --> 00:17:00,060
It will be great.

332
00:17:00,060 --> 00:17:02,540
It's just not a small project,
unfortunately, usually.

333
00:17:02,540 --> 00:17:03,840
Michael: Yeah, good point.

334
00:17:04,280 --> 00:17:08,240
Nikolay: And this recent case with
Notify as well, because it's

335
00:17:08,240 --> 00:17:10,540
also sometimes used in such workloads.

336
00:17:11,460 --> 00:17:18,440
Yeah, as a reminder, Notify exclusive
log on database, serializing

337
00:17:18,520 --> 00:17:22,100
all Notifies makes it basically
not scalable.

338
00:17:23,100 --> 00:17:25,460
Yeah, yeah, all right.

339
00:17:25,760 --> 00:17:27,240
Anyway, what's the answer?

340
00:17:31,420 --> 00:17:34,480
Like If you needed to create a
project and you would need to

341
00:17:34,480 --> 00:17:38,100
think about analytics like think
about like, okay We will have

342
00:17:38,100 --> 00:17:42,720
terabytes of data very soon fast
growing What do we choose for

343
00:17:42,720 --> 00:17:43,200
analytics?

344
00:17:43,200 --> 00:17:44,880
What do we choose for?

345
00:17:45,300 --> 00:17:49,440
Queue like workloads for time series
was what are the choices you

346
00:17:49,440 --> 00:17:50,340
would make I?

347
00:17:51,340 --> 00:17:53,940
Michael: Think it does depend a
lot that you already said with

348
00:17:53,940 --> 00:17:54,440
analytics.

349
00:17:55,120 --> 00:17:57,880
Are we talking about a bank that
is doing?

350
00:17:58,460 --> 00:18:02,820
Nightly loads of data and only
cares about Internal reporting

351
00:18:03,040 --> 00:18:06,900
or are we talking about a user-facing
web app that has to do, 

352
00:18:06,900 --> 00:18:11,540
or like a social media app that
has to do counts and various 

353
00:18:11,540 --> 00:18:15,560
like aggregations that are user-facing? 

354
00:18:17,540 --> 00:18:18,880
Nikolay: Well you need both. 

355
00:18:18,900 --> 00:18:21,840
You need to think about both and
what architecture choice would 

356
00:18:21,840 --> 00:18:23,200
you make in the beginning. 

357
00:18:24,320 --> 00:18:24,720
Yeah. 

358
00:18:24,720 --> 00:18:28,980
For target of terabytes of data
in like in 1 year for example. 

359
00:18:29,440 --> 00:18:30,720
10 terabytes in 1 year. 

360
00:18:30,720 --> 00:18:31,740
What would you do? 

361
00:18:32,080 --> 00:18:34,040
Would you choose to stay in Postgres? 

362
00:18:35,140 --> 00:18:36,560
Michael: I'm a big fan of simplicity. 

363
00:18:36,820 --> 00:18:39,860
I think I would stick with Postgres
for as long, like until it 

364
00:18:39,860 --> 00:18:40,580
was painful. 

365
00:18:41,180 --> 00:18:43,220
Nikolay: Okay, re-engineered then. 

366
00:18:43,660 --> 00:18:47,160
Michael: Yeah, and I know that
would be painful, But that would 

367
00:18:47,160 --> 00:18:48,660
be my preferred option. 

368
00:18:49,640 --> 00:18:52,760
I think then I am tempted, I think
it's quite new at the moment, 

369
00:18:52,760 --> 00:18:58,480
but I am tempted by the work that
crunchy started on moving data 

370
00:18:58,480 --> 00:19:01,260
out to iceberg format and still
querying it from Postgres. 

371
00:19:01,560 --> 00:19:02,820
Like I like the, 

372
00:19:03,340 --> 00:19:03,460
Nikolay: I 

373
00:19:03,460 --> 00:19:04,840
Michael: like that I can keep 

374
00:19:04,840 --> 00:19:06,540
Nikolay: queries going from the
same place. 

375
00:19:06,850 --> 00:19:08,800
But it's possible on Cloud SQL
or RDS, right? 

376
00:19:08,800 --> 00:19:09,720
Michael: Not yet, right? 

377
00:19:09,720 --> 00:19:11,000
But I think it's quite early. 

378
00:19:11,000 --> 00:19:14,060
Like if I was starting a project
today, I would hope that they 

379
00:19:14,060 --> 00:19:15,060
caught up by then. 

380
00:19:15,060 --> 00:19:19,500
And if not, then a ClickHouse,
like a whatever PeerDB is called 

381
00:19:19,500 --> 00:19:22,680
now within ClickHouse, like having
that go out to an analytic 

382
00:19:22,680 --> 00:19:25,220
system like ClickHouse makes a
load of sense to me. 

383
00:19:25,520 --> 00:19:26,500
What do you think? 

384
00:19:27,180 --> 00:19:30,980
Nikolay: I would choose self-managed
Postgres 100%. 

385
00:19:31,720 --> 00:19:34,460
And I would use TimescaleDB full-fledged. 

386
00:19:34,940 --> 00:19:36,240
Michael: Got it, OK, yeah. 

387
00:19:36,600 --> 00:19:40,820
Nikolay: And then I would consider
DuckDB path as well, additionally, 

388
00:19:41,300 --> 00:19:42,240
at some point. 

389
00:19:43,140 --> 00:19:46,720
And queue workloads, queue-like
workloads, I would engineer perfectly 

390
00:19:47,220 --> 00:19:49,580
and squeeze as much as Postgres can. 

391
00:19:49,960 --> 00:19:50,460
Yeah. 

392
00:19:50,500 --> 00:19:51,880
And what else we touched? 

393
00:19:52,800 --> 00:19:55,240
Michael: So again, all of these
are actually sticking with Postgres 

394
00:19:55,240 --> 00:19:58,620
and then just at some point in
the future, you're going to have 

395
00:19:58,620 --> 00:20:00,400
to think about sharding if you 

396
00:20:00,400 --> 00:20:02,960
Nikolay: can't in the future. 

397
00:20:03,140 --> 00:20:05,680
Only 1 reason would make me do
it. 

398
00:20:06,580 --> 00:20:10,440
So I wouldn't go, in these 3 areas
we just discussed, I wouldn't 

399
00:20:10,440 --> 00:20:11,700
go away from Postgres. 

400
00:20:12,520 --> 00:20:16,420
Although I understand very well
customers who have and why they

401
00:20:16,420 --> 00:20:18,660
say we need ClickHouse or something.

402
00:20:19,180 --> 00:20:23,200
In my company, I would add ClickHouse
only if there is a strong

403
00:20:24,140 --> 00:20:26,180
team which needs it.

404
00:20:26,200 --> 00:20:29,760
And this is their choice and I
delegate and then I don't, I'm

405
00:20:29,760 --> 00:20:31,080
not involved in this decision.

406
00:20:31,080 --> 00:20:34,760
But while this choice is still
mine I would stick to Postgres

407
00:20:34,760 --> 00:20:36,480
and just make it work better.

408
00:20:37,080 --> 00:20:38,900
And can scale until IPO.

409
00:20:42,040 --> 00:20:45,840
I saw it several times with several
companies.

410
00:20:47,640 --> 00:20:49,340
Michael: Cool, makes sense.

411
00:20:50,800 --> 00:20:54,760
So I wonder if this next one's
going to be the first 1 where

412
00:20:54,760 --> 00:21:00,060
we both would and don't use Postgres,
which is storing large

413
00:21:00,060 --> 00:21:02,340
objects, like large files.

414
00:21:03,060 --> 00:21:08,360
Nikolay: Well definitely, yeah,
last time I tried to store a

415
00:21:08,360 --> 00:21:14,020
picture inside Postgres was probably
2006 or 7, when I was just

416
00:21:14,020 --> 00:21:19,440
exploring, you know, like oh this
is working, okay, But no, I

417
00:21:19,440 --> 00:21:23,840
even don't know how, what will
happen, you know, like this, this,

418
00:21:23,840 --> 00:21:28,440
like this piece of Postgres I touch
super rarely, you know.

419
00:21:28,440 --> 00:21:28,940
Michael: Yeah.

420
00:21:29,380 --> 00:21:34,800
I think the 1 exception is text-based
stuff, stuff that you might

421
00:21:34,800 --> 00:21:38,360
want to query it, but even then
you probably

422
00:21:38,360 --> 00:21:38,740
Nikolay: want to

423
00:21:38,740 --> 00:21:39,440
Michael: be doing...

424
00:21:39,440 --> 00:21:39,940
PDFs.

425
00:21:40,680 --> 00:21:41,380
Nikolay: PDFs, yeah.

426
00:21:41,380 --> 00:21:45,020
Michael: But you know, it's like
some representation of the PDF,

427
00:21:45,020 --> 00:21:48,440
not the actual PDF, but the text
extracted from it.

428
00:21:48,720 --> 00:21:49,420
Oh, yes.

429
00:21:49,700 --> 00:21:50,107
Nikolay: It's going to be...

430
00:21:50,107 --> 00:21:53,920
Or it can be marked down, and then
we have a Pandoc or something

431
00:21:53,920 --> 00:21:56,940
which converts both to HTML and
PDF.

432
00:21:56,940 --> 00:21:58,700
This is what we do with our checkups.

433
00:21:59,160 --> 00:22:00,460
Originally it's in markdown.

434
00:22:01,340 --> 00:22:05,720
Michael: Yeah, and by the way,
and another possible exception

435
00:22:05,740 --> 00:22:08,600
I think it's almost not worth discussing
is if you only need

436
00:22:08,600 --> 00:22:14,220
to store 5 pictures like maybe
you know but I just don't I don't

437
00:22:14,220 --> 00:22:15,740
see many cases like that.

438
00:22:16,420 --> 00:22:18,000
Nikolay: Yeah yeah yeah.

439
00:22:18,000 --> 00:22:18,640
Still cool.

440
00:22:18,640 --> 00:22:19,220
Michael: All right.

441
00:22:19,480 --> 00:22:22,900
Nikolay: Yeah We recently implemented
attachments in our own

442
00:22:23,140 --> 00:22:28,940
system for pictures and various
like archives of logs or something

443
00:22:28,940 --> 00:22:32,780
which customers upload or we upload
PDFs as well sometimes.

444
00:22:33,520 --> 00:22:37,620
Of course we store them in GCS
in a secure manner.

445
00:22:38,560 --> 00:22:40,740
Not in Postgres 100% now.

446
00:22:40,920 --> 00:22:41,380
Yeah, yeah.

447
00:22:41,380 --> 00:22:43,580
I even don't think they're, yeah.

448
00:22:43,840 --> 00:22:48,420
It's just exercise for those who
probably don't have other tasks.

449
00:22:50,420 --> 00:22:51,880
Michael: Yeah, all right.

450
00:22:53,240 --> 00:22:58,220
Another 1 that I think is maybe
in between these 2, like advanced

451
00:22:58,260 --> 00:23:01,640
search, I still think there are
cases, like there are search

452
00:23:01,640 --> 00:23:04,900
use cases where people that I respect
and trust would still choose

453
00:23:04,900 --> 00:23:06,940
ElasticSearch over Postgres even
though they know that.

454
00:23:06,940 --> 00:23:08,400
Nikolay: By the way, sorry for
interrupting.

455
00:23:08,840 --> 00:23:12,320
I just realized you talked about
blobs.

456
00:23:13,820 --> 00:23:15,320
Yeah, but we also can.

457
00:23:15,560 --> 00:23:19,380
So And this can be an issue with
upgrades, I've heard, right?

458
00:23:19,440 --> 00:23:20,280
Major upgrades.

459
00:23:20,860 --> 00:23:23,400
I don't see them recently at all.

460
00:23:24,060 --> 00:23:27,980
So I mean, you can try to store
it in Postgres, but then you

461
00:23:27,980 --> 00:23:29,940
have some operational issues, additionally.

462
00:23:30,300 --> 00:23:34,780
Because not only I don't see them
often, people who design some

463
00:23:34,780 --> 00:23:37,400
procedures and tools, they also
don't see them often.

464
00:23:38,000 --> 00:23:43,140
And it's kind of exotic to keep,
to have blobs.

465
00:23:43,140 --> 00:23:46,100
Michael: Yeah, so when you say
major upgrades, are we talking

466
00:23:46,100 --> 00:23:50,220
about like the speed of the initial
sync?

467
00:23:50,220 --> 00:23:52,280
Or are we talking about pg_dump?

468
00:23:52,280 --> 00:23:55,180
If we go in the dump restore route,
then actually it's all having

469
00:23:55,180 --> 00:23:56,340
to get dumped out.

470
00:23:57,440 --> 00:23:58,140
Nikolay: Yeah, yeah.

471
00:23:58,140 --> 00:24:01,400
I just remember some notes about
it I saw maybe on RDS.

472
00:24:03,520 --> 00:24:05,520
It's specifically about large objects.

473
00:24:05,540 --> 00:24:06,960
Maybe I'm wrong, actually.

474
00:24:07,200 --> 00:24:10,040
Michael: I just remembered I'm
actually using, not for super

475
00:24:10,040 --> 00:24:12,380
large objects, but several kilobyte
files.

476
00:24:12,700 --> 00:24:18,780
Like, we store query plans in JSON
format or text format.

477
00:24:18,900 --> 00:24:22,600
Text format ones don't tend to
get that massive, but JSON format

478
00:24:22,600 --> 00:24:26,780
ones can be hundreds of, well,
we've actually seen a couple that

479
00:24:26,780 --> 00:24:27,980
were tens of megabytes.

480
00:24:27,980 --> 00:24:29,960
I think 1 or 2 that are in the
hundreds of.

481
00:24:29,960 --> 00:24:30,460
But.

482
00:24:32,740 --> 00:24:35,820
Nikolay: We need to amend this
part of the episode.

483
00:24:36,540 --> 00:24:40,360
You're talking about varlena types.

484
00:24:41,460 --> 00:24:41,960
Yeah.

485
00:24:42,200 --> 00:24:46,560
There is a special thing, large
object facility, special chapter.

486
00:24:46,560 --> 00:24:48,680
Michael: Yes, sorry, that's a different,
yeah.

487
00:24:49,000 --> 00:24:53,980
Nikolay: Yeah, in Berlin everyone
uses JSONs, large texts, everyone.

488
00:24:54,440 --> 00:24:55,820
Even by the race.

489
00:24:56,000 --> 00:24:59,220
Michael: Okay, you mean there's
a specific issue with the thing

490
00:24:59,220 --> 00:25:00,060
called large objects?

491
00:25:00,060 --> 00:25:02,720
Nikolay: I cannot say I don't touch
large JSONs.

492
00:25:02,720 --> 00:25:03,940
Of course, I touch them a lot.

493
00:25:03,940 --> 00:25:04,960
We have them a lot.

494
00:25:04,960 --> 00:25:08,860
And yeah, we talk about how they
toast in Postgres.

495
00:25:08,860 --> 00:25:09,440
Michael: Yeah, yeah.

496
00:25:09,620 --> 00:25:13,940
Nikolay: Large JSONs, large XMLs
sometimes, right?

497
00:25:14,020 --> 00:25:16,100
Texts, of course, large texts,
everything.

498
00:25:16,120 --> 00:25:19,920
For example, our RAG system for
AI assistant has a really large

499
00:25:19,920 --> 00:25:24,860
text, chunks of source code, many
of these discussions, kilobytes.

500
00:25:25,080 --> 00:25:26,780
Michael: And you put all of those
in Postgres?

501
00:25:26,980 --> 00:25:30,440
Nikolay: Of course, because we
need to parse them and also full

502
00:25:30,440 --> 00:25:33,340
text search and vectors, that's
everything in Postgres right

503
00:25:33,340 --> 00:25:33,840
now.

504
00:25:34,000 --> 00:25:38,360
Michael: Well, you could store
the vectors in Postgres without

505
00:25:38,360 --> 00:25:40,760
storing the text in Postgres, but
the full text search makes

506
00:25:40,760 --> 00:25:41,680
a lot of sense.

507
00:25:42,340 --> 00:25:45,040
Nikolay: Yeah, I understand you,
but we do everything in Postgres,

508
00:25:45,280 --> 00:25:46,040
even vectorization.

509
00:25:46,420 --> 00:25:47,500
Yeah, okay, cool.

510
00:25:47,500 --> 00:25:52,400
It doesn't scale well if you need
to deal with billions of vectors,

511
00:25:52,440 --> 00:25:53,960
but millions is fine.

512
00:25:54,280 --> 00:25:55,440
Michael: Yeah, makes sense.

513
00:25:55,440 --> 00:25:58,780
Nikolay: So what I was talking
about is like, it's like a

514
00:25:58,780 --> 00:26:01,160
lo_create function, these things.

515
00:26:01,860 --> 00:26:04,320
Michael: Yeah, I've not used that
and you're saying you don't

516
00:26:04,320 --> 00:26:05,040
see it.

517
00:26:05,660 --> 00:26:07,440
Nikolay: Yeah, this is what I don't
see.

518
00:26:08,480 --> 00:26:11,820
Michael: Hopefully everyone got
the memo and no one's using it.

519
00:26:13,180 --> 00:26:13,680
Yeah,

520
00:26:14,180 --> 00:26:15,420
Nikolay: lo from byte...

521
00:26:15,740 --> 00:26:17,520
So yeah, I don't use those.

522
00:26:17,520 --> 00:26:21,620
And Again, last time I touched
that, it was so long ago.

523
00:26:23,560 --> 00:26:24,520
Actual blobs.

524
00:26:25,680 --> 00:26:26,820
lo_

525
00:26:27,560 --> 00:26:29,940
lo_put, lo_get, these functions.

526
00:26:30,800 --> 00:26:33,720
I have no idea, and I suspect something
will be broken if you

527
00:26:33,720 --> 00:26:34,600
start using them.

528
00:26:34,600 --> 00:26:36,420
Some operations like upgrade maybe.

529
00:26:36,680 --> 00:26:39,560
Not broken, but you will need to
take care of them.

530
00:26:42,180 --> 00:26:45,840
Like some side effects like table
spaces can have, you know.

531
00:26:46,880 --> 00:26:48,640
Recently not used also often.

532
00:26:50,360 --> 00:26:54,520
In cloud context we don't use often
the table spaces.

533
00:26:54,520 --> 00:26:58,220
But table spaces might be headache
when you do some migrations,

534
00:26:58,940 --> 00:27:03,060
move your database from place to
place or upgrade and so on.

535
00:27:03,420 --> 00:27:04,240
Michael: Yeah, yeah.

536
00:27:05,060 --> 00:27:05,560
Nikolay: Okay.

537
00:27:06,500 --> 00:27:07,440
Yeah, good 1.

538
00:27:08,560 --> 00:27:09,340
What else?

539
00:27:11,240 --> 00:27:13,860
Michael: We haven't talked about
embedded databases yet on the

540
00:27:13,860 --> 00:27:15,420
kind of tiny scale of things.

541
00:27:15,940 --> 00:27:18,840
Nikolay: Yeah, I'm not an expert
in the better databases.

542
00:27:18,940 --> 00:27:21,060
I thought SQLite is good.

543
00:27:21,540 --> 00:27:24,180
Michael: Yeah in this category.

544
00:27:24,280 --> 00:27:29,280
We do now have PGlite Looks like
a very interesting project

545
00:27:29,760 --> 00:27:34,240
But I think at the moment unless
I was doing some syncing between

546
00:27:34,240 --> 00:27:38,620
Postgres, unless I had a really
good reason, I'd probably still

547
00:27:38,620 --> 00:27:44,400
default to SQLite, however
the community pronounces

548
00:27:44,480 --> 00:27:44,980
it.

549
00:27:46,680 --> 00:27:51,020
But actually I was going to include
in this topic, like even

550
00:27:51,020 --> 00:27:52,940
browser local storage for example.

551
00:27:53,420 --> 00:27:58,640
If you're wanting to do stuff client
side in like a browser app

552
00:27:58,640 --> 00:28:03,420
or web app, it still makes sense
to use the local storage there,

553
00:28:03,420 --> 00:28:03,680
right?

554
00:28:03,680 --> 00:28:05,100
IndexedDB or whatever.

555
00:28:05,460 --> 00:28:10,180
So there are like a few embedded
cases where I don't think it

556
00:28:10,180 --> 00:28:15,300
makes sense to use Postgres or
if you're gonna try maybe PGlite.

557
00:28:15,920 --> 00:28:21,600
Nikolay: And can you remind me,
PGlite, what does it do?

558
00:28:21,980 --> 00:28:23,900
Is it related somehow to WebAssembly?

559
00:28:26,760 --> 00:28:27,940
I think yes, right?

560
00:28:28,580 --> 00:28:30,040
Michael: I think it must be, right?

561
00:28:30,040 --> 00:28:32,700
But I don't know enough about it.

562
00:28:33,240 --> 00:28:35,380
Nikolay: Yeah, I'm checking it's
a complete awesome build of

563
00:28:35,380 --> 00:28:38,040
Postgres that it's under 3 megabytes
exhibit.

564
00:28:38,040 --> 00:28:38,800
That's impressive.

565
00:28:39,960 --> 00:28:41,180
Michael: Yeah, it's a cool project.

566
00:28:41,420 --> 00:28:44,280
But I guess there's an argument
to say it's not actually Postgres.

567
00:28:44,540 --> 00:28:48,740
Like it talks and behaves like
Postgres, but it's kind of its

568
00:28:48,740 --> 00:28:49,440
own thing.

569
00:28:49,540 --> 00:28:52,540
Nikolay: Well, we can talk about
many Postgres variants like

570
00:28:52,540 --> 00:28:54,820
this, including Aurora and so on.

571
00:28:55,680 --> 00:28:58,940
Some of them more Postgres, some
of them less.

572
00:28:59,640 --> 00:29:00,140
Michael: Yeah.

573
00:29:01,400 --> 00:29:07,080
But if the topic is like when not
to use Postgres, yeah I guess

574
00:29:07,080 --> 00:29:07,420
Aurora.

575
00:29:07,420 --> 00:29:09,940
I don't know if I'd count Aurora
as that or not.

576
00:29:13,120 --> 00:29:15,120
Nikolay: I'm not sure I understand
what you mean.

577
00:29:15,480 --> 00:29:20,580
Michael: If the only solution was
use Aurora, let's say that

578
00:29:20,580 --> 00:29:24,640
was the, there was like a, 1 of
these cases, it turns out Aurora

579
00:29:24,640 --> 00:29:27,940
was the absolute best for, and
way better than native Postgres.

580
00:29:28,940 --> 00:29:31,520
I think I would count that as when
not to use Postgres.

581
00:29:32,440 --> 00:29:35,840
Because it's kind of Postgres compatible
like or Cockroach like

582
00:29:35,840 --> 00:29:37,440
or any of these kind of compatible
or

583
00:29:37,440 --> 00:29:38,840
Nikolay: you know

584
00:29:38,840 --> 00:29:43,260
Michael: yeah I'd say I like to
be yeah so you're right it's

585
00:29:43,260 --> 00:29:47,000
kind of a scale it's hard to like
draw a line where it where

586
00:29:47,000 --> 00:29:48,040
it is and isn't

587
00:29:48,140 --> 00:29:50,200
Nikolay: it's a spectrum Postgres
yeah

588
00:29:50,820 --> 00:29:55,320
Michael: yeah yeah cool okay but
yeah it feels like that's an

589
00:29:55,320 --> 00:29:56,420
easy 1, right?

590
00:29:58,260 --> 00:30:01,340
If you've got little devices or
little, you know, little sensors.

591
00:30:01,560 --> 00:30:03,980
Nikolay: Yeah, default choice is
SQLite already.

592
00:30:04,740 --> 00:30:09,060
And I like the idea of PGlite,
and I know Supabase used it for

593
00:30:09,060 --> 00:30:11,680
database.build project, which I
like a lot.

594
00:30:11,720 --> 00:30:15,920
With like merging this with AI
and right in browser, you can

595
00:30:15,920 --> 00:30:23,360
create pet projects and maybe like
explore and it's a very creative

596
00:30:23,600 --> 00:30:28,700
tool to think about, to bootstrap
some new project, think how

597
00:30:28,700 --> 00:30:33,940
it could look like It has the AR
diagram and you can iterate

598
00:30:33,940 --> 00:30:34,600
with the eyes.

599
00:30:34,600 --> 00:30:35,270
It's great.

600
00:30:35,270 --> 00:30:37,700
And there, PGlite works really
well.

601
00:30:38,420 --> 00:30:45,060
And I'm sure they already created
this ability to deploy, to

602
00:30:45,060 --> 00:30:49,040
sync what you build to real Postgres
in Supabase somewhere,

603
00:30:49,040 --> 00:30:49,540
right?

604
00:30:49,740 --> 00:30:54,460
Michael: Well, I think that was
the main aim of the company behind

605
00:30:54,680 --> 00:30:58,480
PGlite, called like ElectricSQL
or something.

606
00:30:59,240 --> 00:30:59,740
Nikolay: Replication.

607
00:31:00,780 --> 00:31:01,620
Michael: Yeah, exactly.

608
00:31:02,260 --> 00:31:06,340
So it's the whole premise was local,
if you heard of local first

609
00:31:06,340 --> 00:31:11,860
development, so the idea like apps
like Linear, the task management

610
00:31:11,960 --> 00:31:15,320
tool, they they're like lightning
fast because they do everything

611
00:31:15,320 --> 00:31:17,160
locally and then sync.

612
00:31:17,160 --> 00:31:18,660
Nikolay: Very thick client.

613
00:31:18,900 --> 00:31:19,440
Very thick client.

614
00:31:19,440 --> 00:31:19,940
Yes.

615
00:31:20,380 --> 00:31:25,920
Basically like Git, like Git clone,
when you type Git clone and

616
00:31:25,920 --> 00:31:28,040
execute it, it's basically a whole
repository.

617
00:31:28,500 --> 00:31:31,400
It can live on your machine.

618
00:31:31,560 --> 00:31:32,060
Yep.

619
00:31:32,500 --> 00:31:34,060
Distributed fashion, right?

620
00:31:34,060 --> 00:31:35,580
Michael: Well, and it has to handle
mergers.

621
00:31:35,900 --> 00:31:36,400
Nikolay: Yeah.

622
00:31:36,580 --> 00:31:37,700
You can use there.

623
00:31:38,100 --> 00:31:38,740
It's great.

624
00:31:38,740 --> 00:31:39,300
It's a conflict.

625
00:31:39,840 --> 00:31:44,340
I'm curious if we could explore
better branching in that area

626
00:31:44,340 --> 00:31:48,360
because we already very close to
implementing a synchronization

627
00:31:48,580 --> 00:31:50,100
between 2 DBLab engines.

628
00:31:50,200 --> 00:31:52,040
Yeah, but it's a different story.

629
00:31:52,660 --> 00:31:53,160
Yeah.

630
00:31:53,680 --> 00:31:54,640
I like the idea.

631
00:31:55,080 --> 00:31:57,340
So it might be a foundation for...

632
00:31:57,940 --> 00:32:01,260
PGlite, I mean, might be a foundation
for more apps which will

633
00:32:01,260 --> 00:32:05,880
live in browser but then be synchronized
with real Postgres.

634
00:32:07,440 --> 00:32:10,180
Michael: Yeah or even desktop apps
like it doesn't have to be

635
00:32:10,180 --> 00:32:10,680
browser-based.

636
00:32:11,920 --> 00:32:15,600
Nikolay: Well I guess the desktop
apps built on top of using

637
00:32:15,600 --> 00:32:16,560
electron right?

638
00:32:16,560 --> 00:32:18,740
Michael: Oh no yeah good point
good point.

639
00:32:20,140 --> 00:32:23,240
Nikolay: And then if you don't
have internet connection you still

640
00:32:23,240 --> 00:32:24,000
can work?

641
00:32:24,520 --> 00:32:24,820
Yeah.

642
00:32:24,820 --> 00:32:26,020
Like offline mode?

643
00:32:26,400 --> 00:32:27,100
That's great.

644
00:32:28,080 --> 00:32:29,380
I like the idea actually.

645
00:32:29,380 --> 00:32:30,891
I like the idea.

646
00:32:30,891 --> 00:32:32,780
So you have Postgres mirror.

647
00:32:32,780 --> 00:32:39,100
Postgres, well, it reminds me of
multi-master replication.

648
00:32:41,040 --> 00:32:41,980
This is complexity.

649
00:32:42,260 --> 00:32:45,900
Michael: All the same problems,
like with merging and conflicts.

650
00:32:47,900 --> 00:32:51,360
Nikolay: But at the same time,
recent Postgres has this ability

651
00:32:51,380 --> 00:32:56,420
to create subscription and avoid
loops of replication.

652
00:32:56,920 --> 00:32:57,680
Michael: Yeah, true.

653
00:32:57,740 --> 00:33:01,080
Nikolay: So origin something you
can say, I want to replicate

654
00:33:01,300 --> 00:33:04,860
on the data which doesn't have
origin, which means it was born

655
00:33:04,860 --> 00:33:05,360
here.

656
00:33:06,340 --> 00:33:09,300
Local origin basically, but means
no origin there somehow.

657
00:33:09,480 --> 00:33:13,580
Terminology is strange a little
bit, as usual in Postgres, right?

658
00:33:14,040 --> 00:33:18,140
But it's great ability to break
the loops, infinite loops.

659
00:33:19,020 --> 00:33:19,420
Yeah, it's

660
00:33:19,420 --> 00:33:19,540
Michael: interesting.

661
00:33:19,540 --> 00:33:22,400
Yeah, it fixes 1 of the problems,
but it doesn't...

662
00:33:23,000 --> 00:33:24,520
Nikolay: Fix all of them, no fix.

663
00:33:24,680 --> 00:33:25,820
Michael: Yeah, exactly.

664
00:33:26,320 --> 00:33:28,880
And like, last write win type things,
yeah.

665
00:33:29,380 --> 00:33:34,140
Nikolay: But if you use, if you
need to have very good, like,

666
00:33:34,940 --> 00:33:37,740
server-side application and so
on, you choose Postgres, but then

667
00:33:37,740 --> 00:33:43,940
you have these very thick clients,
and you need to choose a database

668
00:33:43,940 --> 00:33:47,360
for them, and you choose SQLite,
then you need to synchronize

669
00:33:47,360 --> 00:33:48,280
between them somehow.

670
00:33:48,280 --> 00:33:49,540
It's maybe even worse.

671
00:33:49,940 --> 00:33:50,580
Michael: Even harder.

672
00:33:50,580 --> 00:33:51,220
I think that's

673
00:33:51,220 --> 00:33:51,600
Nikolay: the point.

674
00:33:51,600 --> 00:33:53,940
Different data types and models.

675
00:33:54,060 --> 00:33:54,560
Yeah.

676
00:33:55,440 --> 00:33:55,940
Michael: Yeah.

677
00:33:57,180 --> 00:33:57,680
Cool.

678
00:33:57,940 --> 00:34:01,720
What about the specialized workloads
like vectors?

679
00:34:01,720 --> 00:34:03,340
And I was going to bring up search
as well.

680
00:34:03,340 --> 00:34:04,900
I think search is slightly easier.

681
00:34:05,980 --> 00:34:06,960
Nikolay: Let's bring search.

682
00:34:07,540 --> 00:34:10,440
Michael: I don't actually have,
I haven't written an app or been

683
00:34:10,440 --> 00:34:14,180
involved in an application that
is heavily reliant on, like,

684
00:34:14,180 --> 00:34:15,940
very advanced search features.

685
00:34:16,640 --> 00:34:22,760
But the people I speak to that
have swear by how good ElasticSearch

686
00:34:23,300 --> 00:34:23,800
is.

687
00:34:25,840 --> 00:34:28,480
Nikolay: This is also what I see.

688
00:34:28,940 --> 00:34:32,580
I touch Elastic only usually working
with some logs, application

689
00:34:32,680 --> 00:34:37,660
logs, Postgres logs, through Kibana,
so ELK, or how is it called,

690
00:34:37,660 --> 00:34:38,660
this stack.

691
00:34:38,940 --> 00:34:43,040
But I also see many customers use
Elastic and like it, and shift

692
00:34:43,260 --> 00:34:45,260
from full-text search to Postgres.

693
00:34:46,160 --> 00:34:51,560
There, okay, their choice right
and I know limitations of Postgres

694
00:34:51,560 --> 00:34:58,780
native full-text search yeah I'm also I
don't understand ParadeDB and

695
00:34:58,780 --> 00:35:04,260
I haven't seen benchmarks The benchmark
I saw was only in the

696
00:35:04,260 --> 00:35:09,260
beginning when they didn't create
index on tsvector column.

697
00:35:09,320 --> 00:35:11,740
Michael: They made a really interesting
hire recently.

698
00:35:11,920 --> 00:35:13,820
I saw, do you remember ZomboDB?

699
00:35:14,060 --> 00:35:15,060
Nikolay: Hires, yes.

700
00:35:15,060 --> 00:35:15,820
Michael: Do you remember?

701
00:35:16,160 --> 00:35:16,880
Several hires,

702
00:35:16,880 --> 00:35:18,120
Nikolay: Because they raise money,
yeah.

703
00:35:18,120 --> 00:35:19,220
But where are benchmarks?

704
00:35:19,940 --> 00:35:22,860
So I don't understand what's there
because I don't see benchmarks.

705
00:35:22,960 --> 00:35:27,440
I tried recently because their
CEO, founder, Philip, approached

706
00:35:27,440 --> 00:35:27,940
me.

707
00:35:29,240 --> 00:35:29,740
Nice.

708
00:35:30,160 --> 00:35:37,200
Maybe asking for, I guess, to spread
the word, but I cannot spread

709
00:35:37,200 --> 00:35:38,700
the word if I don't see numbers.

710
00:35:39,400 --> 00:35:42,180
If it's about performance company,
show numbers.

711
00:35:43,200 --> 00:35:45,120
I might be missing something because...

712
00:35:47,860 --> 00:35:48,420
Yeah, so.

713
00:35:48,420 --> 00:35:52,500
Michael: The other product in this
space for Postgres was an

714
00:35:52,500 --> 00:36:00,560
extension called ZomboDB, which
synchronized, which kept an ElasticSearch

715
00:36:01,080 --> 00:36:05,320
index maintained, but the data
coming from Postgres originally.

716
00:36:05,600 --> 00:36:09,380
So I thought that was a really
fascinating way of having both.

717
00:36:09,380 --> 00:36:12,620
A bit like when we talked about
analytics, like having the interface

718
00:36:12,620 --> 00:36:16,920
be from Postgres, but the actual
query being run on something

719
00:36:16,920 --> 00:36:17,860
that isn't Postgres.

720
00:36:18,480 --> 00:36:22,920
So that was fascinating and it
was the founder of that ZomboDB

721
00:36:23,200 --> 00:36:24,740
that recently joined Parade.

722
00:36:25,080 --> 00:36:26,620
So that seems interesting.

723
00:36:28,960 --> 00:36:30,380
Nikolay: This whole story seems
interesting.

724
00:36:30,380 --> 00:36:33,260
I don't understand it because I
cannot find numbers at the same

725
00:36:33,260 --> 00:36:37,800
time I see everyone mentions them,
a lot of blog posts, a lot

726
00:36:37,800 --> 00:36:44,760
of GitHub stars, a lot of like
a lot of noise, but where are

727
00:36:44,760 --> 00:36:48,580
the numbers and benchmarks so they
removed it after initial ones.

728
00:36:49,540 --> 00:36:52,920
Michael: Yeah, so if you were to
try and stay within Postgres,

729
00:36:52,920 --> 00:36:56,020
they seem like the obvious thing
to try.

730
00:36:56,360 --> 00:37:00,080
But I still see people choosing
ElasticSearch, and I'm not sure

731
00:37:00,080 --> 00:37:00,580
why.

732
00:37:00,720 --> 00:37:01,400
Nikolay: Yeah, yeah.

733
00:37:01,420 --> 00:37:02,340
Yeah, yeah.

734
00:37:02,980 --> 00:37:07,540
Please, if someone listening to
us can share benchmarks showing

735
00:37:07,540 --> 00:37:13,260
how ParadeDB is behaving under
load, some number of rows, some

736
00:37:13,260 --> 00:37:18,580
number of queries, like latencies,
buffers used ideally, right?

737
00:37:18,780 --> 00:37:23,320
I would appreciate it because I'm
still, I'm just, I'm stuck

738
00:37:23,320 --> 00:37:25,080
in question what is this?

739
00:37:26,280 --> 00:37:26,780
Michael: Cool.

740
00:37:27,340 --> 00:37:28,280
What about vectors?

741
00:37:28,520 --> 00:37:31,800
Nikolay: Vectors I have a picture
for you I saw yesterday near

742
00:37:31,800 --> 00:37:32,540
my house.

743
00:37:32,720 --> 00:37:37,260
These guys definitely, yeah, those
who can see YouTube, please

744
00:37:37,540 --> 00:37:38,580
check this out.

745
00:37:38,760 --> 00:37:42,180
So these guys are definitely experts
in vector storage.

746
00:37:42,840 --> 00:37:46,880
Michael: This is Nikolay's joke,
it's like a removal company

747
00:37:46,880 --> 00:37:49,240
that are called Vector Moving and
Storage.

748
00:37:50,280 --> 00:37:53,160
Nikolay: I saw storage as well,
I thought it's funny.

749
00:37:55,840 --> 00:37:58,580
So we have turbopuffer, right?

750
00:38:00,240 --> 00:38:02,580
Michael: Well, again, not Postgres,
right?

751
00:38:02,720 --> 00:38:05,340
Nikolay: Not Postgres at all, and
not open source at all.

752
00:38:05,380 --> 00:38:06,060
And not free

753
00:38:06,060 --> 00:38:06,500
Michael: at all,

754
00:38:06,500 --> 00:38:07,380
Nikolay: even though freemium.

755
00:38:08,480 --> 00:38:14,060
And data is in S3, and this new
type of indexes is being used

756
00:38:14,060 --> 00:38:14,560
there.

757
00:38:14,760 --> 00:38:16,220
I already forgot the name.

758
00:38:16,800 --> 00:38:19,620
But yeah, so it's not HNSW.

759
00:38:21,980 --> 00:38:23,160
Michael: Oh, interesting.

760
00:38:23,320 --> 00:38:24,100
Nikolay: Yeah, yeah, yeah.

761
00:38:24,100 --> 00:38:24,940
Michael: I didn't know.

762
00:38:25,080 --> 00:38:29,080
Nikolay: So HNSW doesn't scale
beyond a few million rows.

763
00:38:29,860 --> 00:38:38,860
DiskANN and we had the Timescale
TigerData guys who developed

764
00:38:38,860 --> 00:38:40,460
an advanced version of that.

765
00:38:40,840 --> 00:38:44,860
My perception, I don't see what
scales to billion rows at all.

766
00:38:44,860 --> 00:38:48,900
And turbopuffer says they scale
to billion rows, but as I understand,

767
00:38:48,960 --> 00:38:50,780
it's a multi-tenant approach.

768
00:38:51,220 --> 00:38:55,580
So every, like, it's not 1 set
of billion vectors.

769
00:38:55,760 --> 00:38:59,360
I also don't understand that, but
like I see some development,

770
00:38:59,380 --> 00:39:02,940
the planned scale for MySQL, they
implemented the same index.

771
00:39:03,580 --> 00:39:07,060
And this development started, I
think, at Microsoft and maybe

772
00:39:07,060 --> 00:39:08,140
in China, actually.

773
00:39:08,760 --> 00:39:11,280
Microsoft in China, this is what
I saw.

774
00:39:11,280 --> 00:39:13,820
And interestingly, they choose
Postgres for prototyping.

775
00:39:14,340 --> 00:39:17,420
So this area is worth additional
research.

776
00:39:18,080 --> 00:39:22,060
I started it and didn't have time,
unfortunately, but it's a

777
00:39:22,060 --> 00:39:25,900
very interesting direction, what's
happening with vector search.

778
00:39:26,500 --> 00:39:28,800
Because I think Postgres is losing
right now.

779
00:39:30,120 --> 00:39:31,960
Michael: Well you say losing, I
think...

780
00:39:32,220 --> 00:39:33,480
Nikolay: It's losing, 100%.

781
00:39:34,640 --> 00:39:35,880
Michael: Well, bear with me.

782
00:39:36,660 --> 00:39:41,180
I think there are a lot of use
cases that don't need the scale

783
00:39:41,180 --> 00:39:42,260
that you're talking about.

784
00:39:42,260 --> 00:39:47,140
And a lot of those are fine on
Postgres with pgvector.

785
00:39:47,520 --> 00:39:51,000
But you're probably talking about
the ones that then succeed

786
00:39:51,000 --> 00:39:55,780
and do really well and scale, they
hit a limit relatively quickly,

787
00:39:55,920 --> 00:39:57,720
or like within the first couple
of years.

788
00:39:57,720 --> 00:40:03,360
Nikolay: It's really hard to maintain
a huge HNSW index and latency-wise

789
00:40:03,700 --> 00:40:04,620
it's not good.

790
00:40:04,640 --> 00:40:08,620
turbopuffer, I'm not fully sold
on that idea that let's store

791
00:40:08,620 --> 00:40:09,840
everything on S3.

792
00:40:09,840 --> 00:40:14,860
Speaking on S3, a few weeks ago
they released S3 vectors.

793
00:40:15,940 --> 00:40:20,100
AWS released S3 vectors, and this
might become mainstream.

794
00:40:21,580 --> 00:40:25,020
So S3 itself right now supports
vector indexes.

795
00:40:26,580 --> 00:40:27,860
Have you heard about this?

796
00:40:28,420 --> 00:40:28,920
No.

797
00:40:29,440 --> 00:40:31,420
I think this might become mainstream.

798
00:40:32,560 --> 00:40:38,400
But if big change doesn't happen
in Postgres ecosystem, it will

799
00:40:38,400 --> 00:40:41,940
be worse than the case with full-text
search and Elastic.

800
00:40:41,940 --> 00:40:42,880
It will be worse.

801
00:40:44,760 --> 00:40:45,740
How it's called?

802
00:40:46,420 --> 00:40:48,940
Like, alarmist I am today, right?

803
00:40:49,940 --> 00:40:52,040
Michael: Well, this is the point
of the episode, right?

804
00:40:52,040 --> 00:40:55,760
It's like, it's almost by design
that we're talking about the

805
00:40:55,760 --> 00:40:56,120
weaknesses.

806
00:40:56,120 --> 00:40:58,860
Nikolay: I was feeling so good
saying like, I would choose Postgres

807
00:40:58,860 --> 00:41:00,400
for this, this and this.

808
00:41:00,560 --> 00:41:01,520
I can rely.

809
00:41:01,620 --> 00:41:09,180
But here, since we have 1.1 or
1.2 million vectors in our RAG

810
00:41:09,180 --> 00:41:11,260
system for Postgres knowledge.

811
00:41:11,540 --> 00:41:12,480
Postgres knowledge.

812
00:41:12,750 --> 00:41:15,180
Michael: Is that mostly because
of the mailing list?

813
00:41:15,180 --> 00:41:15,680
Yeah.

814
00:41:15,800 --> 00:41:17,340
Nikolay: Mailing list, I think,
70%.

815
00:41:17,460 --> 00:41:21,340
But we also have a lot of pieces
of source code of various versions

816
00:41:21,340 --> 00:41:24,940
and not only Postgres, PgBouncer
and so on and documentation

817
00:41:25,280 --> 00:41:31,120
it's a lot of stuff also blog posts
and I feel not well thinking

818
00:41:31,120 --> 00:41:32,880
about how to add more.

819
00:41:32,880 --> 00:41:34,700
And we are going to add more.

820
00:41:34,800 --> 00:41:37,980
We are going to do 10x at some
point.

821
00:41:38,100 --> 00:41:41,980
Of course, we will check what TigerData
has, but at the same

822
00:41:41,980 --> 00:41:44,520
time, I'm feeling not well.

823
00:41:45,120 --> 00:41:45,480
In terms

824
00:41:45,480 --> 00:41:46,740
Michael: of- Is the main issue
latency?

825
00:41:46,740 --> 00:41:47,640
I'm not sure.

826
00:41:48,240 --> 00:41:49,440
Is it query latency?

827
00:41:49,440 --> 00:41:51,040
Or what's the main, okay, yeah.

828
00:41:51,040 --> 00:41:54,260
Nikolay: Latency, index size, index
build time, all these things.

829
00:41:55,080 --> 00:41:55,580
Interesting.

830
00:41:55,600 --> 00:41:59,020
Ability to have additional filter,
which is something that HNSW

831
00:41:59,020 --> 00:42:00,400
still lacks, right?

832
00:42:00,740 --> 00:42:01,240
Yeah.

833
00:42:01,940 --> 00:42:02,420
Maybe.

834
00:42:02,420 --> 00:42:05,500
Michael: I do remember seeing an
update on the pgvector repo

835
00:42:05,500 --> 00:42:06,760
but I can't remember what.

836
00:42:06,760 --> 00:42:10,020
I feel like they had a something
to address this but I can't

837
00:42:10,020 --> 00:42:10,820
remember what.

838
00:42:11,680 --> 00:42:13,860
Nikolay: I haven't touched this
topic for several months.

839
00:42:13,860 --> 00:42:16,420
I might be already lagging in terms
of updates.

840
00:42:16,720 --> 00:42:19,780
It's a very hot topic, of course,
very young as well, right?

841
00:42:20,420 --> 00:42:23,480
Michael: Yeah, and not something
I'm experiencing again.

842
00:42:23,480 --> 00:42:24,960
It's more something I'm observing.

843
00:42:24,960 --> 00:42:27,080
So you're definitely way ahead
of me on this.

844
00:42:27,600 --> 00:42:28,100
Nikolay: Yeah.

845
00:42:28,360 --> 00:42:32,960
So I know just companies use, and
We have several customers who

846
00:42:32,960 --> 00:42:35,460
are in Postgres, but they chose
turbopuffer additionally.

847
00:42:36,660 --> 00:42:38,660
And Linear you mentioned, for example.

848
00:42:38,740 --> 00:42:41,140
Cursor and Linear, they also chose
turbopuffer.

849
00:42:41,840 --> 00:42:43,980
Notion chose turbopuffer to store
vectors.

850
00:42:44,140 --> 00:42:44,760
I'm just checking.

851
00:42:44,760 --> 00:42:45,060
They have

852
00:42:45,060 --> 00:42:47,080
Michael: Some cool customers on this list, yeah.

853
00:42:47,080 --> 00:42:47,580
Nikolay: Yeah.

854
00:42:49,300 --> 00:42:51,860
And several more companies which not mentioned here, we also

855
00:42:51,860 --> 00:42:54,980
mentioned, and they are our customers in terms of Postgres consulting.

856
00:42:55,760 --> 00:43:00,040
And they were super surprised to see that something is like massive

857
00:43:00,100 --> 00:43:02,540
migration of vectors is happening.

858
00:43:03,220 --> 00:43:07,080
Some moving company called turbopuffer helped them move their

859
00:43:07,080 --> 00:43:08,180
vectors to S3.

860
00:43:09,060 --> 00:43:14,360
But yeah, it's interesting and they use some different index,

861
00:43:14,580 --> 00:43:16,340
which is like younger idea.

862
00:43:16,780 --> 00:43:24,460
It's based on clustering and centroids, so vectors.

863
00:43:24,780 --> 00:43:30,390
So it's like ANN is implemented differently, not graph-like as

864
00:43:30,420 --> 00:43:38,340
in HNSW, but basically quickly understand which centroids of

865
00:43:38,340 --> 00:43:42,440
clusters are closer to our vector and then work with those clusters.

866
00:43:42,440 --> 00:43:45,480
Quite simple idea actually, but I guess there are many complexities

867
00:43:45,480 --> 00:43:46,820
in implementation there.

868
00:43:47,200 --> 00:43:47,700
Michael: Yeah.

869
00:43:48,420 --> 00:43:50,040
Well, and it's cheap, right?

870
00:43:50,220 --> 00:43:51,100
Being on S3.

871
00:43:51,500 --> 00:43:52,320
Nikolay: Oh, yes.

872
00:43:52,740 --> 00:43:53,500
And slow.

873
00:43:53,860 --> 00:43:57,180
But they have, turbopuffer, I guess they have additional layer

874
00:43:57,280 --> 00:44:00,700
to cache on regular disks closer to the database.

875
00:44:00,700 --> 00:44:05,960
So there's caching layer, of course, yeah, but it's much cheaper,

876
00:44:06,540 --> 00:44:07,040
much.

877
00:44:08,040 --> 00:44:12,800
Actually, this is another area, if you have hundreds of terabytes

878
00:44:12,840 --> 00:44:18,240
of data, tiering of storage in Postgres is still not fully solved

879
00:44:18,240 --> 00:44:18,740
problem.

880
00:44:20,440 --> 00:44:20,940
Right.

881
00:44:21,420 --> 00:44:23,080
Michael: Well, unless you shard, right?

882
00:44:23,260 --> 00:44:24,440
And this is like

883
00:44:24,440 --> 00:44:25,016
Nikolay: the next generation.

884
00:44:25,016 --> 00:44:27,440
It's too expensive to shard and keep everything on disks, especially

885
00:44:27,440 --> 00:44:31,320
if there is some archive data which you touch very rarely.

886
00:44:31,320 --> 00:44:33,540
I would prefer to have it on S3.

887
00:44:33,920 --> 00:44:37,700
And Timescale Cloud, TigerData, they solved it in their solution.

888
00:44:38,360 --> 00:44:42,840
We also had attempt to solve it from Tembo, which is not Postgres

889
00:44:42,840 --> 00:44:43,720
company anymore.

890
00:44:44,200 --> 00:44:44,700
Yeah.

891
00:44:45,060 --> 00:44:46,820
PGT, right, it was called.

892
00:44:46,820 --> 00:44:51,240
But this is, I think, this should be like more and more needed

893
00:44:51,440 --> 00:44:52,280
over time.

894
00:44:52,440 --> 00:44:55,520
Michael: Well, and it's a side effect of the, like, the Crunchy

895
00:44:55,520 --> 00:44:58,940
Data approach of putting things in Iceberg, that also solves

896
00:44:58,940 --> 00:44:59,600
the problem, right?

897
00:44:59,600 --> 00:45:02,220
You can archive the data from Postgres at that point.

898
00:45:02,440 --> 00:45:04,640
So it's a similar solution, isn't it?

899
00:45:04,640 --> 00:45:07,920
Nikolay: So I guess these days I would explore S3 vectors at

900
00:45:07,920 --> 00:45:08,640
this point.

901
00:45:08,680 --> 00:45:09,940
If I needed to.

902
00:45:10,160 --> 00:45:11,320
Maybe I will, actually.

903
00:45:11,320 --> 00:45:13,780
Michael: Well, you are going to need to, it sounds like.

904
00:45:13,920 --> 00:45:15,220
Nikolay: Well, yeah, yeah, yeah.

905
00:45:15,220 --> 00:45:19,020
PostgreSQL infrastructure mostly is on CloudSQL.

906
00:45:19,400 --> 00:45:21,360
Not CloudSQL, Google Cloud.

907
00:45:21,500 --> 00:45:23,160
Not CloudSQL, no, no, no.

908
00:45:23,260 --> 00:45:24,440
That was wrong.

909
00:45:25,320 --> 00:45:26,620
Google Cloud, so...

910
00:45:27,040 --> 00:45:28,020
Michael: 1 level up.

911
00:45:28,660 --> 00:45:29,200
Nikolay: Yeah, yeah, yeah.

912
00:45:29,200 --> 00:45:31,320
But S3 is AWS, so it's...

913
00:45:32,380 --> 00:45:34,020
But it's really interesting.

914
00:45:35,640 --> 00:45:38,440
Should be cheap, should be interesting to explore.

915
00:45:39,660 --> 00:45:42,840
And it's a big challenge to the Postgres ecosystem.

916
00:45:44,180 --> 00:45:47,760
Or maybe an opportunity if somebody creates a foreign data wrapper

917
00:45:47,760 --> 00:45:48,260
source.

918
00:45:50,140 --> 00:45:51,060
Actually, why not?

919
00:45:51,060 --> 00:45:52,000
It should be.

920
00:45:52,120 --> 00:45:53,600
It's a good project, by the way, right?

921
00:45:53,600 --> 00:45:55,740
So interface...

922
00:45:57,100 --> 00:46:00,060
There is a foreign data wrapper to S3 already, right?

923
00:46:00,620 --> 00:46:01,220
Michael: I think so.

924
00:46:01,220 --> 00:46:02,120
I think super...

925
00:46:02,580 --> 00:46:03,780
I don't know, I'll check.

926
00:46:04,020 --> 00:46:07,940
Nikolay: Should be just extended to have vector functions, in

927
00:46:07,940 --> 00:46:08,580
my opinion.

928
00:46:08,720 --> 00:46:09,840
Okay, enough.

929
00:46:09,920 --> 00:46:12,320
It's like a brainstorm mode already.

930
00:46:12,340 --> 00:46:13,720
Thank you so much.

931
00:46:13,920 --> 00:46:14,380
See you next week.

932
00:46:14,380 --> 00:46:15,567
Michael: Thanks, and catch you next week.

933
00:46:15,567 --> 00:46:15,730
Bye-bye.

934
00:46:15,730 --> 00:46:16,220
Nikolay: See you next week.