1
00:00:00,060 --> 00:00:02,200
Michael: Hello, and welcome to
Postgres.FM, a weekly show about

2
00:00:02,200 --> 00:00:03,140
all things PostgreSQL.

3
00:00:03,400 --> 00:00:05,760
I am Michael, founder of pgMustard,
and I'm joined, as usual,

4
00:00:05,760 --> 00:00:07,160
by Nikolay, founder of Postgres.AI.

5
00:00:07,440 --> 00:00:08,140
Hey, Nikolay.

6
00:00:08,360 --> 00:00:09,140
Nikolay: Hi, Michael.

7
00:00:09,640 --> 00:00:10,460
How are you?

8
00:00:10,760 --> 00:00:12,260
Michael: No, not falling for it.

9
00:00:12,260 --> 00:00:15,160
Today, we are delighted to be joined
by 2 guests.

10
00:00:15,160 --> 00:00:17,020
Firstly, Joe from Hydra.

11
00:00:17,020 --> 00:00:17,820
Welcome, Joe.

12
00:00:18,220 --> 00:00:18,880
Joe: Thank you.

13
00:00:19,280 --> 00:00:21,520
Michael: And also Jelte from MotherDuck.

14
00:00:21,900 --> 00:00:22,740
Welcome, Jelte.

15
00:00:23,360 --> 00:00:23,860
Jelte: Hi.

16
00:00:23,900 --> 00:00:24,600
Hi all.

17
00:00:24,840 --> 00:00:26,760
Michael: So thanks so much both
for joining us.

18
00:00:26,760 --> 00:00:31,740
We are going to be discussing a
new-ish project called pg_duckdb,

19
00:00:32,400 --> 00:00:35,900
which very interestingly is a cross
organization collaboration.

20
00:00:36,380 --> 00:00:39,520
So Jelte, I saw the recording
of your excellent lightning

21
00:00:39,520 --> 00:00:43,860
talk introducing pg_duckdb to PGConf
EU, a nice job.

22
00:00:43,860 --> 00:00:46,940
Perhaps you could start us off
with how you describe pg_duckdb

23
00:00:47,160 --> 00:00:51,480
to a Postgres savvy audience who
may or may not know what DuckDB

24
00:00:51,480 --> 00:00:51,980
is.

25
00:00:52,960 --> 00:00:57,420
Jelte: All right, so it's an extension
for Postgres and it embeds

26
00:00:57,540 --> 00:00:58,040
DuckDB.

27
00:00:58,040 --> 00:01:03,180
And DuckDB is an in-process analytics
database, like SQL database.

28
00:01:03,700 --> 00:01:07,640
So it's a bit like SQLite, but
then more for analytics, like

29
00:01:07,640 --> 00:01:10,240
big queries with lots of joins
and stuff like that.

30
00:01:10,240 --> 00:01:14,480
And lots of joins and less like
SQLite or like Postgres in general

31
00:01:14,480 --> 00:01:18,160
for like smaller datasets, like getting
a hundred rows or something

32
00:01:18,160 --> 00:01:18,820
like that.

33
00:01:18,820 --> 00:01:21,840
It's index scans, stuff like that.

34
00:01:22,120 --> 00:01:25,320
Indexes are not something that's
commonly used in the DuckDB

35
00:01:25,320 --> 00:01:25,520
land.

36
00:01:25,520 --> 00:01:29,060
It's like, oh, we just scan the
whole table, and we compute stuff

37
00:01:29,060 --> 00:01:30,520
very fast on it.

38
00:01:30,700 --> 00:01:34,780
Yeah, and that is then embedded
into Postgres so that you can

39
00:01:34,780 --> 00:01:38,160
use that for some of your more
hazard queries that don't rely

40
00:01:38,160 --> 00:01:38,860
on indexes.

41
00:01:39,720 --> 00:01:42,340
Michael: Nice, and I'm sure we're
going to come back to indexes

42
00:01:42,340 --> 00:01:44,360
in a bit once we dive into the
details.

43
00:01:44,720 --> 00:01:48,480
But Joe, I think the origin story
of this project is super interesting.

44
00:01:48,580 --> 00:01:49,540
I'd be fascinated.

45
00:01:49,740 --> 00:01:52,600
I was reading the MotherDuck blog
post and it mentioned that

46
00:01:52,600 --> 00:01:54,220
Hydra kicked off the effort.

47
00:01:54,440 --> 00:01:57,720
So I'd love for you to share a
little bit more about that if

48
00:01:57,720 --> 00:01:58,300
you can.

49
00:01:59,440 --> 00:02:00,360
Joe: Sure, Sure.

50
00:02:00,360 --> 00:02:04,440
So our team had been working on
a crazy weekend project that

51
00:02:04,440 --> 00:02:06,580
we initially were calling pg_quack.

52
00:02:07,420 --> 00:02:11,180
It was called pg_quack because
well, ducks go quack, but also

53
00:02:11,320 --> 00:02:14,640
because we didn't necessarily expect
this to work.

54
00:02:14,640 --> 00:02:17,340
We kind of were writing it off
as a quacky project.

55
00:02:17,800 --> 00:02:20,040
So it was a little bit of an internal
joke.

56
00:02:20,380 --> 00:02:25,900
But the performance of DuckDB was
looking super great when it

57
00:02:25,900 --> 00:02:29,820
came to all kinds of analytics
queries and aggregates like Jelte

58
00:02:29,820 --> 00:02:30,320
mentioned.

59
00:02:30,780 --> 00:02:35,940
And we had this aha moment that
we could actually take pg_quack

60
00:02:35,940 --> 00:02:39,800
pretty seriously and do it for
real in collaboration with folks

61
00:02:39,800 --> 00:02:42,180
that know a whole lot about DuckDB.

62
00:02:42,880 --> 00:02:46,700
You know, we're Postgres guys by
background mostly.

63
00:02:47,220 --> 00:02:53,340
So not to get too far ahead, but
we decided to build pg_duckdb

64
00:02:54,000 --> 00:02:59,820
for real as a production-grade
extension together with the folks

65
00:02:59,820 --> 00:03:03,920
over at DuckDB Labs, the creators
of DuckDB, which is Hannes

66
00:03:04,020 --> 00:03:08,200
and Mark, as well as the folks
at MotherDuck, like Jelte.

67
00:03:08,200 --> 00:03:12,600
So we've been working together
with them on it, oh, for at least

68
00:03:13,140 --> 00:03:15,140
last 6, 7 months now.

69
00:03:16,720 --> 00:03:17,660
Michael: Yeah, super cool.

70
00:03:17,660 --> 00:03:20,940
And I saw a few other organizations
mentioned as well.

71
00:03:21,620 --> 00:03:23,080
How do they all fit in?

72
00:03:23,300 --> 00:03:28,060
Joe: Yeah, so this is open source,
it's an MIT licensed extension.

73
00:03:28,780 --> 00:03:33,780
So for folks that want to participate,
or can see pg_duckdb being

74
00:03:33,820 --> 00:03:37,320
really helpful in their use cases,
they can either tell us directly

75
00:03:37,320 --> 00:03:40,840
what they'd like to add or open
a PR or open issues.

76
00:03:40,840 --> 00:03:43,500
I mean, certainly there are a good
number of those as the project

77
00:03:43,500 --> 00:03:44,880
is currently in beta.

78
00:03:45,060 --> 00:03:47,880
So be gentle with it, but also
tell us the things that could

79
00:03:47,880 --> 00:03:49,220
be improved on.

80
00:03:49,440 --> 00:03:53,420
One of the things that was surprising
to me was the immediate reaction

81
00:03:54,020 --> 00:04:00,460
that we saw to how DuckDB could
be useful to accelerating analytics

82
00:04:00,460 --> 00:04:01,340
queries in Postgres.

83
00:04:01,340 --> 00:04:03,840
And there are a lot of organizations
that went.

84
00:04:04,140 --> 00:04:05,720
Oh, yes, okay.

85
00:04:05,720 --> 00:04:07,740
We want to support this we want
to get involved.

86
00:04:07,740 --> 00:04:10,060
So that's probably some of the
names you were seeing

87
00:04:11,040 --> 00:04:13,940
Michael: Yeah, I think I saw was
it Neon and Microsoft maybe

88
00:04:13,940 --> 00:04:15,200
some others as well.

89
00:04:15,660 --> 00:04:18,820
Joe: Yeah, there's a whole range
of folks that

90
00:04:18,820 --> 00:04:22,380
are interested in us moving this
forward, and they're providing

91
00:04:22,380 --> 00:04:24,980
either direct or indirect support
for the project.

92
00:04:25,520 --> 00:04:26,020
Michael: Yes.

93
00:04:27,120 --> 00:04:29,440
Jota, from your side, what are
you...

94
00:04:29,440 --> 00:04:32,680
I should mention both of your thoughts
on this, but what are

95
00:04:32,680 --> 00:04:35,280
you seeing early users most excited
about?

96
00:04:35,280 --> 00:04:39,100
Like what features are they most
keen to see added or improved?

97
00:04:40,140 --> 00:04:45,560
Jelte: I think one of the big things
is that you can sort of read

98
00:04:45,660 --> 00:04:49,540
Parquet files or read CSV files
from blob storage, like from

99
00:04:49,540 --> 00:04:55,580
S3 or from Azure or from all the
big clouds, basically in Cloudflare

100
00:04:55,680 --> 00:04:56,340
as well.

101
00:04:56,400 --> 00:05:00,340
And I think that's a big feature that people are very interested

102
00:05:00,340 --> 00:05:00,840
in.

103
00:05:00,940 --> 00:05:04,840
And I mean, there's so many open issues of people wanting a specific

104
00:05:04,840 --> 00:05:09,000
thing that it's hard to pick 1 that's the thing people want.

105
00:05:10,080 --> 00:05:16,760
But 1 that has come up a few times is support for spatial stuff,

106
00:05:17,040 --> 00:05:20,060
like PostGIS things.

107
00:05:20,140 --> 00:05:23,600
And that works slightly different in DuckDB so we haven't started

108
00:05:23,600 --> 00:05:27,660
on that we're first sort of building a bit more the base but

109
00:05:27,660 --> 00:05:30,040
that's definitely 1 of the things that 1 of the features that

110
00:05:30,040 --> 00:05:34,040
comes up like people wanting to do spatial and let's do spatial

111
00:05:34,040 --> 00:05:34,780
data basically.

112
00:05:35,660 --> 00:05:36,100
Yeah.

113
00:05:36,100 --> 00:05:38,520
But that's not supported right now.

114
00:05:38,520 --> 00:05:39,020
Nikolay: Yeah.

115
00:05:39,780 --> 00:05:44,020
So, 1 of the biggest features is that data can be on, I mean,

116
00:05:44,020 --> 00:05:46,240
it's not a feature, it's like a requirement here.

117
00:05:46,320 --> 00:05:48,840
Data is on object storage like S3, right?

118
00:05:49,260 --> 00:05:55,020
In Parquet format, file format, for example, and it's already

119
00:05:55,520 --> 00:05:56,820
column store.

120
00:05:58,140 --> 00:06:01,120
But like, I mean, it's unlike Postgres, right?

121
00:06:01,620 --> 00:06:07,620
But this sounds like, I remember some attempts to have access

122
00:06:07,700 --> 00:06:12,880
to Parquet files on object storage as extension before pg_duckdb,

123
00:06:13,260 --> 00:06:13,760
right?

124
00:06:14,020 --> 00:06:18,960
And it's also like It could be done in FDW way, foreign data

125
00:06:18,960 --> 00:06:19,820
wrapper way.

126
00:06:19,840 --> 00:06:25,940
I think some SQL interface over those files and that's it.

127
00:06:26,500 --> 00:06:29,720
But DuckDB is like bringing something else.

128
00:06:31,320 --> 00:06:35,460
I've never tried myself, but I've heard many opinions about very

129
00:06:35,460 --> 00:06:36,220
good performance.

130
00:06:37,040 --> 00:06:37,540
Right?

131
00:06:38,100 --> 00:06:38,560
Jelte: Yeah, yeah.

132
00:06:38,560 --> 00:06:43,660
So, that's 1 of the big differences between pg_duckdb and some

133
00:06:43,660 --> 00:06:46,680
of the other foreign data wrapper based ones.

134
00:06:47,220 --> 00:06:51,220
Is that we sort of flip it around a bit, instead of saying like,

135
00:06:51,220 --> 00:06:53,320
oh, this table is stored somewhere else.

136
00:06:53,480 --> 00:06:58,440
We'll read that using DuckDB or something, or something else.

137
00:06:58,520 --> 00:07:02,500
Like, instead We're saying like, okay, we push the whole query,

138
00:07:02,960 --> 00:07:07,100
we put it in DuckDB, and then let DuckDB read the Postgres data.

139
00:07:07,840 --> 00:07:10,920
So the Postgres sort of becomes a foreign data wrapper, kind

140
00:07:10,920 --> 00:07:12,140
of inside DuckDB.

141
00:07:12,900 --> 00:07:13,780
Does that make sense?

142
00:07:13,780 --> 00:07:18,900
So it's more like a query stealing approach, is what Mario from

143
00:07:18,900 --> 00:07:21,290
Hydra, 1 of the developers there, called it.

144
00:07:21,290 --> 00:07:24,780
It's like, instead of letting the Postgres execution engine do

145
00:07:24,780 --> 00:07:27,520
a lot of things, it's like, we don't want the Postgres execution

146
00:07:27,520 --> 00:07:30,040
engine to do a lot of things because the DuckDB execution engine

147
00:07:30,040 --> 00:07:32,560
is much better for these analytic stuff, like the sorting and

148
00:07:32,560 --> 00:07:34,460
the grouping and stuff like that that are joining.

149
00:07:34,740 --> 00:07:37,700
We'd wanna do as much as possible inside DuckDB.

150
00:07:39,140 --> 00:07:40,220
Nikolay: Yeah, that's interesting.

151
00:07:41,040 --> 00:07:44,300
Joe: Yeah, and also, you know, kind of back to Michael's question

152
00:07:44,300 --> 00:07:48,340
a little bit too, A really common pattern for a lot of application

153
00:07:48,540 --> 00:07:52,900
developers is to have Postgres, but also to have an S3 bucket.

154
00:07:53,300 --> 00:07:56,480
There's a lot of event data that can get generated that really

155
00:07:56,480 --> 00:07:59,400
should not live on disk in Postgres.

156
00:07:59,620 --> 00:08:03,800
We're talking to a company yesterday that had generated, you

157
00:08:03,800 --> 00:08:09,620
know, has 2.6 terabytes of event data that they really shouldn't

158
00:08:09,620 --> 00:08:13,080
be storing necessarily on Postgres, but they could offload into

159
00:08:13,080 --> 00:08:14,840
Parquet format in S3.

160
00:08:15,400 --> 00:08:18,900
But that data still needs to be accessible by Postgres.

161
00:08:19,340 --> 00:08:23,600
Because 1 of the important things is that pg_duckdb can perform

162
00:08:23,720 --> 00:08:28,680
the join between heap table and Postgres together with the Parquet

163
00:08:28,680 --> 00:08:35,240
files in object storage so that there's not this siloing or needing

164
00:08:35,280 --> 00:08:40,660
for ETL processes between Postgres and external analytics databases

165
00:08:40,680 --> 00:08:41,400
and S3.

166
00:08:41,820 --> 00:08:46,080
So how do you smooth over the differences between columnar format

167
00:08:46,100 --> 00:08:47,340
and row-based format?

168
00:08:47,720 --> 00:08:51,560
And I think pg_duckdb goes a long way to do that because ultimately

169
00:08:51,680 --> 00:08:55,460
when you're developing applications, thinking about table format

170
00:08:55,520 --> 00:08:56,740
is pretty low level.

171
00:08:56,920 --> 00:09:00,660
So if you can avoid that as much as possible, that's probably

172
00:09:00,660 --> 00:09:01,400
just easier.

173
00:09:02,380 --> 00:09:07,320
Nikolay: Yeah, but here I'm confused a little bit, because this

174
00:09:07,380 --> 00:09:10,360
thought is like, let's have tiering of storage.

175
00:09:11,000 --> 00:09:15,260
And, for example, I remember several cases, big companies where

176
00:09:15,300 --> 00:09:19,880
this was needed and the decision was made for scaling purposes

177
00:09:20,020 --> 00:09:21,360
of main clusters.

178
00:09:21,500 --> 00:09:26,740
We need to offload archive data somewhere else, maybe to different

179
00:09:26,740 --> 00:09:27,900
cluster or somewhere else.

180
00:09:27,900 --> 00:09:29,640
Maybe sometimes we just delete it.

181
00:09:29,640 --> 00:09:33,980
It could be, for example, e-commerce and old orders, order history,

182
00:09:34,540 --> 00:09:37,160
exceeding for example a couple of years, we don't need that.

183
00:09:37,360 --> 00:09:41,060
But offloading this data to S3, this sounds great and I remember

184
00:09:41,060 --> 00:09:44,440
there's also extension pgt from Tembo, again I haven't tried

185
00:09:44,440 --> 00:09:46,660
it yet, It's in my to-do list.

186
00:09:47,020 --> 00:09:50,640
But offloading that, I'm thinking, if I also receive column storage

187
00:09:50,640 --> 00:09:52,940
for this kind of data, is it right?

188
00:09:53,240 --> 00:09:56,500
Because I still need only 1 row in these cases, right?

189
00:09:56,980 --> 00:10:01,460
In some cases, I do need like aggregates, like, I don't know,

190
00:10:01,460 --> 00:10:06,600
time series, I would love to upload to S3 in column store format,

191
00:10:06,900 --> 00:10:09,520
but not in these cases when I just need 1 row.

192
00:10:09,720 --> 00:10:12,800
Joe: Yeah, I mean, I think it's different needs for different

193
00:10:12,800 --> 00:10:13,300
users.

194
00:10:13,500 --> 00:10:17,300
If you are a, you know, application developer just trying to

195
00:10:17,300 --> 00:10:21,360
build something quickly, then if we can tell you, hey, dump a

196
00:10:21,360 --> 00:10:25,360
lot of your event data into Parquet format in S3, or archive

197
00:10:25,360 --> 00:10:29,260
it out of Postgres on some increment every 7 days, every 30 days

198
00:10:29,260 --> 00:10:31,480
or something, then that's easy enough.

199
00:10:31,800 --> 00:10:36,600
Then maybe the thinking about columnar versus row should subside.

200
00:10:37,200 --> 00:10:44,140
But yeah, I mean, data tiering and data lake is not as different

201
00:10:44,140 --> 00:10:46,860
as, you know, they're really not
that different.

202
00:10:46,860 --> 00:10:49,500
It's just a function of what the
user interacts with.

203
00:10:49,740 --> 00:10:53,080
Nikolay: Right, but my concern
is if I need the row store, like

204
00:10:53,080 --> 00:10:56,280
I need to find all columns for
1 row to present it.

205
00:10:56,280 --> 00:10:57,720
For example, again, orders, right?

206
00:10:57,720 --> 00:10:58,800
I need 1 order.

207
00:10:59,060 --> 00:11:01,940
If we pull it from archive, I expect
not good performance.

208
00:11:01,940 --> 00:11:05,300
But if it's column store, I will
be concerned about performance

209
00:11:05,580 --> 00:11:09,040
to collect all the rows versus
row storage.

210
00:11:09,720 --> 00:11:11,960
Let's formulate the question differently.

211
00:11:12,400 --> 00:11:17,160
Let's talk about the use cases
where this particular pattern

212
00:11:17,160 --> 00:11:21,980
when we need to upload a lot of
data to S3 and benefit from column

213
00:11:21,980 --> 00:11:25,920
store, analytical queries like
aggregates, what kind of data

214
00:11:25,920 --> 00:11:28,760
it could be in existing Postgres
database?

215
00:11:29,380 --> 00:11:32,820
It can be SAS or something, I don't
know like what kind of data

216
00:11:33,420 --> 00:11:36,500
is beneficial for us to offload
to this?

217
00:11:36,940 --> 00:11:39,280
Jelte: A very common case is like
time series data.

218
00:11:39,280 --> 00:11:39,380
I think

219
00:11:39,380 --> 00:11:41,820
that's 1 of the, that's
by far 1 of the most common

220
00:11:41,820 --> 00:11:45,560
cases where it's like, okay, at
least sort of the old data we

221
00:11:45,720 --> 00:11:49,160
store in, I mean, you're not going
to look up a single row in

222
00:11:49,160 --> 00:11:50,720
some series of events.

223
00:11:51,900 --> 00:11:55,440
Or if you do, if it's like a week
old, it's not so terrible that

224
00:11:55,440 --> 00:11:57,360
you don't look that up too often.

225
00:11:57,360 --> 00:11:58,960
So it's like, okay, it's fine.

226
00:11:59,020 --> 00:12:01,720
If that takes a little bit longer,
make sure fetching a bit too

227
00:12:01,720 --> 00:12:03,780
much more data that you actually
need.

228
00:12:03,900 --> 00:12:06,840
If that means that you can do these
aggregates much quicker.

229
00:12:06,940 --> 00:12:09,320
I think that's 1 of the most common
cases, I would say.

230
00:12:09,320 --> 00:12:10,460
Nikolay: Yeah, that makes sense.

231
00:12:10,460 --> 00:12:13,860
And that means that, for example,
Timescale Cloud, they have

232
00:12:13,860 --> 00:12:18,240
it bottomless timescale of loading
some data transparently, so

233
00:12:18,240 --> 00:12:23,080
we don't see it, but it goes, some
archive data goes to S3, but

234
00:12:23,080 --> 00:12:24,360
this is not open source.

235
00:12:25,080 --> 00:12:27,040
PjWB is open source, which I like.

236
00:12:27,040 --> 00:12:30,800
You mentioned MIT license, that's
great, but I see some kind

237
00:12:30,800 --> 00:12:33,680
of, not competition, but alternative
here.

238
00:12:33,680 --> 00:12:37,660
I can choose between going to Timescale
Cloud if I need this,

239
00:12:37,660 --> 00:12:40,940
or just use it if I can.

240
00:12:41,780 --> 00:12:45,900
So, I mean, if I can install extension,
which is not available

241
00:12:46,020 --> 00:12:47,540
in managed service yet.

242
00:12:48,260 --> 00:12:49,940
But yeah, it makes sense to me.

243
00:12:50,280 --> 00:12:55,720
Indeed, especially if data is old,
and I expect it to go to object

244
00:12:55,720 --> 00:12:58,580
storage, I definitely not in the
usually I'm not interested in

245
00:12:58,580 --> 00:13:00,260
single row, I'm interested in aggregates.

246
00:13:01,160 --> 00:13:01,440
Right?

247
00:13:01,440 --> 00:13:02,980
Yeah, this makes sense for me.

248
00:13:02,980 --> 00:13:03,480
Cool.

249
00:13:04,020 --> 00:13:05,600
Michael: That's a good question
though.

250
00:13:05,600 --> 00:13:09,060
Are there any, I'm sure you're
talking to lots, but are there

251
00:13:09,060 --> 00:13:12,700
any Postgres managed services that
already offer it in a preview

252
00:13:12,700 --> 00:13:13,760
or developer mode?

253
00:13:14,380 --> 00:13:18,080
Jelte: I mean, Hydra is offering
it, but other than that, not

254
00:13:18,080 --> 00:13:18,580
yet.

255
00:13:18,580 --> 00:13:21,180
But yeah, we're definitely in talks
with a bunch of them.

256
00:13:21,760 --> 00:13:24,740
Yeah, it's a project that many
people are interested in.

257
00:13:24,840 --> 00:13:25,820
Michael: Yeah, nice.

258
00:13:26,540 --> 00:13:31,400
Joe: Yeah, so Hydra currently has
a closed beta of a fully managed

259
00:13:31,400 --> 00:13:34,940
Postgres service that has pg_duckdb
extension installed.

260
00:13:35,380 --> 00:13:35,880
Nikolay: Nice.

261
00:13:36,820 --> 00:13:40,560
Let's talk about some numbers,
if you can pull them from top

262
00:13:40,560 --> 00:13:41,400
of your head.

263
00:13:41,400 --> 00:13:47,600
For example, if I need to aggregate
over many million rows which

264
00:13:47,600 --> 00:13:48,840
are sparsely distributed.

265
00:13:49,200 --> 00:13:54,100
We know like in the worst case,
for example, if I touch, I don't

266
00:13:54,100 --> 00:13:59,540
know, 1000000 rows, which each
1 is stored in separate buffers

267
00:13:59,540 --> 00:14:01,280
in the worst case, for example,
right?

268
00:14:01,300 --> 00:14:04,460
Very roughly, I need to touch,
And in the worst case, it's not

269
00:14:04,460 --> 00:14:05,280
hits, it's reads.

270
00:14:05,280 --> 00:14:11,540
I need to get from disk 1 million
buffers, I mean, pages, right?

271
00:14:11,820 --> 00:14:14,440
And in the worst case, it can be
absolutely worst.

272
00:14:14,440 --> 00:14:17,920
I can imagine it may be like 1
millisecond for each if latency

273
00:14:17,920 --> 00:14:22,560
is bad and we do it in not without
parallelization it's 1000000

274
00:14:22,740 --> 00:14:26,940
seconds it's terrible right and
this is what I can see in Postgres

275
00:14:27,480 --> 00:14:31,880
in some cases what should I expect
here if I need for example

276
00:14:31,880 --> 00:14:36,740
to calculate sum of of rows like
1 million rows stored in this

277
00:14:36,740 --> 00:14:37,240
format?

278
00:14:37,860 --> 00:14:39,860
What like is it seconds, minutes?

279
00:14:40,520 --> 00:14:42,220
Jelte: You mean if it's sort of
blob storage?

280
00:14:42,620 --> 00:14:44,300
Like how many rows?

281
00:14:44,300 --> 00:14:48,480
Nikolay: 1000000 rows and some
simple aggregates like count sum

282
00:14:48,480 --> 00:14:49,240
or something.

283
00:14:49,640 --> 00:14:50,640
What do you see?

284
00:14:51,380 --> 00:14:55,320
Jelte: I would say it's generally
within a second, I would say,

285
00:14:55,320 --> 00:14:57,780
or within a few seconds at least.

286
00:14:57,780 --> 00:15:01,420
Nikolay: Including data transfer
from S3, everything, right?

287
00:15:01,520 --> 00:15:02,150
Jelte: Yeah, yes.

288
00:15:02,150 --> 00:15:05,440
So because it's a per cave format,
you only get the columns that

289
00:15:05,440 --> 00:15:06,600
you actually requested.

290
00:15:06,600 --> 00:15:10,080
So it's already, even if it's a
lot of data, you generally just

291
00:15:10,080 --> 00:15:11,740
because you have a bunch of columns.

292
00:15:12,500 --> 00:15:17,060
So you only grab the data from
S3 that you actually need.

293
00:15:17,480 --> 00:15:21,220
And that is then sort of streamed
through the WB engine to get

294
00:15:21,220 --> 00:15:22,580
these, to get these aggregates.

295
00:15:22,820 --> 00:15:24,860
Nikolay: Yeah, that's, that's super
cool.

296
00:15:25,080 --> 00:15:27,840
Jelte: But I mean, it's like with
all these questions, it always

297
00:15:27,840 --> 00:15:30,180
depends a bit on what, on what
you're doing exactly.

298
00:15:30,200 --> 00:15:34,780
But for instance, we, we, we, we
like do, there's click bench

299
00:15:34,780 --> 00:15:35,280
results.

300
00:15:35,780 --> 00:15:40,460
ClickHouse has a benchmark and
their pg_duckdb is quite...

301
00:15:40,840 --> 00:15:43,940
From the Postgres compatible ones, 
it's...

302
00:15:44,180 --> 00:15:46,200
I don't think it's at the top.

303
00:15:46,240 --> 00:15:49,040
The one with MotherDuck support is 
almost at the top, I think, but

304
00:15:49,040 --> 00:15:53,100
the one with Parquet is a bit 
lower, but still much higher

305
00:15:53,100 --> 00:15:55,140
than tuned Postgres, I think.

306
00:15:56,940 --> 00:16:00,240
I don't know by heart, but I think 
it's at least 10 times faster

307
00:16:00,240 --> 00:16:03,780
than the tuned Postgres version, 
but maybe even more.

308
00:16:04,000 --> 00:16:05,160
Nikolay: Yeah, that's great.

309
00:16:05,160 --> 00:16:09,860
And I can imagine this opens possibilities 
to stop saying don't

310
00:16:09,860 --> 00:16:11,260
do analytics in Postgres.

311
00:16:14,760 --> 00:16:18,460
So this can be a transaction which 
does aggregates in Postgres

312
00:16:18,560 --> 00:16:19,900
if it's within seconds.

313
00:16:20,420 --> 00:16:24,400
We don't have negative consequences 
of long-running transactions

314
00:16:24,480 --> 00:16:25,380
anymore, right?

315
00:16:25,640 --> 00:16:27,180
It's not only about user experience.

316
00:16:27,180 --> 00:16:31,840
It's also about how Postgres performs 
vacuuming, still, and so

317
00:16:31,840 --> 00:16:32,020
on.

318
00:16:32,020 --> 00:16:34,380
This xmin horizon problem and so 
on, right?

319
00:16:35,140 --> 00:16:35,700
Jelte: Yeah, yeah.

320
00:16:36,180 --> 00:16:42,100
Joe: Yeah, I mean, pg_duckdb was 
really kicked off to try and

321
00:16:42,100 --> 00:16:45,840
solve the known limitations of 
analytics in Postgres.

322
00:16:46,420 --> 00:16:51,100
You know, not even at huge scales, 
Postgres can be fairly inefficient

323
00:16:51,220 --> 00:16:53,300
for doing analytical processing.

324
00:16:53,940 --> 00:16:58,040
So, you know, DuckDB, that's what 
it excels at.

325
00:16:58,140 --> 00:17:02,320
We're trying to get the project 
to the point where the user doesn't

326
00:17:02,320 --> 00:17:05,540
necessarily need to know very much 
about DuckDB at all.

327
00:17:05,980 --> 00:17:10,200
Since these are Postgres users, 
the service should look and feel

328
00:17:10,200 --> 00:17:13,040
exactly like Postgres for the most 
part.

329
00:17:13,500 --> 00:17:17,960
And then we make those differences 
clear if you were to run an

330
00:17:17,960 --> 00:17:22,860
EXPLAIN plan, it will show you 
exactly where and when DuckDB

331
00:17:22,900 --> 00:17:26,320
is taking over for that execution 
if you want to look a little

332
00:17:26,320 --> 00:17:27,040
bit deeper.

333
00:17:28,140 --> 00:17:30,880
Michael: Would that be like a custom 
scan node or how would it

334
00:17:30,880 --> 00:17:31,380
show?

335
00:17:31,960 --> 00:17:33,240
Jelte: Yeah, that's a custom scan 
node.

336
00:17:33,240 --> 00:17:36,020
And basically, it's always like 
it either takes completely over

337
00:17:36,020 --> 00:17:37,720
or it doesn't take over at all.

338
00:17:37,720 --> 00:17:41,360
There's no, I mean, that's sort 
of one of the things we'll probably

339
00:17:41,360 --> 00:17:43,980
want to add at some point, like 
to say like, oh, this part is

340
00:17:43,980 --> 00:17:47,200
actually, needs to be executed 
in Postgres because DuckDB doesn't

341
00:17:47,200 --> 00:17:48,380
know how to do it.

342
00:17:48,480 --> 00:17:51,420
Right now, it's either everything 
or nothing.

343
00:17:51,420 --> 00:17:53,760
If you want to do something that 
DuckDB doesn't support, but

344
00:17:53,760 --> 00:17:56,460
still needs DuckDB execution, then 
just look at this query.

345
00:17:56,600 --> 00:17:58,440
It's not supported at the moment.

346
00:17:58,900 --> 00:18:03,000
But yeah, It's one big custom scan 
node, basically, that does the

347
00:18:03,000 --> 00:18:04,020
DuckDB execution.

348
00:18:05,060 --> 00:18:08,340
Nikolay: And do you see situations 
when you need to tune something?

349
00:18:08,680 --> 00:18:12,340
There are some tuning capabilities 
inside DuckDB, right?

350
00:18:12,940 --> 00:18:17,400
But if we see only one node in the 
plan, we cannot do it from there.

351
00:18:18,300 --> 00:18:23,160
What's the workflow to understand how efficient it is?

352
00:18:24,280 --> 00:18:28,040
Jelte: So, I mean, at the moment, the main things you can tune

353
00:18:28,040 --> 00:18:31,560
in pg_duckdb are like just that amount of threads that DuckDB

354
00:18:31,560 --> 00:18:34,260
uses and the amount of memory that it's allowed to use.

355
00:18:34,400 --> 00:18:37,640
Because it's, I mean, because you're running it inside... normally

356
00:18:37,640 --> 00:18:40,580
DuckDB really likes to take all the resources of the system and

357
00:18:40,580 --> 00:18:41,840
go as fast as it can.

358
00:18:41,840 --> 00:18:48,020
So that's, I mean, your laptop tries to fly away and overheats.

359
00:18:48,580 --> 00:18:50,040
That's what it wants to do.

360
00:18:50,740 --> 00:18:54,280
But, I mean, if you were inside Postgres, that's probably not

361
00:18:54,280 --> 00:18:56,540
what you want, because you also want to run some other queries.

362
00:18:56,980 --> 00:18:58,920
So that's why we have some limits.

363
00:18:59,340 --> 00:19:04,080
So I think by default, it will try to use as many threads as

364
00:19:04,080 --> 00:19:06,960
are available, just so that it completes quickly, but at least

365
00:19:06,960 --> 00:19:08,700
the memory is trying to limit.

366
00:19:09,060 --> 00:19:11,620
But you can lower both or increase both.

367
00:19:12,740 --> 00:19:15,540
Nikolay: And we can run this queries on Postgres standby nodes,

368
00:19:15,540 --> 00:19:15,960
right?

369
00:19:15,960 --> 00:19:16,600
On replicas?

370
00:19:18,060 --> 00:19:19,140
Jelte: Yes, you should be able to.

371
00:19:19,140 --> 00:19:23,120
I don't think I've personally tried yet, but I think that should

372
00:19:23,120 --> 00:19:23,740
be fine.

373
00:19:24,120 --> 00:19:27,780
And if it doesn't work, please open a BR or like an issue.

374
00:19:28,820 --> 00:19:32,020
Nikolay: I'm already thinking about a common approach, like I

375
00:19:32,020 --> 00:19:34,960
would say a traditional approach right now is to have OLTP database

376
00:19:34,960 --> 00:19:40,140
Postgres for everything but for analytics we offload data to

377
00:19:40,140 --> 00:19:41,920
some another system sometimes.

378
00:19:42,520 --> 00:19:47,000
Or like with Hydrate it's a different table with column storage

379
00:19:47,140 --> 00:19:52,400
as I remember it's also like additional effort to maintain data

380
00:19:52,840 --> 00:19:53,340
synchronized.

381
00:19:54,240 --> 00:19:59,340
Here if some data, okay, again, maybe also here, maybe it makes

382
00:19:59,340 --> 00:20:03,560
sense in some cases to have original table stored in Postgres

383
00:20:03,560 --> 00:20:08,140
and raw storage, but this table, maybe with maybe projections,

384
00:20:08,300 --> 00:20:12,440
some Columns only, go to this additional table which is stored

385
00:20:12,440 --> 00:20:16,820
in this format in S3, Parquet, DuckDB processed, right?

386
00:20:17,420 --> 00:20:19,000
And this is also an interesting question.

387
00:20:19,000 --> 00:20:20,380
Do you see it?

388
00:20:20,380 --> 00:20:22,340
But I wanted to ask a different question.

389
00:20:22,360 --> 00:20:26,920
So if we do this, we can run analytical queries right inside.

390
00:20:26,920 --> 00:20:30,340
And I already see that it makes sense if it just selects to do

391
00:20:30,340 --> 00:20:33,800
it on replicas, not to, like, primaries.

392
00:20:33,800 --> 00:20:37,420
We have only 1 single primary in a single cluster, so it's not

393
00:20:37,420 --> 00:20:43,780
good to take memory for this, and so on, and affect all TP workload

394
00:20:43,860 --> 00:20:44,360
processing.

395
00:20:44,680 --> 00:20:48,300
In this case, it's great that if it's fast, because host-and-buy

396
00:20:48,400 --> 00:20:52,440
feedback is on, like, spin-horizon is almost not affected, great.

397
00:20:52,900 --> 00:20:57,800
But there, I probably would like to give all memory, all CPU

398
00:20:57,800 --> 00:21:01,980
power for this processing if it's maybe it's dedicated replica.

399
00:21:02,720 --> 00:21:07,540
Since I'm new to this, sorry if my question is not right, but

400
00:21:07,540 --> 00:21:09,520
what about compression?

401
00:21:09,960 --> 00:21:15,360
So do I have any capabilities to
choose compression method and

402
00:21:15,360 --> 00:21:17,460
ratio and so on in case of Parquet?

403
00:21:17,620 --> 00:21:24,260
Joe: Well, if you were to copy
the heap table to Parquet format

404
00:21:24,560 --> 00:21:29,480
using pg_duckdb, then, okay, you
know, I think you'd see what,

405
00:21:30,400 --> 00:21:32,640
5 times compression.

406
00:21:33,220 --> 00:21:34,120
Nikolay: Depends on data, right?

407
00:21:34,120 --> 00:21:38,160
Joe: At worst, depending on the
data, 5 times, sometimes you

408
00:21:38,160 --> 00:21:39,640
see 10 times compression.

409
00:21:40,440 --> 00:21:43,640
But this can also be the case for
the other columnar formats

410
00:21:43,640 --> 00:21:44,100
as well.

411
00:21:44,100 --> 00:21:50,080
pg_duckdb supports the Iceberg reader
as well as the Delta table

412
00:21:50,080 --> 00:21:50,410
reader.

413
00:21:50,410 --> 00:21:56,080
You know, you could use Delta tables
or Iceberg files in object

414
00:21:56,080 --> 00:21:57,080
storage as well.

415
00:21:57,600 --> 00:22:01,720
All of those will have better compression
than, you know, than

416
00:22:01,720 --> 00:22:03,840
just standard Postgres heap tables.

417
00:22:04,640 --> 00:22:07,220
Nikolay: Yeah, this is what I expected.

418
00:22:07,540 --> 00:22:08,460
And I'm just...

419
00:22:08,800 --> 00:22:12,540
As I understand, there are no control,
you just get it, that's

420
00:22:12,540 --> 00:22:12,940
it.

421
00:22:12,940 --> 00:22:13,940
For now, you can...

422
00:22:14,680 --> 00:22:15,700
Jelte: I'm not entirely sure.

423
00:22:15,700 --> 00:22:17,220
It might be that the copy...

424
00:22:17,220 --> 00:22:21,400
I mean, we sort of expose all the
copy options that DuckDB exposes.

425
00:22:22,360 --> 00:22:28,640
So I think they might expose like
a compression option that you

426
00:22:28,640 --> 00:22:31,820
can say like which compression
type you want to use to write

427
00:22:31,820 --> 00:22:33,000
the Parquet file?

428
00:22:33,080 --> 00:22:36,880
Nikolay: Yeah, because usually
it's a trade-off between CPU utilization

429
00:22:38,420 --> 00:22:39,000
and

430
00:22:39,000 --> 00:22:39,140
the size.

431
00:22:39,380 --> 00:22:40,140
Jelte: Yeah, yeah.

432
00:22:40,140 --> 00:22:43,700
Nikolay: Yeah, this discussion of CPU triggers
this question in my head,

433
00:22:43,700 --> 00:22:49,200
and I know Column storage is beating
Postgres completely in terms

434
00:22:49,200 --> 00:22:49,840
of compression.

435
00:22:50,140 --> 00:22:55,580
I remember a funny case when I
said 1 billion rows is 1 terabyte,

436
00:22:56,180 --> 00:23:02,140
but VictoriaMetrics CTO said,
how come it's just 1 gigabyte?

437
00:23:02,960 --> 00:23:05,340
And we could not understand each
other, right?

438
00:23:05,340 --> 00:23:07,280
Because in Postgres it's indeed
1 terabyte.

439
00:23:08,180 --> 00:23:12,320
But yeah, 5X, you said it's the
lowest, that's great.

440
00:23:12,340 --> 00:23:14,600
And I think it also cost saving.

441
00:23:15,060 --> 00:23:18,460
And it's not only disk saving in
terms of if you're offloaded

442
00:23:18,480 --> 00:23:20,660
to S3, but also memory, right?

443
00:23:20,660 --> 00:23:25,580
If we have 1 primary multiple standby
nodes, we like, in many

444
00:23:25,580 --> 00:23:29,480
cases, the shared buffer pool and
page cache state is mirrored

445
00:23:29,800 --> 00:23:31,240
between standby nodes.

446
00:23:31,300 --> 00:23:34,340
And if this is data, which we don't
need often, it's not good

447
00:23:34,340 --> 00:23:38,000
in terms of cache efficiencies
right if you're if loaded if you're

448
00:23:38,000 --> 00:23:44,240
floated to this you like less memory
to process your workload

449
00:23:44,240 --> 00:23:47,620
and this is probably a huge cost
saving as well, right?

450
00:23:48,220 --> 00:23:52,620
Jelte: If you offload the computer
to standby now, if you mean,

451
00:23:52,900 --> 00:23:53,820
Nikolay: or are you offloading?

452
00:23:54,140 --> 00:23:57,360
In general, we take some data,
we offload it, like if it's archived

453
00:23:57,360 --> 00:24:00,560
data, we don't need it often, and
it also have some patterns

454
00:24:00,780 --> 00:24:02,360
where column storage makes sense.

455
00:24:02,360 --> 00:24:04,860
We offloaded to S3 using this approach.

456
00:24:06,260 --> 00:24:10,840
And if it's processed by a DuckDB,
like, well, it still needs

457
00:24:10,840 --> 00:24:11,580
memory, right?

458
00:24:11,580 --> 00:24:14,860
It still needs memory if we often
execute some aggregates, so

459
00:24:14,860 --> 00:24:18,500
we consume memory on Postgres standby
anyway, right?

460
00:24:19,260 --> 00:24:19,760
Jelte: Yes.

461
00:24:20,080 --> 00:24:23,480
Yeah, but even then the memory
can still be less than it would

462
00:24:23,480 --> 00:24:26,280
be on Postgres tables, because
also in memory sometimes it's

463
00:24:26,280 --> 00:24:26,780
compressed.

464
00:24:27,040 --> 00:24:31,560
Like if there's many values that
are the same, sometimes it will

465
00:24:31,560 --> 00:24:35,100
just distort the amount of values
and the actual value.

466
00:24:36,220 --> 00:24:39,400
It's just like repeat end times,
basically.

467
00:24:41,000 --> 00:24:42,780
So that also makes the compute...

468
00:24:42,900 --> 00:24:46,240
That's 1 of the tricks that this
column storage engine generally

469
00:24:46,240 --> 00:24:50,260
uses to improve this compute, and
also the memory usage.

470
00:24:50,380 --> 00:24:53,980
Nikolay: Right, and again, if you
do it in row storage, we pull

471
00:24:53,980 --> 00:24:57,180
all those blocks with all columns
and very inefficient in terms

472
00:24:57,180 --> 00:24:57,700
of memory consumption.

473
00:24:57,700 --> 00:24:59,840
Jelte: Yeah, and they all have
the exact same value, basically.

474
00:25:00,060 --> 00:25:01,020
Nikolay: Yeah, yeah, yeah.

475
00:25:01,020 --> 00:25:04,980
So this makes sense completely
to me.

476
00:25:05,900 --> 00:25:11,060
Joe: You know, the common pattern
for so many, so many SaaS companies

477
00:25:11,380 --> 00:25:16,720
is that really only the last 30
days of data or so is being updated

478
00:25:16,780 --> 00:25:17,540
very frequently.

479
00:25:17,900 --> 00:25:21,680
But then after that, the data maybe
shouldn't be deleted, but

480
00:25:21,680 --> 00:25:22,860
it should be accessed.

481
00:25:23,860 --> 00:25:29,880
And it can live in object storage,
which is 2 cents per gigabyte

482
00:25:30,340 --> 00:25:35,880
per month, as opposed to on disk
and Postgres, it can be 10 cents,

483
00:25:36,100 --> 00:25:39,100
15 cents per gigabyte per month
for some services.

484
00:25:39,760 --> 00:25:44,440
So not only is it cheaper from
a cost perspective of the raw

485
00:25:44,440 --> 00:25:47,840
gigabyte per cost, but also the
compression as you mentioned

486
00:25:48,060 --> 00:25:50,520
just earlier is quite good.

487
00:25:50,820 --> 00:25:55,160
So you can substantially archive
data, still have it accessible

488
00:25:55,160 --> 00:25:56,020
to the application.

489
00:25:56,540 --> 00:25:59,480
And pg_duckdb also does support
caching.

490
00:25:59,860 --> 00:26:04,440
So you would be able to cache the
parquet files from S3 to where

491
00:26:04,440 --> 00:26:08,900
Postgres is, so that you can avoid
running over the network to

492
00:26:08,900 --> 00:26:12,340
connecting to S3, because that
would be really 1 of the major

493
00:26:12,340 --> 00:26:17,080
costs, is running over from Postgres
to S3 to retrieve those

494
00:26:17,080 --> 00:26:17,580
files.

495
00:26:17,800 --> 00:26:21,240
Nikolay: Are you talking about
caching in memory or on disk,

496
00:26:21,380 --> 00:26:22,860
like a BUS volume or something?

497
00:26:22,880 --> 00:26:25,280
Jelte: On disk caching.

498
00:26:26,720 --> 00:26:29,720
Basically, it downloads the file
and then it's like, oh, this

499
00:26:29,720 --> 00:26:32,580
1, we were going to use it a lot
of times, it doesn't change.

500
00:26:32,980 --> 00:26:34,940
Nikolay: Yeah, that's interesting.

501
00:26:34,940 --> 00:26:39,060
And also, I noticed that, of course,
rights are supported, right?

502
00:26:39,060 --> 00:26:43,480
So we can update this data, right?

503
00:26:43,480 --> 00:26:45,520
Insert update and so on.

504
00:26:45,760 --> 00:26:50,680
And in this case, I'm very curious
about the performance of writes

505
00:26:50,680 --> 00:26:51,340
as well.

506
00:26:52,480 --> 00:26:54,920
Is it like, how to compare it?

507
00:26:54,920 --> 00:26:56,420
What's the sense of it?

508
00:26:57,500 --> 00:27:02,280
Jelte: I think writes are, I mean,
you can do, like you can store

509
00:27:02,280 --> 00:27:07,440
like a Parquet file, but I don't
think we currently support updating

510
00:27:07,500 --> 00:27:08,980
values in a Parquet file.

511
00:27:09,060 --> 00:27:11,620
We do support the real-life copy.

512
00:27:11,680 --> 00:27:15,240
You put a query and you put all
the results in 1 big Parquet

513
00:27:15,240 --> 00:27:15,740
file.

514
00:27:16,620 --> 00:27:21,040
Or, yeah, that's the main writing
to Blob Search to support.

515
00:27:21,040 --> 00:27:24,280
What we do support is also like
writing to MotherDoc which is

516
00:27:24,280 --> 00:27:26,420
like a sort of managed DocDB.

517
00:27:26,820 --> 00:27:28,580
That's I mean that's the company
I work for.

518
00:27:28,580 --> 00:27:31,900
It's like we have like DocDB in
the cloud basically.

519
00:27:32,520 --> 00:27:37,280
So there we do support inserting
into tables that live there

520
00:27:37,280 --> 00:27:42,660
and updating them, like sort of
more similar to how you would

521
00:27:42,660 --> 00:27:44,440
insert it to Postgres table.

522
00:27:44,440 --> 00:27:45,380
Does that make sense?

523
00:27:45,380 --> 00:27:46,740
Nikolay: Yeah, it makes sense.

524
00:27:47,520 --> 00:27:51,900
And additional question, if I insert
data, can I do it inside

525
00:27:51,900 --> 00:27:56,680
a complex transaction, which probably
inserts data to a regular

526
00:27:56,800 --> 00:27:58,300
table in Postgres?

527
00:27:58,980 --> 00:28:02,760
Jelte: So, right now you cannot
write to both a Postgres table

528
00:28:02,760 --> 00:28:06,880
and a MotherDev table in the same
transaction, specifically because

529
00:28:08,300 --> 00:28:11,580
when a crash happens, I mean, it
kind of works.

530
00:28:11,580 --> 00:28:14,380
It's just that when a crash happens,
maybe you only wrote to

531
00:28:14,380 --> 00:28:15,400
1 of the places.

532
00:28:15,780 --> 00:28:18,220
So that's something we want to
protect against.

533
00:28:18,280 --> 00:28:20,440
Nikolay: Because 2PC would protect
here.

534
00:28:20,660 --> 00:28:23,680
Jelte: Yes, 2PC would protect here,
but 2PC is not trivial to

535
00:28:23,680 --> 00:28:24,180
implement.

536
00:28:25,080 --> 00:28:26,320
Nikolay: And it's super slow as
well.

537
00:28:26,320 --> 00:28:28,840
It's not fun to use as well.

538
00:28:29,340 --> 00:28:32,160
Jelte: Yes, So there's definitely
something you're considering

539
00:28:32,160 --> 00:28:34,500
for like a future version.

540
00:28:34,740 --> 00:28:40,740
But for now, it's like, okay, you
can write to MotorDoc and Postgres.

541
00:28:41,180 --> 00:28:43,700
You can write to MotorDoc in a
transaction, you can do multiple

542
00:28:43,700 --> 00:28:47,420
writes, but you cannot write to
Postgres in that same transaction.

543
00:28:47,720 --> 00:28:50,980
Nikolay: Will it just produce an
error or just, you mean, it's

544
00:28:50,980 --> 00:28:51,720
not good?

545
00:28:52,400 --> 00:28:54,380
Jelte: Yeah, no, right now it will
produce an error.

546
00:28:54,380 --> 00:28:58,980
Like it will say, like, oh, it's
a protection against, like,

547
00:28:59,440 --> 00:29:03,940
maybe we'll lift, like, we'll have
a setting to say, okay, you

548
00:29:03,940 --> 00:29:07,000
know what you're doing, or you
don't care about slight inconsistencies

549
00:29:07,660 --> 00:29:11,520
in case of crash, because it will
still roll back, like the happy

550
00:29:11,520 --> 00:29:15,400
rollback path, or query fails or
something, it will still roll

551
00:29:15,400 --> 00:29:15,900
back.

552
00:29:17,460 --> 00:29:22,480
That still works, but actual crash
cases where Postgres really

553
00:29:23,080 --> 00:29:26,880
really goes away without sort of
doing its normal cleanup yeah

554
00:29:26,880 --> 00:29:30,020
you have some race conditions where
then you write only to MotherDuck

555
00:29:30,020 --> 00:29:31,520
and not to Postgres.

556
00:29:35,060 --> 00:29:39,840
Michael: 1 core feature we haven't
talked about yet is creating

557
00:29:39,840 --> 00:29:46,400
temporary tables within Postgres
using the DuckDB columnar format.

558
00:29:48,820 --> 00:29:54,140
When would it make sense to do
that versus caching a parquet

559
00:29:54,140 --> 00:29:54,900
file locally?

560
00:29:55,940 --> 00:29:57,600
When are you seeing people use
that?

561
00:29:59,440 --> 00:30:02,220
Jelte: I don't know if I've seen
many people use that.

562
00:30:02,640 --> 00:30:03,540
The main reason...

563
00:30:04,300 --> 00:30:07,280
1 of the main reasons it's implemented
is because we wanted to

564
00:30:07,280 --> 00:30:08,460
add MotherDuck support.

565
00:30:08,680 --> 00:30:11,640
And this was really easy to also...

566
00:30:12,440 --> 00:30:15,140
I mean, it was like, we sort of
get it for free.

567
00:30:16,740 --> 00:30:18,020
So that was...

568
00:30:19,860 --> 00:30:23,560
It wasn't necessarily like, oh,
this is a feature we super need.

569
00:30:23,560 --> 00:30:26,280
But it was like, oh, this is like
that lines of code extra, and

570
00:30:26,280 --> 00:30:28,660
now we have temporary table support.

571
00:30:28,820 --> 00:30:31,840
And it also makes the testing a
bit easier, because then you

572
00:30:31,840 --> 00:30:32,720
don't have to...

573
00:30:32,980 --> 00:30:36,540
Like, for the MotherDuck testing,
you need an account, and you

574
00:30:36,540 --> 00:30:39,040
need everything, so you can have
this...

575
00:30:39,280 --> 00:30:41,260
These DuckDB tables, you can have
them locally.

576
00:30:41,680 --> 00:30:42,033
Cool.

577
00:30:42,033 --> 00:30:45,180
Yeah, but 1 of the things that
could be useful for us, like for

578
00:30:45,180 --> 00:30:48,520
like ETL kind of jobs, where you
have like a, like you create

579
00:30:48,520 --> 00:30:52,120
some temporary tables, you do 1
query, you do 1 query, create

580
00:30:52,120 --> 00:30:55,260
a temporary table to do some local
transforms, you create another

581
00:30:55,260 --> 00:30:59,640
temporary table to do some transforms,
and then you store somewhere

582
00:31:00,060 --> 00:31:00,840
else.

583
00:31:01,140 --> 00:31:03,840
So it's also not completely useless
or something.

584
00:31:04,240 --> 00:31:08,200
It's just that it wasn't our main,
like, oh, we need this.

585
00:31:08,300 --> 00:31:10,960
Yeah, so that's it.

586
00:31:10,960 --> 00:31:12,780
Michael: I think that's what I
find.

587
00:31:12,780 --> 00:31:16,000
Like, I found this project quite
confusing to get, and not just

588
00:31:16,000 --> 00:31:19,960
this 1, like all DuckDB Postgres
related projects, like I find

589
00:31:19,960 --> 00:31:22,800
it hard to get my head around,
and partly because it's like a,

590
00:31:22,800 --> 00:31:28,180
feels like a bag of features, like
firstly I can copy to Parquet

591
00:31:28,180 --> 00:31:32,660
format and I can offload queries,
like I can query through Postgres

592
00:31:32,680 --> 00:31:38,300
to Parquet format, but I can also
execute a query that's completely

593
00:31:38,300 --> 00:31:40,620
local to Postgres using the DuckDB
engine.

594
00:31:40,840 --> 00:31:44,800
That's a totally different feature,
but it's also the same extension.

595
00:31:46,120 --> 00:31:49,120
It took me a while to get my head
around that these are different

596
00:31:49,120 --> 00:31:51,940
use cases, but all provided by
the same extension, if that makes

597
00:31:51,940 --> 00:31:52,660
any sense.

598
00:31:54,140 --> 00:31:55,400
Jelte: Yeah, that's definitely...

599
00:31:56,760 --> 00:31:58,940
I totally understand that it's
a bit confusing.

600
00:31:59,540 --> 00:32:02,280
1 reason, I guess, is because DuckDB
can do so many things.

601
00:32:02,280 --> 00:32:02,780
Yeah.

602
00:32:02,900 --> 00:32:06,100
So it's like, okay, and you kind
of want to be able to combine

603
00:32:06,100 --> 00:32:07,060
all of these things.

604
00:32:07,080 --> 00:32:11,620
So it's like, oh yeah, you want
to join a Parquet file with,

605
00:32:11,980 --> 00:32:14,720
you want to read the Parquet file
and join it with some Postgres

606
00:32:14,720 --> 00:32:19,240
table to, for instance, sort of
hydrate the user IDs with the

607
00:32:19,460 --> 00:32:21,560
recently updated user information.

608
00:32:23,120 --> 00:32:26,040
So that the username is actually
what it is now and not what

609
00:32:26,040 --> 00:32:29,780
it was like 2 months ago when the
record was created in the Parquet

610
00:32:29,780 --> 00:32:30,280
file.

611
00:32:30,360 --> 00:32:34,040
And that sort of means that DuckDB
really needs to be able to

612
00:32:34,040 --> 00:32:35,100
read Postgres tables.

613
00:32:35,140 --> 00:32:39,320
But then, I mean, if you want,
you could also just make it read

614
00:32:39,320 --> 00:32:41,780
Postgres tables also fine.

615
00:32:42,180 --> 00:32:46,220
So it's, I think both, I mean,
both Hydra and MotherDuck, we

616
00:32:46,220 --> 00:32:50,540
had some sort of, we had some idea
about what features we think

617
00:32:50,540 --> 00:32:51,240
are important.

618
00:32:53,800 --> 00:32:56,280
And I mean, you start a bit from
there and like, oh, these are

619
00:32:56,280 --> 00:32:57,740
the use cases we want to support.

620
00:32:57,800 --> 00:33:02,660
And then how can we glue all those
use cases together in a way

621
00:33:03,340 --> 00:33:04,520
that works nice.

622
00:33:04,540 --> 00:33:06,880
And then you also end up with some
other use cases you didn't

623
00:33:06,880 --> 00:33:07,860
even think about.

624
00:33:08,560 --> 00:33:09,060
Joe: Yeah,

625
00:33:10,840 --> 00:33:12,260
it's a really good point.

626
00:33:12,380 --> 00:33:16,980
I've seen Postgres itself described
by many people, even myself,

627
00:33:17,240 --> 00:33:20,460
as a multi-tool type of database.

628
00:33:21,060 --> 00:33:25,020
With adding DuckDB into Postgres
makes it probably even more

629
00:33:25,020 --> 00:33:28,700
of a multi-tool database where
the average user is not going

630
00:33:28,700 --> 00:33:31,980
to use every feature, nor should
they try to.

631
00:33:32,080 --> 00:33:35,380
But if there's something there
that's useful to them that just

632
00:33:35,380 --> 00:33:39,480
kind of comes along with DuckDB,
certainly there's no reason

633
00:33:39,480 --> 00:33:42,420
for us to try and remove it or
disable it.

634
00:33:42,500 --> 00:33:46,560
You kind of get all of these DuckDB
cool features along with

635
00:33:46,560 --> 00:33:50,580
now Postgres, just as a complete
kind of all-in-one multi-tool

636
00:33:51,400 --> 00:33:52,260
type database.

637
00:33:52,540 --> 00:33:56,500
In fact, I've kind of started to
use the term like multi-store

638
00:33:57,040 --> 00:33:57,940
a little bit.

639
00:33:58,520 --> 00:34:01,060
I think single store and unistore
was taken.

640
00:34:01,320 --> 00:34:06,420
So, you know, but really what's
what's happening is you have

641
00:34:06,420 --> 00:34:09,660
2 different databases kind of working
working together and the

642
00:34:09,660 --> 00:34:14,600
data is also can be stored in multiple
formats both in, you know,

643
00:34:14,600 --> 00:34:15,940
heap as well as Parquet.

644
00:34:15,940 --> 00:34:19,220
So, I mean, that could be another
way to think about it.

645
00:34:19,960 --> 00:34:20,460
Michael: Yeah.

646
00:34:20,460 --> 00:34:20,960
Yes.

647
00:34:21,220 --> 00:34:23,480
Because of time, I'm starting to
think about wrapping up.

648
00:34:23,480 --> 00:34:25,600
Do you guys want to talk a little
bit about what you're working

649
00:34:25,600 --> 00:34:28,020
on at the moment or what your plans
are for the future?

650
00:34:28,620 --> 00:34:29,120
Jelte: Sure.

651
00:34:29,140 --> 00:34:33,060
So I think there's a, I mean, we
released like the latest version

652
00:34:33,060 --> 00:34:34,700
like a week ago or something.

653
00:34:35,140 --> 00:34:38,960
And the things we're working on
now, we're working basically

654
00:34:38,960 --> 00:34:40,120
on 2 big things.

655
00:34:40,440 --> 00:34:48,960
1 is redoing the whole way we read
from Postgres, because before

656
00:34:48,960 --> 00:34:53,680
we were sort of doing low-level
reads, sort of like the actual

657
00:34:53,680 --> 00:34:57,980
bytes, sort of re-implementing
the heap reader in a sense.

658
00:34:58,260 --> 00:35:00,820
But that means you don't get index
support automatically, then

659
00:35:00,840 --> 00:35:03,900
you also need to implement the
index reader and stuff like that.

660
00:35:04,440 --> 00:35:09,140
So, so now we're trying to more
sort of build a query again to

661
00:35:09,140 --> 00:35:12,440
do, to do that, like to build a
Postgres query again, and then

662
00:35:12,440 --> 00:35:15,140
give it to the Postgres planner
and then make Postgres figure

663
00:35:15,140 --> 00:35:18,480
out, and that's, that's working
very well because then, then

664
00:35:18,480 --> 00:35:21,380
you get foreign data wrapper support,
so that you can call foreign

665
00:35:21,380 --> 00:35:25,120
data wrappers from Postgres from
within DuckDB again.

666
00:35:27,040 --> 00:35:29,940
So it's definitely all the way
down.

667
00:35:31,980 --> 00:35:33,500
So that's 1 big feature.

668
00:35:33,620 --> 00:35:37,860
And the other is, 1 thing people
have been running into, sort

669
00:35:37,860 --> 00:35:41,100
of we've seen people run into in
the first 2 versions is that

670
00:35:41,100 --> 00:35:45,320
to be able to read from Parquet,
like you use the DuckDB functions

671
00:35:45,320 --> 00:35:50,040
to read Parquet files and reads
Iceberg files, Iceberg catalogs,

672
00:35:50,640 --> 00:35:53,100
you need to know what the columns
are.

673
00:35:53,240 --> 00:35:56,260
And right now you need to specify
those columns and the types

674
00:35:56,540 --> 00:35:59,000
in your query, which is a bit of
a hassle.

675
00:35:59,760 --> 00:36:03,480
So I'm working on a way to make
it so that you don't have to

676
00:36:03,480 --> 00:36:04,260
do that anymore.

677
00:36:04,540 --> 00:36:07,280
Because Postgres wants to be very
strict about these things.

678
00:36:07,280 --> 00:36:09,720
And the DuckDB, like, there's 1
of the big differences between

679
00:36:09,720 --> 00:36:10,520
DuckDB and Postgres.

680
00:36:10,520 --> 00:36:12,500
Postgres is very strict about certain
things.

681
00:36:12,500 --> 00:36:16,880
And DuckDB is like, oh, you typed
this, you probably meant to

682
00:36:16,880 --> 00:36:18,000
do this thing.

683
00:36:18,420 --> 00:36:19,340
This kind of makes sense.

684
00:36:19,340 --> 00:36:21,300
I can understand what you're doing,
what you're trying to do

685
00:36:21,300 --> 00:36:21,800
here.

686
00:36:22,080 --> 00:36:23,820
Let me give you the sensible result.

687
00:36:23,940 --> 00:36:26,920
And Postgres is much more like,
oh, this is not exactly how I

688
00:36:26,920 --> 00:36:27,840
expected it.

689
00:36:28,380 --> 00:36:32,260
So, that's sort of where there's
a bit of a problem.

690
00:36:32,780 --> 00:36:35,260
So I'm trying to make the UX there
a lot nicer.

691
00:36:36,600 --> 00:36:39,760
And I have a way now that's working
quite well, but it needs

692
00:36:39,760 --> 00:36:40,820
a bit more polish.

693
00:36:41,460 --> 00:36:46,420
So yeah, probably after Christmas
or after New Year's, I'll also

694
00:36:46,420 --> 00:36:49,060
create a VR to get that easier
to work with.

695
00:36:49,060 --> 00:36:52,740
So you could just do SELECT *
from read CSV and .f2, or read

696
00:36:52,740 --> 00:36:54,640
parquet and .f2 to specify the
callouts.

697
00:36:54,840 --> 00:36:57,600
That's sort of the user experience
we want.

698
00:36:58,180 --> 00:37:01,520
But right now you need to do SELECT *
from read parquet and

699
00:37:01,520 --> 00:37:04,640
then specify all the callouts that
you actually want to read

700
00:37:04,640 --> 00:37:07,540
from the parquet file, which kind
of defeats the purpose of the

701
00:37:07,540 --> 00:37:08,040
star.

702
00:37:09,520 --> 00:37:10,620
Michael: Makes perfect sense.

703
00:37:10,680 --> 00:37:12,280
And how about from your side, Joe?

704
00:37:12,700 --> 00:37:16,860
Joe: Yeah, I think Jelte covered
it well, as far as the way that

705
00:37:16,860 --> 00:37:18,640
Postgres scan tables.

706
00:37:19,260 --> 00:37:22,840
That should be merged in, I think,
relatively soon, and then

707
00:37:23,140 --> 00:37:26,340
we'll let it bake before the next
you know official release.

708
00:37:26,480 --> 00:37:30,600
As Jelte mentioned, we released
a past version just a week ago

709
00:37:30,600 --> 00:37:34,360
so we have a pretty good clip at
this point of shipping new releases.

710
00:37:34,360 --> 00:37:38,340
If there's something that folks
want to see, certainly head on

711
00:37:38,340 --> 00:37:42,540
over to the repo and open an issue
or let us know.

712
00:37:42,720 --> 00:37:47,300
And then as far as Hydra goes,
as I mentioned, we have a managed

713
00:37:47,300 --> 00:37:51,900
cloud Postgres service that's in
beta that has integrated pg_duckdb.

714
00:37:52,660 --> 00:37:57,700
So we're looking to roll that up
out into public beta, so people

715
00:37:57,700 --> 00:38:01,780
can begin to use pg_duckdb without
having to self-manage it.

716
00:38:01,780 --> 00:38:08,040
Today, you can obviously take pg_duckdb
and put it on EC2 box

717
00:38:08,040 --> 00:38:10,680
with Postgres or whatever you want
to do.

718
00:38:11,040 --> 00:38:13,800
So yeah, looking forward to shipping
that out.

719
00:38:14,240 --> 00:38:15,280
Nikolay: Yeah, that's great.

720
00:38:16,100 --> 00:38:23,300
I just thought, like, 20 years
ago, MySQL was dominating in open

721
00:38:23,300 --> 00:38:27,940
source database field, and they
have several engines.

722
00:38:29,480 --> 00:38:33,080
I remember you needed to pick up,
1 was not crashing, and the

723
00:38:33,080 --> 00:38:35,560
other was had the full tech search.

724
00:38:35,740 --> 00:38:40,460
I mean, my son and then the DB,
it was, yeah, trade off.

725
00:38:40,640 --> 00:38:47,000
And now like 20 years later, we
talk about various engines, basically,

726
00:38:47,940 --> 00:38:52,180
cloud friendly, because it's, you
can upload it to object storage.

727
00:38:53,200 --> 00:38:57,100
It's amazing that we can combine
these in Postgres right now,

728
00:38:57,100 --> 00:38:59,700
and it looks like it's only starting,
right?

729
00:39:02,540 --> 00:39:05,140
A lot of stuff can be added, developed,
and so on.

730
00:39:05,140 --> 00:39:07,400
It's only the beginning as I see
it.

731
00:39:08,080 --> 00:39:13,940
Joe: Yeah, it's pretty interesting
to see Data Lake come to Postgres.

732
00:39:15,040 --> 00:39:18,900
It goes to further your point,
which is many of the layers of

733
00:39:19,200 --> 00:39:21,260
the modern databases are being
pulled apart.

734
00:39:21,500 --> 00:39:26,120
There's multiple engines, there's
object storage, there's compute

735
00:39:26,120 --> 00:39:26,820
with Postgres.

736
00:39:27,440 --> 00:39:30,520
So I think that trend is continuing
to play out.

737
00:39:30,520 --> 00:39:33,960
As far as the average app developer
is concerned, like I was

738
00:39:33,960 --> 00:39:38,200
indicating before, I don't think
they necessarily need to know

739
00:39:38,680 --> 00:39:42,260
or care so much about the differences
between a database versus

740
00:39:42,260 --> 00:39:43,160
a data lake.

741
00:39:43,380 --> 00:39:48,640
So hopefully, The interface being
Postgres helps bridge the gap

742
00:39:48,640 --> 00:39:52,480
of using something really cool
like DuckDB, but together with

743
00:39:52,480 --> 00:39:54,140
what you're used to using in Postgres.

744
00:39:54,860 --> 00:39:56,180
Nikolay: Yeah, that's cool.

745
00:39:56,920 --> 00:39:57,520
Michael: Nice 1.

746
00:39:57,520 --> 00:39:57,800
Yeah.

747
00:39:57,800 --> 00:39:59,880
Well, thank you so much both for
joining.

748
00:39:59,880 --> 00:40:02,820
Thank you, Nikolaj, as always,
and catch you soon.

749
00:40:03,200 --> 00:40:03,580
Nikolay: Yeah.

750
00:40:03,580 --> 00:40:04,400
Thank you guys.

751
00:40:05,560 --> 00:40:06,300
Jelte: Thank you.