1
00:00:00,060 --> 00:00:00,720
Nikolay: Hello, hello.

2
00:00:00,720 --> 00:00:03,300
This is Postgres FM, episode number
107.

3
00:00:04,540 --> 00:00:08,040
My name is Nikolay, founder of
Postgres.AI, and as usual, my

4
00:00:08,040 --> 00:00:10,180
co-host is Michael, pgMustard.

5
00:00:10,260 --> 00:00:10,830
Hi, Michael.

6
00:00:11,760 --> 00:00:12,880
Michael: Hello, Nikolay.

7
00:00:14,700 --> 00:00:20,640
Nikolay: So, you chose the second
most boring topic, in my opinion,

8
00:00:20,800 --> 00:00:21,720
after security.

9
00:00:22,540 --> 00:00:23,820
Tell us what it is.

10
00:00:24,400 --> 00:00:26,680
Michael: Yeah, I can blame our
listeners for this one.

11
00:00:26,680 --> 00:00:28,260
We had a… In my opinion…

12
00:00:28,260 --> 00:00:29,740
Nikolay: Blame someone else, right?

13
00:00:30,060 --> 00:00:30,820
Michael: Yeah, exactly.

14
00:00:31,040 --> 00:00:31,540
Always.

15
00:00:31,800 --> 00:00:32,640
That's the first rule.

16
00:00:32,640 --> 00:00:35,140
Nikolay: This is my favorite methodology
in troubleshooting,

17
00:00:35,560 --> 00:00:36,260
of incidents.

18
00:00:37,540 --> 00:00:39,520
Michael: No, blame this culture,
right?

19
00:00:39,520 --> 00:00:42,420
So we had a great listener suggestion
to talk about compression.

20
00:00:42,700 --> 00:00:45,840
And I guess it's kind of surprising
we haven't covered it yet.

21
00:00:45,840 --> 00:00:51,140
We've covered various topics around
this, but they are specifically

22
00:00:51,280 --> 00:00:56,260
in the context of a couple of extensions
that offer compression

23
00:00:56,360 --> 00:00:56,860
options.

24
00:00:57,400 --> 00:00:59,600
I thought it was a good topic to
cover generally.

25
00:00:59,760 --> 00:01:03,320
We can talk about all the different
things where compression

26
00:01:03,320 --> 00:01:04,540
is available in Postgres.

27
00:01:04,900 --> 00:01:07,620
Kind of a broad but shallow topic
maybe this time.

28
00:01:08,400 --> 00:01:12,520
Nikolay: Yeah, we will talk about
physical, like not lower level

29
00:01:12,520 --> 00:01:17,040
of compression, not like modern
ways to compress data when you

30
00:01:17,040 --> 00:01:22,700
tell some LLM, I have a lot of
data in this table, in this column,

31
00:01:22,700 --> 00:01:27,040
it's a huge chunk of text, let's
just summarize them and drop

32
00:01:27,040 --> 00:01:29,840
the detailed texts and so on, right?

33
00:01:30,060 --> 00:01:32,860
It's actually my favorite way to
compress.

34
00:01:33,460 --> 00:01:34,840
Michael: Well, that's a good point,
actually.

35
00:01:34,840 --> 00:01:39,960
I guess I assumed automatically
we were talking about what would

36
00:01:39,960 --> 00:01:41,500
be called lossless compression.

37
00:01:42,180 --> 00:01:43,520
Yeah, so like with images.

38
00:01:43,520 --> 00:01:44,560
Nikolay: This is definitely lossy.

39
00:01:45,240 --> 00:01:48,040
This will lose some details for
sure.

40
00:01:48,620 --> 00:01:52,000
And it will use some details that
don't exist.

41
00:01:54,060 --> 00:01:58,300
Not only it's glossy, it's also
glossy, I guess.

42
00:01:58,680 --> 00:01:59,180
So,

43
00:01:59,440 --> 00:02:01,320
Michael: Does it stand for lossy
lying machines?

44
00:02:01,320 --> 00:02:02,680
Is that what they stand for?

45
00:02:02,680 --> 00:02:03,180
Lutsky

46
00:02:04,300 --> 00:02:05,040
Nikolay: Yeah, yeah.

47
00:02:05,280 --> 00:02:10,880
So we talk about lossless, lower
level, transparent compression

48
00:02:10,900 --> 00:02:18,260
when we just enable something and
data suddenly takes less disk

49
00:02:18,260 --> 00:02:23,860
space or space when we send a network,
like sending something

50
00:02:23,860 --> 00:02:25,740
in transit, so to speak.

51
00:02:26,200 --> 00:02:30,400
And we are able to uncompress it
without any losses.

52
00:02:31,120 --> 00:02:35,240
But it's all fully automatic and
users just use it, enabling

53
00:02:35,280 --> 00:02:36,660
some features probably.

54
00:02:37,540 --> 00:02:39,440
This is our focus today, right?

55
00:02:40,760 --> 00:02:41,620
Michael: Yeah, exactly.

56
00:02:41,740 --> 00:02:44,940
And I think that's a good transition
into kind of the 2 main

57
00:02:45,060 --> 00:02:45,560
benefits.

58
00:02:46,220 --> 00:02:50,200
And I think they're obviously related,
but I think they are somewhat

59
00:02:50,200 --> 00:02:54,440
different and probably in some
ways trade off against each other

60
00:02:54,680 --> 00:02:55,740
a little bit.

61
00:02:55,760 --> 00:02:59,480
1 is compression for the sake of
storage.

62
00:03:00,040 --> 00:03:05,160
So if we have very repetitive data
or data that compresses really

63
00:03:05,160 --> 00:03:09,660
well, we could maybe spend a lot
less money and less resources

64
00:03:10,520 --> 00:03:12,240
by compressing it.

65
00:03:12,340 --> 00:03:15,180
But so storage is obviously a big
1.

66
00:03:15,180 --> 00:03:18,260
But there's also the performance
side of it.

67
00:03:18,540 --> 00:03:23,000
If it can take up less space, it
might be, depending on where

68
00:03:23,000 --> 00:03:28,040
our bottlenecks are, it might be
that overall the cost or speed

69
00:03:28,920 --> 00:03:33,020
degradation of compressing and
uncompressing or decompressing

70
00:03:33,080 --> 00:03:38,100
the other side is still faster
if we've had to only transport

71
00:03:38,100 --> 00:03:41,420
a lot less data or do calculations
on a lot less data, that kind

72
00:03:41,420 --> 00:03:42,080
of thing.

73
00:03:42,600 --> 00:03:46,840
Nikolay: Let's maybe start with
things we have for many years

74
00:03:46,840 --> 00:03:47,520
in Postgres.

75
00:03:48,460 --> 00:03:51,860
And then discuss some new stuff.

76
00:03:51,860 --> 00:03:56,340
And then discuss and compare what's
beneficial, what's less beneficial.

77
00:03:57,980 --> 00:04:01,880
We have, first of all, compression
at WAL level for full page

78
00:04:01,880 --> 00:04:03,840
writes, full page inserts, right?

79
00:04:04,360 --> 00:04:04,860
FPIs.

80
00:04:06,340 --> 00:04:12,540
And full page inserts need to fix
the problem of difference between

81
00:04:13,200 --> 00:04:17,300
the size of buffers, the pages
in memory, 8 kilobytes in most

82
00:04:17,300 --> 00:04:22,320
cases, and the size of block and
file system, very often 4 kilobytes,

83
00:04:22,440 --> 00:04:24,040
x4 for example, right?

84
00:04:24,960 --> 00:04:30,260
And to avoid partial writes in
the case of failures, Postgres,

85
00:04:31,580 --> 00:04:36,660
after each checkpoint, if the buffer
was changed for the first

86
00:04:36,660 --> 00:04:40,380
time until the next checkpoint,
Postgres doesn't write only the

87
00:04:40,380 --> 00:04:43,120
change itself, it writes the whole
buffer.

88
00:04:44,140 --> 00:04:49,940
And if by default WAL compression
is not enabled, it means whole

89
00:04:50,140 --> 00:04:55,560
buffer 8 kilobytes is written as
is to WAL, consumes 8 kilobytes,

90
00:04:55,560 --> 00:04:56,060
right?

91
00:04:56,680 --> 00:05:00,560
But if we enable WAL compression,
this page is compressed.

92
00:05:02,180 --> 00:05:07,400
And in my opinion, in most cases,
if we talk about significant

93
00:05:07,480 --> 00:05:10,820
load, we should consider enabling
WAL compression.

94
00:05:11,940 --> 00:05:15,340
It can be beneficial, especially
if you have short distance between

95
00:05:15,340 --> 00:05:19,100
checkpoints, because in this case
you have more frequent full-page

96
00:05:19,840 --> 00:05:20,720
writes happening.

97
00:05:21,500 --> 00:05:26,000
Of course if you increase distance,
then maybe the same page

98
00:05:26,000 --> 00:05:27,380
is becoming dirty.

99
00:05:27,380 --> 00:05:32,080
It means writes happen inside it
many times, multiple times,

100
00:05:32,080 --> 00:05:35,940
and only for the first change it
will be full page write.

101
00:05:35,940 --> 00:05:39,520
Subsequent changes will be written
only like deltas, right?

102
00:05:39,520 --> 00:05:42,080
Only tuple, which was changed.

103
00:05:42,540 --> 00:05:47,220
But if you have quite frequent
checkpoints, Enabling WAL compression,

104
00:05:47,220 --> 00:05:52,080
you can significantly reduce the
size of WAL written.

105
00:05:52,080 --> 00:05:59,880
And this has very good positive
consequences, such as less data

106
00:06:00,140 --> 00:06:01,600
to write to backups.

107
00:06:02,640 --> 00:06:08,680
WAL archiving, archive command
will archive less bytes, fewer

108
00:06:08,680 --> 00:06:11,740
maybe like gigabytes per hour,
for example.

109
00:06:12,180 --> 00:06:15,280
And second, replication as well.

110
00:06:15,920 --> 00:06:24,060
WAL is smaller, so replication
has less to transmit over the network.

111
00:06:24,440 --> 00:06:28,140
Of course, compression needs CPU
cycles and decompression needs

112
00:06:28,140 --> 00:06:29,120
CPU cycles.

113
00:06:29,540 --> 00:06:33,040
And I saw on Twitter, some people
mentioned they had problems

114
00:06:33,060 --> 00:06:35,820
when they enabled WAL compression.

115
00:06:36,680 --> 00:06:41,760
But in my cases, I observed we
always decided to switch it on

116
00:06:43,260 --> 00:06:47,300
and CPU overhead was worth it.

117
00:06:47,300 --> 00:06:52,060
So there is a trade-off here, CPU
versus I-O, and we always chose

118
00:06:52,060 --> 00:06:54,120
in favor of less I-O.

119
00:06:55,180 --> 00:06:59,760
And with synthetic tests, I just
showed you before we started

120
00:06:59,760 --> 00:07:04,560
this recording, I showed you we
had a simple PgBench experiment

121
00:07:04,640 --> 00:07:09,300
with max WAL size 4GB, checkpoint
amount 15 minutes.

122
00:07:09,920 --> 00:07:15,300
We saw WAL reduction, I think,
on regular PgBench workloads,

123
00:07:15,940 --> 00:07:18,920
which include writes, of course,
inserts, updates, deletes.

124
00:07:19,120 --> 00:07:26,040
We saw 3 times less WAL created,
generated, or written, it's

125
00:07:26,040 --> 00:07:29,820
the same, right, when we enable
WAL compression, which is a

126
00:07:29,820 --> 00:07:32,060
huge benefit, 3 times less WAL.

127
00:07:33,060 --> 00:07:36,760
But if you had the checkpoint tuning,
as we discussed in some

128
00:07:36,760 --> 00:07:41,840
of our last, like, previous episodes,
if you had it and distance

129
00:07:41,840 --> 00:07:45,980
between checkpoints is quite large,
especially if you have Patroni

130
00:07:46,800 --> 00:07:50,780
and modern Postgres, which in case
of failover or switchover

131
00:07:50,860 --> 00:07:55,140
doesn't require restart of all
nodes, you can afford a bigger distance

132
00:07:55,640 --> 00:07:58,740
between checkpoints and have a maximum
size set of, I don't know,

133
00:07:58,740 --> 00:07:59,840
like 100 gigabytes.

134
00:08:00,720 --> 00:08:06,300
In this case, it's unlikely the
benefit will be so huge, not

135
00:08:06,300 --> 00:08:06,800
3x.

136
00:08:07,640 --> 00:08:11,740
Of course, it also depends a lot
on the data nature, right?

137
00:08:13,080 --> 00:08:15,980
If it's very repetitive, it's easier
to compress.

138
00:08:16,880 --> 00:08:20,320
Michael: Well, yeah, I mean, that's
pretty true across the board

139
00:08:20,320 --> 00:08:21,680
for compression isn't it?

140
00:08:21,740 --> 00:08:25,200
On the WAL front, 1 thing I think
we probably didn't talk about

141
00:08:25,200 --> 00:08:30,320
when we talked about WAL and
checkpoint tuning is on the compression

142
00:08:30,340 --> 00:08:34,800
side as of Postgres 15 which I'm
guessing is pretty new so you

143
00:08:34,800 --> 00:08:38,660
might not have this kind of in
the wild experience yet, but we

144
00:08:38,660 --> 00:08:39,960
do have more options now.

145
00:08:39,960 --> 00:08:43,840
So in the past, we could only turn
WAL compression on or off.

146
00:08:43,940 --> 00:08:47,340
And now we have, instead of on,
we have 3 different options.

147
00:08:47,720 --> 00:08:53,480
We have the previous option, which
is the PGLZ algorithm, but

148
00:08:53,480 --> 00:08:57,520
we also have LZ4 and ZStandard
options now.

149
00:08:57,520 --> 00:09:00,860
And for the people that had, that
kind of complained about the

150
00:09:00,860 --> 00:09:06,820
CPU side of things, 1 option might
be, LZ4 I believe would be

151
00:09:06,820 --> 00:09:07,960
less CPU intensive.

152
00:09:07,960 --> 00:09:09,220
It might not be faster.

153
00:09:09,340 --> 00:09:09,840
Huh?

154
00:09:10,160 --> 00:09:13,520
Nikolay: Faster and more lightweight
in terms of CPU consumption.

155
00:09:14,020 --> 00:09:16,220
But the compression ratio is worse.

156
00:09:17,040 --> 00:09:20,780
Michael: Well, it's not that different
to PGLZ actually.

157
00:09:21,000 --> 00:09:24,260
It's yeah, but compared to some
other modern compression algorithms,

158
00:09:25,360 --> 00:09:26,380
it tends to lose.

159
00:09:26,380 --> 00:09:29,980
But compared to PGLZ, you don't
lose much.

160
00:09:30,040 --> 00:09:32,840
I think it is slightly worse on
average, obviously depends a

161
00:09:32,840 --> 00:09:33,840
lot on the data.

162
00:09:34,220 --> 00:09:37,820
But I think it's a pretty good
option if you have that issue.

163
00:09:37,820 --> 00:09:41,320
And if you're testing, turning
this on, on a modern version of

164
00:09:41,320 --> 00:09:45,480
Postgres, worth trying at least,
whether, depending on your constraints,

165
00:09:45,480 --> 00:09:47,260
trying those different new options.

166
00:09:48,480 --> 00:09:49,980
Nikolay: Yeah, that's a great point.

167
00:09:50,240 --> 00:09:52,600
And 2 ideas here I have in mind.

168
00:09:52,600 --> 00:09:59,120
First of all, I remember these
options, the standard and LZ4,

169
00:09:59,620 --> 00:10:02,280
they require compile flags to be
turned on.

170
00:10:02,280 --> 00:10:07,860
I guess, as you rightfully said,
I don't have a lot of...

171
00:10:08,080 --> 00:10:11,720
It's fresh things, so I don't have
production, rich production

172
00:10:11,720 --> 00:10:12,220
experience.

173
00:10:13,440 --> 00:10:18,380
But these flags, I guess they are
turned on in the official apt

174
00:10:18,380 --> 00:10:19,420
packages, right?

175
00:10:19,600 --> 00:10:21,600
If you just install it on fresh
Ubuntu.

176
00:10:21,780 --> 00:10:24,220
I hope it's so, I hope it's so,
but worth checking.

177
00:10:25,120 --> 00:10:26,780
This option should be turned on.

178
00:10:26,940 --> 00:10:31,280
So Postgres should be compiled
with support of these 2 algorithms.

179
00:10:32,020 --> 00:10:34,940
And second thing, I think it's
worth new research.

180
00:10:35,280 --> 00:10:39,220
Maybe with our bot it should be
quite easy, and we should just

181
00:10:39,220 --> 00:10:43,820
research maybe with different workloads
and maybe with different

182
00:10:43,820 --> 00:10:44,620
Maxwell size.

183
00:10:44,620 --> 00:10:47,860
As I said, it's very important
in this case, the distance between

184
00:10:47,860 --> 00:10:48,360
checkpoints.

185
00:10:48,820 --> 00:10:53,680
And just with regular PagerBench
we could check all available

186
00:10:53,860 --> 00:10:58,360
algorithms on fresh Postgres versions,
and maybe draw some charts,

187
00:10:58,360 --> 00:10:58,840
right?

188
00:10:58,840 --> 00:11:00,080
Plot some charts.

189
00:11:00,460 --> 00:11:03,960
So yeah, we just like hands, but
bot is definitely ready for

190
00:11:03,960 --> 00:11:07,000
that.
So I think it's a great direction.

191
00:11:07,440 --> 00:11:09,700
I wish I had more hands, by the
way.

192
00:11:09,880 --> 00:11:14,140
Like I wanted to say we are looking
for maybe part-time database

193
00:11:14,140 --> 00:11:19,300
engineers, people who want some
fun with this kind of work, like

194
00:11:19,300 --> 00:11:19,800
research.

195
00:11:20,580 --> 00:11:23,940
We usually do it public, so it's
kind of interesting for community

196
00:11:23,960 --> 00:11:24,620
as well.

197
00:11:25,080 --> 00:11:28,820
So if some people listening to
this podcast want to participate

198
00:11:28,940 --> 00:11:33,600
and work part-time with us with
Postgres.AI, definitely would

199
00:11:33,600 --> 00:11:36,160
be interested in discussing.

200
00:11:36,560 --> 00:11:40,420
So maybe we will have more hands
and do this benchmark, for example.

201
00:11:40,680 --> 00:11:42,780
Michael: What's the best way for
them to get in touch with you?

202
00:11:42,780 --> 00:11:45,960
Nikolay: Email or Twitter
or LinkedIn, but email

203
00:11:45,960 --> 00:11:50,040
nik@postgres.ai is always a good way
to contact me.

204
00:11:50,740 --> 00:11:54,180
Michael: On the packaging front,
I don't know, I haven't checked

205
00:11:54,220 --> 00:11:57,340
whether they have compiled Postgres
with those flags, but I was

206
00:11:57,340 --> 00:11:58,380
pleasantly surprised.

207
00:11:59,280 --> 00:12:03,780
I'm on Google Cloud SQL for my
own, for the pgMustard database,

208
00:12:04,280 --> 00:12:06,860
and recently upgraded to Postgres
16, finally.

209
00:12:07,480 --> 00:12:10,520
And was pleasantly surprised that
I was able to turn on WAL

210
00:12:10,520 --> 00:12:14,980
compression with LZ4 and TOAST,
I was able to switch our default

211
00:12:15,060 --> 00:12:18,340
TOAST compression algorithm to
LZ4 as well, which is really cool.

212
00:12:18,340 --> 00:12:19,380
Nikolay: On 16 you said?

213
00:12:19,940 --> 00:12:24,600
Michael: I'm on 16, but I think
those are available as of version

214
00:12:24,600 --> 00:12:27,940
15 for WAL and version 14 for
TOAST.

215
00:12:28,940 --> 00:12:29,700
Nikolay: That's great.

216
00:12:30,400 --> 00:12:34,200
So what benefits, like, do you
remember some numbers?

217
00:12:35,340 --> 00:12:39,940
Michael: So in my, I didn't do
like extensive benchmarking, but

218
00:12:40,600 --> 00:12:46,160
like the thing we use it for most
is we have saved plans.

219
00:12:46,220 --> 00:12:48,580
So EXPLAIN plans compress really
well.

220
00:12:48,580 --> 00:12:49,620
It's a lot of repeat.

221
00:12:51,580 --> 00:12:52,320
Nikolay: Yeah, many.

222
00:12:52,720 --> 00:12:54,560
Yeah, it makes sense.

223
00:12:54,640 --> 00:12:56,180
So it's in JSON, right?

224
00:12:57,100 --> 00:13:01,760
Michael: Well, both like text plans
compress well too, But JSON

225
00:13:01,780 --> 00:13:03,840
plans compress extremely well.

226
00:13:04,660 --> 00:13:06,880
But obviously JSON plans are bigger
in the first place.

227
00:13:06,880 --> 00:13:07,920
So there's not...

228
00:13:07,920 --> 00:13:08,980
Nikolay: Yeah, that's good.

229
00:13:09,400 --> 00:13:10,920
Michael: So yeah, it compresses
really well.

230
00:13:10,920 --> 00:13:16,160
But the main thinking was, we don't
mind spending a bit more

231
00:13:16,160 --> 00:13:20,840
on storage for those if the speed
of retrieving them will be

232
00:13:20,840 --> 00:13:21,180
quicker.

233
00:13:21,180 --> 00:13:24,140
So people save a plan, they might
share it with somebody on their

234
00:13:24,140 --> 00:13:26,280
team and that person needs to load
it.

235
00:13:26,280 --> 00:13:30,300
Obviously, it's like we're talking
small amounts of time, But

236
00:13:30,300 --> 00:13:33,420
if the storage wasn't that different
and the speed was faster,

237
00:13:33,460 --> 00:13:35,100
I was happy to make the switch.

238
00:13:35,800 --> 00:13:39,880
And yeah, it turned out the storage
was slightly worse on average

239
00:13:39,880 --> 00:13:43,860
for the plans I tested with LZ4,
but the retrieval speed was

240
00:13:43,860 --> 00:13:44,340
faster.

241
00:13:44,340 --> 00:13:46,120
So I bit the bullet and did it.

242
00:13:46,120 --> 00:13:49,160
The cool thing is, the thing I
didn't realize, I thought it would

243
00:13:49,160 --> 00:13:50,460
be a complex migration.

244
00:13:50,900 --> 00:13:52,020
But like, what do you do?

245
00:13:52,020 --> 00:13:55,360
Like, I thought you might have
to change existing plans or existing

246
00:13:55,680 --> 00:13:56,180
data.

247
00:13:56,360 --> 00:13:57,180
But you don't.

248
00:13:57,180 --> 00:14:01,707
If you change the setting, it applies
to new data.

249
00:14:01,913 --> 00:14:02,620
Nikolay: Right.

250
00:14:02,740 --> 00:14:05,640
Yeah, actually, this is a good
point we didn't mention.

251
00:14:06,040 --> 00:14:10,240
So we're shifting to discussion
of storage, compression, and

252
00:14:10,240 --> 00:14:11,100
host, right?

253
00:14:11,400 --> 00:14:15,480
But it's a good point we forgot
to mention about wall compression.

254
00:14:15,480 --> 00:14:16,780
It does require restart.

255
00:14:17,520 --> 00:14:22,200
So you can switch it on, switch
it off, and all new writes will

256
00:14:22,200 --> 00:14:23,760
happen according to the new setting.

257
00:14:23,760 --> 00:14:28,280
Of course, you need to send the
SIGHUP signal or select PgReloadConf,

258
00:14:30,760 --> 00:14:34,540
so configuration changes are applied
without any restart, which is

259
00:14:34,540 --> 00:14:35,040
great.

260
00:14:35,280 --> 00:14:38,000
And it also means it's easier to
try.

261
00:14:38,500 --> 00:14:41,440
If you have monitoring, if you're
prepared to roll back, it's

262
00:14:41,440 --> 00:14:42,620
easier to try.

263
00:14:42,620 --> 00:14:45,280
And if things go wrong, you can
return.

264
00:14:46,100 --> 00:14:47,820
So TOAST compression is interesting.

265
00:14:49,080 --> 00:14:53,860
And so, again, like, sorry, I'm
not paying attention to details

266
00:14:53,860 --> 00:14:54,640
today somehow.

267
00:14:54,880 --> 00:14:56,820
You chose this standard, right?

268
00:14:57,880 --> 00:14:58,940
Michael: I actually chose LZ4.

269
00:15:01,380 --> 00:15:07,160
I was more interested in the speed
of compression and speed of

270
00:15:07,160 --> 00:15:12,260
retrieval than I was for total
size on disk.

271
00:15:14,440 --> 00:15:19,100
Nikolay: Meanwhile, I asked our
bot to double-check how apt packages,

272
00:15:19,400 --> 00:15:23,120
official pgdg packages are created.

273
00:15:23,120 --> 00:15:27,240
Of course, these 2 options are
there, so if you install Postgres

274
00:15:27,240 --> 00:15:31,720
on Ubuntu, you can try various
compression algorithms.

275
00:15:32,020 --> 00:15:34,200
Well, just to double-check.

276
00:15:34,640 --> 00:15:36,580
Okay, so TOAST, what else?

277
00:15:36,580 --> 00:15:43,220
We don't have a good ability to
compress table data, right?

278
00:15:44,760 --> 00:15:48,380
By default in Postgres we don't
have it, And it's actually not

279
00:15:48,380 --> 00:15:49,940
a simple topic in restore.

280
00:15:50,900 --> 00:15:56,240
In restore, we have, yeah, so tuple
by tuple is stored and it

281
00:15:56,240 --> 00:15:57,680
can go in different pages.

282
00:15:57,700 --> 00:16:02,720
So like, If we compress it, we
can transparently compress it

283
00:16:02,720 --> 00:16:04,540
switching to ZFS, for example.

284
00:16:04,840 --> 00:16:10,200
We saw benefits in terms of disk
space, like 10 to 30 percent,

285
00:16:10,200 --> 00:16:11,820
depending on the data nature.

286
00:16:12,900 --> 00:16:17,460
But ZFS brings new challenges for
administration, for sure.

287
00:16:18,420 --> 00:16:20,780
It's still good to shave off 30%.

288
00:16:21,820 --> 00:16:24,720
And this is what you get by default
if you install DBLab because

289
00:16:24,720 --> 00:16:25,540
it's on ZFS.

290
00:16:25,600 --> 00:16:30,140
So if you have, for example, a terabyte
size database in DBLab,

291
00:16:30,140 --> 00:16:33,820
it will look like 700 to 800 gigabytes
only.

292
00:16:33,820 --> 00:16:35,640
It's much better.

293
00:16:36,360 --> 00:16:39,440
But yeah, so But in Postgres itself,
there are no good compression

294
00:16:39,440 --> 00:16:41,340
options for heap.

295
00:16:42,440 --> 00:16:46,100
Michael: Except for single large,
like if you have 1 column that's

296
00:16:46,100 --> 00:16:51,660
large, I think toast is very good,
but not for like, smaller

297
00:16:51,660 --> 00:16:52,160
values.

298
00:16:53,120 --> 00:16:56,660
Nikolay: Oh, actually, yeah, big
values are compressed in heap,

299
00:16:56,660 --> 00:16:57,160
right?

300
00:16:57,180 --> 00:17:01,140
Now I remember before going to
TOAST, like Postgres tries to,

301
00:17:01,240 --> 00:17:06,200
to, to squeeze, like, to, to fit
them in size, like 2 kilobytes

302
00:17:06,300 --> 00:17:09,500
and have 4, 4, roughly 4 tuples
per page.

303
00:17:10,320 --> 00:17:13,580
Roughly, like it's like kind of
things I remember.

304
00:17:13,680 --> 00:17:16,120
Michael: Once it's compressed,
it's under 1 kilobyte or

305
00:17:16,120 --> 00:17:18,840
something like that, they put it
in line on the page.

306
00:17:19,120 --> 00:17:19,860
Nikolay: Or in line.

307
00:17:19,860 --> 00:17:20,360
Michael: Yeah.

308
00:17:20,420 --> 00:17:23,660
It just means that if you have
values that are multiple kilobytes

309
00:17:23,800 --> 00:17:28,740
and compress well, even the default
PGLZ will give you quite

310
00:17:28,740 --> 00:17:32,820
a lot of compression out of the
box, transparently in Postgres.

311
00:17:32,900 --> 00:17:37,800
It's just, if you have lots of
repeating small data, like time

312
00:17:37,800 --> 00:17:43,260
series data, that could compress
as a whole very well, the row

313
00:17:43,260 --> 00:17:43,760
store...

314
00:17:44,720 --> 00:17:48,840
Nikolay: But this compression is
applied to a single tuple, single

315
00:17:48,840 --> 00:17:50,040
row version.

316
00:17:50,140 --> 00:17:54,060
So a single row version, it means
that we only have 1 timestamp,

317
00:17:54,060 --> 00:17:57,080
for example, 1 temperature, and
so on.

318
00:17:57,380 --> 00:18:00,720
Several different columns, maybe
a couple of timestamps, but

319
00:18:00,720 --> 00:18:02,580
different nature of timestamps.

320
00:18:02,600 --> 00:18:05,460
For example, created that and I
don't know, like registered that,

321
00:18:05,460 --> 00:18:06,640
something different timestamp.

322
00:18:07,200 --> 00:18:11,640
And Postgres tries to compress
the tuple and TOAST, like we didn't

323
00:18:11,640 --> 00:18:13,980
cover and there's no goal to cover
it deeply.

324
00:18:13,980 --> 00:18:16,200
We had another episode on it.

325
00:18:16,800 --> 00:18:17,140
Right.

326
00:18:17,140 --> 00:18:18,700
So Postgres tries to compress.

327
00:18:18,700 --> 00:18:23,000
If it doesn't fit, it shrinks it
already and chunks are stored

328
00:18:23,000 --> 00:18:27,340
in separate so-called TOAST table,
which is actually also a regular

329
00:18:27,340 --> 00:18:31,160
table, which is kind of invisible
to user, but you can inspect

330
00:18:31,160 --> 00:18:32,540
it if you want as well.

331
00:18:32,680 --> 00:18:38,440
Then the compression and reconstruction
of tuple is occurring

332
00:18:38,440 --> 00:18:39,400
when it's needed.

333
00:18:39,860 --> 00:18:42,420
But what I'm trying to say is there is
no...

334
00:18:44,180 --> 00:18:49,840
For the heap itself, there are no rich
capabilities to control compression.

335
00:18:51,140 --> 00:18:55,660
And even if we had them, expected
benefits would be not high

336
00:18:55,680 --> 00:19:00,060
compared to analytical column store
databases, where, for example,

337
00:19:00,060 --> 00:19:04,360
we have all temperatures or all
timestamps stored in a separate

338
00:19:04,360 --> 00:19:04,860
file.

339
00:19:06,600 --> 00:19:09,500
Only this column for all rows is
stored here.

340
00:19:10,680 --> 00:19:16,280
They all are temperatures and maybe
these temperatures are coming

341
00:19:16,280 --> 00:19:20,140
from, for example, if they're coming
from a single source, they

342
00:19:20,140 --> 00:19:21,560
don't jump.

343
00:19:23,100 --> 00:19:26,820
I don't remember the word in English,
sorry, but the changes

344
00:19:26,820 --> 00:19:28,300
are not acute.

345
00:19:30,140 --> 00:19:33,420
If they change, for example, if
temperature is increasing, probably

346
00:19:33,420 --> 00:19:34,900
it will be increasing for some
time.

347
00:19:34,900 --> 00:19:38,400
It means that compression could
be done using various algorithms,

348
00:19:38,420 --> 00:19:41,540
for example, applying deltas and
storing only deltas.

349
00:19:43,980 --> 00:19:48,800
We will probably discuss TimescaleDB
soon because this is pointing

350
00:19:49,020 --> 00:19:52,320
in the direction of TimescaleDB
and what it provides.

351
00:19:52,640 --> 00:19:58,040
But so for ColumnStore you can
compress like 10x which is not

352
00:19:58,040 --> 00:19:58,440
possible

353
00:19:58,440 --> 00:19:59,180
Michael: for rows.

354
00:19:59,270 --> 00:20:00,880
Yeah, or more even.

355
00:20:01,120 --> 00:20:05,900
But I think you're missing the
multiples interesting, right?

356
00:20:05,900 --> 00:20:09,820
But you've missed the fact that,
like, let's say we had a row

357
00:20:09,820 --> 00:20:14,840
that is, let's say our tuples are
like a megabyte, but 99.9%

358
00:20:15,480 --> 00:20:20,280
of that is a single JSON column,
and we have 10 other tiny little

359
00:20:20,280 --> 00:20:21,140
integer columns.

360
00:20:21,140 --> 00:20:25,020
We get way more than 10x compression
ratio just with TOAST, you

361
00:20:25,020 --> 00:20:28,220
know, as if that block is as well.

362
00:20:28,780 --> 00:20:29,280
Yeah.

363
00:20:29,480 --> 00:20:29,880
So, I

364
00:20:29,880 --> 00:20:30,380
Nikolay: don't...

365
00:20:31,160 --> 00:20:35,640
As you know, probably, I, like,
when I was very young, I participated

366
00:20:35,740 --> 00:20:40,040
a little bit in XML function and
data type development in Postgres,

367
00:20:40,040 --> 00:20:44,400
and I remember that time XML compression
was the thing, big thing.

368
00:20:44,680 --> 00:20:50,340
And I even remember hardware accelerators
companies were selling

369
00:20:50,740 --> 00:20:53,400
to compress XML on the fly transparently.

370
00:20:54,180 --> 00:20:54,900
It's crazy.

371
00:20:54,960 --> 00:21:01,620
It's because a lot of structural
pieces of the values can be

372
00:21:01,780 --> 00:21:05,800
compressed well in XML for sure,
but less than JSON, but still

373
00:21:05,800 --> 00:21:12,900
also all those parentheses, braces,
quotes and so on.

374
00:21:13,860 --> 00:21:20,680
Or maybe it's also some, if it's
JSON, not JSON B, a lot of white

375
00:21:20,680 --> 00:21:21,180
spaces.

376
00:21:22,200 --> 00:21:22,700
Right?

377
00:21:22,780 --> 00:21:23,280
Yeah.

378
00:21:23,560 --> 00:21:24,060
Yeah.

379
00:21:24,340 --> 00:21:25,120
So, yeah.

380
00:21:25,460 --> 00:21:31,820
If you have JSON and, yeah, so
in some cases, compression and

381
00:21:31,820 --> 00:21:35,720
TOAST, like tuning it, it's interesting.

382
00:21:35,740 --> 00:21:38,960
And maybe also we need benchmarks,
so maybe for JSON as well.

383
00:21:40,520 --> 00:21:41,040
It's good.

384
00:21:41,040 --> 00:21:41,600
Michael: Yeah, it
Nikolay: would be

385
00:21:41,600 --> 00:21:45,320
Michael: interesting, but what
I mean is more like the compression

386
00:21:45,320 --> 00:21:51,040
ratio is a lot, is varied, obviously
extremely dependent on the

387
00:21:51,040 --> 00:21:51,540
data.

388
00:21:51,860 --> 00:21:52,900
But I think

389
00:21:54,320 --> 00:21:56,640
Nikolay: I'm very curious what
you have, for example, if you

390
00:21:56,640 --> 00:21:59,920
know that some old values are not
compressed and some new values

391
00:21:59,920 --> 00:22:05,340
are compressed, you check how much
storage is occupied by, I

392
00:22:05,340 --> 00:22:08,400
don't know, by like 1000 rows or
so.

393
00:22:09,000 --> 00:22:09,980
It's worth checking.

394
00:22:10,440 --> 00:22:14,600
Michael: Well, you can't, it's
like impossible to store large

395
00:22:14,600 --> 00:22:18,440
values in Postgres without, actually
no, you can turn compression

396
00:22:18,440 --> 00:22:21,620
off, but TOAST, you can turn compression
off with TOAST, but

397
00:22:21,620 --> 00:22:22,700
the default is on.

398
00:22:22,700 --> 00:22:24,560
So I have never tried it without.

399
00:22:25,240 --> 00:22:28,500
So I don't have that to compare,
but just give you an idea.

400
00:22:28,500 --> 00:22:34,040
We didn't use to compress plans
in so we only store plans in

401
00:22:34,040 --> 00:22:37,260
local storage by default, I can
browse local storage, and that's

402
00:22:37,260 --> 00:22:38,680
an extremely limited resource.

403
00:22:38,680 --> 00:22:41,760
Like in some I think in Firefox,
it's like 5 or 10 megabytes,

404
00:22:41,760 --> 00:22:42,840
depending on the version.

405
00:22:43,180 --> 00:22:47,040
So that's not very much when you're
talking large query plans.

406
00:22:47,040 --> 00:22:49,640
Like, obviously, it's a lot for
most people's query plans, but

407
00:22:49,640 --> 00:22:52,780
some people will come up with query
plans that are like, dozens

408
00:22:52,780 --> 00:22:53,460
of megabytes.

409
00:22:54,960 --> 00:22:57,860
Nikolay: Varlena data types accept
up to 1 gigabyte, right.

410
00:22:57,860 --> 00:23:02,640
But as I remember, It's better
not to go beyond a couple of hundred

411
00:23:02,640 --> 00:23:05,340
megabytes because the performance
will become terrible.

412
00:23:05,980 --> 00:23:08,160
Michael: Well, just to give you
an idea though, quickly, these

413
00:23:08,160 --> 00:23:12,180
plans were compressing from megabytes
down to kilobytes.

414
00:23:12,340 --> 00:23:17,520
Like it was easily, yeah, It was
more than 95%.

415
00:23:17,860 --> 00:23:20,340
I think in some cases, like, yeah.

416
00:23:20,500 --> 00:23:21,060
Nikolay: That's cool.

417
00:23:21,060 --> 00:23:25,160
But still, it's all compression
of only 1 tuple and actually

418
00:23:25,160 --> 00:23:26,400
1 value, not tuple.

419
00:23:26,600 --> 00:23:27,840
Michael: 1 value, yeah.

420
00:23:29,540 --> 00:23:33,640
Nikolay: But It feels like it would
make sense to have some compression

421
00:23:33,640 --> 00:23:35,820
at maybe at page level as well
and so on.

422
00:23:35,820 --> 00:23:40,240
This is what ZFS provides transparently
if you put Postgres on

423
00:23:40,240 --> 00:23:41,260
top of it, right?

424
00:23:41,540 --> 00:23:45,540
So compressing pages, I think,
even if it's still a raw store,

425
00:23:45,540 --> 00:23:47,960
maybe having these abilities...

426
00:23:47,960 --> 00:23:52,840
I remember Peter Geoghegan talked a
lot about new features in MySQL

427
00:23:52,940 --> 00:23:53,980
many years ago.

428
00:23:54,720 --> 00:23:59,360
I think MySQL has more settings
in this area for compression

429
00:23:59,440 --> 00:24:00,120
of storage.

430
00:24:01,680 --> 00:24:06,300
Have you heard or remember something
from it?

431
00:24:06,420 --> 00:24:08,660
We have mostly questions today,
right?

432
00:24:08,660 --> 00:24:10,380
Michael: I didn't know that.

433
00:24:11,280 --> 00:24:15,560
Well, they seem to be a bit further
ahead on the storage engine

434
00:24:15,600 --> 00:24:16,720
side of things, don't they?

435
00:24:16,720 --> 00:24:19,640
Like they've had multiple storage
engines for quite a lot longer

436
00:24:19,640 --> 00:24:20,020
than...

437
00:24:20,020 --> 00:24:21,420
Well, like we don't have...

438
00:24:21,560 --> 00:24:24,960
But yeah, so it wouldn't surprise
me if they were ahead on that

439
00:24:24,960 --> 00:24:25,460
front.

440
00:24:25,840 --> 00:24:28,940
But yeah, you mentioned Timescale,
it feels like we've had a

441
00:24:28,940 --> 00:24:33,620
history of a few extensions that
have offered the option for

442
00:24:34,140 --> 00:24:35,420
columnar storage.

443
00:24:35,900 --> 00:24:38,660
Nikolay: Before going there, let's
just touch a little bit on

444
00:24:38,680 --> 00:24:42,840
options that Postgres could have,
and there are some discussions.

445
00:24:43,980 --> 00:24:46,560
Like, what else could be compressed
inside?

446
00:24:47,080 --> 00:24:48,620
For example, temporary files.

447
00:24:50,460 --> 00:24:51,600
Yes, good idea.

448
00:24:51,660 --> 00:24:54,140
Michael: But we should also probably
talk about things that we

449
00:24:54,140 --> 00:24:58,400
actually haven't mentioned, like
backup files, pgdump files.

450
00:24:58,400 --> 00:24:59,720
Nikolay: I know, this is already...

451
00:24:59,720 --> 00:25:04,640
Yeah, I wanted to do it, but look,
before going to backups or

452
00:25:04,640 --> 00:25:08,940
dumps, backups and dumps, it's
already going outside of Postgres.

453
00:25:08,940 --> 00:25:12,900
But imagine we have Postgres, we
run some queries, we discuss

454
00:25:12,900 --> 00:25:16,340
the WAL, which is the first thing
we should discuss because

455
00:25:16,340 --> 00:25:20,740
data is written there first before
it's written in data files

456
00:25:20,740 --> 00:25:22,100
or pages and so on.

457
00:25:22,440 --> 00:25:26,180
So we discussed WAL, we discussed
the storage itself, TOAST,

458
00:25:26,280 --> 00:25:28,640
this is what we have and that's
it.

459
00:25:29,640 --> 00:25:33,640
Not at page level, but it can be
achieved by a file system, but

460
00:25:33,640 --> 00:25:34,380
that's it.

461
00:25:34,740 --> 00:25:37,500
Indexes, I have no idea.

462
00:25:39,340 --> 00:25:43,860
Maybe deduplication, which was
done in Postgres 13, 14, but by

463
00:25:43,860 --> 00:25:47,860
Peter Geoghegan, Anastasia Lubyannikova,
maybe this can be considered

464
00:25:47,860 --> 00:25:49,600
as compression actually, right?

465
00:25:49,940 --> 00:25:51,400
Michael: That's a good shout, yeah.

466
00:25:52,280 --> 00:25:54,860
Nikolay: Native compression, you
know, like pieces of compression

467
00:25:54,860 --> 00:25:57,700
because it occupies less space,
right?

468
00:25:57,700 --> 00:25:58,640
It's kind of...

469
00:25:59,100 --> 00:25:59,740
Why not?

470
00:26:00,080 --> 00:26:02,720
So, But it's optimization, but
still.

471
00:26:02,720 --> 00:26:06,420
And last time we discussed, what
was the topic last week?

472
00:26:07,360 --> 00:26:08,040
Michael: Out of disk.

473
00:26:08,440 --> 00:26:09,660
Nikolay: Out of disk, right.

474
00:26:09,720 --> 00:26:15,260
And remember we mentioned, I named
it exotic and some listener

475
00:26:15,920 --> 00:26:21,000
commented on YouTube, By the way,
thank you all those who write

476
00:26:21,000 --> 00:26:24,060
something on YouTube or on Twitter
or LinkedIn.

477
00:26:24,140 --> 00:26:26,300
It's very good to receive feedback.

478
00:26:26,600 --> 00:26:30,460
So they mentioned that temporary
files is not that exotic.

479
00:26:31,100 --> 00:26:35,820
And if you run huge, heavy queries
so they had it, you can be

480
00:26:35,820 --> 00:26:39,960
out of disk space because temporary
files were huge and consumed

481
00:26:39,960 --> 00:26:40,940
a lot of disk.

482
00:26:41,400 --> 00:26:45,420
And the compression of them makes
total sense to me, right?

483
00:26:45,940 --> 00:26:46,860
What do you think?

484
00:26:47,640 --> 00:26:48,900
We don't have it, right?

485
00:26:48,900 --> 00:26:51,340
Michael: I was watching a really
good video that you sent me

486
00:26:51,340 --> 00:26:53,300
a link to by Andrei

487
00:26:54,014 --> 00:26:54,514
Nikolay: Borodin.

488
00:26:55,229 --> 00:26:56,943
Right, right.

489
00:26:57,657 --> 00:26:59,372
Michael: Cybertech 2023.

490
00:27:00,086 --> 00:27:00,586
Yes.

491
00:27:01,300 --> 00:27:06,340
And I watched it last night and
he mentioned that in order to

492
00:27:06,340 --> 00:27:06,640
be...

493
00:27:06,640 --> 00:27:08,340
I think there was a problem with
durability.

494
00:27:10,440 --> 00:27:10,931
If you want to be...

495
00:27:10,931 --> 00:27:11,900
Nikolay: Forget
about problems.

496
00:27:12,540 --> 00:27:13,480
This is like...

497
00:27:13,680 --> 00:27:14,340
I'm just...

498
00:27:14,340 --> 00:27:15,340
It doesn't make sense.

499
00:27:15,340 --> 00:27:15,900
My answer is

500
00:27:15,900 --> 00:27:16,340
Michael: yes, it does.

501
00:27:16,340 --> 00:27:19,240
David Willis He started doing the
work and it's more complex

502
00:27:19,240 --> 00:27:20,280
than it might sound.

503
00:27:20,820 --> 00:27:23,720
But yeah, it makes sense, but it's
complicated.

504
00:27:24,520 --> 00:27:27,840
Nikolay: Full disclaimer, I just
chatted with Andrei this morning

505
00:27:28,180 --> 00:27:32,360
and this is what he told me and
I'm just shamelessly using his

506
00:27:32,360 --> 00:27:32,860
ideas.

507
00:27:33,400 --> 00:27:37,160
So it's worth compressing pages,
as I mentioned, and it's worth

508
00:27:37,540 --> 00:27:38,940
compressing temporary files.

509
00:27:39,160 --> 00:27:42,380
Andrei says it's just lack of time
and hands, but it's worth

510
00:27:42,380 --> 00:27:44,160
implementing and proposing Pudge.

511
00:27:44,600 --> 00:27:46,840
So, Yeah, it's a great idea.

512
00:27:46,840 --> 00:27:50,100
I hope he will find time to work
on this and other people probably

513
00:27:50,100 --> 00:27:53,200
will find time to move Postgres
forward in this area.

514
00:27:53,840 --> 00:27:58,060
And in this case, imagine if temporary
files are compressed.

515
00:27:58,260 --> 00:28:02,840
In this case, I would be more right
saying it's exotic to be

516
00:28:02,840 --> 00:28:06,560
out of disk space when you have
temporary files occupying a lot

517
00:28:06,560 --> 00:28:07,740
of disk space, right?

518
00:28:08,240 --> 00:28:12,240
Michael: I think it makes extra
sense because the type of things

519
00:28:12,240 --> 00:28:18,380
that are generating temporary files
like sorts and hashes, it's

520
00:28:18,660 --> 00:28:22,260
probably a lot of similar values.

521
00:28:22,640 --> 00:28:26,080
I'm guessing there would be a fair
amount of repetition which

522
00:28:27,040 --> 00:28:29,340
would naturally suit compression
as well.

523
00:28:29,340 --> 00:28:31,660
So I'm wondering if there's like
a...

524
00:28:31,880 --> 00:28:34,540
Not only would it just take up
less space, and obviously there's

525
00:28:34,540 --> 00:28:36,720
benefits there, but I'm wondering
if there might be performance

526
00:28:36,760 --> 00:28:37,700
benefits too.

527
00:28:38,260 --> 00:28:39,720
Nikolay: Yeah, that's a good point
actually.

528
00:28:39,720 --> 00:28:43,080
When we talk about saving disk
space, sometimes it's also about

529
00:28:43,080 --> 00:28:43,580
performance.

530
00:28:43,740 --> 00:28:51,000
If disk is slow, It's better to
consume cycles of CPU for compressing,

531
00:28:51,100 --> 00:28:54,400
decompressing, and things become
much faster.

532
00:28:54,780 --> 00:28:55,240
I agree.

533
00:28:55,240 --> 00:28:56,340
This is a good point.

534
00:28:56,660 --> 00:28:58,120
But it's not always so.

535
00:28:58,520 --> 00:28:59,160
It depends.

536
00:29:01,560 --> 00:29:05,420
In general, I think I saw somewhere,
there's a general point

537
00:29:05,420 --> 00:29:07,940
is that databases are I-O hungry.

538
00:29:08,100 --> 00:29:12,980
And like, as you very well know,
we talk buffers, buffers all

539
00:29:12,980 --> 00:29:13,260
the time.

540
00:29:13,260 --> 00:29:16,920
It's just like confirming it's
all about I-O usually.

541
00:29:17,020 --> 00:29:22,400
But I also saw CPU 100% and lag
CPU, especially for example,

542
00:29:22,660 --> 00:29:28,640
we are on Graviton or like ARM
and we only have up to 64 vCPUs

543
00:29:29,540 --> 00:29:30,392
and That's it.

544
00:29:30,392 --> 00:29:33,340
These days it's not a lot, 64,
right?

545
00:29:33,340 --> 00:29:35,900
It's like moderate number of…

546
00:29:35,900 --> 00:29:37,840
Michael: It's a lot for some people,
yeah.

547
00:29:38,100 --> 00:29:38,980
Nikolay: Right, right.

548
00:29:39,340 --> 00:29:44,840
But it's not 360, it's not 800
like you showed me on AWS for

549
00:29:44,860 --> 00:29:46,360
Intel scalable Xeon.

550
00:29:46,960 --> 00:29:53,540
So if it's 64, a lot of clients,
in this case CPU becomes a very

551
00:29:53,600 --> 00:29:55,220
valuable resource.

552
00:29:56,000 --> 00:29:56,500
Right?

553
00:29:57,040 --> 00:29:57,620
Michael: Of course.

554
00:29:57,840 --> 00:29:58,140
Nikolay: Right.

555
00:29:58,140 --> 00:30:02,720
So in this case, probably we might
be preferring to spend more

556
00:30:02,780 --> 00:30:03,840
disk IO cycles.

557
00:30:04,640 --> 00:30:08,490
Okay, what else can be compressed
in Postgres?

558
00:30:08,490 --> 00:30:12,100
We talked about ideas which are
not currently implemented.

559
00:30:12,440 --> 00:30:16,000
Temporary files, page level compression,
maybe protocol level,

560
00:30:16,000 --> 00:30:18,300
which has pros and cons.

561
00:30:19,120 --> 00:30:21,140
Maybe let's not go there, right?

562
00:30:21,220 --> 00:30:21,880
What else?

563
00:30:22,800 --> 00:30:24,840
Michael: Anything else in memory
that could be?

564
00:30:26,000 --> 00:30:26,760
Nikolay: Oh, interesting.

565
00:30:28,540 --> 00:30:29,040
Maybe.

566
00:30:29,540 --> 00:30:33,680
I don't know, But I think if we,
for example, shifting maybe

567
00:30:33,680 --> 00:30:38,520
to some additional projects, I
think we will talk about Hydra

568
00:30:38,520 --> 00:30:39,360
and TimescaleDB.

569
00:30:39,960 --> 00:30:44,880
If, for example, we have a storage
engine and we have some tables

570
00:30:45,060 --> 00:30:48,840
defined as column store, this is
what Hydra provides.

571
00:30:49,540 --> 00:30:54,220
In this case, it makes total sense
to compress those tables.

572
00:30:54,620 --> 00:30:58,380
Or, for example, if we mentioned
last time, talking about disk

573
00:30:58,380 --> 00:31:02,220
space, or we have raw store, but
we have partitioning and some

574
00:31:02,220 --> 00:31:06,740
old partitions we consider as having
archived data, and this

575
00:31:06,740 --> 00:31:10,160
data we want probably compress,
and maybe we want to store it

576
00:31:10,160 --> 00:31:15,540
not on the regular disk, but on
object storage, and we discussed

577
00:31:15,620 --> 00:31:17,220
PGT, Tempo developed.

578
00:31:17,780 --> 00:31:21,780
So kind of bottomless Postgres,
but we want all data to be compressed

579
00:31:22,820 --> 00:31:24,640
and rarely used.

580
00:31:24,800 --> 00:31:25,300
Right?

581
00:31:26,140 --> 00:31:30,840
Michael: Yeah, I saw 3 different
approaches from different extensions.

582
00:31:32,520 --> 00:31:36,040
The Hydra 1 you mentioned, I think
came from Citus data originally.

583
00:31:36,600 --> 00:31:38,940
C-Store FDW I think was the original.

584
00:31:39,860 --> 00:31:43,400
Nikolay: It's AGPL from Citus inherited
by Hydra.

585
00:31:45,060 --> 00:31:48,840
Michael: Yeah, so I think that
is very much like from the start

586
00:31:48,840 --> 00:31:52,700
of a table's life, you choose whether
it should be row store

587
00:31:52,700 --> 00:31:54,900
oriented or column store oriented.

588
00:31:54,960 --> 00:31:57,680
That's 1 approach that makes a
lot of sense.

589
00:31:58,300 --> 00:32:00,600
The timescale approach seems to
be...

590
00:32:00,960 --> 00:32:03,480
Nikolay: And column store, of course,
we want to probably compress

591
00:32:03,480 --> 00:32:07,040
quite a lot because the ratio usually
is good.

592
00:32:07,080 --> 00:32:08,900
Michael: Well, yes.

593
00:32:09,480 --> 00:32:15,060
But I got the impression that the
main aim originally with the

594
00:32:15,060 --> 00:32:18,900
Citus ColumnStore approach and
therefore Hydra, was yes, you

595
00:32:18,900 --> 00:32:22,200
get some compression for the storage
benefits But the main aim

596
00:32:22,200 --> 00:32:25,720
seemed to be so that we can make
analytical queries faster So

597
00:32:25,720 --> 00:32:29,340
again, so it's that performance
angle That was the seem to be

598
00:32:29,340 --> 00:32:32,880
the driving force of why should
we want it to be column store

599
00:32:32,880 --> 00:32:36,020
in the first place for performance
of analytical queries that

600
00:32:36,020 --> 00:32:38,900
tend to be aggregates over a single
column.

601
00:32:39,000 --> 00:32:42,360
And if we've got if we've got column
oriented data that's compressed.

602
00:32:42,360 --> 00:32:45,180
Nikolay: It's massively a lot of
I.O.

603
00:32:45,180 --> 00:32:47,220
Of course, we want to use this I.O.

604
00:32:47,440 --> 00:32:52,400
And then already deal with it in
CPU, but also nature of data

605
00:32:52,400 --> 00:32:57,020
is like a lot of similar looking
values and compression.

606
00:32:57,080 --> 00:33:01,280
Michael: And I think once it's
organized in column, like by columns,

607
00:33:01,280 --> 00:33:04,640
you can also start to store much
easier, like the metadata of

608
00:33:04,640 --> 00:33:08,320
like min max values and do some
like shortcuts, I think on that

609
00:33:08,320 --> 00:33:09,060
data as well.

610
00:33:09,060 --> 00:33:12,740
So I think there's some like cool
tricks as well.

611
00:33:13,660 --> 00:33:18,220
Now but But there's then I think
there's 2 other approaches that

612
00:33:18,220 --> 00:33:18,660
I've seen.

613
00:33:18,660 --> 00:33:23,580
1 is the Timescale approach, which
seems to be on older partitions,

614
00:33:24,240 --> 00:33:25,820
like on older data.

615
00:33:27,340 --> 00:33:29,560
Everything's raw store at the beginning.

616
00:33:29,680 --> 00:33:32,980
And then after a certain point,
you set a policy that it gets

617
00:33:32,980 --> 00:33:36,960
converted to column store later
once it's unlikely to change?

618
00:33:36,960 --> 00:33:38,500
Nikolay: LBK I remember differently.

619
00:33:38,520 --> 00:33:42,000
I remember it's always row store
but compression works in 2 dimensions.

620
00:33:43,340 --> 00:33:48,080
Like for example, I'm not sure
if they converted to column store,

621
00:33:48,080 --> 00:33:52,920
maybe I'm wrong, but what I remember
is still row store but with

622
00:33:52,920 --> 00:33:57,480
understanding of vertical dimension,
so to speak.

623
00:33:58,700 --> 00:34:03,460
For example, storing deltas instead
of row values and applying

624
00:34:03,460 --> 00:34:03,900
compression.

625
00:34:03,900 --> 00:34:05,520
Michael: Well, I mean, it's still
in Postgres.

626
00:34:05,580 --> 00:34:08,900
It's still in Postgres, so it's
still row store under the hood,

627
00:34:08,900 --> 00:34:12,100
but it's column-oriented, like
it's organized by column.

628
00:34:12,100 --> 00:34:15,480
Like, if you look, the Postgres
docs are really good on this.

629
00:34:15,620 --> 00:34:16,700
I'll share a link.

630
00:34:16,700 --> 00:34:18,760
Sorry, not the Postgres, the Timescale
docs.

631
00:34:18,920 --> 00:34:20,820
Nikolay: I like understanding here,
apparently.

632
00:34:21,140 --> 00:34:22,940
Michael: But there's a third approach
as well.

633
00:34:23,040 --> 00:34:26,520
I think that's over the Timescale
approach I think is more optimized

634
00:34:26,580 --> 00:34:27,180
for space.

635
00:34:27,180 --> 00:34:31,360
I think the compression is more
on the, let's make older partitions

636
00:34:31,640 --> 00:34:35,240
take up less space because you
have so much data.

637
00:34:36,660 --> 00:34:37,280
Like 20 times less.

638
00:34:37,280 --> 00:34:40,320
Like 20, yeah, like some of the
numbers they mentioned, like

639
00:34:40,320 --> 00:34:41,400
95% compression.

640
00:34:41,400 --> 00:34:42,380
We have impressive

641
00:34:42,400 --> 00:34:44,360
Nikolay: numbers observed, yeah.

642
00:34:45,060 --> 00:34:46,020
It's really good.

643
00:34:46,020 --> 00:34:48,940
Michael: But yeah, but that's the
idea is with time series data,

644
00:34:48,940 --> 00:34:52,820
you could end up with hundreds
of terabytes like, and they've,

645
00:34:52,820 --> 00:34:53,760
they have themselves.

646
00:34:53,920 --> 00:34:57,360
So I think it's, it's the kind
of time where you could actually

647
00:34:57,360 --> 00:34:58,040
save a lot.

648
00:34:58,040 --> 00:35:00,960
Now, obviously they also benefit
from the performance on analytical

649
00:35:01,020 --> 00:35:04,300
queries and they have some cool
features there, but it feels

650
00:35:04,300 --> 00:35:08,400
like the way they've implemented
it was primarily for those storage

651
00:35:08,400 --> 00:35:08,900
benefits.

652
00:35:09,100 --> 00:35:13,200
And then the third one, I think,
has popped up relatively recently

653
00:35:13,920 --> 00:35:17,280
in the grand scheme of things,
is this idea of, as you mentioned,

654
00:35:17,280 --> 00:35:21,760
the tiering or like the moving
to a file like format like Parquet.

655
00:35:22,280 --> 00:35:27,660
So exporting data out of Postgres 
into a compressed format on

656
00:35:27,660 --> 00:35:31,260
object storage that's normally
column oriented so that you can

657
00:35:31,260 --> 00:35:34,280
get these fast analytical queries
and they take up a lot less

658
00:35:34,280 --> 00:35:35,040
Nikolay: storage space.

659
00:35:35,240 --> 00:35:38,120
There's some big limitations for
data types, I suspect.

660
00:35:39,160 --> 00:35:43,680
Only a limited set of data types
supported for those kinds of

661
00:35:43,920 --> 00:35:44,420
things.

662
00:35:45,940 --> 00:35:47,800
Michael: And how do updates and 
deletes work?

663
00:35:47,800 --> 00:35:49,550
I don't actually know all of the
details.

664
00:35:49,550 --> 00:35:50,440
Not
Nikolay: possible, I don't.

665
00:35:51,140 --> 00:35:54,640
Michael: Yeah, so there's definitely
limitations and differences

666
00:35:54,720 --> 00:35:55,620
between these.

667
00:35:55,840 --> 00:35:58,700
A lot of, we're not gonna be able
to describe them all here,

668
00:35:58,700 --> 00:35:59,700
obviously, time-wise.

669
00:36:00,480 --> 00:36:03,480
But I found it really fascinating
that there's these 3 different,

670
00:36:04,760 --> 00:36:06,000
quite different approaches.

671
00:36:06,700 --> 00:36:07,600
What do you think?

672
00:36:07,860 --> 00:36:08,360
Nikolay: Right.

673
00:36:09,440 --> 00:36:12,320
Yeah, well, it's super interesting
to observe progress here.

674
00:36:12,780 --> 00:36:17,640
I like, probably again, like, for
me, this episode raising more

675
00:36:17,640 --> 00:36:18,840
questions than answers.

676
00:36:19,600 --> 00:36:23,920
I think after this episode, I will
be planning more experiments

677
00:36:25,160 --> 00:36:27,340
to study benefits.

678
00:36:27,820 --> 00:36:31,420
And I guess we always need to take
into account several metrics,

679
00:36:31,760 --> 00:36:36,240
compression ratio, speed of compression,
speed of decompression,

680
00:36:36,780 --> 00:36:37,280
right?

681
00:36:37,900 --> 00:36:42,280
Maybe the CPU overhead itself,
if it matters for us, how much

682
00:36:42,280 --> 00:36:43,400
CPU we consumed.

683
00:36:44,320 --> 00:36:48,480
So yeah, TimescaleDB and Hydra
are interesting in this area.

684
00:36:48,480 --> 00:36:53,500
And I still, like, I'm still, I
remember my big impression reading,

685
00:36:54,120 --> 00:36:58,900
from reading the TimescaleDB details
in their blog post, how

686
00:36:58,900 --> 00:37:01,520
they implement compression.

687
00:37:02,680 --> 00:37:07,320
I think we forgot, because of me,
we forgot to talk about compression

688
00:37:07,660 --> 00:37:08,260
of dumps.

689
00:37:08,260 --> 00:37:09,060
Oh yeah.

690
00:37:09,520 --> 00:37:11,540
PgDump has compression options.

691
00:37:12,180 --> 00:37:19,200
And also compression of wall files 
as a whole, which I think

692
00:37:19,200 --> 00:37:24,240
Postgres doesn't provide it, but
both PgBackRest and WAL-G, I

693
00:37:24,240 --> 00:37:28,680
think the most popular backup tools,
they both do it, right?

694
00:37:28,680 --> 00:37:33,760
Because if you have 16 megabytes
file, if you compress and you

695
00:37:33,760 --> 00:37:38,900
have like 3 times less, for example,
as I remember, like 5 megabytes

696
00:37:38,940 --> 00:37:41,940
maybe or so, maybe even less in
some cases.

697
00:37:41,980 --> 00:37:43,480
Again, it depends on data.

698
00:37:44,060 --> 00:37:47,420
In this case, it's much better
in terms of storage costs and

699
00:37:47,420 --> 00:37:49,400
transfer speed and so on.

700
00:37:51,280 --> 00:37:55,320
But this will consume some CPU
cycles and usually we archive

701
00:37:55,320 --> 00:37:57,040
command is working on the primary.

702
00:37:57,040 --> 00:37:58,340
This is the key here.

703
00:37:58,340 --> 00:38:04,700
Usually, to avoid risks of having
longer delays, lag of archiving.

704
00:38:05,340 --> 00:38:09,100
Because we need walls, it's part
of our DR strategy, the disaster

705
00:38:09,140 --> 00:38:13,280
recovery strategy, so we want to
archive wall as fast as possible.

706
00:38:15,040 --> 00:38:16,860
And this means we do it on the
primary.

707
00:38:17,780 --> 00:38:21,020
And if we archive whole wall, we
need some CPU.

708
00:38:23,260 --> 00:38:27,480
And if a lot of wall generated,
probably we will need multiple

709
00:38:27,500 --> 00:38:28,000
workers.

710
00:38:28,080 --> 00:38:32,440
And then you see already more than
100% of single CPUs, meaning

711
00:38:32,440 --> 00:38:34,460
multiple cores are busy.

712
00:38:35,580 --> 00:38:47,860
200 percent out of our 360 cores,
vCPUs, We allow 2 cores to

713
00:38:47,860 --> 00:38:50,820
be used to compress and archive
walls.

714
00:38:51,100 --> 00:38:54,560
It's just like some random numbers
in my head, but definitely

715
00:38:54,620 --> 00:39:00,180
if we talk about compression of
whole walls by WAL-G or PgBackRest,

716
00:39:00,300 --> 00:39:04,200
we need to keep in mind this, the
most valuable CPU resource,

717
00:39:04,200 --> 00:39:05,220
which is on primary.

718
00:39:05,600 --> 00:39:07,440
We need to think about capacity
here.

719
00:39:07,440 --> 00:39:11,580
And decompressing usually is not
a big deal, especially if we

720
00:39:11,580 --> 00:39:17,140
can fetch walls using multiple
workers from object storage, like

721
00:39:17,640 --> 00:39:19,900
S3 and decompress them.

722
00:39:19,900 --> 00:39:22,500
Usually it's not a big deal, but
still worth remembering.

723
00:39:22,540 --> 00:39:24,520
So usually we have compression
there.

724
00:39:24,520 --> 00:39:29,140
Unfortunately, Postgres right now
doesn't do it officially, so

725
00:39:29,140 --> 00:39:29,980
this is only...

726
00:39:30,040 --> 00:39:33,460
Here I talk about, again, third-party
tools which are very common,

727
00:39:33,900 --> 00:39:35,880
popular, WAL-G, PgBackRest.

728
00:39:36,340 --> 00:39:38,540
Others as well, I think they also
compress.

729
00:39:39,740 --> 00:39:41,100
And PgDump.

730
00:39:41,180 --> 00:39:43,120
PgDump is official, very official.

731
00:39:44,100 --> 00:39:45,080
PgDump, PgRestore.

732
00:39:46,020 --> 00:39:50,020
They support compression for custom
or directory or both formats,

733
00:39:50,020 --> 00:39:50,520
right?

734
00:39:50,980 --> 00:39:51,880
I always mix

735
00:39:51,880 --> 00:39:51,980
Michael: them up.

736
00:39:51,980 --> 00:39:55,520
Yeah, but, and this is 1 of the
oldest, like this is, this has

737
00:39:55,520 --> 00:39:56,820
been in forever, right?

738
00:39:57,040 --> 00:40:01,220
Nikolay: Right, but compression
was forever, but what about options

739
00:40:01,380 --> 00:40:02,460
in terms of algorithms?

740
00:40:02,460 --> 00:40:03,140
Oh, yeah.

741
00:40:03,580 --> 00:40:07,580
ZStandard and LZ4, so I think it's
relatively new.

742
00:40:08,040 --> 00:40:11,460
And it's worth again experimenting,
benchmarking, and studying.

743
00:40:11,540 --> 00:40:13,940
I think there are articles from
cybertech, right?

744
00:40:14,340 --> 00:40:14,940
Let's check.

745
00:40:14,940 --> 00:40:18,500
Yeah, There are articles comparing
different types of compression.

746
00:40:19,900 --> 00:40:20,720
and decompression.

747
00:40:22,360 --> 00:40:29,820
So ratio, I think you can even
control a ratio there if you want

748
00:40:29,820 --> 00:40:35,780
to spend more time and CPU capacity
to achieve a little bit more

749
00:40:36,140 --> 00:40:38,440
better compression ratio, it's
possible.

750
00:40:38,680 --> 00:40:41,940
Yeah, I remember some surprising
results from those articles.

751
00:40:41,980 --> 00:40:44,380
I don't remember details, so let's
attach them.

752
00:40:44,380 --> 00:40:46,560
But definitely we want to study
this as well.

753
00:40:46,560 --> 00:40:51,680
Again, mostly questions today,
not answers, not exact recipes.

754
00:40:52,500 --> 00:40:57,680
But it's good that the only thing
I don't like in PgDump, PgRestore

755
00:40:57,800 --> 00:41:01,820
is that you cannot use parallelization
and compression on the

756
00:41:01,820 --> 00:41:02,320
fly.

757
00:41:02,780 --> 00:41:07,620
What if I don't want, this is 1
time operation, I want just logically

758
00:41:07,720 --> 00:41:12,040
migrate from 1 database to another
database, and I don't want

759
00:41:12,660 --> 00:41:14,920
to send a lot of bytes over the network.

760
00:41:14,920 --> 00:41:19,020
I want to compress and decompress
and use multiple workers because,

761
00:41:19,020 --> 00:41:21,580
for example, these servers are
not used right now.

762
00:41:22,300 --> 00:41:26,360
So I just need to migrate out of
RDS to self-managed Postgres

763
00:41:26,360 --> 00:41:30,860
because I feel enough level of
confidence and I've already worked

764
00:41:30,860 --> 00:41:34,900
with great guys who know how to
do it and RDS is not needed for

765
00:41:34,900 --> 00:41:36,360
me anymore and so on.

766
00:41:36,380 --> 00:41:40,540
So, in this case, you need to,
unfortunately, you need to first

767
00:41:41,200 --> 00:41:45,080
save it to disk and then restore
from there.

768
00:41:45,780 --> 00:41:50,700
Yeah, so this is, I think, a big
missing feature of Postgres.

769
00:41:51,140 --> 00:41:57,280
And Dimitri Fontaine created pgcopydb,
I think, but I quickly

770
00:41:57,280 --> 00:42:01,560
checked before our recording, I
didn't see anything related to

771
00:42:01,560 --> 00:42:02,060
compression.

772
00:42:02,420 --> 00:42:04,080
It talks only about parallelization.

773
00:42:04,540 --> 00:42:09,400
Let's have like 16 workers or,
I don't know, 4 workers to speed

774
00:42:09,400 --> 00:42:15,540
up the process, which is in my
head quite relatively simple idea.

775
00:42:15,540 --> 00:42:20,640
We just create repeatable re-transaction,
keep it, export snapshot,

776
00:42:21,200 --> 00:42:24,280
create other transactions, repeatable
read transactions, and

777
00:42:24,280 --> 00:42:27,040
use the same snapshot so all of
them are synchronized and you

778
00:42:27,040 --> 00:42:31,100
can even read 1 huge unpartitioned
table in chunks.

779
00:42:31,800 --> 00:42:37,980
It will be a fully consistent read
using multiple workers.

780
00:42:39,520 --> 00:42:42,100
But no compression options.

781
00:42:42,980 --> 00:42:44,320
I couldn't find it.

782
00:42:44,640 --> 00:42:49,060
Maybe it's not an easy idea to
implement.

783
00:42:49,060 --> 00:42:50,240
Maybe it's a good idea.

784
00:42:50,580 --> 00:42:54,720
So what I would like to have is
just pgDump and pgRestore supporting

785
00:42:54,720 --> 00:42:58,260
both parallelization and compression
without the need to store

786
00:42:58,260 --> 00:43:01,440
intermediate file Or directory.

787
00:43:01,780 --> 00:43:02,220
Would be great.

788
00:43:02,220 --> 00:43:02,920
Makes sense.

789
00:43:03,420 --> 00:43:04,980
Yeah, I needed it yesterday.

790
00:43:05,740 --> 00:43:10,200
Copying huge, not huge, not huge
at all, like 10 or something

791
00:43:10,200 --> 00:43:14,980
gigabytes tiny table with big vectors.

792
00:43:15,660 --> 00:43:17,380
I mean, a lot of vectors.

793
00:43:17,780 --> 00:43:23,420
By the way, let's make a full cycle
and in the end mention that

794
00:43:23,900 --> 00:43:29,440
although we joked about lossy compression
using LLM, under Liontag,

795
00:43:30,060 --> 00:43:34,020
when you have a lot of dimensions,
huge vector values, it means

796
00:43:34,020 --> 00:43:35,340
TOAST is involved definitely.

797
00:43:35,340 --> 00:43:37,360
It's interesting how it's compressed
there.

798
00:43:37,360 --> 00:43:39,060
In this case, just simple idea.

799
00:43:39,520 --> 00:43:45,420
Sometimes reducing the number of
dimensions means kind of lossy,

800
00:43:45,420 --> 00:43:47,860
but it's also kind of compression,
right?

801
00:43:48,060 --> 00:43:50,080
And OpenAI speaks about it.

802
00:43:50,080 --> 00:43:54,620
It's also an interesting area,
how to like, not losing a lot

803
00:43:54,620 --> 00:43:55,460
of quality.

804
00:43:56,460 --> 00:43:59,760
Michael: On vectors, though, I
think Jonathan Katz made a really

805
00:43:59,760 --> 00:44:05,580
good point on the episode we did
for Postgres FM, that because

806
00:44:05,580 --> 00:44:09,720
it's mostly, because the vector
data, like at least the embeddings

807
00:44:09,720 --> 00:44:14,620
that come back from models like
OpenAI, it's mostly random integers,

808
00:44:15,220 --> 00:44:16,720
it doesn't compress well.

809
00:44:16,720 --> 00:44:19,760
So there's actually, I think, I'm
not sure he's done it yet or

810
00:44:19,760 --> 00:44:20,900
he's planning to do it.

811
00:44:20,900 --> 00:44:25,940
There's work to look into whether
you'd actually be, it'd be

812
00:44:25,940 --> 00:44:31,160
beneficial to turn compression
off for TOAST of vector data because

813
00:44:31,160 --> 00:44:34,160
you're not getting much compression,
there's no point paying

814
00:44:34,160 --> 00:44:37,700
the overhead of the compressing
and decompressing each point.

815
00:44:38,100 --> 00:44:41,720
So I thought that was super interesting
as a use case for turning

816
00:44:41,720 --> 00:44:42,380
it off.

817
00:44:44,100 --> 00:44:45,040
Nikolay: Yeah, interesting.

818
00:44:45,060 --> 00:44:46,680
I would like to explore it myself.

819
00:44:46,680 --> 00:44:48,160
It's a very interesting area.

820
00:44:48,540 --> 00:44:52,500
So I think we discussed maybe 5
different directions for benchmarking.

821
00:44:52,720 --> 00:44:56,140
I would like to conduct these benchmarks.

822
00:44:57,540 --> 00:44:57,740
Michael: Cool.

823
00:44:57,740 --> 00:45:00,460
Let us know how you get on with
all 5 by next week, right?

824
00:45:02,000 --> 00:45:02,340
Nikolay: Yeah.

825
00:45:02,340 --> 00:45:03,640
Well, yeah.

826
00:45:03,820 --> 00:45:05,200
So thank you.

827
00:45:05,220 --> 00:45:06,140
It was interesting.

828
00:45:08,100 --> 00:45:08,860
Many questions.

829
00:45:10,000 --> 00:45:10,520
Michael: There you go.

830
00:45:10,520 --> 00:45:11,800
Not boring after all.

831
00:45:11,800 --> 00:45:14,484
Thanks so much, Nikolay.
Catch you next week.