1
00:00:00,060 --> 00:00:00,860
Nikolay: Hello, hello.

2
00:00:01,080 --> 00:00:02,460
This is Postgres.FM.

3
00:00:02,900 --> 00:00:05,220
And my name is Nikolay from Postgres.AI.

4
00:00:05,220 --> 00:00:07,940
And as usual, my co-host is Michael, pgMustard.

5
00:00:08,320 --> 00:00:09,100
Hi, Michael.

6
00:00:09,160 --> 00:00:10,260
How are you doing?

7
00:00:11,120 --> 00:00:11,760
Michael: Hello, Nikolay.

8
00:00:11,760 --> 00:00:12,160
I'm good.

9
00:00:12,160 --> 00:00:12,540
Thank you.

10
00:00:12,540 --> 00:00:13,360
How are you?

11
00:00:13,620 --> 00:00:14,680
Nikolay: I'm fantastic.

12
00:00:15,100 --> 00:00:18,580
And looking forward to our discussion today, I think it's really

13
00:00:18,580 --> 00:00:22,700
important because, you know, some people think everything around,

14
00:00:22,940 --> 00:00:25,860
everything we store in database is probably can be considered

15
00:00:26,260 --> 00:00:27,680
as time series.

16
00:00:28,420 --> 00:00:32,440
At least any small piece of data, any datum, right?

17
00:00:32,760 --> 00:00:36,500
Was born with some creation timestamp, for example, right, so

18
00:00:36,500 --> 00:00:37,840
we could consider it.

19
00:00:37,840 --> 00:00:40,300
Let's discuss what time series is.

20
00:00:42,180 --> 00:00:45,720
Michael: Yeah, I have heard, I've definitely heard people claiming

21
00:00:45,720 --> 00:00:49,220
that all data is time series data but they tend to work for time

22
00:00:49,220 --> 00:00:54,060
series databases so I think there's some correlation or some

23
00:00:55,240 --> 00:00:55,600
incentive.

24
00:00:55,600 --> 00:00:58,060
Nikolay: I'm not sure what's the reason, what's the consequence

25
00:00:58,100 --> 00:00:59,660
here, it's not obvious.

26
00:01:00,720 --> 00:01:05,400
Michael: Exactly but yeah for the purposes of this though I like

27
00:01:05,400 --> 00:01:12,180
to think of time series data as much more rigidly things that

28
00:01:12,180 --> 00:01:18,720
are very very useful to log at points in time and then compare

29
00:01:18,740 --> 00:01:23,540
and look back at, you know, things we want to know the differences

30
00:01:23,760 --> 00:01:27,840
of or like the averages of or the mins, the maxes, like that

31
00:01:27,840 --> 00:01:31,200
kind of stuff feels much more useful to me to think of in these

32
00:01:31,200 --> 00:01:35,040
terms, because it brings certain challenges, especially at scale.

33
00:01:35,460 --> 00:01:40,800
So I think if we consider everything time series, this episode

34
00:01:40,800 --> 00:01:41,820
will be very difficult.

35
00:01:43,200 --> 00:01:46,360
Nikolay: Yeah, I agree with you that We can look at this like

36
00:01:46,360 --> 00:01:46,860
this.

37
00:01:47,480 --> 00:01:54,280
If anything can have creation or modification timestamps, but

38
00:01:54,280 --> 00:01:59,060
sometimes timestamps matter a lot, like for analysis and for

39
00:01:59,060 --> 00:01:59,560
operations.

40
00:02:00,180 --> 00:02:01,800
Sometimes it matters less.

41
00:02:01,800 --> 00:02:07,560
For example, if we want to work with some geographical data,

42
00:02:08,100 --> 00:02:13,220
maps and so on, we put cities on the map, for example.

43
00:02:13,860 --> 00:02:17,060
Yeah, they have a creation year, or year of creation.

44
00:02:17,380 --> 00:02:19,540
Sometimes it's not clear actually.

45
00:02:19,540 --> 00:02:19,960
Right.

46
00:02:19,960 --> 00:02:24,740
But for day to day operations, it's not that important.

47
00:02:25,160 --> 00:02:27,340
Maybe population is more important and so on.

48
00:02:27,340 --> 00:02:28,220
It's depends.

49
00:02:28,680 --> 00:02:30,560
Maybe position is more important.

50
00:02:30,720 --> 00:02:35,680
So it's hard to say this data is time series, although it has

51
00:02:35,680 --> 00:02:36,800
creation timestamp.

52
00:02:37,460 --> 00:02:40,220
Each part of this has it, right?

53
00:02:40,280 --> 00:02:47,940
So an opposite example is when we log something from something,

54
00:02:47,980 --> 00:02:53,960
from Postgres or from systems or from drivers, I mean, cars,

55
00:02:53,960 --> 00:03:00,180
which are constantly moving on the map or temperature or anything,

56
00:03:00,180 --> 00:03:00,680
right?

57
00:03:01,020 --> 00:03:02,080
This kind of data.

58
00:03:03,780 --> 00:03:08,140
There we log many, many, many times for the same source.

59
00:03:09,240 --> 00:03:14,760
We log something many, many times and we obtain data from the

60
00:03:14,760 --> 00:03:15,260
same source.

61
00:03:15,260 --> 00:03:18,560
For example, there is some sensor or something or some source

62
00:03:18,560 --> 00:03:24,900
of, of metrics and we log these metrics every minute, for example,

63
00:03:25,440 --> 00:03:30,040
in this case, it's definitely, it can be considered time series.

64
00:03:30,060 --> 00:03:36,720
And in many cases, not of, not always, but in many cases, fresh

65
00:03:36,720 --> 00:03:42,380
data, we, we, we wish we want to be more detailed, like ideally

66
00:03:42,440 --> 00:03:46,120
every second, for example, or at least every minute And all data

67
00:03:46,120 --> 00:03:51,820
we still want to have, but it can be more rough, right?

68
00:03:51,820 --> 00:03:56,400
Less precise and can be already aggregated and so on.

69
00:03:56,400 --> 00:04:00,080
And sometimes we can even say, this data is not relevant anymore.

70
00:04:00,480 --> 00:04:05,760
If it's some logs, we don't care about logs from operations from

71
00:04:05,760 --> 00:04:07,160
5 years ago, for example.

72
00:04:07,580 --> 00:04:08,600
It also might happen.

73
00:04:08,600 --> 00:04:09,060
It depends.

74
00:04:09,060 --> 00:04:12,840
It depends on the concrete project product you are dealing with

75
00:04:13,260 --> 00:04:14,840
or building, right?

76
00:04:15,120 --> 00:04:19,740
So but what matters is like we obtain data periodically from

77
00:04:19,740 --> 00:04:24,060
some source, and this is definitely time series to me.

78
00:04:25,240 --> 00:04:26,020
Unlike cities.

79
00:04:27,660 --> 00:04:29,880
Michael: I think for the purpose of this conversation, it's probably

80
00:04:29,880 --> 00:04:33,620
worth distinguishing at what point we start caring, because if

81
00:04:33,620 --> 00:04:37,600
we're logging every second, it only takes a few hundred, maybe

82
00:04:37,600 --> 00:04:42,340
a few thousand devices before that significant scale but if we're

83
00:04:42,340 --> 00:04:45,860
if even if we're logging every hour if we've got let's say a

84
00:04:45,860 --> 00:04:49,500
million devices logging every hour, that's also a lot of scale.

85
00:04:50,340 --> 00:04:53,200
But a lot of use cases won't have that.

86
00:04:53,560 --> 00:04:56,860
So there are smaller cases where whilst they are time series,

87
00:04:56,980 --> 00:05:00,660
like let's say you've got a little home setup where you are monitoring

88
00:05:00,660 --> 00:05:04,120
the weather outside, maybe in the garden and maybe at the back

89
00:05:04,120 --> 00:05:07,720
of the house, like maybe you've got like 2 or 3 sensors or maybe

90
00:05:07,720 --> 00:05:13,520
like 4 or 5 devices logging every, even every second, say, you

91
00:05:13,520 --> 00:05:16,220
might as well, you don't, probably don't need this episode, right?

92
00:05:16,220 --> 00:05:19,300
You don't need to do any tuning, you don't need to do that much.

93
00:05:20,800 --> 00:05:21,680
Okay, interesting.

94
00:05:21,740 --> 00:05:22,120
You think

95
00:05:22,120 --> 00:05:22,620
Nikolay: maybe.

96
00:05:23,080 --> 00:05:23,800
It depends.

97
00:05:25,240 --> 00:05:29,020
There is a big misconception for new Postgres users.

98
00:05:29,180 --> 00:05:34,020
They think if table size is small, in terms of row count, it

99
00:05:34,020 --> 00:05:35,600
should be very fast, no problem.

100
00:05:35,600 --> 00:05:39,180
But we know how MVCC is organized in Postgres, right?

101
00:05:39,240 --> 00:05:45,040
Every update means new physical
row, which we call tuple or tuple.

102
00:05:45,040 --> 00:05:49,180
We're still not sure after 2 years,
almost 3 years of discussing.

103
00:05:50,580 --> 00:05:55,120
Anyway, physical row version is
created on every update, unless

104
00:05:55,120 --> 00:05:57,680
this update is special case, like
HOT update.

105
00:05:58,780 --> 00:05:59,340
It depends.

106
00:05:59,340 --> 00:06:02,280
So it's basically delete and insert
all the time.

107
00:06:02,440 --> 00:06:04,060
By default, it's so, right?

108
00:06:04,080 --> 00:06:08,960
It means if you update the same
row many times, with SELECT,

109
00:06:09,520 --> 00:06:13,360
you still see 1 row, but physically
it might be many, many physical

110
00:06:13,360 --> 00:06:17,000
versions, and you have a lot of
dead tuples, and then autovacuum

111
00:06:17,180 --> 00:06:22,360
comes, which is, autovacuum is
converter of dead tuples to bloat

112
00:06:22,360 --> 00:06:26,260
and you have a lot of bloat and
then you have problems with performance

113
00:06:26,360 --> 00:06:28,200
although you think table is small.

114
00:06:29,440 --> 00:06:31,960
Michael: I completely agree but
I think for this use case you

115
00:06:31,960 --> 00:06:35,380
don't time series use case normally
is append only right normally

116
00:06:35,380 --> 00:06:39,560
you're just inserting new roads
so I don't think it applies here

117
00:06:40,640 --> 00:06:44,540
Nikolay: well I see updates in
time series data all the time.

118
00:06:44,540 --> 00:06:45,040
Michael: Some.

119
00:06:46,420 --> 00:06:49,780
Okay tell me about that because
I don't see I hear mostly about

120
00:06:49,780 --> 00:06:50,500
not like

121
00:06:50,500 --> 00:06:50,900
Nikolay: all right

122
00:06:50,900 --> 00:06:52,760
Michael: it's append mostly let's
say

123
00:06:52,780 --> 00:06:55,140
Nikolay: imagine for example CI
pipelines.

124
00:06:56,500 --> 00:07:00,160
I won't tell the name of the company,
but some listeners from

125
00:07:00,160 --> 00:07:02,140
that company should recognize it.

126
00:07:02,240 --> 00:07:06,820
And you have pipeline and some
steps like jobs or something inside

127
00:07:06,820 --> 00:07:07,440
the pipeline.

128
00:07:07,540 --> 00:07:10,320
And when you create this pipeline
and then you update it a few

129
00:07:10,320 --> 00:07:14,860
times, if status has changed, it
reminds queue like thing, but

130
00:07:14,860 --> 00:07:17,840
it's not just queue like, you need
to log everything and present

131
00:07:17,840 --> 00:07:18,340
it.

132
00:07:18,440 --> 00:07:18,760
All right.

133
00:07:18,760 --> 00:07:22,280
So you might create some record
and then updated status.

134
00:07:23,720 --> 00:07:24,220
Yeah.

135
00:07:24,840 --> 00:07:25,340
Right.

136
00:07:26,080 --> 00:07:27,360
Michael: But, but let's say my,

137
00:07:27,360 --> 00:07:28,160
Nikolay: my example,

138
00:07:30,300 --> 00:07:35,020
Michael: Well, maybe by some definitions,
but I'm thinking more

139
00:07:35,020 --> 00:07:39,280
like you mentioned logging or you
mentioned monitoring, maybe

140
00:07:39,280 --> 00:07:42,740
real-time analytics, like these
kinds of things where time moves

141
00:07:42,740 --> 00:07:45,340
on and we just want readings over
time, we want to graph.

142
00:07:45,340 --> 00:07:48,000
Nikolay: Yeah, okay, let's agree
on this, we need to distinguish

143
00:07:48,000 --> 00:07:52,360
append-only time series and time
series with some modifications

144
00:07:53,040 --> 00:07:53,400
yeah

145
00:07:53,400 --> 00:07:57,040
Michael: maybe but yeah really
good point though if you've got

146
00:07:57,040 --> 00:08:01,020
modifications you might have to
worry about these tips and tricks

147
00:08:01,380 --> 00:08:05,580
quite a bit earlier in terms of
your raw volume.

148
00:08:06,140 --> 00:08:06,840
Good point.

149
00:08:07,200 --> 00:08:08,680
Nikolay: Right, right.

150
00:08:08,680 --> 00:08:13,860
And yeah, so and you like the problem
is not huge scale, but

151
00:08:13,860 --> 00:08:17,300
I think it's a great idea to think
in advance how many records

152
00:08:17,320 --> 00:08:23,000
you expect per some time, per day,
per hour, per minute, per

153
00:08:23,000 --> 00:08:27,100
day, per hour, per month, per year,
and then what is your plan

154
00:08:27,100 --> 00:08:31,780
to do with it and understanding
that it's approaching billion

155
00:08:31,780 --> 00:08:33,880
or hundred gigabytes or something.

156
00:08:34,740 --> 00:08:35,260
You need to...

157
00:08:35,260 --> 00:08:37,200
Michael: By billion, you mean like
a billion rows?

158
00:08:37,200 --> 00:08:37,700
Nikolay: Billions.

159
00:08:38,420 --> 00:08:41,900
Billions of rows or hundreds of
gigabytes, terabyte.

160
00:08:42,520 --> 00:08:45,460
For me, billion is terabyte for
our row.

161
00:08:46,020 --> 00:08:47,080
Roughly, very roughly.

162
00:08:47,080 --> 00:08:47,980
It depends a lot.

163
00:08:47,980 --> 00:08:51,440
It can be 5X to both sides, right?

164
00:08:51,960 --> 00:08:56,820
But for our row storage Postgres,
it's roughly so.

165
00:08:57,800 --> 00:09:01,960
For real, for many time series
database systems, which are only

166
00:09:01,960 --> 00:09:06,080
for time series, like VictoriaMetrics,
or you can consider ClickHouse

167
00:09:06,140 --> 00:09:06,840
as well.

168
00:09:07,820 --> 00:09:14,320
1 terabyte, 1 billion rows is
maybe 100 gigabytes, maybe 10

169
00:09:14,320 --> 00:09:16,200
gigabytes even depending on compression.

170
00:09:16,460 --> 00:09:18,200
Their compression is much better.

171
00:09:18,460 --> 00:09:22,280
So, but this is like overall capacity
planning, right?

172
00:09:22,280 --> 00:09:26,640
But we can have performance problems
locally thinking, oh, this

173
00:09:26,640 --> 00:09:27,500
is a small table.

174
00:09:27,500 --> 00:09:29,480
It should be like 1 millisecond
lookup.

175
00:09:29,480 --> 00:09:34,700
But then you, if you, if you ignore
MVCC in Postgres, you might

176
00:09:34,700 --> 00:09:39,720
end up having a lot of dead tuples
or bloat, and latency will

177
00:09:39,720 --> 00:09:42,220
drop a lot as well, and that's
not good.

178
00:09:44,340 --> 00:09:44,840
Michael: Cool.

179
00:09:45,280 --> 00:09:51,380
Well, if we go back, if we go to
the append only scenario, what

180
00:09:51,380 --> 00:09:54,400
are the main problems you tend
to see in that setup?

181
00:09:55,760 --> 00:09:59,380
Nikolay: Well, for me, the main
problem is that let's pull the

182
00:09:59,380 --> 00:10:02,160
elephant from our room, in the
room.

183
00:10:02,160 --> 00:10:03,580
Let's recognize it.

184
00:10:03,860 --> 00:10:05,780
There is a great product called
TimescaleDB.

185
00:10:06,220 --> 00:10:07,000
Michael: Oh yes, yeah.

186
00:10:07,000 --> 00:10:08,040
Nikolay: Absolutely great.

187
00:10:09,520 --> 00:10:15,120
They have, in my opinion, 2 major
things they have is compression,

188
00:10:16,300 --> 00:10:21,540
unbeatable compression for Postgres,
for time series, And continuous

189
00:10:21,540 --> 00:10:25,140
aggregates, which is also good,
like kind of, it's convenient.

190
00:10:25,400 --> 00:10:28,480
You can build continuous aggregates,
but it's super hard to build

191
00:10:28,520 --> 00:10:29,440
good compression.

192
00:10:30,400 --> 00:10:34,940
And the problem is with TimescaleDB,
it's too great, but it's

193
00:10:34,940 --> 00:10:36,940
not part of core Postgres.

194
00:10:37,360 --> 00:10:40,820
And it's an extension which is
available only either on Timescale

195
00:10:40,840 --> 00:10:45,700
Cloud or self-hosted if you are
okay to use non-open source.

196
00:10:45,720 --> 00:10:46,780
It's source available.

197
00:10:47,120 --> 00:10:48,180
It's very open.

198
00:10:48,480 --> 00:10:51,880
They, I know they allowed it to
use it to many, many companies,

199
00:10:52,720 --> 00:10:56,240
but it's still like if you go into
build some platform, you probably

200
00:10:56,240 --> 00:10:58,260
will be not allowed to use it.

201
00:10:58,260 --> 00:10:59,540
And so on with compression.

202
00:10:59,540 --> 00:11:03,920
I mean, not, I'm not talking about
Apache TimescaleDB, but I'm

203
00:11:03,920 --> 00:11:08,400
talking about Timescale licensed
up TimescaleDB, which has compression

204
00:11:08,420 --> 00:11:09,580
and continuous aggregates.

205
00:11:09,720 --> 00:11:11,020
Compression is great.

206
00:11:11,140 --> 00:11:15,820
Again, I recommended to read a
couple of blog posts before, I

207
00:11:15,820 --> 00:11:18,020
recommend it again, they are amazing.

208
00:11:19,120 --> 00:11:21,200
But it's not core of Postgres,

209
00:11:21,900 --> 00:11:22,400
Michael: so...

210
00:11:22,420 --> 00:11:26,460
But I actually think, I think the
list of TimescaleDB features

211
00:11:26,980 --> 00:11:31,020
are a really good list of things
that are tricky with time series

212
00:11:31,020 --> 00:11:34,600
data yeah like in general so they've
also done great blog posts

213
00:11:34,600 --> 00:11:39,800
on things like How to copy data
into post or how to ingest data

214
00:11:39,800 --> 00:11:43,700
as fast as possible So that's 1
of the 1 of the trickier things

215
00:11:43,700 --> 00:11:51,260
about extreme volume and would
do maybe use A few yeah, but not

216
00:11:51,260 --> 00:11:54,360
from they this was actually a tips
and tricks blog post, this

217
00:11:54,360 --> 00:11:59,880
was actually like 13 tips for ingesting
and all of these types

218
00:11:59,880 --> 00:12:03,440
of blog posts come out with oh
you should use Postgres copy feature,

219
00:12:03,440 --> 00:12:08,000
but like there are some exceptions,
but basically if you can

220
00:12:08,000 --> 00:12:12,440
get it in in a way that it's making
the most efficient use of

221
00:12:12,440 --> 00:12:17,280
writing data in batches, then you
can get much much better throughput.

222
00:12:17,440 --> 00:12:22,360
But then there are other tips and
tricks like not having many

223
00:12:22,360 --> 00:12:24,860
indexes on the table that you're
ingesting into.

224
00:12:25,160 --> 00:12:28,760
Maybe none, but probably at least
1.

225
00:12:29,340 --> 00:12:33,320
Nikolay: Also foreign keys can
slow you down a lot and also check

226
00:12:33,320 --> 00:12:34,200
pointer behavior.

227
00:12:34,300 --> 00:12:40,580
So you need to, if you expect a
massive ride coming like 1 time,

228
00:12:40,760 --> 00:12:45,040
it's worth thinking about Checkpoint,
checkpointer tuning like

229
00:12:45,040 --> 00:12:48,980
for this window only accepting
risks that if we are crushed,

230
00:12:49,240 --> 00:12:50,660
recovery will take longer.

231
00:12:50,660 --> 00:12:55,640
So you raise checkpoint_timeout
and max_wal_size, and it should

232
00:12:55,640 --> 00:12:59,180
speed up because checkpointer will
put less pressure on disk

233
00:12:59,380 --> 00:13:00,160
to write more.

234
00:13:00,160 --> 00:13:01,020
Michael: Yeah, nice.

235
00:13:01,220 --> 00:13:04,340
Well, on that note, and I know
this is something I think we're

236
00:13:04,340 --> 00:13:06,780
probably going to talk about in
the future, but because you can

237
00:13:06,780 --> 00:13:10,960
set it on a, like for a single
one-off thing, you could even

238
00:13:10,960 --> 00:13:14,200
consider in that case turning off
synchronous_commit for those

239
00:13:14,200 --> 00:13:15,780
ingests, like if you want to...

240
00:13:15,780 --> 00:13:18,220
Nikolay: I don't think it will
help with throughput at all.

241
00:13:18,660 --> 00:13:19,160
Really?

242
00:13:19,600 --> 00:13:20,100
Yes.

243
00:13:20,660 --> 00:13:24,360
Well, how many commits per second
are we going to have?

244
00:13:24,520 --> 00:13:26,080
Michael: Yeah, no, good point.

245
00:13:26,820 --> 00:13:30,320
Nikolay: But also, thinking about
checkpointer tuning, I think

246
00:13:30,820 --> 00:13:35,940
it depends, because in a PAN-only
specific case, probably we

247
00:13:35,940 --> 00:13:40,760
won't have a lot of winning, a
lot of benefits from checkpoint

248
00:13:40,760 --> 00:13:47,480
tuning because checkpoint tuning
for massive writes shines when

249
00:13:47,700 --> 00:13:52,060
we have random inserts to many
pages and we revisit page.

250
00:13:52,120 --> 00:13:53,000
If in append only,

251
00:13:53,000 --> 00:13:53,420
Michael: the thing

252
00:13:53,420 --> 00:13:57,840
Nikolay: is that we're constantly
writing to the end of data,

253
00:13:58,180 --> 00:13:59,920
of heap and also B-tree index.

254
00:13:59,920 --> 00:14:02,440
It's only to the right, to the
right, and it's great.

255
00:14:02,440 --> 00:14:04,020
Michael: Well, that's a good point
then.

256
00:14:04,020 --> 00:14:06,980
Yeah, we need a primary key that
is right.

257
00:14:06,980 --> 00:14:10,340
Nikolay: Yes, that's why UUID v4
is a terrible idea, right?

258
00:14:10,440 --> 00:14:11,620
Yeah, yeah, yeah.

259
00:14:12,040 --> 00:14:12,540
Right.

260
00:14:12,660 --> 00:14:18,500
So you want UUID v7 or just
bigint or something else which

261
00:14:18,500 --> 00:14:21,040
has similar like growing...

262
00:14:21,100 --> 00:14:22,740
Like ULID type thing.

263
00:14:23,400 --> 00:14:28,320
So B-tree inserts will happen only
to the right, always to the right,

264
00:14:28,320 --> 00:14:29,480
right and so on.

265
00:14:29,480 --> 00:14:32,200
And it will be packed and so on.

266
00:14:32,680 --> 00:14:37,460
But this also leads us to the question
about partitioning, right?

267
00:14:37,740 --> 00:14:41,260
Because partitioning is the key.

268
00:14:41,380 --> 00:14:45,740
So let me finish my, my rattling
session a little bit about Timescale

269
00:14:45,840 --> 00:14:52,980
and how sad, how sad it is that
they are not part of fully open

270
00:14:53,140 --> 00:14:57,320
of Postgres because I know this
is like, this is their business.

271
00:14:57,640 --> 00:15:01,040
I fully respect that at the same
time.

272
00:15:01,080 --> 00:15:06,340
I just have companies who come
to us asking for help.

273
00:15:06,340 --> 00:15:10,220
And many of these companies, we
do health check, we improve health,

274
00:15:10,760 --> 00:15:11,580
we level up.

275
00:15:11,580 --> 00:15:18,120
And then like our typical client
is a startup which grows and

276
00:15:18,120 --> 00:15:22,260
starts having some issues, very
often they are on some managed

277
00:15:22,260 --> 00:15:23,400
Postgres service.

278
00:15:23,940 --> 00:15:27,540
So backups are solved, replication
is solved, and so on.

279
00:15:27,540 --> 00:15:28,980
But then they have some problems.

280
00:15:29,540 --> 00:15:32,820
And It's typical that many of them
consider partitioning.

281
00:15:34,660 --> 00:15:39,500
And then we realize, okay, partitioning
is needed because tables

282
00:15:39,520 --> 00:15:45,040
are already large and we have simple
rule, like if you exceed

283
00:15:45,240 --> 00:15:47,360
100 gigabytes, you need to do partitioning.

284
00:15:48,080 --> 00:15:51,300
But then we see, oh, this data
is actually time serious, right?

285
00:15:52,060 --> 00:15:56,460
And it's inevitable to consider
TimescaleDB, but they, for example,

286
00:15:56,460 --> 00:16:00,920
if they are RDS or somewhere else
similar, they cannot use it.

287
00:16:00,920 --> 00:16:04,660
All they can use is just to start
doing partitioning themselves,

288
00:16:04,740 --> 00:16:07,240
maybe with Partman, maybe not.

289
00:16:07,660 --> 00:16:09,360
Or migrate to Timescale Cloud.

290
00:16:10,760 --> 00:16:13,480
And we sometimes can help them
to consider.

291
00:16:15,840 --> 00:16:19,320
Michael: Or migrate to self-host,
or like, you know, self-managed

292
00:16:19,960 --> 00:16:20,880
with TimescaleDB?

293
00:16:22,200 --> 00:16:23,680
Nikolay: Well, it depends.

294
00:16:24,180 --> 00:16:28,260
For some companies it's a good
path, but if they have a very

295
00:16:28,260 --> 00:16:32,880
lean approach in terms of how much
engineers they want to have,

296
00:16:33,080 --> 00:16:37,680
how much of them they want to be
involved in maintaining backups,

297
00:16:37,680 --> 00:16:38,800
applications, and so on.

298
00:16:38,800 --> 00:16:41,420
Like it's, it's, it's a big question.

299
00:16:41,640 --> 00:16:47,460
Honestly, I expect, I fully like
agree with people like 37 Signals,

300
00:16:48,660 --> 00:16:52,860
migration back to, from cloud,
right, even.

301
00:16:53,200 --> 00:16:54,780
Or at least from managed service.

302
00:16:55,280 --> 00:17:00,240
I think it will be happening, and
if crisis hits us and so on,

303
00:17:00,240 --> 00:17:01,720
it will be happening for sure.

304
00:17:01,720 --> 00:17:06,420
Many people will try to go away
from RDS because honestly I think

305
00:17:06,960 --> 00:17:11,400
these days we have all the pieces
already battle proven to have

306
00:17:11,400 --> 00:17:15,740
cluster working with backups and
replication, autofailover, everything

307
00:17:15,740 --> 00:17:16,480
like that.

308
00:17:16,520 --> 00:17:19,540
It may be Kubernetes, maybe not
Kubernetes, doesn't matter actually.

309
00:17:19,640 --> 00:17:24,560
We have many pieces already battle
proven and we can live without

310
00:17:24,600 --> 00:17:31,260
RDS or any managed service and
spend not a lot of efforts to

311
00:17:31,260 --> 00:17:32,260
maintain this.

312
00:17:32,660 --> 00:17:35,100
But still, there is also business
for Timescale.

313
00:17:35,380 --> 00:17:37,420
They want to have paid customers.

314
00:17:37,540 --> 00:17:39,060
I, again, I respect that.

315
00:17:39,060 --> 00:17:42,940
And sometimes we have to consider
moving there, but if you move

316
00:17:42,940 --> 00:17:46,340
there, also complications because
Timescale Cloud has limitations,

317
00:17:46,560 --> 00:17:49,320
like just single logical database,
right?

318
00:17:49,940 --> 00:17:54,920
It's kind of, in some cases it's
not convenient to have to, like

319
00:17:54,920 --> 00:17:58,100
you, you have a cluster, but you
cannot run additional create

320
00:17:58,100 --> 00:18:00,800
database and have multiple logical
databases inside.

321
00:18:00,860 --> 00:18:02,820
This is limitation of Timescale
Cloud.

322
00:18:03,240 --> 00:18:07,900
And anyway, we, we, it's like middle
age in Europe.

323
00:18:07,900 --> 00:18:13,100
We have many, many, I don't know,
like dukes or so, and many

324
00:18:13,580 --> 00:18:14,420
counties, right?

325
00:18:14,420 --> 00:18:14,920
Counties.

326
00:18:15,800 --> 00:18:16,060
Yeah.

327
00:18:16,060 --> 00:18:19,800
It's called counties in your country
and in our country as well.

328
00:18:19,860 --> 00:18:21,620
Like I'm living in San Diego County.

329
00:18:21,620 --> 00:18:25,340
So yeah, this is, this is like
fragmented space.

330
00:18:25,800 --> 00:18:26,300
Right.

331
00:18:26,600 --> 00:18:29,540
And, and if you want good TimescaleDB,
you want, you need to

332
00:18:29,540 --> 00:18:33,280
go either there or here, you cannot
do it on any other managed

333
00:18:33,280 --> 00:18:33,780
service.

334
00:18:33,920 --> 00:18:36,900
It's, it's a bad, I don't understand
why they cannot reach some

335
00:18:36,900 --> 00:18:40,840
agreement and have some, some business
agreement.

336
00:18:40,840 --> 00:18:46,020
I mean, AWS and Timescale company
and Google and Azure, I mean,

337
00:18:46,020 --> 00:18:48,700
Microsoft and so on, it would be
good for all.

338
00:18:49,220 --> 00:18:50,440
Right, it's ridiculous.

339
00:18:50,740 --> 00:18:54,480
Because I'm telling everyone, like,
compression is great.

340
00:18:54,620 --> 00:18:57,520
Read these posts, you understand,
just make experiments, it's

341
00:18:57,520 --> 00:18:58,600
so great, right?

342
00:18:59,200 --> 00:19:03,060
This is like, Without it, my advice,
okay, maybe not Postgres

343
00:19:03,840 --> 00:19:06,320
for this kind of data, maybe ClickHouse.

344
00:19:06,980 --> 00:19:12,020
ClickHouse was created to handle
the load for Google Analytics-like

345
00:19:12,380 --> 00:19:12,880
load.

346
00:19:13,780 --> 00:19:17,560
You ingest a lot of logs into it,
a lot, a lot of time series

347
00:19:17,560 --> 00:19:20,580
data and store and so on.

348
00:19:20,640 --> 00:19:23,980
While TimescaleDB shines and compression
because they apply

349
00:19:24,000 --> 00:19:28,380
like 2 dimensional compression,
they also compress column wise,

350
00:19:28,380 --> 00:19:28,700
right?

351
00:19:28,700 --> 00:19:29,480
That's great.

352
00:19:29,720 --> 00:19:34,220
And for, if you're, if you have
metrics from some sensor coming

353
00:19:34,220 --> 00:19:38,500
to you at every second, for example,
they cannot change quickly

354
00:19:38,560 --> 00:19:39,440
a lot, right?

355
00:19:39,440 --> 00:19:43,060
So they change a little, change
like a temperature or position,

356
00:19:43,260 --> 00:19:43,520
right?

357
00:19:43,520 --> 00:19:44,980
Position of some driver.

358
00:19:46,200 --> 00:19:50,260
And it means that we can convert
it to deltas and then additionally

359
00:19:50,320 --> 00:19:54,640
compress and additionally compress
and TimescaleDB reaches like

360
00:19:54,640 --> 00:19:56,060
30x compression for example.

361
00:19:56,060 --> 00:19:57,280
I remember some case.

362
00:19:57,560 --> 00:20:02,520
It's great without it Wow, it's
a lot of data, And it's not only

363
00:20:02,520 --> 00:20:07,160
storage, but it's also memory,
buffer pool, WAL,

364
00:20:07,540 --> 00:20:08,000
Michael: WAL.

365
00:20:08,000 --> 00:20:08,940
OK, yeah.

366
00:20:09,280 --> 00:20:12,800
Let me pull us back, because I
think you're right at a certain

367
00:20:12,800 --> 00:20:13,300
scale.

368
00:20:13,940 --> 00:20:18,420
Even you care about pure disk size,
it really, really matters.

369
00:20:18,520 --> 00:20:23,000
But I think there's a lot of use
cases, I'd say most use cases

370
00:20:23,000 --> 00:20:25,440
that I see are smaller.

371
00:20:25,440 --> 00:20:28,080
Like, there are a lot of time series
use cases that are much,

372
00:20:28,080 --> 00:20:30,100
that are, like, in that middle
ground, right?

373
00:20:30,100 --> 00:20:34,440
Like, they're big enough to be
considering partitioning some

374
00:20:34,440 --> 00:20:39,300
certain like maintenance optimizations
some set like some really

375
00:20:39,320 --> 00:20:43,820
considerate schema design but they're
not but they still manageable

376
00:20:43,900 --> 00:20:48,760
within Postgres core Postgres
quite healthily If

377
00:20:48,760 --> 00:20:52,380
Nikolay: you have a project which
you predict won't explode in

378
00:20:52,380 --> 00:20:54,680
terms of growth, it's okay.

379
00:20:55,380 --> 00:20:58,940
Michael: Yeah, or it's like growing
not exponentially, but like

380
00:20:58,940 --> 00:20:59,440
steadily.

381
00:20:59,680 --> 00:21:02,360
Nikolay: If you expect you won't
have more than like 10 terabytes

382
00:21:02,360 --> 00:21:03,740
of data in the next 10 years.

383
00:21:03,740 --> 00:21:04,540
Oh, it's okay.

384
00:21:04,540 --> 00:21:05,320
Maybe 5 years.

385
00:21:05,320 --> 00:21:11,820
It's okay to implement partitioning
yourself and maybe aggregating

386
00:21:11,960 --> 00:21:14,880
all data and repacking it somehow,
for example.

387
00:21:15,040 --> 00:21:17,040
Partitioning is great because it's
many things.

388
00:21:17,040 --> 00:21:21,600
Like it's, it's gives you ability
to, for example, if you have

389
00:21:21,600 --> 00:21:24,840
partition, which is like one day
from previous month, okay, it's

390
00:21:24,840 --> 00:21:31,200
time to repack it and store only
aggregates, rough, rough data,

391
00:21:31,620 --> 00:21:33,900
per hour, for example, not per
second anymore.

392
00:21:34,020 --> 00:21:35,680
It's a huge reduction.

393
00:21:35,900 --> 00:21:38,800
And if you don't have partitioning,
it will be nightmare to deal

394
00:21:38,800 --> 00:21:43,820
with updates or deletes and then
MVCC again right a vacuum will

395
00:21:43,820 --> 00:21:47,720
come and you have empty space and
then some insert happens to

396
00:21:47,720 --> 00:21:51,300
like from new new insert will go
there and now this page is out

397
00:21:51,300 --> 00:21:54,900
of visibility map and it's not
all visible it's not all frozen

398
00:21:55,380 --> 00:21:59,680
so what the vacuum needs to chew
it again it's like if we have

399
00:21:59,680 --> 00:22:05,100
partitioning we have locality and
by the way it's a like I talked

400
00:22:05,100 --> 00:22:10,900
to some someone recently and we
discussed partitioning, custom

401
00:22:10,900 --> 00:22:14,440
partitioning without TimescaleDB and discussion was, should

402
00:22:14,440 --> 00:22:19,120
it be partitioned by customer or
project or time-wise.

403
00:22:19,600 --> 00:22:23,360
Of course, time-wise, if we talk
about this kind of data, it

404
00:22:23,360 --> 00:22:26,680
must involve time consideration
into partitioning because in

405
00:22:26,680 --> 00:22:30,840
this case, you will have fresh
partition where all inserts coming

406
00:22:30,840 --> 00:22:32,060
to one partition, right?

407
00:22:32,200 --> 00:22:36,040
And then you have archived data
and you can deal with it separately.

408
00:22:36,420 --> 00:22:40,900
So you can repack partition, detach
it, detach old partition,

409
00:22:40,900 --> 00:22:43,940
attach new partition, whatever
you want, fully online.

410
00:22:44,440 --> 00:22:49,200
And autovacuum will be fine and
no new inserts will come to

411
00:22:49,200 --> 00:22:50,340
this partition, right?

412
00:22:50,900 --> 00:22:54,360
Unless you support some occasional
updates of old data.

413
00:22:55,080 --> 00:23:00,080
Even then, if it's not massive,
those updates, it's still good.

414
00:23:00,180 --> 00:23:04,940
You can do that, you can repack
by updating basically.

415
00:23:06,040 --> 00:23:10,600
You have a second resolution partition
for all the day from the

416
00:23:10,600 --> 00:23:14,140
previous month, and then you convert
it to, for example, hour

417
00:23:14,140 --> 00:23:14,640
resolution.

418
00:23:15,940 --> 00:23:18,460
3600 times smaller.

419
00:23:18,760 --> 00:23:19,580
It's great.

420
00:23:20,220 --> 00:23:24,720
And actually, this is a great recipe.

421
00:23:24,720 --> 00:23:28,140
This is not compression, but it's
like kind of replacement for

422
00:23:28,180 --> 00:23:29,440
alternative, right?

423
00:23:29,680 --> 00:23:31,540
And then you have-

424
00:23:31,630 --> 00:23:34,480
Michael: I hear this get called
roll-up tables quite often, which

425
00:23:34,480 --> 00:23:35,420
I really like.

426
00:23:35,740 --> 00:23:38,300
And you could have multiple granularities,
like you could have

427
00:23:38,300 --> 00:23:43,440
hourly, but you could also have
daily, monthly, like depending

428
00:23:43,440 --> 00:23:47,160
on how fast you need those queries
to be or what you're using

429
00:23:47,160 --> 00:23:49,640
them for, what your dashboards
are showing or what customers

430
00:23:49,640 --> 00:23:50,560
need to see.

431
00:23:50,660 --> 00:23:54,520
You can have a few different ones
and each one is going to be an

432
00:23:54,520 --> 00:23:57,380
order or orders of magnitude smaller
than the previous one and

433
00:23:57,380 --> 00:23:59,940
therefore faster to query.

434
00:24:00,640 --> 00:24:01,140
Exactly.

435
00:24:01,700 --> 00:24:01,920
Yeah.

436
00:24:01,920 --> 00:24:05,280
So I think this is a really good
solution and the other thing

437
00:24:05,280 --> 00:24:08,740
about partitioning I'm not sure
you mentioned yet is is not just

438
00:24:08,760 --> 00:24:13,180
kind of rolling up but you but
also just purely dropping the

439
00:24:13,180 --> 00:24:14,160
Nikolay: yeah yeah yeah that's

440
00:24:14,160 --> 00:24:15,020
Michael: getting rid of that

441
00:24:15,020 --> 00:24:19,140
Nikolay: solution yeah or well
all there is also an attempt to

442
00:24:19,140 --> 00:24:20,340
have tiered storage.

443
00:24:20,380 --> 00:24:25,280
So maybe all partitions should
go to S3 or GCS or some blob storage

444
00:24:25,280 --> 00:24:28,840
on Azure, how it's called, they're
still trying to remember.

445
00:24:28,860 --> 00:24:33,000
Anyway, like object storage and
where we can choose the level

446
00:24:33,000 --> 00:24:35,380
like S3 Glacier, right?

447
00:24:35,380 --> 00:24:36,600
It's, it's cheaper.

448
00:24:36,980 --> 00:24:40,700
It's slower, but people rarely
access this data and so on.

449
00:24:40,760 --> 00:24:42,680
Yeah, there are several options
here.

450
00:24:42,680 --> 00:24:46,240
How you need to, you need some
strategy for data management and

451
00:24:46,240 --> 00:24:47,640
long, longer term, right?

452
00:24:47,640 --> 00:24:48,540
For this data.

453
00:24:48,680 --> 00:24:51,660
I see how companies struggle when
they're already hitting some

454
00:24:51,660 --> 00:24:53,760
limits and they need to do something
about it.

455
00:24:53,760 --> 00:24:57,380
But it's a lot of effort to redesign
things.

456
00:24:57,840 --> 00:25:01,760
If you have earlier, like, think
earlier, maybe it will save

457
00:25:01,760 --> 00:25:02,260
some.

458
00:25:02,680 --> 00:25:04,660
It's harder in the beginning, as
usual.

459
00:25:04,740 --> 00:25:08,420
More like, bigger challenge to
solve.

460
00:25:08,640 --> 00:25:09,280
But then it

461
00:25:09,280 --> 00:25:09,940
Michael: pays off.

462
00:25:10,320 --> 00:25:10,820
Yeah.

463
00:25:10,900 --> 00:25:12,600
I think this is such a difficult
1.

464
00:25:12,600 --> 00:25:14,140
Like, you've got all of the traps.

465
00:25:14,140 --> 00:25:15,660
You've got traps in both directions,
right?

466
00:25:15,660 --> 00:25:19,540
You've got all of the premature
optimization traps, but then

467
00:25:19,540 --> 00:25:19,900
also...

468
00:25:19,900 --> 00:25:20,180
I don't...

469
00:25:20,180 --> 00:25:21,580
Nikolay: Well, yeah.

470
00:25:21,580 --> 00:25:22,460
If it was...

471
00:25:23,260 --> 00:25:25,780
Or just use TimescaleDB, that's
it, right?

472
00:25:26,480 --> 00:25:29,860
Michael: So I do think there is
an argument for it, and not just

473
00:25:29,860 --> 00:25:33,360
because of the scalability, but
I think, like, developer experience-wise,

474
00:25:34,760 --> 00:25:36,000
you don't have to...

475
00:25:36,000 --> 00:25:38,540
Like, even with pg_partman, there's
a bit of setup.

476
00:25:38,540 --> 00:25:41,840
Like, with continuous aggregates,
there's quite a lot of setup

477
00:25:41,840 --> 00:25:42,700
for you if you want to

478
00:25:42,700 --> 00:25:43,360
Nikolay: do that.

479
00:25:43,660 --> 00:25:48,020
But this is available pg_cron and
pg_partman are there are available

480
00:25:48,300 --> 00:25:48,520
everywhere.

481
00:25:48,520 --> 00:25:51,380
Michael: I completely agree I completely
agree I'm just saying

482
00:25:51,380 --> 00:25:54,520
from a UX point of view, like developer
experience point of view,

483
00:25:54,680 --> 00:25:57,160
there's also a bunch of functions
that come with TimescaleDB

484
00:25:57,720 --> 00:26:01,860
that are really useful, make certain
things much simpler to write

485
00:26:01,860 --> 00:26:02,780
queries for.

486
00:26:02,780 --> 00:26:07,200
So I do think there's an argument
for that but I do think you

487
00:26:07,200 --> 00:26:11,040
are then trading off freedom like
there are then only so many

488
00:26:11,040 --> 00:26:14,880
managed services you can use and
I think even there was a recent

489
00:26:15,280 --> 00:26:19,140
alternative that stuck that popped
up from the team at Tembo,

490
00:26:19,140 --> 00:26:20,580
I don't know if you, I haven't
had a chance

491
00:26:20,580 --> 00:26:21,040
Nikolay: to look at

492
00:26:21,040 --> 00:26:23,500
Michael: it properly yet, but pg_timeseries?

493
00:26:24,020 --> 00:26:25,740
Nikolay: Ah no, this I haven't
tried.

494
00:26:26,560 --> 00:26:30,180
Michael: Yeah it's new and they
left the exact same reason in

495
00:26:30,180 --> 00:26:33,380
the readme, you can see the exact
reason they've done it is because

496
00:26:33,900 --> 00:26:38,060
they could only have used Apache
TimescaleDB and they realized

497
00:26:38,680 --> 00:26:40,560
they needed something a bit more
than that.

498
00:26:40,560 --> 00:26:44,540
Nikolay: And then additional complexities
hit you when you grow

499
00:26:44,540 --> 00:26:47,600
and at some scale schema changes
become a problem.

500
00:26:48,620 --> 00:26:52,340
Right, For partition tables they
are much more harder, they're

501
00:26:52,340 --> 00:26:54,440
not easy to do at scale anyway.

502
00:26:55,520 --> 00:26:56,880
But with partition tables...

503
00:26:56,880 --> 00:27:00,580
Michael: But this is, this was kind of my point, that you're

504
00:27:00,580 --> 00:27:03,260
paying some complexity up front for doing this and it is kind

505
00:27:03,260 --> 00:27:06,360
of a premature optimization to partition a table that is absolutely

506
00:27:06,440 --> 00:27:10,200
tiny and yes it will be a pain to move from a single table to

507
00:27:10,200 --> 00:27:14,560
a partitioned table later but you've also got away with some simplicity

508
00:27:14,700 --> 00:27:16,920
for maybe a year or 2 maybe longer

509
00:27:16,920 --> 00:27:21,640
Nikolay: well anyway I think if you have partition if you have

510
00:27:21,680 --> 00:27:26,700
timescale time-serious situation yeah and you expect some terabytes

511
00:27:26,720 --> 00:27:29,960
of data in a few years, I would implement partitioning right

512
00:27:29,960 --> 00:27:36,160
away anyway, because this would avoid me of using some queries

513
00:27:36,340 --> 00:27:38,200
which are not compatible with partitioning.

514
00:27:39,000 --> 00:27:41,860
And these query is really hard to redesign later.

515
00:27:42,560 --> 00:27:45,560
It's better to have some partitioning right away and partitioning

516
00:27:45,720 --> 00:27:47,920
key should be involving timestamp.

517
00:27:48,900 --> 00:27:49,760
And that's great.

518
00:27:49,760 --> 00:27:53,860
I mean, yes, it's some inconvenience in the beginning, but it's

519
00:27:53,860 --> 00:27:56,600
already like, it's not rocket science at all these days, like

520
00:27:56,600 --> 00:27:58,440
many people done it and there are blog posts.

521
00:27:58,440 --> 00:28:04,700
I have some how to recipe, how to partition a table with

522
00:28:04,700 --> 00:28:05,640
UUID v7.

523
00:28:06,040 --> 00:28:11,840
And I use, it has like example before TimescaleDB, but it's

524
00:28:11,840 --> 00:28:16,520
agnostic to version of Postgres, so it can be without TimescaleDB, same thing.

525
00:28:16,520 --> 00:28:17,540
And yeah, I think partitioning

526
00:28:18,500 --> 00:28:22,660
is a must if you expect terabytes

527
00:28:22,720 --> 00:28:26,040
of data, especially if dozens of terabytes become absolutely

528
00:28:26,040 --> 00:28:27,100
must in the beginning.

529
00:28:27,440 --> 00:28:30,060
And it's a pity that we don't have, like do you think partitioning

530
00:28:30,060 --> 00:28:31,480
could be improved in Postgres?

531
00:28:31,880 --> 00:28:32,800
In which areas?

532
00:28:32,800 --> 00:28:35,400
Michael: Well, to be fair, it has been improving.

533
00:28:35,580 --> 00:28:36,600
Like, there's so...

534
00:28:36,600 --> 00:28:40,240
Each release, every release, there's quite significant improvement.

535
00:28:40,240 --> 00:28:40,440
Well, in

536
00:28:40,440 --> 00:28:43,140
Nikolay: Postgres 10 we got declarative partitioning, right?

537
00:28:43,140 --> 00:28:43,640
Yes.

538
00:28:44,040 --> 00:28:48,280
And then it was improving only, like, evolution-wise, like, polishing

539
00:28:48,280 --> 00:28:48,700
things.

540
00:28:48,700 --> 00:28:49,200
Michael: Okay.

541
00:28:49,400 --> 00:28:53,800
Nikolay: Do you think UX wise big step could be done, UX wise

542
00:28:53,800 --> 00:28:55,700
to simplify?

543
00:28:56,440 --> 00:29:00,360
Like for example, I remember we are helping very famous company

544
00:29:00,820 --> 00:29:02,980
to partition like during a weekend.

545
00:29:03,820 --> 00:29:06,640
Like I think it was more than 1 year ago.

546
00:29:06,640 --> 00:29:08,700
It was a AI company, super popular.

547
00:29:09,160 --> 00:29:13,920
We helped them and it was great, but it was interesting experience

548
00:29:13,980 --> 00:29:18,820
actually to help like to do it very quickly without downtime

549
00:29:18,820 --> 00:29:19,700
and so on.

550
00:29:20,280 --> 00:29:22,200
Michael: Without downtime is the hard part.

551
00:29:22,200 --> 00:29:24,640
Nikolay: Yeah, but for me it's
easy because I know the, like

552
00:29:24,640 --> 00:29:27,700
for me, not so easy, but I know
how to do it and how to verify

553
00:29:27,700 --> 00:29:29,340
it before deployment and so on.

554
00:29:29,680 --> 00:29:31,620
Hold my focus on that part.

555
00:29:31,640 --> 00:29:35,100
What we missed, I'll tell you the
story, it's funny.

556
00:29:35,140 --> 00:29:39,220
I was washing my car and I was
chatting to my friend also from

557
00:29:39,240 --> 00:29:43,180
from a Bay Area, like I told him,
you know, we helped this company

558
00:29:43,180 --> 00:29:44,980
partition, to implement partitioning.

559
00:29:46,060 --> 00:29:48,780
He's, ah, because he mentioned
he's using the company.

560
00:29:49,660 --> 00:29:51,260
He's using products of the company.

561
00:29:51,260 --> 00:29:51,560
Okay.

562
00:29:51,560 --> 00:29:52,360
He's using product.

563
00:29:52,360 --> 00:29:55,380
I mentioned, okay, we just implement,
help them implement partitioning.

564
00:29:56,000 --> 00:29:59,720
And he said, Oh, the service is
down right now.

565
00:30:00,100 --> 00:30:01,580
I said, really?

566
00:30:01,780 --> 00:30:04,460
And I'm looking at my watch and
I see 5 p.m.

567
00:30:05,140 --> 00:30:06,760
And I realized, 5 p.m.

568
00:30:06,760 --> 00:30:09,720
Probably this is the time, it's
like 5-0-0, right?

569
00:30:09,720 --> 00:30:12,280
It's the time probably when new
partition must be created.

570
00:30:13,500 --> 00:30:15,180
And this we completely missed.

571
00:30:15,460 --> 00:30:16,740
Creation of new partitions.

572
00:30:16,880 --> 00:30:17,380
Michael: Wow.

573
00:30:17,620 --> 00:30:21,560
Nikolay: It's like we focused on
hard topics and failed in

574
00:30:21,560 --> 00:30:22,440
Michael: simple topics.

575
00:30:23,200 --> 00:30:25,620
Nikolay: You must create partition
in advance.

576
00:30:26,500 --> 00:30:28,240
And this is on shoulders of developer.

577
00:30:28,520 --> 00:30:29,940
That's not all right, right?

578
00:30:30,540 --> 00:30:32,240
We need pg_cron or something.

579
00:30:32,640 --> 00:30:33,700
Why is it like...

580
00:30:34,020 --> 00:30:38,040
DX, developer experience, is not
excellent, unfortunately.

581
00:30:38,140 --> 00:30:41,020
I think some things can be improved
in this area.

582
00:30:41,940 --> 00:30:44,440
But of course it could be good
to do it evolutionarily as well.

583
00:30:44,440 --> 00:30:48,580
Like maybe pg_cron should be part
of Postgres, first thing.

584
00:30:48,900 --> 00:30:49,400
Cool.

585
00:30:49,780 --> 00:30:50,640
Yeah, maybe.

586
00:30:51,900 --> 00:30:54,900
At least, at least, contrib module
or something, I don't know.

587
00:30:55,120 --> 00:30:59,340
And then probably partition creation
could be part of declarative

588
00:31:00,300 --> 00:31:04,740
syntax, you define it when you
create your partition schema,

589
00:31:04,780 --> 00:31:08,260
you say, okay, new partitions should
be created hourly or daily

590
00:31:08,260 --> 00:31:08,960
or something.

591
00:31:09,780 --> 00:31:10,120
Yeah.

592
00:31:10,120 --> 00:31:10,620
Right?

593
00:31:10,640 --> 00:31:12,600
Right now it's terrible, everything
on main.

594
00:31:12,600 --> 00:31:16,100
I mean, it's not terrible, it's
much better when I implemented

595
00:31:16,480 --> 00:31:21,300
partitioning for my first RDS project
and it was inheritance-based.

596
00:31:23,080 --> 00:31:26,680
I did it with 0 downtime, it was
great, but I spent like a couple

597
00:31:26,680 --> 00:31:27,320
of months.

598
00:31:28,880 --> 00:31:33,960
Yeah, this is time when I started
understanding that clones are

599
00:31:33,960 --> 00:31:36,800
great because you can experiment
and check everything.

600
00:31:39,780 --> 00:31:43,380
Michael: Going back, I think what
I meant when I said no downtime

601
00:31:43,380 --> 00:31:46,580
is the hard part, I mean new features
for partitioning generally

602
00:31:46,640 --> 00:31:49,840
come in and there's so many kind
of like foot guns of what you

603
00:31:49,840 --> 00:31:52,440
can do without a heavy lock and
what you can what needs

604
00:31:52,440 --> 00:31:53,300
Nikolay: 1 keys

605
00:31:54,020 --> 00:32:00,060
Michael: like even dropping an
index yeah so there are so many

606
00:32:00,060 --> 00:32:04,740
tricky things that get implemented
first in a way that has heavy

607
00:32:04,740 --> 00:32:09,100
locks and then later in a way that
can be done without, so therefore

608
00:32:09,160 --> 00:32:10,840
in a 0 downtime fashion.

609
00:32:10,840 --> 00:32:15,580
So I do see it as probably it's
going to only improve incrementally,

610
00:32:15,780 --> 00:32:20,040
but I don't see why automatic creation
couldn't be a complete

611
00:32:20,060 --> 00:32:20,560
thing.

612
00:32:21,340 --> 00:32:25,080
It could also help, yeah, remember
we were talking about pg_squeeze

613
00:32:25,240 --> 00:32:28,940
and that's another feature that
even though I think it was vacuum

614
00:32:28,940 --> 00:32:31,300
full concurrently could come in.

615
00:32:31,620 --> 00:32:32,220
With glass

616
00:32:32,220 --> 00:32:33,620
Nikolay: option maybe, right?

617
00:32:34,080 --> 00:32:39,720
Michael: Yeah but the other feature
it has is scheduled, let's

618
00:32:39,720 --> 00:32:40,940
say like repacking.

619
00:32:42,160 --> 00:32:42,440
Repacking.

620
00:32:42,440 --> 00:32:43,020
Or whatever, squeezing.

621
00:32:43,180 --> 00:32:44,240
So scheduled squeezing.

622
00:32:45,220 --> 00:32:46,720
Well, it's different.

623
00:32:47,960 --> 00:32:49,920
It would need the same kind of
logic though, right?

624
00:32:49,920 --> 00:32:52,260
It would need the same, I guess
that's trigger based instead

625
00:32:52,260 --> 00:32:55,680
of time based, but it's a similar
kind of Background worker type

626
00:32:55,680 --> 00:32:56,180
thing.

627
00:32:57,260 --> 00:32:58,820
Nikolay: Oh, yeah, maybe you're
right.

628
00:32:58,820 --> 00:32:59,160
Actually.

629
00:32:59,160 --> 00:32:59,340
Yeah.

630
00:33:00,060 --> 00:33:01,000
I see evolution here.

631
00:33:01,000 --> 00:33:06,220
First of all, I fully agree, this
whole class of problems that

632
00:33:06,220 --> 00:33:10,160
are gradually solved in every major
release related to locks,

633
00:33:10,840 --> 00:33:13,180
partitioning versus locking, right?

634
00:33:13,940 --> 00:33:17,500
Indexes, foreign keys, detaching,
attaching, we remember every

635
00:33:17,500 --> 00:33:19,860
release like last 10 years maybe.

636
00:33:20,080 --> 00:33:24,140
When the 10 version was released
it was 2017 I guess.

637
00:33:24,480 --> 00:33:28,220
Michael: I like to do, 17 is the
most recent so it's 7 years

638
00:33:28,220 --> 00:33:28,720
ago.

639
00:33:29,060 --> 00:33:34,700
Nikolay: Yes, I'm old guy, I still
remember that 95 was 15, 96

640
00:33:34,740 --> 00:33:36,240
was 16, okay.

641
00:33:36,980 --> 00:33:42,820
So I think you're right, and some
steps could be done here, and

642
00:33:42,820 --> 00:33:47,900
experience could be improved, And
I mean in smaller steps.

643
00:33:48,760 --> 00:33:53,300
And yeah, but you're talking about
repacking for to mitigate

644
00:33:53,300 --> 00:33:53,800
bloat.

645
00:33:54,240 --> 00:33:58,660
Previously, we talked about repacking
to make partition like

646
00:33:58,660 --> 00:34:03,980
100 times smaller and having more
or less precise data, different

647
00:34:03,980 --> 00:34:05,340
precision level of precision.

648
00:34:05,980 --> 00:34:10,120
I think this the latter should
be on schedule.

649
00:34:11,140 --> 00:34:14,400
And to be on schedule right now,
there is no option in Postgres.

650
00:34:14,760 --> 00:34:16,520
You need pg_cron, right?

651
00:34:16,840 --> 00:34:22,180
That's why I think pg_cron could
help to open some new features

652
00:34:22,180 --> 00:34:24,560
for partitioning if it was in the
core.

653
00:34:25,200 --> 00:34:30,600
Yeah, maybe we have gone sideways
a little bit from time series,

654
00:34:30,660 --> 00:34:35,080
but time series and partitioning
they are very strongly connected

655
00:34:35,080 --> 00:34:36,840
in my head, in my mind.

656
00:34:36,960 --> 00:34:38,560
Michael: Yeah, I completely agree.

657
00:34:38,560 --> 00:34:42,240
I think you mentioned Timescale
has a couple of important features,

658
00:34:42,240 --> 00:34:45,780
but I would add the automatic partitioning
as another thing.

659
00:34:45,780 --> 00:34:48,210
Nikolay: Oh, it's great, Yeah,
yeah.

660
00:34:48,210 --> 00:34:49,440
It's just, it's very declarative.

661
00:34:49,540 --> 00:34:52,440
You just say I want every hour
and then you just forget about

662
00:34:52,440 --> 00:34:52,640
it.

663
00:34:52,640 --> 00:34:54,060
It works really great.

664
00:34:54,440 --> 00:34:59,140
By the way, also, if we, for TimescaleDB, we don't, we're not

665
00:34:59,140 --> 00:35:01,040
scared to have a lot of partitions.

666
00:35:01,060 --> 00:35:02,460
They're called chunks, right?

667
00:35:02,780 --> 00:35:03,280
Yeah.

668
00:35:03,540 --> 00:35:07,100
And it's interesting that if you
have time series, you want partitioning,

669
00:35:07,240 --> 00:35:09,840
you should explore what will happen
if you have thousands of

670
00:35:09,840 --> 00:35:11,100
partitions, for example.

671
00:35:11,200 --> 00:35:16,880
And this provokes you to use pooling
and establish connection

672
00:35:16,880 --> 00:35:17,660
less often.

673
00:35:18,120 --> 00:35:23,620
Remember, I have a blog post and
mistake was made that real cache,

674
00:35:24,180 --> 00:35:25,520
real cache related mistake.

675
00:35:25,520 --> 00:35:29,680
So every time Postgres connects
again, first query will be slow

676
00:35:29,680 --> 00:35:31,040
if you have a lot of partitions.

677
00:35:33,040 --> 00:35:35,540
But subsequent queries will be
fast.

678
00:35:36,040 --> 00:35:41,360
So if you have many, many partitions,
you should have connection

679
00:35:41,380 --> 00:35:45,200
pooling and establish connection
not often.

680
00:35:46,860 --> 00:35:50,320
Or somehow accept that the planner
needs more time for the very

681
00:35:50,320 --> 00:35:51,140
first planning.

682
00:35:52,500 --> 00:35:56,520
Michael: I guess it depends on
what queries, but yeah, it seems

683
00:35:56,520 --> 00:35:56,960
to me like that would be...

684
00:35:56,960 --> 00:35:57,720
It doesn't depend

685
00:35:57,720 --> 00:35:58,360
Nikolay: on queries.

686
00:35:58,580 --> 00:36:01,820
Because it's like real cache, we
need to load the metadata for

687
00:36:02,040 --> 00:36:03,340
all partitions anyway.

688
00:36:03,940 --> 00:36:06,060
Michael: Yeah, but let's say they're
analytical queries.

689
00:36:07,120 --> 00:36:09,640
It won't matter a few milliseconds
of planning.

690
00:36:09,960 --> 00:36:13,300
Nikolay: Analytical queries, time series, without TimescaleDB,

691
00:36:14,120 --> 00:36:15,460
please go out.

692
00:36:15,480 --> 00:36:16,960
It's not Postgres case.

693
00:36:18,380 --> 00:36:19,920
Well, unless it's a small project.

694
00:36:19,920 --> 00:36:23,760
If it's a small project fine but
if it's not small let's not

695
00:36:23,900 --> 00:36:24,660
fool ourselves.

696
00:36:25,440 --> 00:36:25,760
It's not

697
00:36:25,760 --> 00:36:26,000
Michael: for Postgres.

698
00:36:26,000 --> 00:36:26,780
What about the

699
00:36:27,040 --> 00:36:27,172
Nikolay: roll up tables?

700
00:36:27,172 --> 00:36:29,560
Michael: The roll-up tables are
exactly for that case.

701
00:36:30,340 --> 00:36:35,500
Nikolay: Well yeah if you do a
very strict strategy and very

702
00:36:35,500 --> 00:36:39,520
aggressive roll-up and three-stage
approach and maybe your counts

703
00:36:39,520 --> 00:36:44,140
and sums and means, maxes, aggregates
will be fast, I don't know.

704
00:36:44,140 --> 00:36:44,380
Yeah.

705
00:36:44,380 --> 00:36:46,200
But it's easy to experiment, actually.

706
00:36:46,240 --> 00:36:49,300
It's a one-day project to experiment
and check for your particular

707
00:36:49,360 --> 00:36:52,800
case, I mean, to explore what latencies
will be and so on.

708
00:36:52,800 --> 00:36:56,400
This is definitely worth it if
you plan some new thing to explore

709
00:36:56,600 --> 00:36:57,100
latencies.

710
00:36:58,340 --> 00:36:59,280
Michael: Yes, yes, yes.

711
00:36:59,440 --> 00:37:00,280
Nikolay: Okay, okay.

712
00:37:00,340 --> 00:37:04,640
I actually, I described this roll-up
approach not knowing it's

713
00:37:04,640 --> 00:37:09,480
called roll-up myself, and then
I'm still pulling us out of Postgres.

714
00:37:09,480 --> 00:37:10,880
But, well, Postgres is great.

715
00:37:11,400 --> 00:37:15,580
I know when you go dig deeper,
you see some problems, you talk

716
00:37:15,580 --> 00:37:19,000
about them, and then people come
to you and say you are Postgres

717
00:37:19,000 --> 00:37:19,500
hater.

718
00:37:20,180 --> 00:37:21,720
It happened to me not once.

719
00:37:22,300 --> 00:37:27,440
Yes that's why I spent like 20
years and still not stopping working

720
00:37:27,440 --> 00:37:29,360
with Postgres and thinking and so
on.

721
00:37:29,700 --> 00:37:31,100
Yeah I'm Postgres hater.

722
00:37:33,480 --> 00:37:36,420
Michael: Let's do, should we do
a quick fire, kind of a few last

723
00:37:36,420 --> 00:37:39,440
things in terms of considerations
while we're here?

724
00:37:39,480 --> 00:37:39,960
I think,

725
00:37:39,960 --> 00:37:42,240
Nikolay: well, we didn't talk about
updates, unfortunately.

726
00:37:42,440 --> 00:37:46,260
Yes, but I think if updates are
in place, definitely first thing

727
00:37:46,260 --> 00:37:49,840
I would try to localize with them
with partitioning again, partitioning

728
00:37:50,020 --> 00:37:50,520
localization.

729
00:37:51,540 --> 00:37:57,100
My new fresh idea, not mine, I
actually borrowed it from a good

730
00:37:57,100 --> 00:37:59,760
guy who is expert longer than me.

731
00:37:59,760 --> 00:38:05,320
So idea is that if you have localized
writes, thanks to partitioning,

732
00:38:05,660 --> 00:38:09,100
it also helps to backups, incremental
backups, because you don't

733
00:38:09,100 --> 00:38:10,540
touch old pages.

734
00:38:13,360 --> 00:38:17,880
So this makes incremental backup
lighter, smaller, right?

735
00:38:18,580 --> 00:38:19,660
And recovery as well.

736
00:38:19,660 --> 00:38:20,640
It's also good.

737
00:38:20,740 --> 00:38:22,580
It's not about storage only.

738
00:38:22,600 --> 00:38:27,100
It's not only about the buffer
pool and WAL rates, full-page

739
00:38:27,100 --> 00:38:27,740
writes and so on.

740
00:38:27,740 --> 00:38:28,860
It's also about backup.

741
00:38:28,860 --> 00:38:30,920
So everything is connected here.

742
00:38:31,220 --> 00:38:37,680
It's great to have partitioning
for large tables exceeding like

743
00:38:37,680 --> 00:38:38,540
100 gigabytes.

744
00:38:38,940 --> 00:38:43,920
So this is first thing, compression,
any kinds of, or this roll-up

745
00:38:43,940 --> 00:38:45,840
idea, this is second 1, right?

746
00:38:46,240 --> 00:38:48,160
Yeah, a couple of

747
00:38:48,160 --> 00:38:50,740
Michael: things we haven't talked
a lot about because of the

748
00:38:50,740 --> 00:38:54,660
volume here I think disk space
can be a big thing I know compression

749
00:38:55,520 --> 00:38:58,420
really helps there but another
thing that can help is if you

750
00:38:58,420 --> 00:39:01,800
do have any secondary indexes like
not not the primary key if

751
00:39:01,800 --> 00:39:08,160
you do want another index, maybe
consider BRIN as well, especially

752
00:39:08,160 --> 00:39:11,720
in the append-only case, but even,
we had a whole episode about

753
00:39:11,720 --> 00:39:17,780
it, but with the new operator classes,
with min-max multi, it's

754
00:39:17,780 --> 00:39:20,380
really quite powerful and much
much smaller than B-tree.

755
00:39:20,380 --> 00:39:20,740
Nikolay: That's a

756
00:39:20,740 --> 00:39:21,420
Michael: great point.

757
00:39:21,420 --> 00:39:23,760
Nikolay: Do you remember the version
when it was added?

758
00:39:24,020 --> 00:39:25,920
16, 17, 15?

759
00:39:25,920 --> 00:39:28,180
Michael: I think it was roughly
when we started, maybe like a

760
00:39:28,180 --> 00:39:30,040
year or 2 before we started the
podcast.

761
00:39:30,040 --> 00:39:30,840
So maybe, yeah.

762
00:39:30,840 --> 00:39:32,580
Nikolay: Anyway, in very fresh
versions.

763
00:39:33,240 --> 00:39:36,680
Yeah, and also continuous aggregates
in TimescaleDB is a great

764
00:39:36,680 --> 00:39:36,980
thing.

765
00:39:36,980 --> 00:39:40,920
If you don't have TimescaleDB,
then something should be done,

766
00:39:40,920 --> 00:39:45,260
like probably incremental materialized
views and so on.

767
00:39:45,780 --> 00:39:47,480
Michael: Yeah, there's that extension.

768
00:39:47,500 --> 00:39:51,340
But also there's a really great
talk by the team at Citus on

769
00:39:51,340 --> 00:39:51,840
roll-ups.

770
00:39:52,060 --> 00:39:54,940
So I will share that in the show
notes.

771
00:39:55,340 --> 00:39:58,400
There's also on this topic there's
with a lot more practical

772
00:39:58,440 --> 00:39:58,940
tips.

773
00:39:59,380 --> 00:40:03,660
There's a talk by a fellow Brit
of mine, Chris Ellis.

774
00:40:03,820 --> 00:40:08,040
He's done a talk on, it was about
Internet of Things, so sensor

775
00:40:08,040 --> 00:40:10,440
data, but it's basically just time
series, right?

776
00:40:10,440 --> 00:40:13,720
So I'll share a link to that talk
as well.

777
00:40:14,180 --> 00:40:14,860
Nikolay: 4 points.

778
00:40:14,860 --> 00:40:16,060
What's the last 1?

779
00:40:16,500 --> 00:40:17,340
Number 5.

780
00:40:17,520 --> 00:40:19,020
Michael: What about language stuff?

781
00:40:19,020 --> 00:40:22,020
Like, Chris, in his talk, I remember,
talked quite a bit about

782
00:40:22,020 --> 00:40:24,360
window functions and interpolating
gaps.

783
00:40:24,840 --> 00:40:27,880
There was probably some considerations
there that you probably

784
00:40:27,880 --> 00:40:29,360
still want access to SQL.

785
00:40:29,800 --> 00:40:33,900
Postgres is so powerful there that
I think some of the other

786
00:40:34,360 --> 00:40:37,180
specialist databases, it's slightly
harder to do some of that

787
00:40:37,180 --> 00:40:40,520
stuff or it's a lot more it's a
lot more learning.

788
00:40:40,760 --> 00:40:42,540
Nikolay: My favorite function is
lag.

789
00:40:43,940 --> 00:40:44,440
Yeah.

790
00:40:44,460 --> 00:40:48,140
I deal with A lot with lags in
various kinds of replication.

791
00:40:48,320 --> 00:40:51,340
And here we have a window function
called lag, right?

792
00:40:51,820 --> 00:40:52,560
I thought you were going to

793
00:40:52,560 --> 00:40:54,400
Michael: say your internet connection,
but yeah.

794
00:40:54,520 --> 00:40:56,280
Nikolay: Well, this is as well.

795
00:40:56,280 --> 00:40:57,220
Lags everywhere.

796
00:40:57,660 --> 00:41:01,500
Anyway, let's consider this number
5.

797
00:41:01,560 --> 00:41:06,240
Postgres has great SQL, standard
SQL support, window functions,

798
00:41:06,280 --> 00:41:09,820
a very rich set of them, and here
it can help as well.

799
00:41:10,200 --> 00:41:12,660
Michael: Yeah, last thing, we have
mentioned Timescale a lot,

800
00:41:12,660 --> 00:41:17,080
but there are other extensions
like Citus, Hydra as well, worth

801
00:41:17,080 --> 00:41:19,480
considering for specific use cases.

802
00:41:20,100 --> 00:41:22,540
And pg_timeseries, I'm gonna check
it out.

803
00:41:22,540 --> 00:41:25,520
I'm not sure quite what they've
got in terms of features yet

804
00:41:25,520 --> 00:41:27,420
but it's Postgres licensed.

805
00:41:27,840 --> 00:41:31,840
Nikolay: And pg_duckdb episode which
is new thing we had pg_duckdb

806
00:41:32,140 --> 00:41:32,640
episode.

807
00:41:32,980 --> 00:41:36,260
Michael: Yeah but I haven't considered
it for time series data.

808
00:41:36,480 --> 00:41:37,180
Nikolay: Do it!

809
00:41:37,860 --> 00:41:39,760
Yeah, why not?

810
00:41:39,960 --> 00:41:41,820
Well, all things are fresh.

811
00:41:41,820 --> 00:41:48,840
I just saw how my team member used
DuckDB just locally to read

812
00:41:49,260 --> 00:41:53,580
some CSV logs, snapshots actually
collected from Postgres.

813
00:41:53,760 --> 00:41:58,380
It was great, like it was some
SQL, DuckDB SQL, and super

814
00:41:58,380 --> 00:41:59,520
convenient as well.

815
00:41:59,720 --> 00:42:03,480
So I think we have a lot of things
and tools which are open source

816
00:42:03,480 --> 00:42:04,980
and that's great.

817
00:42:05,740 --> 00:42:06,640
Michael: 1 more idea.

818
00:42:07,340 --> 00:42:10,900
Even the topic from last time was,
or 2 times ago, was pg_dog,

819
00:42:11,040 --> 00:42:11,760
like sharding.

820
00:42:12,040 --> 00:42:16,920
If you're getting to extreme, light,
heavy workloads, you could

821
00:42:16,920 --> 00:42:22,760
shard by, you could have time series
partitioning on every shard,

822
00:42:22,920 --> 00:42:29,820
but then maybe a range of device
IDs for each 1 and shard based

823
00:42:29,820 --> 00:42:31,300
on something other key.

824
00:42:31,840 --> 00:42:33,760
Nikolay: Right, yeah.

825
00:42:33,940 --> 00:42:35,660
Thank you for this little brainstorm.

826
00:42:36,620 --> 00:42:38,980
It was a pleasure to talk to you
as usual.

827
00:42:39,140 --> 00:42:39,640
Michael: Likewise.