1
00:00:00,060 --> 00:00:02,420
Michael: Hello and welcome to Postgres.FM, a weekly show about

2
00:00:02,420 --> 00:00:03,300
all things PostgreSQL.

3
00:00:03,400 --> 00:00:05,840
I am Michael, founder of pgMustard, and I'm joined as usual by

4
00:00:05,840 --> 00:00:07,360
Nik, founder of PostgresAI.

5
00:00:07,360 --> 00:00:08,040
Hey Nik!

6
00:00:08,840 --> 00:00:10,360
Nikolay: Hi Michael, how are you?

7
00:00:10,840 --> 00:00:12,220
Michael: I am good, how are you?

8
00:00:12,380 --> 00:00:13,120
Nikolay: Very good.

9
00:00:13,420 --> 00:00:13,780
Michael: Great.

10
00:00:13,780 --> 00:00:15,640
And what are we talking about this week?

11
00:00:16,220 --> 00:00:16,720
Nikolay: Disks.

12
00:00:17,460 --> 00:00:23,380
If you imagine database regular icon or how to say like picture

13
00:00:23,440 --> 00:00:28,380
how we usually visualize database on various diagrams it consists

14
00:00:28,420 --> 00:00:29,660
of disks right?

15
00:00:30,080 --> 00:00:33,580
Michael: Yeah like 3 I'm thinking of cylinder sometimes a cylinder

16
00:00:33,580 --> 00:00:36,220
yeah with like normally 3 layers?

17
00:00:36,220 --> 00:00:40,380
Nikolay: Yeah, if you 4, and obviously databases and disks, they

18
00:00:40,380 --> 00:00:42,460
are close to each other, right?

19
00:00:43,320 --> 00:00:47,620
But my first question, why do we keep calling them disks?

20
00:00:50,200 --> 00:00:52,000
Michael: Like outdated term you mean?

21
00:00:52,220 --> 00:00:53,040
Nikolay: Yeah, obviously.

22
00:00:54,960 --> 00:00:55,820
I don't know.

23
00:00:56,400 --> 00:00:58,760
Michael: What does the D in SSD stand for?

24
00:00:59,160 --> 00:01:03,360
Nikolay: Yeah, actually Sometimes we like logical level volumes,

25
00:01:05,320 --> 00:01:07,960
storage volumes, something like this.

26
00:01:07,960 --> 00:01:11,400
And in cloud context, especially EBS volumes, right?

27
00:01:11,400 --> 00:01:17,640
We talk about them like that, but In all cases, we still, it's

28
00:01:17,640 --> 00:01:23,080
acceptable to say disks, but disks are, they don't look like

29
00:01:23,080 --> 00:01:24,140
disks anymore, right?

30
00:01:24,140 --> 00:01:32,420
They are rectangular and microchips instead of rotational devices,

31
00:01:32,440 --> 00:01:32,940
right?

32
00:01:34,340 --> 00:01:35,360
Michael: Yeah, makes sense.

33
00:01:35,380 --> 00:01:37,320
Nikolay: In most cases, not in all cases.

34
00:01:37,360 --> 00:01:43,120
Rotational devices can be still seen in the world, but not often

35
00:01:43,120 --> 00:01:47,620
if we talk about OLTP databases because it's not okay to use

36
00:01:47,620 --> 00:01:50,100
rotational devices if you want good latency.

37
00:01:50,740 --> 00:01:55,220
But yeah, so disks, because databases, they require good disks,

38
00:01:55,380 --> 00:02:00,220
and they depend on it heavily in most cases, not in all.

39
00:02:00,780 --> 00:02:05,040
Sometimes it's fully cached, so we don't care if it's cached,

40
00:02:05,080 --> 00:02:05,580
right?

41
00:02:06,040 --> 00:02:07,940
Michael: Yeah, I was gonna ask you about that.

42
00:02:07,960 --> 00:02:12,100
Because I think even in the fully cached state, if we've got

43
00:02:12,100 --> 00:02:15,120
a lot of writes, for example, we might still want really good

44
00:02:15,120 --> 00:02:15,620
disks.

45
00:02:16,220 --> 00:02:20,440
There's things where we're still writing out to disk and we want

46
00:02:20,440 --> 00:02:23,260
that to be fast, not just reading from.

47
00:02:23,500 --> 00:02:25,580
Nikolay: But we are not writing to disk.

48
00:02:26,900 --> 00:02:30,720
If we move to the Postgres context, we don't write to the disk

49
00:02:30,720 --> 00:02:32,240
except to WAL, right?

50
00:02:32,240 --> 00:02:32,680
WAL.

51
00:02:32,680 --> 00:02:33,180
Yes.

52
00:02:33,580 --> 00:02:34,840
Yeah, and that's it.

53
00:02:35,060 --> 00:02:39,900
Well, yeah, I agree it can be expensive
if a lot of data is written.

54
00:02:40,860 --> 00:02:44,240
So, yeah, you're right.

55
00:02:45,040 --> 00:02:47,420
Because we need to write our tuples.

56
00:02:49,060 --> 00:02:52,120
And if it's full page write after
checkpoint, we need to write

57
00:02:52,120 --> 00:02:54,020
whole page, 8 kilobyte page.

58
00:02:55,520 --> 00:03:00,980
Yes, so and we need to get a sync
before commit is finalized.

59
00:03:01,740 --> 00:03:06,720
So definitely it goes to disk,
But data in terms of table and

60
00:03:06,720 --> 00:03:11,760
index, tables and indexes, it's
written only to memory and it's

61
00:03:11,760 --> 00:03:13,340
dropped for checkpoint normally.

62
00:03:13,500 --> 00:03:18,120
It dropped for checkpoint or to
later write it first to page

63
00:03:18,120 --> 00:03:22,300
cache and then page cache can use
pdflush or something like to

64
00:03:22,300 --> 00:03:24,220
write it further to disk.

65
00:03:24,960 --> 00:03:29,000
But yeah, in terms of fsync, write
latency is important.

66
00:03:29,760 --> 00:03:31,180
It affects commit time.

67
00:03:31,380 --> 00:03:34,700
By the way, I just had a case,
it's slightly off topic, but I

68
00:03:34,700 --> 00:03:39,520
published a tweet and LinkedIn
post about LISTEN/NOTIFY.

69
00:03:42,100 --> 00:03:46,200
I added them to the list of deprecated
stuff.

70
00:03:46,740 --> 00:03:48,060
Michael: It's not deprecated, right?

71
00:03:48,060 --> 00:03:51,680
But you're saying you recommend
not using it yeah at scale?

72
00:03:52,420 --> 00:03:53,560
Nikolay: Yeah well if...

73
00:03:53,580 --> 00:03:54,880
Michael: Or possibly at all.

74
00:03:54,940 --> 00:03:59,520
Nikolay: Yes my Postgres vision
deviates from the official vision

75
00:03:59,600 --> 00:04:04,500
in some cases for example official
documentation says don't set

76
00:04:05,380 --> 00:04:08,480
statement amount globally because
blah, blah, blah.

77
00:04:08,480 --> 00:04:09,960
And I don't agree with this.

78
00:04:09,960 --> 00:04:13,520
I, in OLTP it's a good idea to set
it globally to some value and

79
00:04:13,520 --> 00:04:16,060
override locally when needed.

80
00:04:16,620 --> 00:04:20,780
And here, LISTEN/NOTIFY, I just
see like we should just abandon

81
00:04:20,780 --> 00:04:24,960
this completely until it fully
redesigned because there is global

82
00:04:24,960 --> 00:04:29,500
lock and 1 of our customers Recall AI,
they published great post

83
00:04:29,500 --> 00:04:34,200
about this because they had outages
And it's related to the topic

84
00:04:34,200 --> 00:04:36,540
we discussed in an interesting
way.

85
00:04:37,060 --> 00:04:40,100
To reproduce it, I used a bigger
machine.

86
00:04:41,140 --> 00:04:46,640
And the issue is with NOTIFY, at
commit time, It gets a global

87
00:04:46,640 --> 00:04:48,900
log to serialize NOTIFY events.

88
00:04:49,400 --> 00:04:53,500
Global log like on database, exclusive
log, insane.

89
00:04:54,480 --> 00:04:56,920
And if commit is fast, everything
is fine.

90
00:04:57,040 --> 00:05:00,780
But if in the same transaction
you write something, commit, like

91
00:05:00,780 --> 00:05:03,080
WAL, it waits a little bit, right?

92
00:05:03,400 --> 00:05:07,400
In this case, contention starts
because of that lock.

93
00:05:08,000 --> 00:05:13,240
So if you have a lot of commits
which are writing something to

94
00:05:13,240 --> 00:05:18,740
WAL, meaning they need a sync
and they need to wait on disk.

95
00:05:19,940 --> 00:05:22,740
If disk is slow, you use NOTIFY.

96
00:05:23,680 --> 00:05:24,860
This doesn't scale.

97
00:05:25,840 --> 00:05:27,900
Performance will be terrible very
soon.

98
00:05:28,660 --> 00:05:32,080
At some concurrency level, you
will have issues and you will

99
00:05:32,080 --> 00:05:35,720
see commit spans like many milliseconds
and dozens of milliseconds

100
00:05:35,800 --> 00:05:36,920
and then up to seconds.

101
00:05:37,200 --> 00:05:39,060
And eventually system will be down.

102
00:05:40,200 --> 00:05:42,740
Anyway, this is related to slow disks.

103
00:05:42,740 --> 00:05:47,620
You're right if latency write is bad, we might have issues.

104
00:05:48,100 --> 00:05:52,880
Michael: Yeah, but you're right too, that the majority of the

105
00:05:52,880 --> 00:05:57,340
time we care about the quality of our disks, it's when our data

106
00:05:57,340 --> 00:06:01,140
isn't fully in memory and we're worrying about reading things

107
00:06:01,260 --> 00:06:06,420
either from, well, from disk, or even from the operating system.

108
00:06:06,420 --> 00:06:09,720
It's hard to tell from Postgres sometimes where it's coming from.

109
00:06:09,720 --> 00:06:10,400
But we have a

110
00:06:10,400 --> 00:06:10,580
Nikolay: bit of documentation.

111
00:06:10,580 --> 00:06:14,940
It's impossible to tell in Postgres unless you have pg_stat_kcache

112
00:06:15,060 --> 00:06:15,560
extension.

113
00:06:15,600 --> 00:06:20,540
That's why, since buffers is already in Postgres 18, again, I

114
00:06:20,540 --> 00:06:24,600
advertised to all people who develop some systems with Postgres,

115
00:06:24,780 --> 00:06:27,880
if it's possible, include extensions, pg_wait_sampling, and

116
00:06:27,880 --> 00:06:28,380
pg_stat_kcache.

117
00:06:29,600 --> 00:06:31,300
And kcache can show.

118
00:06:31,920 --> 00:06:32,420
Michael: Yeah.

119
00:06:32,500 --> 00:06:36,520
I think it's not, I think you're right, it's impossible to be

120
00:06:36,520 --> 00:06:40,940
certain without those, but for example with, through timings,

121
00:06:40,960 --> 00:06:44,580
through I/O timings, which another thing that people might want

122
00:06:44,580 --> 00:06:46,800
to consider having on, obviously a bit of overhead.

123
00:06:46,800 --> 00:06:47,320
Nikolay: track_io_timing.

124
00:06:47,320 --> 00:06:48,240
you mean?

125
00:06:48,240 --> 00:06:48,800
Michael: track_io_timing.

126
00:06:48,800 --> 00:06:53,840
gives you an indication, like if you're seeing not too

127
00:06:53,840 --> 00:06:58,580
many reads from either the disk or the operating system and the

128
00:06:58,580 --> 00:06:58,980
I/O.

129
00:06:58,980 --> 00:07:03,340
Timings are bad, you've got a clue that it's coming from disk.

130
00:07:03,340 --> 00:07:08,760
Nikolay: Yeah, indirectly we can guess that this time was spent.

131
00:07:09,920 --> 00:07:14,060
Yeah, not many, it's a good point because sometimes it's fully

132
00:07:14,060 --> 00:07:14,560
cached.

133
00:07:14,720 --> 00:07:21,180
In page cache we see reads and since there are so many of them,

134
00:07:21,500 --> 00:07:24,520
are your timing is spent reading from page cache to the buffer

135
00:07:24,520 --> 00:07:27,180
pool and disk is not involved.

136
00:07:27,700 --> 00:07:32,180
If volumes are huge, but if volumes are not huge and still significant

137
00:07:32,280 --> 00:07:35,940
time is spent, very likely it's from disk.

138
00:07:36,760 --> 00:07:37,260
Michael: Exactly.

139
00:07:37,660 --> 00:07:38,220
Nikolay: Yeah, yeah.

140
00:07:38,390 --> 00:07:38,890
Michael: Yeah.

141
00:07:39,060 --> 00:07:42,140
It's not a novel, like this is something we added to our product

142
00:07:42,140 --> 00:07:42,980
just as a tip.

143
00:07:42,980 --> 00:07:44,480
It doesn't come up that often.

144
00:07:44,540 --> 00:07:44,970
Like it's not...

145
00:07:44,970 --> 00:07:45,980
Nikolay: Enabled track_io_timing?

146
00:07:47,540 --> 00:07:49,700
Michael: We actually just use, we actually, because most people

147
00:07:49,700 --> 00:07:53,480
don't have that on, we actually just use the buffers, like shared

148
00:07:53,480 --> 00:07:57,140
red, and then the timing, the total time of the operation.

149
00:07:57,440 --> 00:07:59,180
Nikolay: It's a pity it's not on.

150
00:08:00,380 --> 00:08:03,960
In big systems, we have it on, like I never saw big problems

151
00:08:03,960 --> 00:08:10,360
on modern, at least Intel and Arm,
Graviton2 on Amazon.

152
00:08:10,960 --> 00:08:12,740
Like I just see it's working well.

153
00:08:12,740 --> 00:08:17,520
There is a utility you can check
your infrastructure and understand

154
00:08:17,580 --> 00:08:22,160
if it's worth enabling, but my
default recommendation is to enable

155
00:08:22,160 --> 00:08:22,660
it.

156
00:08:22,940 --> 00:08:27,160
Of course there might be an observer
effect, but it can be double-checked

157
00:08:27,340 --> 00:08:31,560
if you want to be serious with
this change, but I just see we

158
00:08:31,560 --> 00:08:32,380
enable it.

159
00:08:32,780 --> 00:08:35,720
Michael: Yeah, it's all to do with
the performance of the system

160
00:08:35,720 --> 00:08:36,540
clock checks.

161
00:08:36,760 --> 00:08:42,180
And I think, for example, the setup
I've seen with really bad

162
00:08:42,180 --> 00:08:45,320
performance there are like running,
like dev systems that are

163
00:08:45,320 --> 00:08:48,540
running Postgres inside Docker
and things like that, that still

164
00:08:48,540 --> 00:08:50,960
have really slow system clock lookups.

165
00:08:51,500 --> 00:08:54,840
But most people aren't doing that
with production Postgres databases.

166
00:08:54,920 --> 00:09:00,560
And I haven't seen even any of
the cloud providers have slow,

167
00:09:00,700 --> 00:09:02,560
I think it's pg_test_timing or something
like

168
00:09:02,560 --> 00:09:02,780
Nikolay: that.

169
00:09:02,780 --> 00:09:04,340
pg_test_timing, I double checked.

170
00:09:04,340 --> 00:09:06,640
Michael: So yeah, you can run it
really easily.

171
00:09:06,820 --> 00:09:07,320
And.

172
00:09:07,540 --> 00:09:09,300
Nikolay: But what if it's managed
Postgres?

173
00:09:10,580 --> 00:09:12,040
You cannot run it there.

174
00:09:12,540 --> 00:09:17,160
In this case, you need to understand
what type of instance is

175
00:09:17,160 --> 00:09:19,180
behind that managed Postgres instance.

176
00:09:19,900 --> 00:09:21,780
Take the same instance in the cloud.

177
00:09:21,780 --> 00:09:25,840
For example, if it's RDS, from
RDS instance name, you can easily

178
00:09:25,840 --> 00:09:28,240
understand what EC2 instance is
this.

179
00:09:28,660 --> 00:09:29,160
Right?

180
00:09:29,160 --> 00:09:29,480
Yeah.

181
00:09:29,480 --> 00:09:31,400
You can install it, it will be...

182
00:09:31,400 --> 00:09:33,780
Well, operating system matters
also, right?

183
00:09:34,780 --> 00:09:37,700
Michael: There are some, yeah,
there are some tricks you can

184
00:09:37,700 --> 00:09:43,280
do, like do things that would call
the system clock a lot, like

185
00:09:43,280 --> 00:09:47,860
nested loop type things, or count
aggregations, things like that,

186
00:09:47,860 --> 00:09:49,220
like trying to get lots and lots
of loops.

187
00:09:49,220 --> 00:09:51,600
Nikolay: Ah, you're talking about
testing at higher level, at

188
00:09:51,600 --> 00:09:52,360
Postgres level.

189
00:09:52,360 --> 00:09:52,860
Yeah.

190
00:09:53,220 --> 00:09:54,440
Oh, that's a good idea.

191
00:09:54,800 --> 00:09:56,340
Yeah, a lot of nest loops.

192
00:09:56,960 --> 00:10:02,080
And you test with this, without
this, completely like running

193
00:10:02,080 --> 00:10:06,160
like 100 times, taking average,
for example, and comparing averages.

194
00:10:06,220 --> 00:10:08,860
And as you can guess, yeah, it's
a good test, by the way.

195
00:10:08,860 --> 00:10:11,820
Michael: I think the first time
I saw that was Lukas Fittl.

196
00:10:11,880 --> 00:10:14,440
I think he must have done a 5 minutes
of Postgres episode on

197
00:10:14,440 --> 00:10:15,060
this kind of thing.

198
00:10:15,060 --> 00:10:16,220
So I'll link that up.

199
00:10:16,220 --> 00:10:18,800
Nikolay: Yeah, I'm glad we touched
this because again, like our

200
00:10:18,800 --> 00:10:20,700
default recommendation is to have
it enabled.

201
00:10:20,700 --> 00:10:24,240
It's super helpful in pg_stat_statements
analysis and explain

202
00:10:24,240 --> 00:10:28,040
analysis plans and yeah, track_io_timing, if possible, should

203
00:10:28,040 --> 00:10:28,760
be enabled.

204
00:10:29,380 --> 00:10:32,380
And this is related to disks directly,
of course.

205
00:10:32,380 --> 00:10:32,880
Yeah.

206
00:10:33,740 --> 00:10:36,820
Although, strictly speaking, it's
not timing of disks.

207
00:10:36,820 --> 00:10:39,520
It's timing of reading from page
cache to buffer pool.

208
00:10:39,520 --> 00:10:42,740
So it might include pure memory
timing as well.

209
00:10:42,740 --> 00:10:43,360
That's why you-

210
00:10:43,360 --> 00:10:44,120
Michael: But it does.

211
00:10:44,340 --> 00:10:45,040
Nikolay: Yeah, yeah.

212
00:10:45,040 --> 00:10:49,020
That's why your comment about large
or not large volumes, it's

213
00:10:49,200 --> 00:10:49,700
important.

214
00:10:50,660 --> 00:10:54,220
But it's honestly like if you even
like, if you are a Backend

215
00:10:54,220 --> 00:10:57,240
engineer, for example, listening
to this episode, I can easily

216
00:10:57,240 --> 00:11:01,700
imagine in 1 month you will forget
about this nuance and will

217
00:11:01,700 --> 00:11:04,520
think about track_io_timing like
only about disks.

218
00:11:05,220 --> 00:11:05,720
Right?

219
00:11:05,760 --> 00:11:09,560
And it's okay because like, it's
really like super narrow topic

220
00:11:09,860 --> 00:11:11,500
to remember, to memorize.

221
00:11:12,540 --> 00:11:13,040
Yeah.

222
00:11:13,660 --> 00:11:16,840
Michael: Well, and I guess this
is moving the topic on a tiny

223
00:11:16,840 --> 00:11:22,180
bit but if you're on a managed
Postgres setup which a lot of

224
00:11:22,940 --> 00:11:27,560
Back-end engineers are working
with Postgres are, you don't have

225
00:11:27,560 --> 00:11:28,780
control over the disks.

226
00:11:28,780 --> 00:11:33,720
You're probably not going to migrate
provider just for quality

227
00:11:33,720 --> 00:11:34,340
of disks.

228
00:11:34,600 --> 00:11:38,400
Maybe you would, but it would have
to be really bad and you'd

229
00:11:38,400 --> 00:11:41,280
have to be in a setup that really
was hammering them.

230
00:11:41,280 --> 00:11:46,280
Maybe super write-heavy workload
or huge data volumes that you

231
00:11:46,280 --> 00:11:50,140
can't afford to have enough memory
for, you know, those kinds

232
00:11:50,140 --> 00:11:53,820
of edge cases where you're really
hammering things.

233
00:11:54,400 --> 00:11:58,540
Nikolay: Well, there are 2 big
areas where things can be bad.

234
00:11:58,700 --> 00:12:00,780
Bad means saturation, right?

235
00:12:01,020 --> 00:12:01,240
Yeah.

236
00:12:01,240 --> 00:12:05,080
We can saturate disk space, so
to speak, out of disk space, and

237
00:12:05,080 --> 00:12:06,640
we can saturate disk I/O.

238
00:12:07,800 --> 00:12:09,500
Both happen quite often.

239
00:12:11,320 --> 00:12:14,860
And managed Postgres providers
are not all equal, and clouds

240
00:12:14,860 --> 00:12:16,140
are not all equal.

241
00:12:18,820 --> 00:12:22,540
They manage disk capacities quite
differently.

242
00:12:22,600 --> 00:12:29,540
For example, at Google, at GCP,
I know regular PD SSD, quite

243
00:12:29,540 --> 00:12:30,260
old stuff.

244
00:12:30,760 --> 00:12:37,280
They have maximum 1, 200 mbps separately
for reads, separately

245
00:12:37,280 --> 00:12:40,060
for write, speaking of throughput.

246
00:12:41,400 --> 00:12:48,180
And they have 100,000 or 120,000
IOPS maximum.

247
00:12:49,740 --> 00:12:54,740
And I know from the past discussions
with Google engineers that

248
00:12:54,740 --> 00:12:59,160
actually real capacity is bigger,
but it was not sustainable.

249
00:12:59,340 --> 00:13:04,440
So it was not guaranteed all the
time and they could raise the

250
00:13:04,440 --> 00:13:08,300
bar but it could be it would be
not guaranteed so they needed

251
00:13:08,440 --> 00:13:12,460
they decided to choose the guaranteed
bar for us.

252
00:13:12,980 --> 00:13:13,680
Michael: Makes sense.

253
00:13:13,860 --> 00:13:18,500
Nikolay: Yeah but like basically
we're not using at full possible.

254
00:13:21,380 --> 00:13:22,900
We could use more, right?

255
00:13:22,900 --> 00:13:24,060
But we can not.

256
00:13:24,380 --> 00:13:25,620
So they throttle it.

257
00:13:27,240 --> 00:13:28,240
Michael: Okay, interesting.

258
00:13:28,580 --> 00:13:29,080
Artificially,

259
00:13:29,760 --> 00:13:33,640
Nikolay: to have guaranteed capacity
for this disk I/O.

260
00:13:35,220 --> 00:13:35,720
Michael: Interesting.

261
00:13:36,660 --> 00:13:42,380
I guess the subtlety that I was
missing was not when you're at

262
00:13:42,380 --> 00:13:43,160
the maximum.

263
00:13:43,260 --> 00:13:46,980
So in between tiers, imagine you're
in a much smaller setup.

264
00:13:47,480 --> 00:13:51,500
I see a lot of people just in upgrading
to the next level up

265
00:13:51,500 --> 00:13:54,100
within that cloud provider to get
more IOPS.

266
00:13:54,280 --> 00:13:58,580
You know, if you're on Aurora,
just scaling up a little bit instead

267
00:13:58,580 --> 00:14:01,580
of switching from an Aurora to
Google Cloud.

268
00:14:02,120 --> 00:14:03,220
But you're right.

269
00:14:03,900 --> 00:14:06,560
When you're at the last, or second
to last level is when people

270
00:14:06,560 --> 00:14:07,600
start to worry, isn't it?

271
00:14:07,600 --> 00:14:11,960
When you're at the last level,
you can't just scale up on that

272
00:14:11,960 --> 00:14:13,000
cloud provider anymore.

273
00:14:13,000 --> 00:14:14,180
So yeah, really good point.

274
00:14:14,180 --> 00:14:19,020
Nikolay: And also, at Google, for
example, let's say we're, like

275
00:14:19,020 --> 00:14:22,320
I know this, these rules, they're
artificial.

276
00:14:22,800 --> 00:14:27,900
So this throttling, what I just
told you, it also can be throttled

277
00:14:27,900 --> 00:14:31,040
additionally if you have not many
CPUs, VCPUs.

278
00:14:32,060 --> 00:14:35,860
So the maximum possible throttling
is achieved only if you have

279
00:14:35,860 --> 00:14:38,740
32, as I remember, VCPUs or more.

280
00:14:39,060 --> 00:14:42,180
If it's less, also it can depend
on family, I think.

281
00:14:42,280 --> 00:14:43,160
Instance family.

282
00:14:43,780 --> 00:14:43,960
Interesting.

283
00:14:43,960 --> 00:14:45,100
So complex rules.

284
00:14:45,720 --> 00:14:53,000
On Amazon, AWS, EBS volumes, okay,
there is GP2, GP3, IO1.

285
00:14:54,320 --> 00:14:57,220
You choose between them, also there
is provisioned IOPS.

286
00:14:58,380 --> 00:14:59,700
Really complex, right?

287
00:15:00,180 --> 00:15:02,940
Michael: And you haven't even mentioned
burst IOPS yet.

288
00:15:03,260 --> 00:15:03,820
Nikolay: Yeah, yeah.

289
00:15:03,820 --> 00:15:10,000
So hitting IOPS limits is really
easy, actually.

290
00:15:11,100 --> 00:15:11,660
If you

291
00:15:11,660 --> 00:15:12,780
Michael: insist on smaller.

292
00:15:13,580 --> 00:15:17,040
Yeah, well, the times I see people
hitting it is like they're

293
00:15:17,040 --> 00:15:19,540
doing a massive migration.

294
00:15:19,820 --> 00:15:20,920
Nikolay: No, no, that's not it.

295
00:15:20,920 --> 00:15:21,580
Michael: Big import.

296
00:15:21,580 --> 00:15:24,380
Okay, when do you just like, just
growing?

297
00:15:24,380 --> 00:15:30,980
Nikolay: Yeah, just project just
grows and then latency, database

298
00:15:30,980 --> 00:15:32,520
latency becomes worse.

299
00:15:32,540 --> 00:15:33,040
Why?

300
00:15:33,340 --> 00:15:38,860
We check and we see, well, if you
have experienced capabilities

301
00:15:39,000 --> 00:15:43,080
like to look at graphs, you can easily identify some plateau.

302
00:15:43,500 --> 00:15:48,520
It's not full, like not ideal plateau, but usually some spikes,

303
00:15:48,520 --> 00:15:51,780
small, but you feel, oh, this is, we are hitting the ceiling

304
00:15:51,780 --> 00:15:52,280
here.

305
00:15:53,560 --> 00:15:54,620
Checking disk I/O.

306
00:15:57,340 --> 00:16:01,780
It's not cliff, no it's a wall instead of cliff.

307
00:16:01,780 --> 00:16:06,720
Cliff it's when, this is important distinction, cliff is when

308
00:16:07,080 --> 00:16:11,140
everything was okay, okay, okay, and then suddenly slightly more

309
00:16:11,140 --> 00:16:15,640
load or something and you completely down or down like drastically

310
00:16:15,820 --> 00:16:16,740
50 plus%.

311
00:16:17,040 --> 00:16:17,820
Michael: Okay, yeah.

312
00:16:18,340 --> 00:16:21,260
Nikolay: Here we have a wall and everything is okay, okay, and

313
00:16:21,260 --> 00:16:22,980
then slightly not okay.

314
00:16:23,420 --> 00:16:25,060
Okay, okay, slightly not okay.

315
00:16:25,680 --> 00:16:30,940
And then more is coming and we start scheduling processing, right,

316
00:16:30,940 --> 00:16:32,460
accumulating active processes.

317
00:16:34,540 --> 00:16:39,620
So in performance cliff, if you raise load slowly, there is acute

318
00:16:39,780 --> 00:16:44,880
drop in capabilities to process workload.

319
00:16:45,480 --> 00:16:49,340
In the case of hitting the ceiling in terms of situation of disk

320
00:16:49,340 --> 00:16:51,500
I/O or CPU, it's different.

321
00:16:52,340 --> 00:16:58,280
You grow your load slowly and then you see you grow further and

322
00:16:58,480 --> 00:17:00,300
things become worse, worse, worse.

323
00:17:00,300 --> 00:17:01,420
It's not like acute.

324
00:17:02,120 --> 00:17:06,860
It's slightly more, more, more, and things become very bad only

325
00:17:06,860 --> 00:17:09,360
if you grow a lot further, right?

326
00:17:09,400 --> 00:17:10,960
So it's not acute drop.

327
00:17:10,960 --> 00:17:12,400
It's like hitting the wall.

328
00:17:12,400 --> 00:17:13,940
It feels like hitting the wall.

329
00:17:14,060 --> 00:17:21,220
You know, like if you imagine many lines in store, for example,

330
00:17:21,220 --> 00:17:28,180
we have several cashiers, 8 for example, and then normally lines

331
00:17:28,180 --> 00:17:32,500
should be 1 or 2, 0 or 1 people only.

332
00:17:32,520 --> 00:17:35,040
This is ideal throughput, everything good.

333
00:17:35,540 --> 00:17:37,200
We haven't saturated them.

334
00:17:37,300 --> 00:17:39,620
Once we saturated, we see lines are accumulating.

335
00:17:40,680 --> 00:17:46,260
And latency, meaning how much we spend to process each customer,

336
00:17:46,840 --> 00:17:50,580
they start to grow, but they don't grow acutely, boom, no.

337
00:17:50,860 --> 00:17:54,560
Here we talk about, like, performance cliff is that, for example,

338
00:17:55,080 --> 00:17:59,160
if we talk about cash only, no cards involved, and suddenly,

339
00:17:59,160 --> 00:18:05,820
like, we had remainings of cash for change in all lines, right?

340
00:18:05,860 --> 00:18:11,240
And cashier suddenly, they can say, okay, you do have change,

341
00:18:11,240 --> 00:18:14,160
I have change, okay, we're processing, and then suddenly we're

342
00:18:14,160 --> 00:18:16,860
out of cash to give change.

343
00:18:17,380 --> 00:18:19,700
This is acute performance cliff.

344
00:18:19,900 --> 00:18:21,920
They say, okay, we cannot work anymore.

345
00:18:22,200 --> 00:18:22,700
Boom.

346
00:18:23,200 --> 00:18:23,480
Right.

347
00:18:23,480 --> 00:18:26,920
We need to wait until someone goes somewhere like this, like

348
00:18:26,920 --> 00:18:28,640
we need 15 minutes of wait.

349
00:18:29,060 --> 00:18:33,400
This is like important distinction of performance cliff and hitting

350
00:18:33,400 --> 00:18:34,820
the wall or ceiling.

351
00:18:34,820 --> 00:18:38,160
Michael: Okay, I haven't heard 
that strict a definition before.

352
00:18:38,160 --> 00:18:40,940
Like it sounds to me like you're 
describing the difference between

353
00:18:41,260 --> 00:18:42,260
blackout and a brownout.

354
00:18:42,260 --> 00:18:43,480
Have you heard of a brownout?

355
00:18:43,660 --> 00:18:48,680
Like so blackout is kind of like 
your database can't accept

356
00:18:49,060 --> 00:18:52,060
writes anymore or even SELECTs
like no reads like everything

357
00:18:52,120 --> 00:18:58,780
is down. Brownout would be like 
it's still working but people

358
00:18:58,780 --> 00:19:02,080
like the they're spinning loaders 
and it may be loads after 30

359
00:19:02,080 --> 00:19:04,940
seconds or maybe some people are 
hitting timeout some people

360
00:19:04,940 --> 00:19:09,320
aren't and there's like the like 
the queuing issue in the supermarket

361
00:19:09,320 --> 00:19:12,900
you talked about. Performance is 
severely degraded but it's not

362
00:19:12,900 --> 00:19:16,880
completely offline. Still working 
at least for some people. So

363
00:19:16,880 --> 00:19:19,260
it feels like that's the kind of 
distinction you're talking about.

364
00:19:19,280 --> 00:19:22,000
Nikolay: Yeah, and brown can become 
dark if you keep loading

365
00:19:22,040 --> 00:19:22,700
a lot.

366
00:19:23,740 --> 00:19:29,340
So if situation happened at some 
workload level, but you gave

367
00:19:29,340 --> 00:19:32,800
it 10x, of course it will be blackout, 
but because of context

368
00:19:32,800 --> 00:19:34,740
switching, and then it's different.

369
00:19:35,380 --> 00:19:38,420
But for performance cliff, it happens 
very quickly.

370
00:19:38,460 --> 00:19:40,920
It's very much more acute situation.

371
00:19:41,040 --> 00:19:43,840
Michael: I think I'm also biased 
by the cases that I've seen

372
00:19:43,840 --> 00:19:48,980
which are more acute because they 
are bulk loads or backfills

373
00:19:49,640 --> 00:19:53,320
where they are running at a much 
much higher rate than they would

374
00:19:53,320 --> 00:19:56,040
normally be, they're consuming 
IOPS at a much higher rate than

375
00:19:56,040 --> 00:19:58,940
they normally would so they hit 
it really fast and it's like

376
00:19:58,940 --> 00:20:02,360
running at the wall extremely fast 
But I guess if you approach

377
00:20:02,360 --> 00:20:05,040
the wall slowly, it's not going 
to hurt quite as much.

378
00:20:05,420 --> 00:20:05,920
Yeah.

379
00:20:06,340 --> 00:20:07,620
Okay, I think I understand.

380
00:20:08,000 --> 00:20:09,180
Nikolay: Yeah, back to disks.

381
00:20:09,660 --> 00:20:15,640
Definitely, we should check disk 
IO usage and situation risks.

382
00:20:16,640 --> 00:20:16,940
So you

383
00:20:16,940 --> 00:20:19,380
Michael: mean like monitor, monitor 
for it alerts when we're

384
00:20:19,380 --> 00:20:20,440
close to our limits?

385
00:20:20,440 --> 00:20:20,940
Yeah.

386
00:20:21,580 --> 00:20:21,980
Nikolay: Yeah.

387
00:20:21,980 --> 00:20:25,580
And also it might be interesting, 
for example, I remember, I

388
00:20:25,580 --> 00:20:30,040
don't know right now, but many 
years ago in RDS, I remember we

389
00:20:30,040 --> 00:20:35,800
like ask, okay, a small system, 
maybe we need 10,000 IOPS, But

390
00:20:35,800 --> 00:20:39,180
we see situation at 2,500 somehow.

391
00:20:40,080 --> 00:20:41,880
Oh, there is RAID, actually.

392
00:20:41,880 --> 00:20:44,000
We have 4 disks, and that's why.

393
00:20:44,070 --> 00:20:45,140
Okay, okay.

394
00:20:45,210 --> 00:20:47,860
So there are interesting nuances 
there.

395
00:20:48,620 --> 00:20:52,560
But also, so understanding your 
limits is super important.

396
00:20:53,420 --> 00:20:58,620
And like, I think clouds could 
do a better job explaining where

397
00:20:58,620 --> 00:20:59,740
the limits are.

398
00:21:00,320 --> 00:21:06,760
Because right now you need to do 
a lot of legwork to figure out

399
00:21:07,120 --> 00:21:09,440
what is your advertised limit.

400
00:21:09,900 --> 00:21:12,900
For example, as I said, at GCP 
you need to understand how many

401
00:21:12,900 --> 00:21:13,400
vCPUs.

402
00:21:14,540 --> 00:21:18,280
Also, disk, I forgot, like 10 terabytes, I think, is when you

403
00:21:18,280 --> 00:21:19,240
achieve the...

404
00:21:19,740 --> 00:21:20,640
Or 1 terabyte.

405
00:21:21,140 --> 00:21:23,540
Memory fools me a little bit.

406
00:21:24,020 --> 00:21:27,600
So you need to take into account many factors to understand,

407
00:21:27,700 --> 00:21:31,080
oh, our theoretical limit is this.

408
00:21:31,120 --> 00:21:34,840
And then ideally you should test it to see that it can be achieved.

409
00:21:35,140 --> 00:21:38,120
Testing is also interesting because, of course, it depends on

410
00:21:38,120 --> 00:21:40,580
block size you're using.

411
00:21:40,680 --> 00:21:44,040
And also it depends on, like, you're testing through page cache

412
00:21:44,040 --> 00:21:45,720
or direct I/O, right?

413
00:21:45,720 --> 00:21:47,580
So directly writing to device.

414
00:21:48,540 --> 00:21:53,940
And then you go to the graphs and monitoring and see some disk

415
00:21:53,940 --> 00:21:58,720
I/O in terms of IOPS and throughput, separately reads and writes,

416
00:21:59,440 --> 00:22:03,760
and then you think, okay, let's draw a line here.

417
00:22:03,760 --> 00:22:04,900
This is our limit.

418
00:22:06,260 --> 00:22:09,220
So what I'm saying, they should draw the line.

419
00:22:09,620 --> 00:22:11,040
Clouds should draw the line.

420
00:22:11,040 --> 00:22:16,300
They know all these damned rules, right, which are really complex.

421
00:22:17,360 --> 00:22:18,540
So this should be automated.

422
00:22:18,940 --> 00:22:20,200
This line should be automated.

423
00:22:20,280 --> 00:22:23,600
Okay, with this, this, and this, and this, we give you this.

424
00:22:23,600 --> 00:22:27,640
This is your line in terms of capabilities of your disk.

425
00:22:27,980 --> 00:22:29,940
And here are you, okay, at 50%.

426
00:22:30,160 --> 00:22:31,160
Okay, I know.

427
00:22:31,560 --> 00:22:36,720
Now it's like a whole day of work for someone to understand all

428
00:22:36,720 --> 00:22:40,660
the details, double-check them and then correct mistakes.

429
00:22:41,280 --> 00:22:45,720
Even if you know all the nuances, still you return to this topic

430
00:22:45,720 --> 00:22:47,420
and you, oh, I forgot this.

431
00:22:48,080 --> 00:22:48,580
Redo.

432
00:22:49,700 --> 00:22:50,140
So, yeah.

433
00:22:50,140 --> 00:22:50,640
Yeah.

434
00:22:50,820 --> 00:22:54,220
Michael: When you mentioned the terabytes thing, is that I was

435
00:22:54,220 --> 00:22:59,860
working with somebody a while back who, they weren't using the

436
00:22:59,860 --> 00:23:02,640
disk space they already had at the, like, Let's say they had

437
00:23:02,640 --> 00:23:05,640
a 1 terabyte disk, they only had a couple hundred gigabytes.

438
00:23:06,220 --> 00:23:11,420
But they upgrade, they expand their disk to a few terabytes so

439
00:23:11,420 --> 00:23:13,360
that they would get more provisioned IOPS.

440
00:23:13,660 --> 00:23:16,720
Because that was the way of, So is that what you're talking about?

441
00:23:16,720 --> 00:23:17,900
You need a certain size?

442
00:23:17,900 --> 00:23:18,080
Yeah.

443
00:23:18,080 --> 00:23:20,640
Nikolay: So the rule for throttling is so multi-factor.

444
00:23:21,040 --> 00:23:23,460
You need to read a lot of docs.

445
00:23:25,240 --> 00:23:32,780
And like with GCP, AWS, I have pages which I read many, many

446
00:23:32,780 --> 00:23:36,660
times per year, carefully trying to remember, oh, this rule I

447
00:23:36,660 --> 00:23:37,540
forgot again.

448
00:23:37,920 --> 00:23:39,740
Why isn't this automated?

449
00:23:40,120 --> 00:23:44,240
Someone can say, OK, these limits depend on block sizes.

450
00:23:44,440 --> 00:23:48,700
OK, But if it's RDS, block size is already chosen.

451
00:23:49,280 --> 00:23:51,180
Postgres uses 8 kilobytes.

452
00:23:51,700 --> 00:23:54,300
If it's ext4, it's 4 kilobytes
there.

453
00:23:54,640 --> 00:23:56,020
Everything is already defined.

454
00:23:56,180 --> 00:24:01,200
So we can talk about limits for
throughput quite well, right?

455
00:24:02,140 --> 00:24:05,920
So yeah, this is, I think, lack
of automation here.

456
00:24:06,500 --> 00:24:11,380
Michael: Also, if you mentioned
the number of vcpus like I guess

457
00:24:11,380 --> 00:24:13,740
that is that they have all the
setting why they have they

458
00:24:13,740 --> 00:24:16,400
Nikolay: have all the knowledge
and yeah they define these rules

459
00:24:16,940 --> 00:24:17,440
Michael: yeah

460
00:24:17,700 --> 00:24:21,900
Nikolay: so give me this like usage
level and understanding how

461
00:24:21,900 --> 00:24:23,980
far from saturation I am.

462
00:24:24,180 --> 00:24:25,400
Because it's so important.

463
00:24:26,740 --> 00:24:30,580
No, in reality, we wait until that
plateau I mentioned, and then

464
00:24:30,580 --> 00:24:34,860
only then we go and do something
about it and raise the bar.

465
00:24:35,380 --> 00:24:39,300
There should be alerts even. You,
like... your database is spending

466
00:24:39,300 --> 00:24:43,680
at 80 plus percent of your capacity
on this guy you are prepared

467
00:24:43,680 --> 00:24:46,720
to upgrade you know add more

468
00:24:46,800 --> 00:24:49,600
Michael: yeah well I was gonna
say sometimes there are perverse

469
00:24:49,600 --> 00:24:52,480
incentives here where they're not
incentivized to help you improve

470
00:24:52,480 --> 00:24:54,180
your performance so that you upgrade.

471
00:24:54,180 --> 00:24:57,940
But in this case, it should be
the incentives should be aligned.

472
00:25:01,560 --> 00:25:01,560
Nikolay: Yeah, at the same time,
these complaints we are currently

473
00:25:13,080 --> 00:25:18,220
expressing, they all are like,
reminding me complaints of a guy

474
00:25:18,220 --> 00:25:21,600
who is sitting on airplane and
saying that there is no leg room

475
00:25:21,600 --> 00:25:22,120
and so on.

476
00:25:22,120 --> 00:25:30,260
You're sitting in the air and flying
30,000 feet above ground.

477
00:25:30,480 --> 00:25:32,040
And it's magic, right?

478
00:25:32,120 --> 00:25:37,400
So these EBS volumes, PD, SSD,
like other newer disks on GCP

479
00:25:37,600 --> 00:25:44,020
or NVMe's they are great like I
mean snapshots elasticity of

480
00:25:44,020 --> 00:25:48,640
everything it's great right we
just yeah we just want even more

481
00:25:49,240 --> 00:25:52,360
Michael: you were It's good that
you're being positive about

482
00:25:52,360 --> 00:25:52,580
them.

483
00:25:52,580 --> 00:25:57,320
But I feel like I hear quite a
lot of people saying that 1 of

484
00:25:57,320 --> 00:26:02,480
the cases still for self hosting
is better this you can.

485
00:26:04,200 --> 00:26:08,680
So actually, I think a lot of a
lot of the time with the cloud

486
00:26:08,680 --> 00:26:13,100
you're paying for hardware that
might be a bit on the older side

487
00:26:13,100 --> 00:26:14,780
and you have no control over that.

488
00:26:14,780 --> 00:26:18,260
So it's, yeah, I'm interested in
your take on that as somebody

489
00:26:18,260 --> 00:26:23,740
who's historically been pro-self-managing
or some hybrid version.

490
00:26:25,680 --> 00:26:29,440
Nikolay: So I love clones and snapshots,
that's why... EBS volumes

491
00:26:29,440 --> 00:26:33,560
and what RDS has, and even if it's
a lazy load involved, and

492
00:26:33,560 --> 00:26:37,540
when we restore from snapshot,
it's actually getting data from

493
00:26:37,540 --> 00:26:38,040
S3.

494
00:26:38,440 --> 00:26:42,280
It still feels like magic and great,
and like, we're very good

495
00:26:42,280 --> 00:26:44,740
for reproducing incidents and so
on.

496
00:26:45,160 --> 00:26:47,900
And snapshots are cheap because
they are stored in S3.

497
00:26:48,740 --> 00:26:52,400
At GCP it's the same, although
there is lazy load there as well,

498
00:26:52,660 --> 00:26:55,120
although their documentation still
doesn't admit it.

499
00:26:55,120 --> 00:26:59,740
But just looking at the price,
we understand it's the snapshots

500
00:27:00,060 --> 00:27:06,440
of Google Cloud disks stored in
GCS, so S3 analog.

501
00:27:07,360 --> 00:27:08,260
It's great.

502
00:27:08,840 --> 00:27:17,420
But also, if you think about a
cluster of 3 nodes or 456, up

503
00:27:17,420 --> 00:27:18,760
to 10 nodes and more.

504
00:27:18,760 --> 00:27:20,040
Some people have more.

505
00:27:21,300 --> 00:27:26,580
Database is basically copied to
all replicas and on replicas

506
00:27:26,580 --> 00:27:30,820
it's stored on disk and disk becomes
more and more expensive over

507
00:27:30,820 --> 00:27:31,320
time.

508
00:27:32,320 --> 00:27:34,100
So it can be significant.

509
00:27:35,280 --> 00:27:37,380
It can be even more than compute
sometimes.

510
00:27:38,680 --> 00:27:39,520
That's the point.

511
00:27:39,520 --> 00:27:44,700
Like if we have a large database,
but working set is not that

512
00:27:44,700 --> 00:27:50,640
large, we can have much smaller
memory, that's much smaller,

513
00:27:50,640 --> 00:27:52,500
like not big compute instance.

514
00:27:53,100 --> 00:27:57,180
We had these cases, for example,
a lot of time series data.

515
00:27:57,440 --> 00:28:00,360
And we have much bigger disk than
you could expect.

516
00:28:00,860 --> 00:28:03,840
And then all replicas need to have
the same disk.

517
00:28:04,340 --> 00:28:07,400
And this disk, if it's EBS volume,
it becomes expensive.

518
00:28:07,640 --> 00:28:11,440
Very expensive and contributes
to costs so much.

519
00:28:11,840 --> 00:28:16,320
So then you think, why not to use
local disks?

520
00:28:16,320 --> 00:28:18,740
Well, we used local disks for benchmarks.

521
00:28:19,080 --> 00:28:23,660
It was an i3 instance like years
ago, 7 years ago maybe started

522
00:28:23,960 --> 00:28:27,760
liking them because it's always
included to price, right?

523
00:28:29,280 --> 00:28:30,400
Of EC2 instance.

524
00:28:30,940 --> 00:28:31,960
And it's super fast.

525
00:28:31,960 --> 00:28:36,160
It's like basically 1 order of
magnitude faster in terms of IOPS.

526
00:28:36,340 --> 00:28:38,760
Can give you a million IOPS these
days already.

527
00:28:40,520 --> 00:28:44,080
And throughput, 3 gigabyte per
second.

528
00:28:45,320 --> 00:28:49,180
Michael: Well, and the resiliency,
like if you've already got

529
00:28:49,180 --> 00:28:53,400
replicas provisioned for failovers,
you don't need the resiliency

530
00:28:53,680 --> 00:28:54,780
that the cloud...

531
00:28:55,120 --> 00:28:56,540
Nikolay: The point is they are
ephemeral.

532
00:28:56,640 --> 00:28:59,880
So if restart happens, you might
lose this data.

533
00:29:00,400 --> 00:29:02,500
But if restart happens, we have
replicas.

534
00:29:03,160 --> 00:29:04,440
Michael: Yes, that's what I mean.

535
00:29:05,860 --> 00:29:08,360
So that doesn't actually matter.

536
00:29:08,500 --> 00:29:11,320
In fact, this reminds me a lot
of the PlanetScale stuff that's

537
00:29:11,320 --> 00:29:11,820
been...

538
00:29:12,180 --> 00:29:15,140
The PlanetScale Postgres, I think
they call it Metal.

539
00:29:15,420 --> 00:29:19,000
They've got 2 products, but the
metal 1 has the local disks and

540
00:29:19,000 --> 00:29:20,280
this is a lot of the

541
00:29:20,660 --> 00:29:24,280
Nikolay: yeah but you can have
local ephemeral VMs only

542
00:29:24,280 --> 00:29:26,540
on virtual machines of course smaller
size

543
00:29:27,100 --> 00:29:31,840
Michael: metal yeah sorry all I
meant was they're doing a lot

544
00:29:31,840 --> 00:29:34,840
of their publicity, a lot of their
blog posts and things are

545
00:29:34,840 --> 00:29:36,140
relevant to this discussion.

546
00:29:36,420 --> 00:29:39,600
You don't have to use their services
and also you could do it

547
00:29:39,600 --> 00:29:40,820
on a much smaller scale.

548
00:29:41,120 --> 00:29:41,440
Nikolay: Yeah.

549
00:29:41,440 --> 00:29:46,480
And it's so big cost saving and it brings so much more disk I/O

550
00:29:46,480 --> 00:29:46,980
capacity.

551
00:29:47,440 --> 00:29:47,940
Amazing.

552
00:29:48,520 --> 00:29:49,300
But there is a-

553
00:29:49,300 --> 00:29:50,940
Michael: And latency reduction, right?

554
00:29:50,940 --> 00:29:53,500
Like because the systems are just closer together.

555
00:29:54,140 --> 00:29:54,800
Nikolay: Yeah, yeah, yeah.

556
00:29:54,800 --> 00:29:59,540
So it's much like, it can handle workloads much better in terms

557
00:29:59,540 --> 00:30:00,560
of OLTP workloads.

558
00:30:01,020 --> 00:30:06,140
There are 2 caveats, if matter, property and also limits in terms

559
00:30:06,140 --> 00:30:08,800
of we didn't touch this space topic yet.

560
00:30:09,440 --> 00:30:10,660
Michael: Yeah, yeah, yeah.

561
00:30:10,840 --> 00:30:12,740
We have a whole separate episode on that.

562
00:30:12,740 --> 00:30:14,740
But yeah, we should still touch on that.

563
00:30:14,940 --> 00:30:15,400
Nikolay: Right.

564
00:30:15,400 --> 00:30:20,860
And on AWS, I like local disks much more because they are usually

565
00:30:20,860 --> 00:30:22,280
bigger and so on.

566
00:30:22,440 --> 00:30:28,340
Like they are bigger, each disk is bigger and the summarized

567
00:30:28,680 --> 00:30:31,620
aggregated disk volume is also bigger.

568
00:30:31,920 --> 00:30:36,740
On GCP, I think, first of all, somehow local disks are still,

569
00:30:36,740 --> 00:30:41,480
I think, 375 gigabytes only, looks like old, but you can stack

570
00:30:41,480 --> 00:30:48,340
a lot of them, and I think up to 72, or how many?

571
00:30:48,340 --> 00:30:48,840
Michael: Terabytes.

572
00:30:48,900 --> 00:30:50,640
Nikolay: Terabytes, yeah, quite a lot.

573
00:30:50,640 --> 00:30:54,560
But in this case you need to really like maybe go with metal,

574
00:30:55,120 --> 00:30:58,140
like the maximum, take a whole machine basically, right?

575
00:30:58,500 --> 00:31:02,280
But it's possible, but this 72 terabytes will be your hard limit,

576
00:31:02,280 --> 00:31:03,060
hard stop.

577
00:31:04,440 --> 00:31:06,420
And it's not that bad.

578
00:31:07,220 --> 00:31:07,920
Like, good.

579
00:31:07,960 --> 00:31:09,440
Michael: Most people will be fine.

580
00:31:09,440 --> 00:31:10,580
Nikolay: Yeah, yeah.

581
00:31:10,600 --> 00:31:11,400
It's, it's okay.

582
00:31:11,400 --> 00:31:12,840
I mean, to have this limit.

583
00:31:13,940 --> 00:31:14,960
But it's a hard limit.

584
00:31:14,960 --> 00:31:17,800
Michael: But you're, yeah, the hard limit is the interesting

585
00:31:17,800 --> 00:31:18,080
thing.

586
00:31:18,080 --> 00:31:20,900
So you're saying, let's say we start on small machines and they

587
00:31:20,900 --> 00:31:24,760
only have a set amount, and we suddenly realize we're at 80 or

588
00:31:24,760 --> 00:31:26,340
90% capacity.

589
00:31:26,760 --> 00:31:30,800
Nikolay: Right, but at the same time, EBS volume has limits 64

590
00:31:31,080 --> 00:31:36,240
terabytes and PDSSD on GCP has the same limit, 64 terabytes.

591
00:31:37,040 --> 00:31:42,780
And RDS and Google Cloud, CloudSQL, they also have hard stops

592
00:31:42,780 --> 00:31:44,260
at 64 terabytes.

593
00:31:44,680 --> 00:31:48,980
Aurora has 128, double of that size.

594
00:31:50,460 --> 00:31:51,440
And that's it.

595
00:31:52,540 --> 00:31:54,520
So these are hard stops.

596
00:31:54,520 --> 00:31:58,240
And I think in 2025, I think this is not a lot of data already.

597
00:31:58,680 --> 00:32:01,480
50, 100TB, we had episode about it.

598
00:32:01,480 --> 00:32:04,840
It's already like achievable for
bigger startups.

599
00:32:05,740 --> 00:32:07,900
So RDS, I don't know.

600
00:32:07,900 --> 00:32:09,440
I think we should solve it soon.

601
00:32:09,440 --> 00:32:12,620
And I think CloudSQL, Google Cloud
SQL should solve it soon.

602
00:32:12,620 --> 00:32:16,500
But to my knowledge, they haven't
solved it yet.

603
00:32:16,960 --> 00:32:20,040
So if you approach this, it's hard
stop.

604
00:32:20,660 --> 00:32:25,020
And basically, you need to go to
self-managed maybe, right?

605
00:32:25,360 --> 00:32:28,160
And there you can combine multiple
EBS volumes.

606
00:32:29,060 --> 00:32:31,560
Michael: Most that we've talked
to that do this shard at that

607
00:32:31,560 --> 00:32:32,060
point.

608
00:32:32,220 --> 00:32:33,340
Nikolay: This is a different route.

609
00:32:33,340 --> 00:32:37,600
Yeah, that's why I think plane
scale, like it's easier for them

610
00:32:37,600 --> 00:32:42,280
to choose local disks and deal
with those hard limits in size

611
00:32:42,280 --> 00:32:42,940
as well.

612
00:32:43,520 --> 00:32:50,040
Because if there is a rebalancing,
if it's 0 downtime rebalancing,

613
00:32:50,740 --> 00:32:53,980
you can just make sure no shards
will reach that limit, that's

614
00:32:53,980 --> 00:32:54,480
it.

615
00:32:54,520 --> 00:32:56,300
It's a good way to scale further.

616
00:32:56,640 --> 00:32:58,380
Michael: Yeah, they have that for
MySQL, but they don't have

617
00:32:58,380 --> 00:33:01,020
that for Postgres, so, well not
yet, they're building it.

618
00:33:01,020 --> 00:33:02,240
Nikolay: They announced it, right?

619
00:33:02,600 --> 00:33:05,280
Michael: Yeah, well, they announced
building it, I think lots

620
00:33:05,280 --> 00:33:07,740
of people are announcing building
sharding at the moment.

621
00:33:07,740 --> 00:33:10,940
Nikolay: Well I see Multigres
already has some code.

622
00:33:10,940 --> 00:33:14,640
I even commented in a couple of
places proposing some improvements.

623
00:33:15,720 --> 00:33:18,180
Michael: Yeah well I know they
all have some code, right?

624
00:33:18,180 --> 00:33:19,940
Like PgDog's got some code.

625
00:33:20,220 --> 00:33:23,420
Nikolay: It's not PgDog is already,
you can test it already.

626
00:33:23,420 --> 00:33:23,920
Yeah.

627
00:33:23,940 --> 00:33:27,780
I think Multigres also will have
some at some point.

628
00:33:28,440 --> 00:33:31,920
Michael: All I mean is that you
can shard in other ways, right,

629
00:33:31,920 --> 00:33:33,060
without these solutions.

630
00:33:33,340 --> 00:33:35,900
Like Notion talked about doing
it, Figma have done it.

631
00:33:35,900 --> 00:33:39,180
Nikolay: So-called application
side sharding, as I claim.

632
00:33:39,180 --> 00:33:43,580
Michael: Yeah, but they did it
without leaving RDS in those cases.

633
00:33:44,380 --> 00:33:45,360
So it is interesting.

634
00:33:46,000 --> 00:33:48,600
But I thought you were going to
go in a different direction here

635
00:33:48,600 --> 00:33:51,600
like I thought it was more about
the practicalities of expanding

636
00:33:51,760 --> 00:33:55,240
So let's say you're not at the
the dozens of terabytes limit.

637
00:33:55,240 --> 00:33:59,700
Whatever your provider has let's
say you're at 1 terabyte and

638
00:33:59,700 --> 00:34:01,980
you're you just want to expand
to 2 terabytes.

639
00:34:02,900 --> 00:34:05,520
That's often really easy.

640
00:34:05,900 --> 00:34:09,160
You can do it at a few clicks of
a button without any downtime

641
00:34:09,160 --> 00:34:10,260
in a lot of providers.

642
00:34:10,680 --> 00:34:14,080
Whereas if you've got local disks,
is it a bit more complicated?

643
00:34:14,680 --> 00:34:15,480
Nikolay: Yeah, you know what?

644
00:34:15,480 --> 00:34:19,540
I think these days RDS also provides
options with local and MES.

645
00:34:20,380 --> 00:34:20,860
Michael: Wow, okay.

646
00:34:20,860 --> 00:34:21,220
I heard

647
00:34:21,220 --> 00:34:21,880
Nikolay: about this.

648
00:34:21,900 --> 00:34:25,160
Yeah, the instance I'm double-checking right now, it's instance,

649
00:34:25,160 --> 00:34:26,260
for example, X2idn.

650
00:34:28,580 --> 00:34:32,560
Yeah, and it has, I think it has, yeah, it has local and NVMe,

651
00:34:32,980 --> 00:34:38,400
several terabytes, and up to, I think, not many actually, 4 terabytes.

652
00:34:38,600 --> 00:34:39,100
Interesting.

653
00:34:40,180 --> 00:34:44,680
So, there might be a hybrid approach when you have EBS volumes

654
00:34:44,680 --> 00:34:50,160
and you use local NVMe as a caching layer for both reads and

655
00:34:50,160 --> 00:34:50,660
writes.

656
00:34:51,140 --> 00:34:52,320
Michael: But then what would you do?

657
00:34:52,320 --> 00:34:55,580
Would you set up some replicas with larger disks and then fail

658
00:34:55,580 --> 00:34:56,500
over to those?

659
00:34:57,180 --> 00:35:01,280
How are you managing a migration to larger local disks?

660
00:35:02,220 --> 00:35:04,320
Nikolay: When you hit 64 terabytes?

661
00:35:04,460 --> 00:35:07,400
Michael: Well, no, when you, let's say you've started with smaller,

662
00:35:07,440 --> 00:35:09,740
like you started with local disks that are smaller.

663
00:35:09,840 --> 00:35:12,720
Nikolay: Ah, with local disks, yeah, I think you did switch over

664
00:35:12,720 --> 00:35:15,200
approach, of course, yeah, so you need a different instance with

665
00:35:15,200 --> 00:35:17,140
bigger capacity in terms of disk space.

666
00:35:17,160 --> 00:35:22,480
Of course, here again, elasticity and automation of network-attached

667
00:35:22,480 --> 00:35:25,060
disks cloud providers have, it's great.

668
00:35:25,440 --> 00:35:28,160
But let's also criticize it.

669
00:35:29,280 --> 00:35:34,440
So they have like, EBS volume has auto-scaling, but only in 1

670
00:35:34,440 --> 00:35:34,940
direction.

671
00:35:35,920 --> 00:35:41,200
For example, if we re-shard, we need to re-provision and then

672
00:35:41,200 --> 00:35:41,980
switch over.

673
00:35:42,380 --> 00:35:48,480
Or if we saw we didn't have autovacuum tuning in place, or we

674
00:35:48,480 --> 00:35:53,820
screwed up in terms of long-running transactions or abandoned

675
00:35:53,940 --> 00:35:56,820
logical slots, so we accumulated a lot of bloat and say we have

676
00:35:56,820 --> 00:35:57,720
80% of bloat.

677
00:35:57,720 --> 00:36:04,540
Okay, we re-indexed, re-packed, now we sit with a lot of free

678
00:36:04,540 --> 00:36:05,340
disk space.

679
00:36:05,800 --> 00:36:07,540
We don't need it during next year.

680
00:36:07,540 --> 00:36:09,400
Why should we pay for it, right?

681
00:36:10,080 --> 00:36:11,540
And shrinking is not automated.

682
00:36:13,680 --> 00:36:17,640
But of course, you can provision new replica with smaller disk

683
00:36:17,640 --> 00:36:18,780
and then switch over.

684
00:36:19,000 --> 00:36:25,680
And when I think about switch over, you know, I decided to

685
00:36:25,680 --> 00:36:28,940
force myself to have a mind shift and my team as well to self-driving

686
00:36:28,980 --> 00:36:29,440
Postgres.

687
00:36:29,440 --> 00:36:30,600
We talked about it.

688
00:36:30,600 --> 00:36:34,940
And I think when I think about this particular case we eliminated

689
00:36:34,940 --> 00:36:38,360
a lot of bloat we want disk to be smaller we need to switch over

690
00:36:38,360 --> 00:36:43,200
but switch over also it's maintenance window yeah because...

691
00:36:44,200 --> 00:36:45,680
Michael: What's the shift there?

692
00:36:45,680 --> 00:36:47,120
What did you used to think?

693
00:36:49,200 --> 00:36:49,900
Nikolay: Say again?

694
00:36:50,320 --> 00:36:52,500
Michael: What was the mindset change that you had?

695
00:36:53,000 --> 00:36:58,860
Nikolay: So I think operations like adding disk, removing disk

696
00:36:58,860 --> 00:37:03,840
space when not needed, getting rid of bloat and so on, automation

697
00:37:04,080 --> 00:37:05,400
must be much higher.

698
00:37:06,380 --> 00:37:13,320
So it should be like approval from
DBA or some senior Backend

699
00:37:13,320 --> 00:37:17,220
engineer or something, a CTO if
it's a small startup, just approval.

700
00:37:17,440 --> 00:37:19,340
Yeah, we need to shrink disk space.

701
00:37:19,340 --> 00:37:22,160
We don't want to pay for all those
terabytes.

702
00:37:22,540 --> 00:37:24,400
And automation should be very high.

703
00:37:24,860 --> 00:37:30,560
Repacking, and then without downtime,
we have a smaller disk.

704
00:37:30,720 --> 00:37:37,280
But to achieve this right now,
so many moving parts and for example,

705
00:37:37,280 --> 00:37:39,960
to have, you can provision node
with smaller disk.

706
00:37:39,960 --> 00:37:43,060
It can be local, can be EBS volume,
doesn't matter.

707
00:37:43,140 --> 00:37:46,600
But then you need to switchover
without downtime, you need a

708
00:37:46,740 --> 00:37:50,780
PgBouncer or PgDog Layer with
pause resume support.

709
00:37:52,820 --> 00:37:54,500
And then orchestrate it properly.

710
00:37:55,440 --> 00:37:58,040
RDS proxy doesn't, for example,
support pause-resume.

711
00:37:58,860 --> 00:38:02,140
So you can, You must have some
small downtime.

712
00:38:03,340 --> 00:38:05,860
And usually people say, oh, it's
just 30 seconds.

713
00:38:06,140 --> 00:38:08,180
Well, I disagree.

714
00:38:08,500 --> 00:38:09,520
Why should we lose?

715
00:38:09,520 --> 00:38:11,540
This is just some routine operation.

716
00:38:11,820 --> 00:38:15,040
Why should we show some errors
to customers?

717
00:38:16,100 --> 00:38:21,500
Let's raise a bar and have pure
0 downtime, everything.

718
00:38:23,760 --> 00:38:28,120
And auto scaling, it can be auto
scaling, but it can be maybe

719
00:38:28,160 --> 00:38:33,300
like, Auto scaling is about like
it makes decision itself.

720
00:38:34,440 --> 00:38:35,520
It's too much.

721
00:38:35,540 --> 00:38:37,160
Like let's step back.

722
00:38:37,900 --> 00:38:41,960
I can make decision myself, but
I want full automation, right?

723
00:38:42,660 --> 00:38:44,080
And we don't have it.

724
00:38:44,200 --> 00:38:47,860
We have it for increasing, to increase
disk space, which is good

725
00:38:47,860 --> 00:38:49,140
for EB S volumes.

726
00:38:50,280 --> 00:38:50,820
Which is good.

727
00:38:50,820 --> 00:38:53,940
We don't need to have switchover,
so it will be 0 downtime.

728
00:38:53,940 --> 00:38:55,440
You can say add 1 terabyte.

729
00:38:55,440 --> 00:38:56,820
This is what people do all the
time.

730
00:38:56,820 --> 00:39:00,840
And I think there is checkbox for
auto scaling, even in RDS can

731
00:39:00,840 --> 00:39:03,220
decide to add more disk space itself,
right?

732
00:39:04,000 --> 00:39:04,540
Which is good.

733
00:39:04,540 --> 00:39:07,260
Michael: Yeah, like if you get
within 10%, for example.

734
00:39:07,360 --> 00:39:09,220
But yeah, only up, yeah, as you
said.

735
00:39:09,220 --> 00:39:11,120
Nikolay: Yeah, at least we will
avoid downtime.

736
00:39:11,120 --> 00:39:17,220
I also saw in some places people,
like There's a trick to put

737
00:39:18,620 --> 00:39:22,260
some file, like some gigabytes,
filled with zeros.

738
00:39:23,260 --> 00:39:26,600
So if we are out of this request,
we delete this file.

739
00:39:27,740 --> 00:39:28,360
Oh no.

740
00:39:30,880 --> 00:39:32,720
Yeah, just like something sitting
there.

741
00:39:33,660 --> 00:39:35,940
We can invent some funny name for
this approach.

742
00:39:36,900 --> 00:39:39,020
Yeah, but just an emergency...

743
00:39:40,240 --> 00:39:43,940
It's like reserved connections
for max connections, 3 connections

744
00:39:43,980 --> 00:39:44,940
reserved for admin.

745
00:39:45,020 --> 00:39:50,320
So reserved disk space you can
quickly delete and buy your some

746
00:39:50,320 --> 00:39:52,400
time to increase disk space.

747
00:39:52,800 --> 00:39:56,380
Michael: Yeah, on the disk space
thing, the only thing I think

748
00:39:56,380 --> 00:40:00,460
people sometimes get caught out
by is having alerts early enough.

749
00:40:00,460 --> 00:40:03,680
Like you need, sometimes you need
quite a lot of spare disk space

750
00:40:03,680 --> 00:40:06,680
in order to save this space to
do a repack for example

751
00:40:06,760 --> 00:40:07,060
Nikolay: yeah

752
00:40:07,060 --> 00:40:10,580
Michael: you need the size of the
table you're repacking at least

753
00:40:10,960 --> 00:40:13,740
3 in order to do the operation
so it

754
00:40:13,740 --> 00:40:14,440
Nikolay: makes us

755
00:40:15,240 --> 00:40:20,440
Michael: yes so that well either
start with your smallest ones

756
00:40:20,440 --> 00:40:23,820
which is not going to make the
most difference or like try and

757
00:40:23,820 --> 00:40:25,380
set that alert quite early.

758
00:40:25,900 --> 00:40:26,400
Yeah.

759
00:40:26,540 --> 00:40:29,280
But yeah, is there anything else
you wanted to make sure we talked

760
00:40:29,280 --> 00:40:29,780
about?

761
00:40:29,880 --> 00:40:32,940
Nikolay: Yeah, well no, I think
it's a good idea to understand

762
00:40:34,160 --> 00:40:35,380
some numbers, right?

763
00:40:35,380 --> 00:40:38,280
So our very old rule was latencies
also.

764
00:40:38,440 --> 00:40:39,660
We didn't talk about latencies.

765
00:40:39,960 --> 00:40:41,680
What latency is normal?

766
00:40:42,100 --> 00:40:46,020
Very old rule was you look at monitoring,
if it's SSD, it can

767
00:40:46,020 --> 00:40:47,000
be best volume.

768
00:40:47,600 --> 00:40:51,820
If it's some, the best volume is
also NVMe, They're usually these

769
00:40:51,820 --> 00:40:55,340
days with most modern instance
families.

770
00:40:56,040 --> 00:41:00,560
And you just see very rough old
rule, 1 millisecond.

771
00:41:02,500 --> 00:41:06,280
I already have a feeling we have
a discussion about previous

772
00:41:06,280 --> 00:41:11,240
episode where I shared some old
rule and someone disagreed with

773
00:41:11,380 --> 00:41:12,300
this old rule.

774
00:41:12,560 --> 00:41:14,560
Yeah, rules might be already outdated.

775
00:41:14,720 --> 00:41:20,100
So If it's 1 millisecond, these
days maybe we should go lower,

776
00:41:20,380 --> 00:41:21,400
half of a millisecond.

777
00:41:21,820 --> 00:41:24,020
If it's local disk, it should be
even lower.

778
00:41:24,960 --> 00:41:28,520
This is our point when we think
it's okay.

779
00:41:28,860 --> 00:41:33,580
If it's more, well, Back in those
days we thought up to 5 to

780
00:41:33,580 --> 00:41:35,260
10 milliseconds is okay.

781
00:41:36,020 --> 00:41:37,920
But these days already this is
not okay.

782
00:41:37,920 --> 00:41:41,780
10 milliseconds is definitely slow
these days for SSDs and the

783
00:41:41,780 --> 00:41:44,340
NVMe specifically, like, particularly.

784
00:41:45,420 --> 00:41:47,510
This is latency which you should
start worrying.

785
00:41:47,510 --> 00:41:48,580
This is latency, which is like
you should start worrying.

786
00:41:51,280 --> 00:41:57,940
So basically, in monitoring, we
should control usage, situation

787
00:41:58,080 --> 00:42:00,160
risks, and latency as well.

788
00:42:00,160 --> 00:42:05,800
This is like regular use or 4 golden
signals, right?

789
00:42:05,800 --> 00:42:08,340
So we control these things and
also errors.

790
00:42:09,280 --> 00:42:14,340
And yeah, we check these things
and understand where we are right

791
00:42:14,340 --> 00:42:14,840
now.

792
00:42:15,580 --> 00:42:17,420
Should we start worrying already?

793
00:42:18,340 --> 00:42:21,340
Yeah, simple, it's actually simple.

794
00:42:21,900 --> 00:42:27,300
And my recommendation also to know
your theoretical limits, based

795
00:42:27,300 --> 00:42:29,320
on the docs, as I said, it's not
trivial.

796
00:42:30,440 --> 00:42:35,020
But also recommendation, if you
use some particular setups in

797
00:42:35,020 --> 00:42:41,720
cloud, always test them to understand
actual limits and if they

798
00:42:41,720 --> 00:42:45,920
don't match theoretical advertised
limits, You should understand

799
00:42:45,920 --> 00:42:46,420
why.

800
00:42:46,500 --> 00:42:47,780
And to test it's easy.

801
00:42:47,780 --> 00:42:51,800
I usually prefer Fio, simple
program.

802
00:42:52,500 --> 00:42:55,100
I like snippets GCP provides.

803
00:42:55,200 --> 00:42:59,640
They have snippets if you just
check SSD, disk, GCP, performance,

804
00:43:00,040 --> 00:43:01,700
you will see a bunch of snippets.

805
00:43:01,960 --> 00:43:06,480
The only warning, I managed to
destroy, several times I destroyed

806
00:43:06,980 --> 00:43:10,900
PGDATA because it was like,
you know, like, some of those

807
00:43:10,900 --> 00:43:16,360
snippets are Direct I/O, and if you
try to test, like, It was always

808
00:43:16,360 --> 00:43:20,040
not production, but still I made
mistakes.

809
00:43:20,500 --> 00:43:26,600
And if you try to test your disk
capabilities with Direct I/O and

810
00:43:26,600 --> 00:43:32,960
you use volume which is used for
PGDATA, forget about your PGDATA.

811
00:43:33,760 --> 00:43:37,020
And this is a good way, for example,
to have silent corruption

812
00:43:37,020 --> 00:43:40,520
as well, because Postgres even
might work for some time until

813
00:43:40,520 --> 00:43:48,520
you reach a point when it will
touch the areas you had rights

814
00:43:48,520 --> 00:43:49,020
to.

815
00:43:49,240 --> 00:43:53,260
So yeah, those are practical pieces
of advice.

816
00:43:54,960 --> 00:43:58,260
Michael: Given pgbench stress,
we've talked in the past about

817
00:43:58,260 --> 00:44:01,800
pgbench stress, Is this actually
a benefit in this case?

818
00:44:01,800 --> 00:44:04,820
Could we use it because it's kind
of what we want to do is stress

819
00:44:04,820 --> 00:44:05,580
test at this point.

820
00:44:05,580 --> 00:44:07,940
Nikolay: Right, but pgbench
tests everything, including Postgres.

821
00:44:09,840 --> 00:44:15,480
In our methodology, let's split
everything to pieces and study

822
00:44:15,480 --> 00:44:18,940
them quite well if possible separately.

823
00:44:19,740 --> 00:44:23,100
So disk I-O should be understood
separately from Postgres.

824
00:44:23,100 --> 00:44:24,340
We had it many times, by the way.

825
00:44:24,340 --> 00:44:26,340
We started, oh, let's pgbench...

826
00:44:26,940 --> 00:44:28,460
We talk about disk here.

827
00:44:28,520 --> 00:44:31,140
Let's forget about Postgres completely
for now.

828
00:44:31,160 --> 00:44:31,660
Right?

829
00:44:31,780 --> 00:44:32,900
Michael: So try and isolate.

830
00:44:32,900 --> 00:44:33,840
Nikolay: Not completely, actually.

831
00:44:33,840 --> 00:44:36,840
It will usually keep in mind that
pages are 8 kilobytes.

832
00:44:38,600 --> 00:44:39,100
Michael: Yeah.

833
00:44:39,320 --> 00:44:43,060
Well, I was thinking on managed
providers, like, it's a bit,

834
00:44:43,500 --> 00:44:46,840
like, how would you test on RDS
what the, what IOPs

835
00:44:46,840 --> 00:44:47,080
Nikolay: should be.

836
00:44:47,080 --> 00:44:48,760
That's a tricky question, right?

837
00:44:48,760 --> 00:44:49,800
That's a tricky question.

838
00:44:50,260 --> 00:44:52,840
Michael: I think pgbench would
be a good solution there.

839
00:44:52,840 --> 00:44:56,820
Nikolay: pgbench, yes, but you
can try to guess which instance,

840
00:44:56,820 --> 00:45:00,380
well, instance is easy to guess,
but which disks are there and

841
00:45:00,380 --> 00:45:01,960
like IOPS and so on, right?

842
00:45:01,960 --> 00:45:05,740
And then you can provision the
same instance, EC2 instance and

843
00:45:06,340 --> 00:45:07,540
disk you guessed.

844
00:45:08,360 --> 00:45:12,340
And, but again, as I said, 1 day
I discovered they use RAID.

845
00:45:13,000 --> 00:45:14,560
So there's a stripe there.

846
00:45:15,040 --> 00:45:18,980
And if you want to do the same,
probably you will have a different

847
00:45:19,200 --> 00:45:19,700
setup.

848
00:45:20,020 --> 00:45:21,140
That's an issue.

849
00:45:22,740 --> 00:45:27,380
Also, with those, like I know CloudSQL has it for bigger customers.

850
00:45:27,640 --> 00:45:31,280
I don't remember, Enterprise Plus or something, they also have

851
00:45:31,280 --> 00:45:32,860
caching with local NVMes.

852
00:45:33,620 --> 00:45:34,120
Michael: Yes,

853
00:45:34,140 --> 00:45:34,640
Nikolay: yes.

854
00:45:34,640 --> 00:45:40,360
Yeah, it's good, but to reproduce it, it's really tricky to test,

855
00:45:40,380 --> 00:45:40,880
right?

856
00:45:41,120 --> 00:45:48,100
So yeah, I think It's tricky how to test disks for RDS.

857
00:45:49,120 --> 00:45:55,880
But yet another reason to think about who controls your database,

858
00:45:56,540 --> 00:46:00,240
and why you cannot connect to your own database using SSH and

859
00:46:00,240 --> 00:46:02,120
see what's happening under the hood.

860
00:46:03,700 --> 00:46:05,320
Michael: Probably a good place to end it.

861
00:46:05,320 --> 00:46:07,360
Nikolay: Yeah, let's do it.

862
00:46:07,360 --> 00:46:09,640
Michael: All right, Nice one, Nikolay, thanks so much.

863
00:46:09,640 --> 00:46:09,960
Nikolay: Thank you.

864
00:46:09,960 --> 00:46:10,980
Michael: See you next week.