1
0:0:0,14 --> 0:0:3,08
Michael: Hello and welcome to Postgres.FM, a weekly show about

2
0:0:3,08 --> 0:0:3,98
all things PostgreSQL.

3
0:0:4,16 --> 0:0:6,8999996
I am Michael, founder of pgMustard, and this is Nik, founder

4
0:0:6,8999996 --> 0:0:7,74
of Postgres.AI.

5
0:0:7,74 --> 0:0:8,96
Hey Nik, how's it going?

6
0:0:10,12 --> 0:0:10,679999
Nikolay: Good, good.

7
0:0:10,679999 --> 0:0:11,559999
How are you?

8
0:0:11,98 --> 0:0:13,78
Michael: Yeah, keeping well, thank you.

9
0:0:13,86 --> 0:0:15,98
Nikolay: Yeah, so a long time, obviously.

10
0:0:16,0 --> 0:0:21,44
We skipped last week because of me doing some stuff at work and

11
0:0:21,44 --> 0:0:23,56
with family, so couldn't make it.

12
0:0:23,56 --> 0:0:26,7
Thank you for understanding and I'm glad we are back.

13
0:0:27,34 --> 0:0:29,54
Michael: Yeah, and no complaints from listeners, so thank you

14
0:0:29,54 --> 0:0:30,48
everybody for your patience.

15
0:0:30,48 --> 0:0:31,3
Nikolay: Nobody noticed.

16
0:0:31,32 --> 0:0:33,74
They are still mostly catching up, we know.

17
0:0:34,45 --> 0:0:34,95
Yeah.

18
0:0:35,66 --> 0:0:35,86
Yeah.

19
0:0:35,86 --> 0:0:39,5
Somebody shared on LinkedIn that like, oh, I discovered PostgresFM

20
0:0:39,92 --> 0:0:44,62
and while I was walking the dog, listen to all the episodes.

21
0:0:44,7 --> 0:0:47,6
It took me like almost 2 weeks.

22
0:0:47,78 --> 0:0:54,52
Like, okay, once 1, 164 episodes during 10 days it's challenging

23
0:0:54,64 --> 0:0:56,98
right it's like 16 episodes per day

24
0:0:57,66 --> 0:0:59,52
Michael: yeah that's binge watching

25
0:1:0,36 --> 0:1:6,18
Nikolay: Yeah It's you need maybe 2x or 3x speed and yeah, it's

26
0:1:6,18 --> 0:1:6,68
insane.

27
0:1:7,36 --> 0:1:10,46
So This topic we have today.

28
0:1:10,76 --> 0:1:11,92
It's not my originally.

29
0:1:12,04 --> 0:1:13,4
It's from Maxim Boguk.

30
0:1:13,4 --> 0:1:16,08
I Probably mentioned him a couple of times.

31
0:1:16,08 --> 0:1:21,48
He always has very interesting ideas and approaches to solving

32
0:1:21,48 --> 0:1:22,24
hard problems.

33
0:1:23,76 --> 0:1:30,2
And, I remember he said, he told me it was maybe half a year

34
0:1:30,2 --> 0:1:31,16
ago, maybe more.

35
0:1:31,16 --> 0:1:37,3
He said, I squeezed more than 10 terabytes per

36
0:1:37,3 --> 0:1:41,54
hour when copying data directly from 1 machine to another.

37
0:1:42,28 --> 0:1:46,62
And for me, the standard was 5 years ago, it was 1 terabyte per

38
0:1:46,62 --> 0:1:47,12
hour.

39
0:1:47,54 --> 0:1:49,46
I think I mentioned it a few times.

40
0:1:51,42 --> 0:1:55,64
We had an episode about how to copy Postgres from 1 machine to

41
0:1:55,64 --> 0:1:56,14
another.

42
0:1:56,74 --> 0:1:59,96
And 1 terabyte per hour, it's like this.

43
0:1:59,96 --> 0:2:0,32
Okay.

44
0:2:0,32 --> 0:2:4,74
Modern hardware, we know there's a throughput for disks, throughput

45
0:2:4,74 --> 0:2:11,02
for network, throughput to disks on the source and on the destination,

46
0:2:11,24 --> 0:2:16,02
both matter here, parallelization, and SSD is good with parallelization.

47
0:2:16,4 --> 0:2:20,02
So basically, golden standard for me was 1 terabyte per hour.

48
0:2:20,02 --> 0:2:25,32
And if you need to create WAL-G or pgBackRest backup, it was 1

49
0:2:25,32 --> 0:2:26,78
terabyte per hour at least.

50
0:2:28,18 --> 0:2:32,06
And then we saw like 2, 3 terabytes if you raise parallelization

51
0:2:32,26 --> 0:2:33,34
with modern hardware.

52
0:2:33,66 --> 0:2:39,24
But it was still about not local
NVMe, but about EBS volumes

53
0:2:39,24 --> 0:2:47,18
or PD SSD disks on Google Cloud,
persistent disks on Google Cloud.

54
0:2:47,98 --> 0:2:49,66
And so like traditional disks.

55
0:2:50,74 --> 0:2:52,62
By the way, those are also improved.

56
0:2:52,86 --> 0:2:56,88
EBS volumes are pretty fast these
days and the Google Cloud also

57
0:2:56,88 --> 0:3:2,78
has hyperdisks which are impressive,
but not as impressive as

58
0:3:2,78 --> 0:3:4,78
local NVMe disks, right?

59
0:3:5,46 --> 0:3:9,64
Maxim told me that, of course,
he was using local NVMes.

60
0:3:9,72 --> 0:3:13,68
It was some self-managed Postgres
setup with special machines,

61
0:3:13,68 --> 0:3:23,8
so local disks and no snapshots
like in cloud and so on, and

62
0:3:23,8 --> 0:3:26,04
backups to where?

63
0:3:26,04 --> 0:3:28,88
To different machine or to S3?

64
0:3:29,24 --> 0:3:30,24
It doesn't matter here.

65
0:3:30,24 --> 0:3:35,14
The idea was how fast we can provision
a replica, because sometimes

66
0:3:35,14 --> 0:3:38,22
we need it, for example, 1 replica
is down or we reconfigure

67
0:3:38,22 --> 0:3:39,88
something, we need to build a replica.

68
0:3:40,84 --> 0:3:45,14
And normally if we have cloud disk
snapshots, these days we use

69
0:3:45,14 --> 0:3:46,12
cloud disk snapshots.

70
0:3:46,12 --> 0:3:47,7
By the way, there is great news.

71
0:3:48,34 --> 0:3:54,78
Andrey Borodin implemented in WAL-G,
Andrey is a maintainer

72
0:3:54,78 --> 0:4:0,22
of WAL-G, implemented native
support for CloudDisk snapshots

73
0:4:0,4 --> 0:4:4,4
in WAL-G as an alternative to
backup push, backup fetch.

74
0:4:4,4 --> 0:4:8,98
So instead of full backup or delta
backups, you can rely on snapshots,

75
0:4:9,18 --> 0:4:11,44
AWS, GCP, or others provide.

76
0:4:11,58 --> 0:4:12,74
Michael: Yeah, I saw this.

77
0:4:13,32 --> 0:4:15,06
Is it a draft PR at the moment?

78
0:4:15,06 --> 0:4:16,86
Or what's the status of this?

79
0:4:16,98 --> 0:4:19,76
Nikolay: It was just created a
few days ago, early.

80
0:4:19,76 --> 0:4:21,18
It was vibe-coded mostly.

81
0:4:22,78 --> 0:4:26,74
Andrey is a great hacker, but I
think our hacking Postgres sessions

82
0:4:27,78 --> 0:4:30,46
contributed to this idea.

83
0:4:30,56 --> 0:4:33,98
Let's do more code writing with
LLMs.

84
0:4:35,02 --> 0:4:38,3
But of course, the review will
be thorough.

85
0:4:38,3 --> 0:4:39,64
I think it's already happening.

86
0:4:39,64 --> 0:4:44,06
I saw many comments and the result
will be improved and so on.

87
0:4:44,06 --> 0:4:48,42
So no hallucinations will be allowed.

88
0:4:48,64 --> 0:4:49,14
Right.

89
0:4:49,38 --> 0:4:51,06
So all covered with tests and so
on.

90
0:4:51,06 --> 0:4:56,12
Anyway, there is this idea that
instead of full backups, we can

91
0:4:56,12 --> 0:5:1,76
have snapshots, but at the same,
like I see like 2 big trends.

92
0:5:2,3 --> 0:5:6,68
1 trend is, okay, if we have snapshots
for disks, let's use them

93
0:5:7,06 --> 0:5:12,28
more and more and more and rely
on cloud capabilities, depending

94
0:5:12,28 --> 0:5:13,0
on cloud.

95
0:5:13,92 --> 0:5:16,72
Of course, such backups need testing
and so on.

96
0:5:17,14 --> 0:5:20,9
In parallel, there is an old idea,
we should not be afraid of

97
0:5:20,9 --> 0:5:22,94
using local NVMes.

98
0:5:23,5 --> 0:5:28,18
We know they lose data if the machine
is restarted.

99
0:5:29,08 --> 0:5:30,66
Ephemeral disks, right?

100
0:5:31,62 --> 0:5:32,52
But it's okay.

101
0:5:32,98 --> 0:5:38,2
These days, if a machine is having issues, we usually just re-provision

102
0:5:38,3 --> 0:5:39,04
it anyway.

103
0:5:39,52 --> 0:5:44,48
And switch over or failover doesn't require restart, so it's

104
0:5:44,48 --> 0:5:46,1
very seamless.

105
0:5:46,98 --> 0:5:51,72
It means that we can live with local NVMes, right?

106
0:5:52,04 --> 0:5:56,72
And additional momentum to this idea recently was created by

107
0:5:56,72 --> 0:6:2,14
PlanetScale, which came to Postgres ecosystem and started showing

108
0:6:2,2 --> 0:6:7,16
very good looking pictures in terms of benchmarks and real latencies

109
0:6:8,04 --> 0:6:9,18
from production systems.

110
0:6:9,18 --> 0:6:13,82
Of course, you cannot beat local NVMe disks with their million

111
0:6:13,82 --> 0:6:20,24
to 3,000,000 IOPS and amazing throughput, like more than 10 gigabytes

112
0:6:20,38 --> 0:6:22,36
per second writes and reads.

113
0:6:22,36 --> 0:6:23,22
It's insane.

114
0:6:24,02 --> 0:6:28,58
Because for regular disks we have 123 gigabytes per second only

115
0:6:28,58 --> 0:6:29,48
and that's it.

116
0:6:29,96 --> 0:6:33,74
And IOPS, 100 to 100,000 IOPS maximum.

117
0:6:34,82 --> 0:6:39,16
Michael: And I think for me, the critical part is if we have

118
0:6:39,16 --> 0:6:45,36
an HA setup with failover in place, then we don't need the durability

119
0:6:46,36 --> 0:6:51,18
of those disks, that we don't need there to be 6 cloud backups,

120
0:6:51,18 --> 0:6:54,72
you know, so because we're gonna fail over anyway So yeah, it's

121
0:6:54,72 --> 0:6:58,52
that that insight and I think they did a good job of making that

122
0:6:58,52 --> 0:7:2,02
very clear If you've got an HA set up we can make use of the

123
0:7:2,02 --> 0:7:2,94
NVME drives.

124
0:7:3,18 --> 0:7:5,58
Nikolay: Yeah, by the way, I recognize I'm mixing throughput

125
0:7:5,58 --> 0:7:6,26
and latency.

126
0:7:6,54 --> 0:7:12,34
Anyway, local NVMes are good in both aspects of performance and

127
0:7:12,34 --> 0:7:16,88
like almost or sometimes full order of magnitude better than

128
0:7:16,88 --> 0:7:19,3
network attached disks and so on.

129
0:7:19,3 --> 0:7:20,64
But you don't have snapshots.

130
0:7:20,82 --> 0:7:24,0
So 2 big downsides, you don't have snapshots, and there is a

131
0:7:24,0 --> 0:7:25,74
hard stop in terms of disk size.

132
0:7:25,74 --> 0:7:31,02
If you plan to grow 200 terabytes, you definitely need something

133
0:7:31,02 --> 0:7:35,12
like sharding, or anyway, some way to split.

134
0:7:35,38 --> 0:7:40,08
Only if you have it, if you master it, then in this case, local

135
0:7:40,08 --> 0:7:44,48
disks are great for, for like long-term planning, but for long-term

136
0:7:44,48 --> 0:7:48,48
planning, Everyone should remember also that EBS volumes and

137
0:7:48,9 --> 0:7:53,46
PD SSD or something, hyper disks are limited to 64 terabytes

138
0:7:54,06 --> 0:7:58,74
and both CloudSQL and RDS, this is hard limit for them as well,

139
0:7:58,74 --> 0:7:59,8
64 terabytes.

140
0:7:59,86 --> 0:8:3,42
This is cliff or wall, very hard 1.

141
0:8:3,94 --> 0:8:8,2
And Aurora has 128 terabytes, double of capacity.

142
0:8:8,8 --> 0:8:12,6
But anyway, if you self-manage, you can use LVM, of course, to

143
0:8:12,6 --> 0:8:16,56
combine multiple disks, and I think some companies already do

144
0:8:16,56 --> 0:8:17,06
it.

145
0:8:17,18 --> 0:8:20,82
But in the case of local disks, it's really hard stopping, nothing

146
0:8:20,82 --> 0:8:21,96
to combine, unfortunately.

147
0:8:22,86 --> 0:8:27,14
Anyway, you usually combine multiple disks when you have terabytes

148
0:8:27,16 --> 0:8:27,84
of data.

149
0:8:28,26 --> 0:8:33,4
In my benchmarks, in our benchmarks, we combine multiple disks

150
0:8:33,4 --> 0:8:35,88
on the i4i instances.

151
0:8:36,18 --> 0:8:38,94
So there is a hard stop in terms of capacity.

152
0:8:38,94 --> 0:8:43,98
I think on i4i it's 40 terabytes or something like this.

153
0:8:43,98 --> 0:8:45,14
It's impressive, right?

154
0:8:45,14 --> 0:8:51,3
So if you think you won't grow to that size very soon, it's a

155
0:8:51,3 --> 0:8:55,68
very great alternative to consider if you go self-managed or

156
0:8:55,68 --> 0:9:0,04
some self-Kubernetes managed setup compared to RDS.

157
0:9:0,74 --> 0:9:3,46
Because it gives you an order of magnitude better performance,

158
0:9:3,52 --> 0:9:4,02
right?

159
0:9:5,02 --> 0:9:5,52
Michael: Yeah.

160
0:9:5,58 --> 0:9:8,86
By the way, when you say our benchmarks, do you mean in the blog

161
0:9:8,86 --> 0:9:10,36
post you did with Maxim?

162
0:9:10,76 --> 0:9:11,2
Nikolay: Yeah, yeah.

163
0:9:11,2 --> 0:9:13,68
So the original idea was by Maxim.

164
0:9:13,94 --> 0:9:17,74
He did the majority of initial work, created the recipe.

165
0:9:18,16 --> 0:9:21,3
I will tell the recipe, like why it existed.

166
0:9:21,58 --> 0:9:25,24
But the idea was we need to copy from 1 machine to another as

167
0:9:25,24 --> 0:9:26,12
fast as possible.

168
0:9:26,12 --> 0:9:29,44
We don't have snapshots because it's local disks.

169
0:9:30,12 --> 0:9:36,24
Either fully like maybe your own data center or it's these instances

170
0:9:36,3 --> 0:9:40,28
with local disks like i4i, i7i.

171
0:9:41,68 --> 0:9:46,32
I remember I explored this idea very long ago, 10 years or so

172
0:9:46,32 --> 0:9:50,52
maybe, yeah, 10 years maybe ago with i3 instances at that time.

173
0:9:50,68 --> 0:9:54,52
I really liked the cost of the final solution because the disks

174
0:9:54,52 --> 0:9:59,24
are included, instance costs are slightly higher, and performance

175
0:9:59,24 --> 0:9:59,88
is great.

176
0:10:0,04 --> 0:10:3,12
So we did a lot of benchmarks using i3 instances, even spot instances,

177
0:10:3,12 --> 0:10:4,46
I remember it was great.

178
0:10:4,96 --> 0:10:10,28
So you get very cheap, very powerful machine and do with Postgres

179
0:10:10,74 --> 0:10:11,5
many things.

180
0:10:11,98 --> 0:10:16,88
So anyway, the goal was how to clone to provision a replica,

181
0:10:16,88 --> 0:10:20,04
for example, or to provision a clone for experiments.

182
0:10:21,5 --> 0:10:24,9
Michael: Yeah, I wanted to ask about this because in the benchmark,

183
0:10:24,96 --> 0:10:27,0
for example, you turned off checksums.

184
0:10:27,8 --> 0:10:30,84
And it was interesting to me, Because if you're provisioning

185
0:10:30,84 --> 0:10:33,28
a replica, presumably you wouldn't do that.

186
0:10:33,28 --> 0:10:36,3
But if you were doing it for some other reason, maybe you would.

187
0:10:36,3 --> 0:10:39,64
Yeah, so what are the use cases here where this would make sense?

188
0:10:39,96 --> 0:10:43,14
Nikolay: Well, if it's a clone for experiments, we don't need

189
0:10:43,14 --> 0:10:44,36
to be super strict.

190
0:10:44,68 --> 0:10:48,0
And this option, I think, exists mostly for backups to make them

191
0:10:48,0 --> 0:10:49,54
super reliable and so on.

192
0:10:49,8 --> 0:10:54,24
Of course, and also, you know what, this is an extra feature

193
0:10:54,52 --> 0:11:0,44
that BackRest has, which it's actually a luxury to have it.

194
0:11:3,04 --> 0:11:7,94
So yeah, of course, it would be good to see experiment with checksums

195
0:11:8,0 --> 0:11:11,64
enabled of course results should go down but for some cases it's

196
0:11:11,64 --> 0:11:14,84
a it's appropriate to disable that that check

197
0:11:15,4 --> 0:11:17,78
Michael: that's a good point actually so if we don't have it

198
0:11:17,78 --> 0:11:21,36
it's not Is there a risk of introducing corruption, or is it

199
0:11:21,36 --> 0:11:22,48
more that we persist...

200
0:11:22,72 --> 0:11:24,56
Like, if our primary is corrupted,


201
0:11:25,64 --> 0:11:25,76
Nikolay: is

202
0:11:25,76 --> 0:11:26,46
Michael: it both?

203
0:11:26,82 --> 0:11:31,4
Nikolay: Let me ask you this, if
you copy data using pg_basebackup,

204
0:11:32,98 --> 0:11:35,24
Who will check this?

205
0:11:35,32 --> 0:11:36,3
Who will do that?

206
0:11:36,82 --> 0:11:37,86
Michael: Yeah, good point.

207
0:11:38,0 --> 0:11:39,24
Nikolay: Or you use rsync.

208
0:11:39,38 --> 0:11:41,54
Traditional ways are rsync and
pg_basebackup.

209
0:11:41,92 --> 0:11:43,54
This is the most traditional way.

210
0:11:43,62 --> 0:11:45,18
Both are single threaded.

211
0:11:45,54 --> 0:11:46,5
That's the point.

212
0:11:46,74 --> 0:11:50,1
They are single threaded, so, and
there is no checksum.

213
0:11:50,82 --> 0:11:52,08
This is an extra feature.

214
0:11:52,08 --> 0:11:55,66
It's great that DBLab implemented
it.

215
0:11:58,08 --> 0:12:2,56
And it would be good to compare
how it affects this amazing performance.

216
0:12:4,4 --> 0:12:8,94
But anyway, the original goal was
to be able to copy data with

217
0:12:8,94 --> 0:12:11,54
traditional way, not involving
backups this time.

218
0:12:11,68 --> 0:12:15,98
Because usually, we basically,
like normally, If not snapshots,

219
0:12:16,16 --> 0:12:18,84
we just fetch from backups using
WAL-G or pgBackRest.

220
0:12:19,64 --> 0:12:23,14
This is also a very fast way, and
there you can control also

221
0:12:23,14 --> 0:12:25,42
parallelization, you can provision
quite fast.

222
0:12:25,6 --> 0:12:30,28
And S3 and GCS on Google Cloud,
they are good with parallelization,

223
0:12:30,5 --> 0:12:32,48
like 16 threads, 32 threads.

224
0:12:33,04 --> 0:12:37,8
You can speed up significantly,
you can increase throughput of

225
0:12:38,48 --> 0:12:41,06
backing up or restoring from backup.

226
0:12:41,32 --> 0:12:44,2
But in this case, OK, we don't
have those backups.

227
0:12:44,34 --> 0:12:47,18
We just want to clone from 1 machine
to another, that's it.

228
0:12:47,32 --> 0:12:50,14
And the problem is, pg_basebackup
is still single-threaded.

229
0:12:50,98 --> 0:12:54,52
There are discussions and even
patch proposed to make it multi-threaded,

230
0:12:54,78 --> 0:12:57,1
but apparently it's not trivial.

231
0:12:57,44 --> 0:13:1,92
And I don't see current work in
progress, So I think it will

232
0:13:1,92 --> 0:13:3,36
stay single-threaded.

233
0:13:4,02 --> 0:13:8,56
So Maxim came up with the idea
that we could use pgBackRest node

234
0:13:8,56 --> 0:13:13,04
to create backups in S3 and then
restore, because it's basically

235
0:13:13,1 --> 0:13:14,04
copying twice.

236
0:13:15,18 --> 0:13:17,54
If you have them already, good,
just restore.

237
0:13:17,78 --> 0:13:23,7
But if you don't have them somehow
or cannot involve them somehow,

238
0:13:24,06 --> 0:13:26,26
you just have 2 machines you need
to clone.

239
0:13:26,26 --> 0:13:29,66
In this case, he just came up with
idea, let's use pgBackRest to

240
0:13:29,66 --> 0:13:31,4
copy from 1 machine to another,
that's it.

241
0:13:31,4 --> 0:13:33,3
It's not its primary job, right?

242
0:13:33,64 --> 0:13:34,5
But why not?

243
0:13:34,94 --> 0:13:39,52
And he told me, like I said, months
ago that he achieved more

244
0:13:39,52 --> 0:13:41,74
than 10 terabytes per hour.

245
0:13:41,74 --> 0:13:42,24
Yeah.

246
0:13:42,74 --> 0:13:47,22
Which is great, which is like absolutely
great and impressive.

247
0:13:48,18 --> 0:13:52,86
Michael: And multiple times what
you thought was, you know, considered

248
0:13:52,88 --> 0:13:53,38
good.

249
0:13:53,86 --> 0:13:59,14
Nikolay: Yeah, so I already, from
this fact and others, I already

250
0:13:59,14 --> 0:14:3,58
recognize that Like 1 hour, 1 terabyte
per hour is not a golden

251
0:14:3,58 --> 0:14:4,4
standard anymore.

252
0:14:4,4 --> 0:14:5,1
It's outdated.

253
0:14:5,2 --> 0:14:6,68
We need to raise the bar.

254
0:14:6,68 --> 0:14:7,18
Definitely.

255
0:14:7,88 --> 0:14:14,24
So maybe 2, at least, I don't know,
maybe more 3, 4, how much

256
0:14:14,24 --> 0:14:17,76
like should be okay for modern
hardware for large databases these

257
0:14:17,76 --> 0:14:21,18
days, maybe 5 terabytes per hour
should be considered as like

258
0:14:21,18 --> 0:14:24,62
good because you, you see, you
build some infrastructure and

259
0:14:24,62 --> 0:14:28,9
you see some numbers and we have
clients, not 1 client, many

260
0:14:28,9 --> 0:14:32,86
clients who come to us and complain,
okay, We will backups take

261
0:14:32,86 --> 0:14:35,78
like this amount of time, like
10 hours.

262
0:14:35,86 --> 0:14:37,72
What's your size of database?

263
0:14:38,4 --> 0:14:42,22
200 gigabytes, 10 hours, 200 gigabytes,
something is absolutely

264
0:14:42,28 --> 0:14:45,96
wrong, you know, maybe like, let's
see where situation is, where

265
0:14:45,96 --> 0:14:46,8
is the bottleneck.

266
0:14:47,64 --> 0:14:52,06
And it can be disks, in many cases,
but maybe network, maybe

267
0:14:52,06 --> 0:14:53,62
something else, mostly disks.

268
0:14:53,62 --> 0:14:55,16
But anyway, this is not normal.

269
0:14:55,16 --> 0:14:58,26
Sometimes it's software and lack
of parallelization and so on

270
0:14:58,26 --> 0:15:2,22
but it's not okay to live with
these bad numbers these days,

271
0:15:2,22 --> 0:15:2,56
right?

272
0:15:2,56 --> 0:15:3,3
It's already...

273
0:15:4,3 --> 0:15:6,36
We have very powerful hardware
usually.

274
0:15:6,78 --> 0:15:10,84
Michael: Just to ask the stupid
question, what's the negative

275
0:15:10,94 --> 0:15:12,94
impact of it taking 10 hours?

276
0:15:13,94 --> 0:15:17,12
Nikolay: Well, it affects many
things, for example, RPO, RTO,

277
0:15:17,12 --> 0:15:17,62
right?

278
0:15:17,72 --> 0:15:21,26
You create backups, you recover
a long time, it affects RTO.

279
0:15:21,6 --> 0:15:25,48
How much time, recovery target
objective, how much time do you

280
0:15:25,48 --> 0:15:28,04
spend to recover from disaster?

281
0:15:28,9 --> 0:15:33,26
You lost all the nodes, how much
time to get at least something

282
0:15:33,26 --> 0:15:34,9
working, at least 1 node working.

283
0:15:35,14 --> 0:15:38,86
If it's 10 hours for 200 gigabytes,
it's a very bad number.

284
0:15:38,88 --> 0:15:43,5
You need to change things to have
better infrastructure and software

285
0:15:43,5 --> 0:15:44,32
and so on.

286
0:15:44,54 --> 0:15:45,72
And This is 1 thing.

287
0:15:45,72 --> 0:15:48,04
Another thing is when you upgrade,
for example, sometimes you

288
0:15:48,04 --> 0:15:54,32
need to clone, our recipe involves
cloning, a recipe for 0 downtime

289
0:15:54,32 --> 0:15:54,82
upgrades.

290
0:15:55,08 --> 0:16:1,02
And if it takes so many hours,
it's hard to think what will happen

291
0:16:1,02 --> 0:16:2,52
when you will have 10 terabytes?

292
0:16:2,72 --> 0:16:4,8
How many days will you need, right?

293
0:16:5,2 --> 0:16:6,86
So it's not okay.

294
0:16:7,36 --> 0:16:9,66
This requires optimization earlier.

295
0:16:10,32 --> 0:16:13,26
Michael: And also for provisioning
replicas, a certain amount

296
0:16:13,26 --> 0:16:16,84
of time taken is always going to
be acceptable, I guess, because

297
0:16:16,84 --> 0:16:18,06
it's not the primary, right?

298
0:16:18,06 --> 0:16:18,62
Like we've maybe

299
0:16:18,62 --> 0:16:18,92
Nikolay: lost...

300
0:16:18,92 --> 0:16:19,62
It depends.

301
0:16:20,56 --> 0:16:24,72
If it's only primary you are out
of replicas is dangerous even

302
0:16:24,72 --> 0:16:30,26
with the most it's I would be running
with 2 nodes I would spend

303
0:16:30,48 --> 0:16:33,28
as much as less time as possible.

304
0:16:33,9 --> 0:16:36,3
Michael: So when you say 2 nodes,
do you mean with 2 replicas?

305
0:16:36,42 --> 0:16:37,76
Oh no, okay, so 1.

306
0:16:37,9 --> 0:16:41,28
Nikolay: Primary and 1 standby,
it's already degraded state.

307
0:16:41,28 --> 0:16:43,4
We need a third node, right?

308
0:16:43,58 --> 0:16:47,9
To be in, with 3 nodes, you have
how many nines, 12 nines or

309
0:16:47,9 --> 0:16:48,4
so?

310
0:16:49,3 --> 0:16:55,16
If you check the availability of
EC2 instances and just think

311
0:16:55,16 --> 0:17:1,64
about 3 nodes and what's the chance
to be out of anything, this

312
0:17:1,64 --> 0:17:5,32
will be I think 12 nines or something,
as I remember.

313
0:17:5,5 --> 0:17:6,86
Michael: That's a lot of nines.

314
0:17:7,08 --> 0:17:8,3
Nikolay: Yeah, yeah, and that's
great.

315
0:17:8,3 --> 0:17:12,0
So it means like almost 0 chances
that you will be down unless

316
0:17:12,28 --> 0:17:16,5
some stupid bug which propagates
to all the nodes and puts Postgres

317
0:17:16,5 --> 0:17:18,36
on knees on all nodes simultaneously.

318
0:17:19,12 --> 0:17:23,6
Anyway, if we integrate it, and
especially if it's just 1 primary,

319
0:17:23,6 --> 0:17:24,94
it's a very dangerous state.

320
0:17:25,01 --> 0:17:29,72
Like, I like, we have cases in
some clients who run only 1 node

321
0:17:29,72 --> 0:17:33,22
and like, we discuss it with them,
it's not okay, but somehow

322
0:17:33,24 --> 0:17:38,76
these days it's, they just survive
because the cloud became better,

323
0:17:38,76 --> 0:17:41,5
you know, like it doesn't die so
often.

324
0:17:41,68 --> 0:17:43,88
Nodes don't die so often as before.

325
0:17:44,34 --> 0:17:46,8
Michael: I also think there's this
phenomenon where if you're

326
0:17:46,8 --> 0:17:50,92
in US East 1 and you go down there's
so many other services that

327
0:17:50,92 --> 0:17:54,2
are down at the same time that
you kind of get not a free pass

328
0:17:54,2 --> 0:17:56,98
but like they get away with it
a little bit more.

329
0:17:57,1 --> 0:17:58,66
Nikolay: You're in the club right?

330
0:17:58,78 --> 0:17:59,7
Michael: No no no

331
0:18:0,4 --> 0:18:3,98
Nikolay: you are down others are
down like we are we are in the

332
0:18:4,54 --> 0:18:5,28
same club.

333
0:18:5,74 --> 0:18:8,66
Michael: Yeah, I'm not in that
club, but yes, I think a little

334
0:18:8,66 --> 0:18:9,16
bit.

335
0:18:9,92 --> 0:18:13,84
Nikolay: No, I think, yeah, we
have also customers who got in

336
0:18:13,84 --> 0:18:17,08
that trouble as well, and they
seriously think it's not okay,

337
0:18:17,08 --> 0:18:20,74
and they need more region set up,
And I think this drives right

338
0:18:20,74 --> 0:18:24,8
now improvements in many companies
and infrastructure and so

339
0:18:24,8 --> 0:18:25,3
on.

340
0:18:25,38 --> 0:18:26,62
Michael: I'm glad to hear that.

341
0:18:26,82 --> 0:18:27,06
Nikolay: Yeah.

342
0:18:27,06 --> 0:18:31,86
Anyway, and with our 1 node is
dangerous, like I would provision

343
0:18:32,2 --> 0:18:33,42
more nodes sooner.

344
0:18:34,08 --> 0:18:36,36
And in some cases you cannot leave
with 1 node.

345
0:18:36,36 --> 0:18:40,68
If you already relied on distributing
read-only traffic between

346
0:18:40,68 --> 0:18:46,92
among multiple nodes, 1 node won't
be capable of serving this

347
0:18:46,92 --> 0:18:47,42
traffic.

348
0:18:47,78 --> 0:18:50,4
Michael: Yeah, so we've established
speed matters.

349
0:18:50,74 --> 0:18:51,3
Of course.

350
0:18:51,58 --> 0:18:54,1
Sorry, time matters, therefore
speed matters.

351
0:18:54,84 --> 0:18:55,9
How do we go faster?

352
0:18:56,26 --> 0:19:0,06
Nikolay: Yeah, so pg_basebackup,
I expected like, okay, it's single

353
0:19:0,06 --> 0:19:4,0
threaded, so I expect something
like 234 hundred megabytes per

354
0:19:4,0 --> 0:19:4,84
second maximum.

355
0:19:6,34 --> 0:19:9,88
And this is what also my expectations
fully matched with what

356
0:19:9,88 --> 0:19:11,82
I saw Maxim shared with me.

357
0:19:12,94 --> 0:19:19,8
But when I started testing I took
2 nodes i4i, it's not the latest

358
0:19:19,8 --> 0:19:25,54
it's a third generation Intel scalable
Xeon which is quite outdated

359
0:19:25,76 --> 0:19:33,14
they have already fifth generation
i7i so 2 nodes 32 cores it's

360
0:19:33,14 --> 0:19:40,12
128 vcpus more than terabyte of
memory I think both, and 75 gigabit

361
0:19:40,12 --> 0:19:41,14
per second network.

362
0:19:42,08 --> 0:19:47,16
And the disks are like, I think
it's 3 million IOPS I remember,

363
0:19:47,16 --> 0:19:51,34
I don't remember how many megs
per second, gigabytes per second.

364
0:19:51,34 --> 0:19:55,36
It's definitely somewhere like
10 or more gigabytes per second

365
0:19:55,76 --> 0:19:57,08
disk throughput maximum.

366
0:19:57,88 --> 0:20:0,94
I think 8 disks already in RAID.

367
0:20:1,16 --> 0:20:2,06
I took Ubuntu.

368
0:20:2,36 --> 0:20:5,7
I installed Postgres 18 and this
was my mistake.

369
0:20:6,42 --> 0:20:10,74
Because this gave me very good
speed for pg_basebackup.

370
0:20:11,34 --> 0:20:11,84
Unexpectedly.

371
0:20:12,56 --> 0:20:14,16
I saw gigabyte per second.

372
0:20:15,06 --> 0:20:18,04
And I was like, oh, what am I doing
wrong here?

373
0:20:18,04 --> 0:20:19,3
Because it's too fast.

374
0:20:20,08 --> 0:20:24,34
And then I realized it's Postgres
18 improvements, right?

375
0:20:24,8 --> 0:20:25,3
io_uring.

376
0:20:26,28 --> 0:20:29,44
Michael: Well, yeah, I saw you
said that in the blog post.

377
0:20:29,44 --> 0:20:34,36
But did you run it deliberately
with io_uring on?

378
0:20:34,36 --> 0:20:35,2
Nikolay: It's by default.

379
0:20:35,74 --> 0:20:38,48
Michael: Well, no, it's not io_uring
by default.

380
0:20:38,62 --> 0:20:42,34
But it is using something similar
it's using something similar

381
0:20:42,34 --> 0:20:42,9
it's like

382
0:20:42,9 --> 0:20:45,56
Nikolay: that's some pre-fetching
or something but it's definitely

383
0:20:45,56 --> 0:20:46,06
nothing

384
0:20:46,84 --> 0:20:51,06
Michael: yeah well it's it does
have 3 work 2 or 3 work processes

385
0:20:51,14 --> 0:20:54,56
by default, and it is an asynchronous
I/O thing.

386
0:20:54,72 --> 0:20:56,88
Nikolay: No, I didn't change that
setting.

387
0:20:56,88 --> 0:20:59,68
So this is probably an accuracy
in my blog post.

388
0:21:0,14 --> 0:21:4,02
Michael: But maybe not bad, because
it might still be AIO.

389
0:21:4,12 --> 0:21:7,4
It might still be asynchronous
I/O, just not io_uring.

390
0:21:7,4 --> 0:21:10,46
Nikolay: So, right, there is setting
I/O method, right?

391
0:21:10,46 --> 0:21:14,72
And default is worker, right?

392
0:21:14,78 --> 0:21:15,28
Yeah.

393
0:21:15,32 --> 0:21:18,22
And it's asynchronous I/O using
worker process.

394
0:21:18,22 --> 0:21:20,82
Yeah, it's not io_uring, I need
to make correction.

395
0:21:20,82 --> 0:21:24,02
Michael: But this is actually interesting,
because I think the

396
0:21:24,02 --> 0:21:30,04
default is only 2 or 3 workers,
which means if you increase that

397
0:21:30,04 --> 0:21:31,84
number, you might see triple, like,

398
0:21:31,84 --> 0:21:32,54
Nikolay: 3 workers, you

399
0:21:32,54 --> 0:21:33,3
Michael: got triple.

400
0:21:33,56 --> 0:21:34,06
Yeah.

401
0:21:34,24 --> 0:21:36,64
So you got roughly triple the

402
0:21:36,74 --> 0:21:39,3
Nikolay: exact in my expectations
for 300 yeah

403
0:21:39,72 --> 0:21:43,26
Michael: but if you increase that
number if you increase the

404
0:21:43,26 --> 0:21:45,7
number of workers you might be
able to get even more

405
0:21:45,9 --> 0:21:49,36
Nikolay: doesn't mean I need to
spend a few hundred dollars again

406
0:21:49,36 --> 0:21:50,26
for these machines?

407
0:21:50,74 --> 0:21:51,66
I guess so.

408
0:21:53,3 --> 0:21:54,94
Michael: Or an exercise to the
reader.

409
0:21:54,94 --> 0:21:55,82
Nikolay: How it worked.

410
0:21:56,06 --> 0:21:58,68
I started them at 6pm or something.

411
0:21:58,68 --> 0:22:0,0
I worked with them.

412
0:22:0,56 --> 0:22:1,96
Cursor did a lot of work.

413
0:22:1,96 --> 0:22:6,0
Like I just explained control that
I connected T-Max, Iostat,

414
0:22:6,94 --> 0:22:8,16
IOTOP, everything.

415
0:22:8,2 --> 0:22:12,38
I see how many threads, everything,
like htop, many things.

416
0:22:12,44 --> 0:22:14,8
So I see that it's doing work as
I do.

417
0:22:14,8 --> 0:22:16,7
Many iterations to Polish approach.

418
0:22:17,04 --> 0:22:20,34
And then I realized, okay, it's
already 9pm, 10pm, but I cannot

419
0:22:20,34 --> 0:22:23,52
drop it because it's already provisioned
and it took time to

420
0:22:23,52 --> 0:22:25,26
create 1 terabyte database.

421
0:22:26,2 --> 0:22:29,44
I had comments from Maxim as well
at 1 terabyte is not enough.

422
0:22:29,44 --> 0:22:30,6
Like, okay, I agree.

423
0:22:30,86 --> 0:22:32,28
I should do 10 terabytes.

424
0:22:32,54 --> 0:22:34,62
So I guess I need to redo this.

425
0:22:34,82 --> 0:22:36,1
I have homework to do.

426
0:22:36,1 --> 0:22:37,54
Michael: You don't need to redo
it.

427
0:22:37,54 --> 0:22:38,72
It's just an interesting-

428
0:22:39,14 --> 0:22:40,98
Nikolay: It's interesting to me
as well.

429
0:22:41,12 --> 0:22:44,34
Because I also want to see Postgres
17, you know?

430
0:22:44,34 --> 0:22:44,82
Yes.

431
0:22:44,82 --> 0:22:46,2
And different settings here.

432
0:22:46,2 --> 0:22:52,04
I think it's interesting how pgBaseBackup
can behave with various

433
0:22:52,04 --> 0:22:52,54
settings.

434
0:22:53,0 --> 0:22:57,42
More workers, io_uring as well,
sync as well, right?

435
0:22:57,52 --> 0:22:58,02
Synchronously.

436
0:22:58,44 --> 0:23:1,36
So not asynchronously like here,
but synchronously.

437
0:23:1,74 --> 0:23:2,72
It should go down.

438
0:23:2,72 --> 0:23:3,48
The throughput should go down.

439
0:23:3,48 --> 0:23:3,72
Should go

440
0:23:3,72 --> 0:23:4,44
Michael: back down.

441
0:23:4,66 --> 0:23:5,52
Nikolay: Yeah, yeah, yeah.

442
0:23:5,9 --> 0:23:13,04
So now, obviously, you just make
me do another round of this

443
0:23:13,04 --> 0:23:14,28
experiment right

444
0:23:15,02 --> 0:23:21,98
Michael: well sorry but I also
think like this someone else could

445
0:23:21,98 --> 0:23:23,66
do this right someone else could
do this

446
0:23:23,66 --> 0:23:27,52
Nikolay: yeah I I will have it
in my to-do list, but if someone

447
0:23:27,52 --> 0:23:30,6
who is listening to us is ready
to repeat, maybe they have already

448
0:23:30,6 --> 0:23:31,58
some good machines.

449
0:23:32,32 --> 0:23:34,2
Unfortunately, I don't have credits
anymore.

450
0:23:34,2 --> 0:23:38,26
Like I did in my company, had a
lot of credits, but not now.

451
0:23:38,5 --> 0:23:42,94
Maybe someone has credits and can
provision big machines and

452
0:23:43,18 --> 0:23:47,08
just avoid extra spending Because
I think a few more times it

453
0:23:47,08 --> 0:23:50,42
will be already $1,000 just to
check this setting.

454
0:23:50,86 --> 0:23:55,12
And I'm not interested in checking
on smaller machines and then

455
0:23:55,28 --> 0:23:56,02
like extrapolating.

456
0:23:56,36 --> 0:23:57,22
No, no, no.

457
0:23:57,36 --> 0:23:59,82
It's boring and not serious.

458
0:24:0,3 --> 0:24:1,82
So it should be big machines.

459
0:24:1,82 --> 0:24:7,12
I would also check i7i Because
they have 100 megabit per second.

460
0:24:7,12 --> 0:24:8,5
It's versus 75.

461
0:24:8,94 --> 0:24:14,7
So also extra I Think we can exceed
10 gigabytes per second And

462
0:24:14,8 --> 0:24:19,74
so what I've got with pgBackRest
with pg_basebackup 1 gigabyte

463
0:24:20,54 --> 0:24:23,34
I think you are right in terms
of 3 workers.

464
0:24:23,36 --> 0:24:24,48
This is why, right?

465
0:24:24,64 --> 0:24:26,14
Interesting to check more workers.

466
0:24:26,4 --> 0:24:32,62
With pgBackRest, I increased
parallelization and my blog post

467
0:24:32,62 --> 0:24:38,86
has this like the Graph showing
how throughput was growing with

468
0:24:38,86 --> 0:24:41,92
more and more workers and then
situation happened on that network

469
0:24:42,18 --> 0:24:47,18
So I achieved 36 terabytes per
hour And that was also exceeding

470
0:24:47,18 --> 0:24:47,86
my expectations.

471
0:24:47,98 --> 0:24:51,8
I was hoping like be close to 20
maybe, right?

472
0:24:52,12 --> 0:24:53,14
But this machine's,

473
0:24:53,48 --> 0:24:53,86
Michael: yeah.

474
0:24:53,86 --> 0:24:55,08
It sounds fake to me.

475
0:24:55,08 --> 0:24:57,18
It just sounds like not believable.

476
0:24:58,18 --> 0:24:59,74
But it obviously is impressive.

477
0:25:0,18 --> 0:25:4,78
Nikolay: Yeah, so i7i with 100
megabit per second should give

478
0:25:4,78 --> 0:25:7,54
maybe more than 40, right?

479
0:25:8,2 --> 0:25:10,58
Maybe approaching 50 TB per hour.

480
0:25:11,54 --> 0:25:15,16
This raises the bars and expectations
what we should have in

481
0:25:15,16 --> 0:25:16,74
our systems these days.

482
0:25:17,64 --> 0:25:20,76
So it should be normal, 5 terabytes
per hour should be normal

483
0:25:20,76 --> 0:25:21,8
now, I think.

484
0:25:22,08 --> 0:25:25,22
Answering my own question some
minutes ago, right?

485
0:25:25,58 --> 0:25:29,34
10 terabytes should be not surprising
already, if you have local

486
0:25:29,34 --> 0:25:29,84
disks.

487
0:25:30,24 --> 0:25:32,28
With BS volumes, different.

488
0:25:32,98 --> 0:25:36,9
Michael: Yeah, for me, I guess
the surprising thing is that we

489
0:25:36,9 --> 0:25:38,66
can saturate the network.

490
0:25:39,14 --> 0:25:42,42
Like, it then becomes about your
network, right?

491
0:25:43,04 --> 0:25:43,68
I think.

492
0:25:44,14 --> 0:25:45,4
And the number of cores.

493
0:25:46,02 --> 0:25:46,52
Nikolay: Yeah.

494
0:25:46,56 --> 0:25:50,86
Yeah, if you saturate network,
the idea is of course, let's try

495
0:25:50,86 --> 0:25:51,36
compression.

496
0:25:51,82 --> 0:25:52,7
So I did.

497
0:25:52,72 --> 0:25:57,76
And compression improves things,
but, so it shifts the saturation

498
0:25:57,84 --> 0:26:5,52
point, but it's like quickly saturated
again, and it was not

499
0:26:6,22 --> 0:26:8,24
helping to achieve more.

500
0:26:8,76 --> 0:26:14,44
Maybe I saturated the disks there
actually.

501
0:26:14,44 --> 0:26:15,98
So if you look at the picture...

502
0:26:16,16 --> 0:26:18,14
Michael: I'll pull up your blog
post as well.

503
0:26:18,24 --> 0:26:26,02
Yeah, because you hit a peak around
32 parallel processes.

504
0:26:26,8 --> 0:26:28,76
Nikolay: Ah, also SSH overhead.

505
0:26:29,2 --> 0:26:33,88
I've got good comments on Twitter
that it would be good to check

506
0:26:33,88 --> 0:26:34,38
TLS.

507
0:26:34,82 --> 0:26:39,82
It would require effort to configure,
but it should also reduce

508
0:26:39,84 --> 0:26:43,6
some overhead and throughput should
be improved.

509
0:26:44,34 --> 0:26:44,8201
Wow.

510
0:26:44,8201 --> 0:26:51,04
So I guess 36 terabytes per hour
is not the current limit for

511
0:26:51,04 --> 0:26:55,74
modern hardware in AWS.

512
0:26:56,8 --> 0:26:58,78
We can squeeze more, right?

513
0:26:58,78 --> 0:27:3,48
So we can probably squeeze up to
50 terabytes per hour.

514
0:27:4,12 --> 0:27:8,82
There are several good ideas, and
also squeeze from pg_basebackup.

515
0:27:10,68 --> 0:27:13,18
So this is a competition, right?

516
0:27:13,18 --> 0:27:18,58
So my intention was to show like
pg_basebackup is bad, single-threaded,

517
0:27:19,12 --> 0:27:20,9
forget about it, here's the recipe.

518
0:27:21,46 --> 0:27:25,52
Now with Postgres 18, it's not
that bad.

519
0:27:25,52 --> 0:27:28,86
It can be tuned additionally, as
you say, probably, right?

520
0:27:28,86 --> 0:27:31,5
And we can have very good speed
with it as well.

521
0:27:31,64 --> 0:27:33,78
So it's interesting.

522
0:27:34,0 --> 0:27:38,38
And 1 gigabit per second is more
than 3 terabytes per hour.

523
0:27:38,88 --> 0:27:39,58
Michael: Yeah, wow.

524
0:27:42,04 --> 0:27:45,72
Nikolay: Yeah, 3, 600 gigabyte
per hour, right?

525
0:27:45,72 --> 0:27:47,14
It's more than 3 terabytes.

526
0:27:47,8 --> 0:27:50,04
So it's already like good enough.

527
0:27:50,16 --> 0:27:51,7
And these are default settings.

528
0:27:52,76 --> 0:27:55,62
So the answer, should you use pgBackRest
Recipe?

529
0:27:55,76 --> 0:27:56,68
Well, it depends.

530
0:27:58,34 --> 0:28:2,0
With Postgres 18, with older Postgres,
I definitely think you

531
0:28:2,0 --> 0:28:4,9
will be limited by 3, 400 megs
per second.

532
0:28:4,9 --> 0:28:5,58
That's it.

533
0:28:5,64 --> 0:28:10,74
Because of single-threaded nature
of the base backup.

534
0:28:11,4 --> 0:28:16,2
And I like this a lot compared
to the tricks we did in the past

535
0:28:16,2 --> 0:28:16,98
with rsync.

536
0:28:17,4 --> 0:28:21,32
rsync is also single threaded and
I really don't like that.

537
0:28:21,76 --> 0:28:22,54
Why is it so?

538
0:28:22,54 --> 0:28:23,5
It should be implemented.

539
0:28:24,0 --> 0:28:29,44
People use GNU parallel and so
on, but it so feels cumbersome,

540
0:28:29,62 --> 0:28:30,12
right?

541
0:28:30,18 --> 0:28:34,94
So it's not a lightweight approach
to write some things and then

542
0:28:34,94 --> 0:28:37,0
control the threads and so on.

543
0:28:37,82 --> 0:28:40,06
With rsync, it's definitely single-threaded.

544
0:28:40,9 --> 0:28:45,42
pg_basebackup in Postgres 18 basically
shines compared to rsync.

545
0:28:46,26 --> 0:28:50,14
And also it orchestrates all the
stuff additionally like like

546
0:28:50,14 --> 0:28:54,56
connecting to the primary doing
like telling it I'm copying you

547
0:28:54,56 --> 0:28:58,86
right pg_start_backup, pg_stop_backup
basically right so you

548
0:28:58,86 --> 0:29:3,18
don't need to remember all those
things and it's also standard

549
0:29:3,34 --> 0:29:3,84
officially.

550
0:29:6,04 --> 0:29:9,78
Michael: Although I would say I
was looking because of researching

551
0:29:9,8 --> 0:29:14,72
for this I was looking at the
pgBackRest docs and GitHub repo

552
0:29:15,1 --> 0:29:19,58
just earlier today and it's incredibly
well maintained like I

553
0:29:19,58 --> 0:29:20,8
was surprised to see

554
0:29:20,8 --> 0:29:23,5
Nikolay: yeah we're moving to David
Steele, yeah it's great

555
0:29:23,68 --> 0:29:27,38
Michael: yeah kudos to David but
also just in terms of viewing

556
0:29:27,38 --> 0:29:30,1
a project for the first time in
a while I hadn't seen I hadn't

557
0:29:30,1 --> 0:29:32,14
like looked into it in detail for
a while.

558
0:29:32,32 --> 0:29:35,58
And I think I was just a bit surprised
when I looked at an open

559
0:29:35,58 --> 0:29:35,86
issue.

560
0:29:35,86 --> 0:29:38,8
I was like, there's only 54 open
issues?

561
0:29:39,84 --> 0:29:44,16
Like that's, for the size of the
project, for the age of the

562
0:29:44,16 --> 0:29:49,02
project, That for me seems like,
and by the way, I'm not taking

563
0:29:49,02 --> 0:29:51,26
that number on its own, looking
at what they were, like some

564
0:29:51,26 --> 0:29:53,56
of them are like feature ideas
from a few years ago that are

565
0:29:53,56 --> 0:29:55,62
just left open because they're
still good ideas.

566
0:29:55,84 --> 0:29:57,18
Some of them are new things.

567
0:29:57,38 --> 0:30:0,92
Nikolay: That's good because there's
approach when people just

568
0:30:0,92 --> 0:30:4,44
close if it's like, okay, no comments
for a month, let's close.

569
0:30:4,44 --> 0:30:9,22
And I absolutely hate it because
if you reported a bug which

570
0:30:9,22 --> 0:30:13,7
is not fixed, CloudNativePG, hello,
like they just closed because,

571
0:30:13,78 --> 0:30:14,28
right?

572
0:30:14,8 --> 0:30:18,42
Because it's a little bit of a
discussion, but bug is still there

573
0:30:18,42 --> 0:30:20,4
and everyone around agrees it's
still there.

574
0:30:20,4 --> 0:30:21,56
Why do you close my issue?

575
0:30:21,56 --> 0:30:22,58
It's a bug report.

576
0:30:23,48 --> 0:30:25,62
So next time I won't write anything,
right?

577
0:30:25,94 --> 0:30:27,88
And I told it everywhere.

578
0:30:27,88 --> 0:30:31,7
And in CloudNativePG, I'm not
going to contribute anyhow else.

579
0:30:31,92 --> 0:30:35,92
So this is like, let's just keep
issue set lean, right?

580
0:30:35,92 --> 0:30:36,88
It's not okay.

581
0:30:37,26 --> 0:30:40,68
If it's a serious problem of idea,
it should be there.

582
0:30:40,68 --> 0:30:43,94
So the fact that there are all
the issues there is great.

583
0:30:44,34 --> 0:30:45,22
Michael: Yeah, yes.

584
0:30:45,28 --> 0:30:48,68
So all I meant is, you kind of
get these, Sometimes you get red

585
0:30:48,68 --> 0:30:51,26
flags when you look at a new project
or like if you if you see

586
0:30:51,26 --> 0:30:54,44
a project And it's got 20,000
open issues for me That is a red

587
0:30:54,44 --> 0:30:57,98
flag like not that it's that there's
lots of issues But that

588
0:30:57,98 --> 0:31:1,22
no one's maintained no one's closing
the ones that are duplicates

589
0:31:1,32 --> 0:31:2,3
or things.

590
0:31:2,44 --> 0:31:5,14
So it was more just like, I wasn't
getting red flags and then

591
0:31:5,14 --> 0:31:8,94
there were loads of green flags
looking at how like recent PRs

592
0:31:9,24 --> 0:31:11,18
and it just, it seemed like a route.

593
0:31:11,18 --> 0:31:17,38
So I know it's not in core, but
it does still feel like a very

594
0:31:17,38 --> 0:31:20,4
tried and tested product that's
really well maintained.

595
0:31:20,5 --> 0:31:22,7
And I know a lot of people are
using it.

596
0:31:23,36 --> 0:31:27,0
So it feels pretty much as close
to core as like a...

597
0:31:27,28 --> 0:31:30,9
Nikolay: What if you see hundreds
or thousands of issues open,

598
0:31:30,9 --> 0:31:34,04
but many of them are recent and
there is activity.

599
0:31:35,0 --> 0:31:38,5
Not well maintained or a mess.

600
0:31:38,56 --> 0:31:39,44
Michael: Spam issue?

601
0:31:39,44 --> 0:31:41,78
Like how can you create hundreds of thousands of issues?

602
0:31:41,82 --> 0:31:44,86
I just haven't seen a repo like that so maybe I have to like

603
0:31:44,86 --> 0:31:47,18
reassess but yeah it's yeah.

604
0:31:49,54 --> 0:31:51,36
Have you seen a repo like that?

605
0:31:52,0 --> 0:31:53,6
Nikolay: I have such projects.

606
0:31:54,34 --> 0:31:54,84
Oh, okay.

607
0:31:54,84 --> 0:31:57,36
So it's just, yeah, there are some old issues that should be

608
0:31:57,36 --> 0:31:58,76
closed, you know, yeah.

609
0:31:58,86 --> 0:31:59,36
Yes.

610
0:31:59,45 --> 0:32:3,82
Well, If you check, well, so I agree this is a great example

611
0:32:3,84 --> 0:32:5,82
of well-maintained open source project.

612
0:32:5,92 --> 0:32:6,76
Absolutely great.

613
0:32:6,76 --> 0:32:7,4
I agree.

614
0:32:8,1 --> 0:32:12,32
I think if you check some internal repositories in companies,

615
0:32:13,48 --> 0:32:15,28
sometimes it's a mess as well.

616
0:32:17,2 --> 0:32:18,5
Michael: And sometimes it's beautiful.

617
0:32:19,64 --> 0:32:21,4
I think it depends a lot on the maintainer

618
0:32:21,66 --> 0:32:25,24
Nikolay: what can you tell about Postgres itself yeah

619
0:32:25,24 --> 0:32:26,9
Michael: well we don't have an issue track

620
0:32:26,92 --> 0:32:31,02
Nikolay: do yeah no issues no no issues right well like

621
0:32:31,26 --> 0:32:33,22
Michael: we should get that would be a good t-shirt.

622
0:32:33,48 --> 0:32:34,06
No issues.

623
0:32:34,06 --> 0:32:39,02
No issues All right, let's we get back to the topic

624
0:32:39,02 --> 0:32:42,68
Nikolay: back to the topic so David Steele reached out to me commenting

625
0:32:42,9 --> 0:32:48,54
and it's great So some inaccuracies in my blog post to fix.

626
0:32:49,28 --> 0:32:55,48
And also, obviously, it inspired him to, as I understand this

627
0:32:55,48 --> 0:33:1,66
idea was already for quite some time to have the feature to basically

628
0:33:1,94 --> 0:33:5,28
issue renice, to reprioritize some processes, right?

629
0:33:5,64 --> 0:33:10,02
So now there is a pull request to be able to change priority

630
0:33:10,52 --> 0:33:12,72
of pgBackRest processes, which is great.

631
0:33:12,72 --> 0:33:14,1
Michael: Yeah, so this is super cool.

632
0:33:14,1 --> 0:33:18,02
So this is the idea that because you're running this on production,

633
0:33:18,52 --> 0:33:22,08
you might not want to, you know, let's say you're quite high

634
0:33:22,08 --> 0:33:24,34
CPU on your primary.

635
0:33:24,34 --> 0:33:24,84
Nikolay: Compression.

636
0:33:25,84 --> 0:33:26,0
Yeah.

637
0:33:26,0 --> 0:33:29,64
You use compression, so you need CPU for pgBackRest, for example.

638
0:33:30,42 --> 0:33:30,78
Michael: Yeah.

639
0:33:30,78 --> 0:33:33,776
But you don't want to affect your current traffic, right?

640
0:33:33,776 --> 0:33:34,072
It reminds

641
0:33:34,072 --> 0:33:39,0
Nikolay: me of old days, like 2003 or 2004, when we also...

642
0:33:39,38 --> 0:33:41,54
It's so strange, like, it was...

643
0:33:42,38 --> 0:33:46,2
We couldn't think about terabytes those days, 20 plus years ago.

644
0:33:46,3 --> 0:33:47,2
It was too much.

645
0:33:47,2 --> 0:33:50,14
Terabyte is a mind-blowing number.

646
0:33:50,58 --> 0:33:55,94
But we used, I remember, renice, it is renice, ionice, something,

647
0:33:56,32 --> 0:33:57,72
I barely remember from those.

648
0:33:57,72 --> 0:33:58,94
Michael: Just NICE, yeah?

649
0:33:58,94 --> 0:34:0,54
Nikolay: Yeah, NICE and so on.

650
0:34:1,8 --> 0:34:4,9
It's strange that this word is used to change priority, right?

651
0:34:4,9 --> 0:34:5,78
I don't know.

652
0:34:5,96 --> 0:34:7,28
Do you know details here?

653
0:34:7,9 --> 0:34:9,02
Yeah, no, I don't.

654
0:34:9,32 --> 0:34:9,52
Yeah.

655
0:34:9,52 --> 0:34:9,72
Yeah.

656
0:34:9,72 --> 0:34:13,36
So I remember from those days we used it, but also I remember

657
0:34:13,58 --> 0:34:15,76
I was struggling to see good effects.

658
0:34:15,76 --> 0:34:21,24
So now if this pops up again, I would check, test it properly

659
0:34:21,28 --> 0:34:26,92
again, how exactly it helps, you know, but this is in to do.

660
0:34:27,84 --> 0:34:32,22
Michael: And really cool that your, that these benchmarks helped

661
0:34:32,22 --> 0:34:33,26
prioritize that work.

662
0:34:33,26 --> 0:34:33,98
So cool.

663
0:34:34,06 --> 0:34:37,08
Nikolay: Inspires to do more benchmarks as well.

664
0:34:38,3 --> 0:34:38,8
Michael: Always.

665
0:34:39,0 --> 0:34:41,92
I get stuck in that loop sometimes, not just with benchmarks,

666
0:34:42,04 --> 0:34:44,54
but you know, you think you've used to implement a feature, then

667
0:34:44,54 --> 0:34:46,56
you think of a better way to implement it, then you think of

668
0:34:46,56 --> 0:34:48,84
a better way, and you think of a better way, and you get stuck

669
0:34:48,84 --> 0:34:51,6
in that loop of constant, yeah, or blog posts.

670
0:34:51,6 --> 0:34:51,9
Nikolay: Yeah.

671
0:34:51,9 --> 0:34:53,9
This post is definitely not the final.

672
0:34:54,32 --> 0:34:56,94
So somebody should achieve 50 terabytes per hour.

673
0:34:56,94 --> 0:34:57,94
This is a challenge.

674
0:34:58,66 --> 0:34:59,48
Yeah, I like it.

675
0:34:59,48 --> 0:35:0,94
Yeah, Who can do it?

676
0:35:1,32 --> 0:35:5,04
Who will be the first in the Postgres community reporting 50TB

677
0:35:5,38 --> 0:35:6,06
per hour?

678
0:35:6,48 --> 0:35:7,62
Michael: Yeah, let us know.

679
0:35:8,48 --> 0:35:14,44
Although, I did see there was an issue that David linked us to,

680
0:35:14,68 --> 0:35:17,765
where somebody maxed out a 100-core machine, somebody maxed out

681
0:35:17,765 --> 0:35:23,98
a 100 core machine and it convinced him to increase the, he had

682
0:35:23,98 --> 0:35:26,66
a process max at 99 before then.

683
0:35:27,08 --> 0:35:31,5
So I thought that was quite funny that, you know, there was a,

684
0:35:31,5 --> 0:35:34,66
the limiting factor was this hard-coded yeah limit

685
0:35:34,82 --> 0:35:38,94
Nikolay: number of maximum number of vcpus on AWS already I think

686
0:35:39,72 --> 0:35:46,0
has it exceeded already of 1000 or no I remember 700 vcpus with

687
0:35:46,0 --> 0:35:53,94
also fifth generation into 0 scalable so it's already like just,

688
0:35:54,16 --> 0:35:55,26
we have a lot.

689
0:35:55,76 --> 0:36:0,64
Michael: Bear in mind your test maxed out at 32 processes, right?

690
0:36:0,82 --> 0:36:1,96
Right, but this 1...

691
0:36:2,02 --> 0:36:3,1
Nikolay: Bumped into network.

692
0:36:4,12 --> 0:36:5,72
Michael: Yes, okay.

693
0:36:6,02 --> 0:36:10,9
Nikolay: And then when I increased compression it shifted, right?

694
0:36:10,9 --> 0:36:14,18
So it shifted to 64 cores.

695
0:36:15,46 --> 0:36:19,24
Michael: So I'm wondering what do you think the person in the

696
0:36:19,24 --> 0:36:20,78
open issue back in 2019?

697
0:36:21,34 --> 0:36:24,64
Maxed out their hundred core like what do you reckon their?

698
0:36:25,12 --> 0:36:26,14
Throughput was

699
0:36:26,68 --> 0:36:30,82
Nikolay: I Lower it's interesting, but but maybe you know like

700
0:36:31,4 --> 0:36:37,28
we need compression not only to battle network throughput limit,

701
0:36:37,36 --> 0:36:42,16
but we also, if it's actual backup,
we want, sometimes we want

702
0:36:42,16 --> 0:36:46,6
it to take as much, as less space
as possible in S3.

703
0:36:46,98 --> 0:36:47,26
Right.

704
0:36:47,26 --> 0:36:48,54
Just to pay less, maybe.

705
0:36:48,64 --> 0:36:48,94
Right.

706
0:36:48,94 --> 0:36:55,22
In this case, we can have aggressive
compression and CPU consumption

707
0:36:55,38 --> 0:36:55,88
high.

708
0:36:56,46 --> 0:37:1,82
In this case, it can be 128 cores,
for example, third generation

709
0:37:1,92 --> 0:37:9,52
of Intel Scalable Xeon has 128
cores maximum, I think, at least

710
0:37:9,52 --> 0:37:10,26
in clouds.

711
0:37:11,0 --> 0:37:17,58
It's N2 on GCP and I4I, as I used
on AWS.

712
0:37:19,16 --> 0:37:28,16
But I wonder a lot, how come AWS
created 7 or 800 VCP use machines?

713
0:37:28,32 --> 0:37:29,32
It's like, wow.

714
0:37:30,14 --> 0:37:32,66
So like 10 plus terabytes of memory.

715
0:37:33,52 --> 0:37:34,02
Wow.

716
0:37:34,44 --> 0:37:34,94
Yeah.

717
0:37:34,94 --> 0:37:35,44
Michael: Yeah.

718
0:37:36,1 --> 0:37:39,36
By the way, when you keep saying
there was the Google disks earlier,

719
0:37:39,36 --> 0:37:42,1
I think you were saying that every
time you said that, all I

720
0:37:42,1 --> 0:37:43,12
could hear was PTSD.

721
0:37:43,44 --> 0:37:43,94
PTSD.

722
0:37:45,4 --> 0:37:45,85
Nikolay: PD, SSD.

723
0:37:45,85 --> 0:37:46,71
System disk, SSD.

724
0:37:46,71 --> 0:37:47,34
Yeah, yeah.

725
0:37:47,34 --> 0:37:47,84
I

726
0:37:50,48 --> 0:37:51,14
Michael: heard PTSD.

727
0:37:51,28 --> 0:37:54,18
And then when you're saying eye
for eye, all I can think of is

728
0:37:54,72 --> 0:37:57,28
the saying, an eye for an eye in
the world is blind.

729
0:37:57,7 --> 0:37:58,34
So now...

730
0:37:58,58 --> 0:37:59,52
Nikolay: Eye for eye.

731
0:38:0,02 --> 0:38:0,52
Michael: Yeah.

732
0:38:0,84 --> 0:38:5,64
Nikolay: Oh, by the way, 1 interesting
thing, like EBS volumes

733
0:38:5,66 --> 0:38:8,76
these days also are built on top
of, NVMes.

734
0:38:9,24 --> 0:38:12,44
I don't know how's it called, Nitro
architectural.

735
0:38:12,62 --> 0:38:13,94
It's always the new name.

736
0:38:13,94 --> 0:38:18,3
I don't follow the, all the terms,
but I remember 2 gigabytes

737
0:38:18,38 --> 0:38:19,96
per second and even more.

738
0:38:20,24 --> 0:38:20,54
Right.

739
0:38:20,54 --> 0:38:25,38
So I think we can squeeze a 5 to
6 7 terabytes per hour on modern

740
0:38:25,38 --> 0:38:26,58
EBS volumes as well

741
0:38:26,58 --> 0:38:27,94
Michael: even on EBS Wow

742
0:38:27,94 --> 0:38:34,0
Nikolay: yes yes yes so so forget
about 1 terabyte per hour

743
0:38:34,24 --> 0:38:37,18
Michael: it's old you can do better

744
0:38:37,34 --> 0:38:39,8
Nikolay: yeah definitely better
if you have a large database,

745
0:38:39,8 --> 0:38:40,46
do better.

746
0:38:41,04 --> 0:38:41,54
Michael: Nice.

747
0:38:41,58 --> 0:38:42,04
All right.

748
0:38:42,04 --> 0:38:42,54
Good.

749
0:38:42,6 --> 0:38:43,6
Thanks so much, Nikolay.

750
0:38:43,6 --> 0:38:44,26
I enjoyed that.

751
0:38:44,26 --> 0:38:44,8008
Take care.

752
0:38:44,8008 --> 0:38:45,7712
Nikolay: Bye bye.