1
00:00:00,038 --> 00:00:03,817
Michael: Hello, and welcome to progre FM, a
weekly show about all thingss Postgres curl.

2
00:00:04,178 --> 00:00:08,858
I am Michael founder of PG mustard, and this
is my cohost, Nick I founder of pore AI.

3
00:00:09,218 --> 00:00:09,518
Hey, Nick.

4
00:00:09,523 --> 00:00:09,788
Eli.

5
00:00:10,148 --> 00:00:11,228
What are we gonna be talking about today?

6
00:00:11,738 --> 00:00:11,888
Nikolay: Yeah.

7
00:00:11,888 --> 00:00:12,938
Hello, Michael.

8
00:00:13,005 --> 00:00:19,440
As we decided let's talk about index maintenance,
first of all, blood removal, but maybe not only right.

9
00:00:19,954 --> 00:00:26,595
Michael: Yeah, we've alluded to this in previous episode
around vacuum, believe, or about blo specifically.

10
00:00:26,725 --> 00:00:29,320
So yeah, excited to dive into this with you.

11
00:00:29,560 --> 00:00:36,081
So should we start with how this occurs perhaps
or quick recap on is this always a problem?

12
00:00:36,081 --> 00:00:40,341
If I, if I have a Postgre database, is it very
likely I'm suffering from this at the moment?

13
00:00:40,341 --> 00:00:42,396
Or is it, is there a chance that it's fine.

14
00:00:42,652 --> 00:00:43,736
Nikolay: Yeah, by the way you are right.

15
00:00:43,736 --> 00:00:48,386
our episode was called vacuum, not below, but they're so close to each other.

16
00:00:48,391 --> 00:00:48,656
Right.

17
00:00:48,656 --> 00:00:58,491
Because usually, we talk about blood and lack of vacuuming or  some
inefficient, vacuuming and so This is a great You asked, we were

18
00:00:58,491 --> 00:01:02,459
just asked, if we tuned our  auto vacuum to be quite aggressive.

19
00:01:02,909 --> 00:01:04,169
Everything looks fine.

20
00:01:04,679 --> 00:01:09,329
question is still, should we have index maintenance from time to time?

21
00:01:09,359 --> 00:01:10,469
Is it inevitable?

22
00:01:10,829 --> 00:01:18,122
And in my opinion, from my practice, the answer is
yes, due to many reasons and autocam won't solve

23
00:01:18,122 --> 00:01:21,302
everything and you, some blood will still be accumulated.

24
00:01:21,302 --> 00:01:23,372
Even if you have very aggressive vacuum.

25
00:01:23,753 --> 00:01:29,183
And observing other databases, database systems,
for example, SQL server, Microsoft SQL server.

26
00:01:29,543 --> 00:01:34,643
They also have index maintenance, a routine task for DBS DB.

27
00:01:34,686 --> 00:01:43,590
like in my opinion, you, , you should still in very low that
growing large systems, you should indexes from time aggressive auto

28
00:01:43,590 --> 00:01:51,297
vacuuming will only reduce the frequency of this need that comes
like a tool come less frequent, but still you need to recreate them.

29
00:01:51,911 --> 00:01:54,830
Michael: I think that's a really good point on heavily loaded systems.

30
00:01:54,830 --> 00:02:00,603
I think that probably the only caveat I would put is
if you've got a relatively light load on your Postgres

31
00:02:00,603 --> 00:02:03,576
database, this might be something you don't come across.

32
00:02:03,606 --> 00:02:10,068
If, if a, even if you haven't tuned auto vacuum, It
will be tidying things up as it goes along, freeing up

33
00:02:10,243 --> 00:02:13,263
index pages, especially on later versions of Postgres.

34
00:02:13,291 --> 00:02:18,652
There's some, there's some additional logic to make that
even less likely to bloat, but yeah, there's, I think it's

35
00:02:18,652 --> 00:02:21,022
even worse than for tables that within indexes, right?

36
00:02:21,052 --> 00:02:24,382
Like vacuums able to free up space and tables much.

37
00:02:24,462 --> 00:02:32,138
Well,   refer to a previous episode for more details, but it's, I think
on, in tables you can free up space and it's much more easily reused.

38
00:02:32,298 --> 00:02:39,088
Whereas  in a be tree index, if you get page splits vacuum can
free up those that space in those pages again, but it can't UNS

39
00:02:39,243 --> 00:02:41,133
Nikolay: it doesn't rebalance between, right.

40
00:02:41,673 --> 00:02:44,023
It doesn't Rebe that and, and.

41
00:02:44,023 --> 00:02:45,135
I, I, agree with you.

42
00:02:45,135 --> 00:02:54,919
Some systems might not need automatic index, recreation but I'm sure
everyone needs monitoring and analysis of blow, on regular basis.

43
00:02:54,979 --> 00:03:02,500
So this is a must for everyone  in my opinion, and question
is how to analyze blood because it's not a trivial task.

44
00:03:02,500 --> 00:03:08,632
. All the scripts we have for fast blood analysis,
lightweight analysis, they are all wrong.

45
00:03:08,886 --> 00:03:11,585
I mean, they can have some errors.

46
00:03:12,085 --> 00:03:13,465
They are not precise.

47
00:03:13,732 --> 00:03:21,250
For example, create a table with three columns small
li time timestamp GI, Z don't use timestamp without GI

48
00:03:21,250 --> 00:03:25,154
Z and small li again, fill it with a few million row.

49
00:03:25,701 --> 00:03:33,726
Create an index and use your script to estimate blood in table and
in index to actually actually it's two different scripts, right?

50
00:03:33,726 --> 00:03:41,043
But still at least for table I'm I I'm sure you will see
terrible blow, but we know there is no blood there yet with

51
00:03:41,048 --> 00:03:44,673
Justin Short of throw, we didn't delete updates or no blood.

52
00:03:45,273 --> 00:03:47,253
You'll see something like 30% of.

53
00:03:47,860 --> 00:03:48,520
Estimated.

54
00:03:48,880 --> 00:03:56,386
So we need to keep in mind our scripts and, and they, our scripts
they have errors sometimes quite significant ones because they

55
00:03:56,406 --> 00:04:00,514
don't take into account alignment padding,  this experiment.

56
00:04:00,534 --> 00:04:09,370
I just described it by purpose has zero bites gaps
between columns inside each page, column values.

57
00:04:09,540 --> 00:04:11,160
I'm not sure by the way about index.

58
00:04:11,190 --> 00:04:13,710
I, it should also have some blood.

59
00:04:13,710 --> 00:04:16,800
Maybe not, maybe not, maybe I'm I I'm wrong here.

60
00:04:17,070 --> 00:04:19,560
And it's, it's only about hip only,

61
00:04:20,025 --> 00:04:22,695
Michael: maybe multi column indexes, but I don't, I haven't checked

62
00:04:23,128 --> 00:04:23,398
Nikolay: right?

63
00:04:23,608 --> 00:04:27,206
Any anyway estimation scripts are great because they are light.

64
00:04:27,206 --> 00:04:32,887
But, I always correct everyone saying, blo, is this, I
say, estimated blood, is this because it's not fully agree.

65
00:04:33,097 --> 00:04:37,390
The real number can be obtained by using Rista tubal.

66
00:04:37,830 --> 00:04:42,688
Just a  top, extension, by the way, I, I had no luck using it.

67
00:04:43,238 --> 00:04:45,358
I had problems as well in the past show.

68
00:04:45,388 --> 00:04:46,898
So I don't use it myself.

69
00:04:47,009 --> 00:04:54,255
In my approach, since we work a lot  colos of production environments,
I always say let's just vacuum and fool on the cl because why not?

70
00:04:54,621 --> 00:04:56,211
And compare numbers before and after.

71
00:04:56,211 --> 00:04:56,601
And this.

72
00:04:57,168 --> 00:04:59,371
A reliable number of blood.

73
00:04:59,731 --> 00:05:04,480
We, Like, this is a real, exact number
because why can fool show show it us to us?

74
00:05:04,570 --> 00:05:06,760
So the clones are cool here as well.

75
00:05:06,894 --> 00:05:08,634
But you need to wait a little bit of course.

76
00:05:08,689 --> 00:05:14,407
Michael: Yeah, I guess this leads us to quite an interesting
part of the topic, which is when should you worry about this?

77
00:05:14,407 --> 00:05:18,702
And I, I might even argue that 30% bloat is probably not that bad, you know?

78
00:05:18,757 --> 00:05:19,957
Nikolay: no, not that bad.

79
00:05:20,117 --> 00:05:30,527
but if it shows 60 and 30 of those, those 60 is an error  you decided to
blow probably like it, it affects the, the fact that it's an estimate.

80
00:05:30,527 --> 00:05:31,967
It affects our decisions anyway.

81
00:05:32,397 --> 00:05:37,437
Michael: For sure, but just to give people peace of mind,
you know, when we are talking about badly bloated indexes,

82
00:05:37,437 --> 00:05:40,857
they could easily be triple the size of a, of a reindexed,

83
00:05:40,887 --> 00:05:42,417
Nikolay: Oh, this is great.

84
00:05:42,417 --> 00:05:43,722
By the way, this is exercise.

85
00:05:43,722 --> 00:05:47,652
I do usually people see 90% blow, but they say, is it bad?

86
00:05:47,652 --> 00:05:53,622
But I say 90% blood means that your index size is
10 times bigger than it would be without blood.

87
00:05:53,832 --> 00:05:56,202
99% means a hundred times.

88
00:05:56,704 --> 00:05:57,604
a hundred times bigger.

89
00:05:57,604 --> 00:06:00,064
It's already quite noticeable.

90
00:06:00,124 --> 00:06:06,033
And by the way I wanted to mention those I say lightweight estimate scripts.

91
00:06:06,063 --> 00:06:08,253
They sometimes are not light at all.

92
00:06:08,253 --> 00:06:16,775
And we have many cases when they fail to finish during statement,
time out like 15 or 30 seconds because too many indexes.

93
00:06:17,265 --> 00:06:19,260
And analysis takes time as well.

94
00:06:19,260 --> 00:06:23,726
So this is not something you should put to monitoring to run each minute.

95
00:06:23,726 --> 00:06:25,866
Probably you don't need it every minute.

96
00:06:25,866 --> 00:06:29,699
You need it once per day, maybe because it doesn't change very fast.

97
00:06:30,209 --> 00:06:36,286
But back to the question about we have 99%
blood meaning our index is hundred times.

98
00:06:36,291 --> 00:06:37,709
Bigger question is, is it.

99
00:06:38,318 --> 00:06:38,888
Michael: Or why?

100
00:06:38,948 --> 00:06:39,368
Why

101
00:06:39,469 --> 00:06:41,329
Nikolay: we have this space, for example.

102
00:06:41,539 --> 00:06:42,809
Yeah, oh yes.

103
00:06:42,809 --> 00:06:44,536
It's bad, but why it's it's bad?

104
00:06:44,716 --> 00:06:46,036
How, how is it bad?

105
00:06:46,341 --> 00:06:51,745
Michael: Yeah, this is a fun, kind of very specific
thing that we came across working with query plans.

106
00:06:51,775 --> 00:06:59,485
And it's, it's funny because, because this feels like a, a, we, we
discussed it last week, but macro analysis problem, you know, system level

107
00:06:59,485 --> 00:07:03,535
what's going on, but you can spot it sometimes from a single query plan.

108
00:07:03,540 --> 00:07:10,189
So if you, if you notice maybe your queries are slower they,
or they could degrading over time, the same query is maybe

109
00:07:10,189 --> 00:07:14,089
doing an index scan, but that is getting slower over time.

110
00:07:14,239 --> 00:07:23,136
And you look at buffers again, a previous episode, you can sometimes see
that, that those buffer numbers are way higher than they need to be for

111
00:07:23,136 --> 00:07:26,903
the amount of data involved or gradually increasing each time you run it.

112
00:07:27,203 --> 00:07:32,363
So it, it's not guaranteed that that's a sign of
bloat, but there's a really good chance that it is.

113
00:07:32,694 --> 00:07:38,004
Nikolay: This is great, by the way, you, you are, you
apply this classification of micro and micro, and I

114
00:07:38,004 --> 00:07:40,568
even didn't think about it, but it's exactly like.

115
00:07:40,724 --> 00:07:42,074
Aligns with my,  thoughts.

116
00:07:42,494 --> 00:07:46,844
So we have macro effects and micro effects starting from micro effects.

117
00:07:47,234 --> 00:07:55,423
Sometimes some particular queries and with particular parameters
might behave much worse for bloated in the case of bloated index.

118
00:07:55,423 --> 00:08:01,940
Because for example, instead of dealing with a few
buffer, We need to deal with interest, which are sparsely

119
00:08:01,940 --> 00:08:04,820
stored, and we need to involve much more buffers.

120
00:08:05,030 --> 00:08:10,684
So we have, we can see degradation sometimes several
orders of magnitude it's like in extreme cases.

121
00:08:10,708 --> 00:08:16,321
but it's tricky to find for example, you
checked your query for a few parameter sets.

122
00:08:16,670 --> 00:08:20,629
You see, it's not bad compared to loaded versus UN bloated, right?

123
00:08:20,992 --> 00:08:26,002
but you don't look at other parameters,
but for other parameters it may be worse.

124
00:08:26,002 --> 00:08:33,652
So it's, it's a tricky how to  automatically, check how
bad it is because actually Bry height grows quite slowly.

125
00:08:34,226 --> 00:08:38,038
It's like AIF basis, very high, like, so it grows very slowly.

126
00:08:38,308 --> 00:08:41,334
And if we go from, I don't know, like.

127
00:08:41,983 --> 00:08:48,949
Thousand buffers to 10 to hundred thousand buffers
for overall index size or million buffers already.

128
00:08:49,283 --> 00:09:02,573
We don't see a huge increase in, look up time because of of height because
we just a few, okay a couple of more hopes to reach the lift who, who cares.

129
00:09:02,573 --> 00:09:04,733
So a couple of more IO doesn't.

130
00:09:05,318 --> 00:09:07,058
So B three is excellent here.

131
00:09:07,058 --> 00:09:08,468
Like it grows very slowly.

132
00:09:08,738 --> 00:09:15,835
So searching in, let's find one row among a million
rows or let's find one row among a billion rows.

133
00:09:16,175 --> 00:09:18,232
Well, difference is not huge, right?

134
00:09:18,232 --> 00:09:19,912
It's not, it won't be noticeable.

135
00:09:19,912 --> 00:09:22,542
Like it won't be 1000  times.

136
00:09:22,542 --> 00:09:23,022
Difference it.

137
00:09:23,408 --> 00:09:32,194
Small difference, a few more IO hops, but if you need to deal
with many, many interest and blood means that distribution

138
00:09:32,199 --> 00:09:35,914
of them they're stored like sparsely in case of blood that.

139
00:09:36,257 --> 00:09:39,219
Index, of course the difference will be amazing.

140
00:09:39,339 --> 00:09:43,989
Michael: I guess that's covering micro a little
bit, but on the macro side, we've got things like

141
00:09:44,119 --> 00:09:47,349
Nikolay: Macro in my opinion, it's much more interesting.

142
00:09:47,979 --> 00:09:48,639
I feel it.

143
00:09:48,699 --> 00:09:57,501
I feel it like if we have a 99% blood, it means we have
so many more pages  to store the same data our index.

144
00:09:57,868 --> 00:10:05,390
And it means that not only disc space occupied I worried
about it less, like disc space is interesting, i, I will

145
00:10:05,390 --> 00:10:09,129
explain my thoughts in the second, but the most notice.

146
00:10:09,155 --> 00:10:20,008
Performance negative effect from a high blood of indexes, in my opinion, is
We need to keep more pages in,  cash, both in the buffer pool and page cash.

147
00:10:20,446 --> 00:10:24,436
So it means that our cash  effectiveness reduces.

148
00:10:24,436 --> 00:10:27,223
I, have cases sometimes where.

149
00:10:27,820 --> 00:10:31,188
That database or few databases they grown so quickly.

150
00:10:31,193 --> 00:10:41,604
And the company may using this databases may already be a multibillion
company, ya unicorn, but never, nobody never was fighting with blog.

151
00:10:41,604 --> 00:10:50,495
So for example, up to half of databases blocked both table
and index, and it means that shared buffers work much worse.

152
00:10:50,612 --> 00:10:58,072
Michael: This is a great point, actually again, a caveat is this, this
applies of course, when your database exceeds the size or database,

153
00:10:58,072 --> 00:11:05,392
including indexes, including bloated indexes exceeds the size of
shared buffers before then probably not gonna cause you any issues.

154
00:11:05,632 --> 00:11:09,796
But most of us are probably running databases where we exceed

155
00:11:09,916 --> 00:11:14,575
Nikolay: if, or if it exceeds cash size buffer, buffer size.

156
00:11:14,593 --> 00:11:17,803
But if you eliminate blo, it, it fits again.

157
00:11:17,893 --> 00:11:18,133
Right?

158
00:11:18,448 --> 00:11:25,378
Michael: Yeah, but that would be, that would imagine that the
difference you would see then, like it would be stark the difference.

159
00:11:25,618 --> 00:11:25,858
Nikolay: Yeah.

160
00:11:25,858 --> 00:11:28,258
This macro effect is quite noticeable.

161
00:11:28,258 --> 00:11:37,776
And also we can talk budgets here, like spending on hardware
or in the case of Aurora, where, where they charge for IO.

162
00:11:38,413 --> 00:11:40,978
If we need to do much more IO.

163
00:11:41,451 --> 00:11:42,161
save here.

164
00:11:42,161 --> 00:11:42,371
Right.

165
00:11:42,742 --> 00:11:45,755
Also this macro effect is  very interesting.

166
00:11:46,175 --> 00:11:54,778
I think it's maybe it's the most important one in my opinion,
in terms of performance, but third one and this third one, this

167
00:11:54,778 --> 00:11:59,057
space occupied is, is usually the first thing that comes to mind.

168
00:11:59,058 --> 00:12:03,698
When we talk about, we think about blo
blo means that we occupy much more space.

169
00:12:03,698 --> 00:12:12,251
we pay for this, but not only we pay for this if we also recall
our previous episodes, we need to write more to wall, right?

170
00:12:12,965 --> 00:12:21,558
Full page rights, for example index rights, also go there to
wall and  more wall is generated and data files also bigger.

171
00:12:22,068 --> 00:12:22,998
Wall is bigger.

172
00:12:23,355 --> 00:12:26,415
It gives more work for backup system.

173
00:12:26,881 --> 00:12:34,609
Backup is longer, but also replication, physical at least
the, it also slow,  , more bites need to be transferred

174
00:12:34,620 --> 00:12:38,198
to standby node notes, negative effects everywhere.

175
00:12:38,258 --> 00:12:38,588
Right

176
00:12:38,733 --> 00:12:39,223
Michael: Yeah.

177
00:12:39,223 --> 00:12:40,103
That's a really good point.

178
00:12:40,533 --> 00:12:41,023
Yeah.

179
00:12:41,258 --> 00:12:47,036
Nikolay: Check pointer also actually check pointer
also needs to take care about more dirty pages.

180
00:12:47,554 --> 00:12:47,854
Michael: Yeah.

181
00:12:48,184 --> 00:12:54,414
So the ideal world is not to grow your indexes 10
X, and then re-index them to shrink them back down.

182
00:12:54,594 --> 00:12:56,614
In an ideal world, we'll stay on top of it.

183
00:12:56,806 --> 00:13:00,460
So, it stays in a much more manageable range.

184
00:13:00,497 --> 00:13:08,362
Firstly I guess through auto vacuum, but also as we've
discussed auto vacuum, Shrink it down once it's started to blow.

185
00:13:08,362 --> 00:13:16,193
So we do need to do these occasional reindex, ideally reindex
concurrently I'm guess, or as you were gonna say, I guess PG repack.

186
00:13:16,593 --> 00:13:16,923
Nikolay: right.

187
00:13:16,953 --> 00:13:23,823
So we have a tool called Pogs checkup, which explains a lot
of details about both provides some recommendations and.

188
00:13:24,461 --> 00:13:26,691
I mean, we ly say, yeah, we have this tool.

189
00:13:27,411 --> 00:13:37,545
And it tries to explain what to do, but in general, the plan we recommend
especially for cases which are like example, database company was

190
00:13:37,545 --> 00:13:41,848
super successful database grown, but is no proper processes in place.

191
00:13:42,238 --> 00:13:43,061
We usually.

192
00:13:43,061 --> 00:13:48,970
Recommend of course you can see the auto whack
settings a hundred percent, but not always.

193
00:13:49,000 --> 00:13:58,971
This will help to like we, we just discussed it, index health, my degree
also, if you have long transactions also if you have large tables, let's

194
00:13:58,971 --> 00:14:08,912
touch it once again, in, in a minute,  auto won't help you a lot, but it's
still needed to make it more aggressive to eliminate datas faster then.

195
00:14:09,156 --> 00:14:17,645
Run, index maintenance once and then prepare to run into maybe in
fully automated fashion during weekends, because index maintenance

196
00:14:17,645 --> 00:14:20,959
means index circula creation, and it's definitely stress for dis.

197
00:14:21,109 --> 00:14:22,309
And for wall as well.

198
00:14:22,309 --> 00:14:24,859
And for replications also, it's definitely some stress.

199
00:14:25,279 --> 00:14:28,009
So prepare it to run auto automatically.

200
00:14:28,129 --> 00:14:36,734
Every for example, every, every weekend, for example, in
GitLab, we did  GitLab is disclaimer, our client and they have

201
00:14:36,734 --> 00:14:44,446
a lot of interesting information, automation and articles how
they automated and they run fully automated index recreation.

202
00:14:44,446 --> 00:14:44,656
Every.

203
00:14:45,094 --> 00:14:46,289
Michael: I should read those I haven't

204
00:14:46,772 --> 00:14:47,102
Nikolay: right.

205
00:14:47,432 --> 00:14:49,592
A question is how to recreate indexes.

206
00:14:49,633 --> 00:14:56,901
It's a   depending on your Pogs version from Pogs
12, it's possible to run a index concurrently.

207
00:14:57,024 --> 00:14:58,214
Good earlier.

208
00:14:58,438 --> 00:15:01,905
Well, the idea is okay, you can create in this drop old one.

209
00:15:02,690 --> 00:15:09,545
but if this index is participating in some constraints,
like primary key, , it'll be quite a task, but there is

210
00:15:09,545 --> 00:15:14,186
also pry park  Can Repak indexes basically recreate them.

211
00:15:14,192 --> 00:15:19,938
not touching tables because removal blow from table
is  a bigger task than just recreational of indexes.

212
00:15:19,938 --> 00:15:22,758
So Pak can work with any Pogo measure version.

213
00:15:23,167 --> 00:15:27,247
I mean, not the, the old world one, but, but for example, 96,

214
00:15:27,519 --> 00:15:28,009
Michael: Yeah.

215
00:15:28,144 --> 00:15:30,724
Nikolay: It has some interesting caveats though.

216
00:15:30,784 --> 00:15:34,530
For example, if you have deferred constraints, do I pronounce it right?

217
00:15:34,535 --> 00:15:35,670
Deferred constraints, right.

218
00:15:36,270 --> 00:15:41,007
So if you have deferred constraints might have issues with running PGE park.

219
00:15:41,113 --> 00:15:45,253
But for, for table, actually not for index forensic, index's fine for table.

220
00:15:45,253 --> 00:15:46,483
It, it can have issues.

221
00:15:46,813 --> 00:15:51,043
Me mirror had problem set, but, and they, they wrote an excellent article.

222
00:15:51,043 --> 00:15:57,178
Well, we can provide a link if you need to fight with
blo and table and you have deferred constraints, it's a

223
00:15:57,178 --> 00:16:00,766
very good read, but index, there won't be any problem.

224
00:16:01,297 --> 00:16:04,897
so, but, but modern, modern approach is just index concurrently.

225
00:16:05,257 --> 00:16:08,587
Unfortunately, rend condex concurrently.

226
00:16:08,748 --> 00:16:10,948
This feature had so many bug fixed.

227
00:16:11,008 --> 00:16:12,028
All of them are fixed.

228
00:16:12,033 --> 00:16:12,298
Right.

229
00:16:12,718 --> 00:16:18,471
But history shows that  many people, including
my me already think there might be more.

230
00:16:18,511 --> 00:16:19,771
Found in this problem.

231
00:16:19,771 --> 00:16:28,081
So I, I would recommend if you run index concurrently, it's
worth also having  process of index certification for example,

232
00:16:28,298 --> 00:16:32,023
using arm check, you can   check for corruption periodically.

233
00:16:32,343 --> 00:16:42,287
Weekly, for example, also after  recreation, maybe it's a good idea to
double check if there is corruption or no, because this, recent bug,

234
00:16:42,347 --> 00:16:47,110
which was discovered in may in POS August 14, we briefly discussed it.

235
00:16:47,363 --> 00:16:54,876
So the interesting thing is that if you have huge tables,
like terabyte size, multi terabyte size, and they are

236
00:16:54,876 --> 00:17:00,157
not  We create index or we create index during all this time.

237
00:17:00,162 --> 00:17:01,267
It can take hours.

238
00:17:01,514 --> 00:17:10,493
OWA cannot delete that tops in any table, any index it's database
wide problem, and poss 14, there was attempt , to improve and.

239
00:17:11,408 --> 00:17:19,911
XME horizon was not held during index creation or recreation,
but unfortunately back was discovered in may and in June,

240
00:17:20,121 --> 00:17:25,351
August 14.4 had this functionality fixed optimization reverted.

241
00:17:25,651 --> 00:17:29,194
So the rule of thumb don't allow your tables.

242
00:17:29,249 --> 00:17:35,192
To grow more than a hundred gigabytes because index
maintenance will require more index maintenance.

243
00:17:35,712 --> 00:17:35,922
Michael: Yeah.

244
00:17:37,020 --> 00:17:38,745
Actually just while we're on that topic.

245
00:17:38,745 --> 00:17:49,035
So I think if you're on a, a minor version of 14 lower than 14.4 it's
it's and you use reindex concurrently upgrade with the exception.

246
00:17:49,040 --> 00:17:53,655
I think of RDS and Aurora who back patched it to

247
00:17:53,850 --> 00:17:55,830
Nikolay: maybe I didn't know about it.

248
00:17:56,190 --> 00:18:04,020
Mm-hmm and not only index can currently create index can
currently, also was a problem and everyone uses it because how

249
00:18:04,170 --> 00:18:07,530
can create index on running system create index can currently.

250
00:18:07,890 --> 00:18:14,306
So if you are running 14.0 till 14.3 you have urgent task.

251
00:18:14,786 --> 00:18:18,116
I think everyone knows it, but just worth repeating anyway.

252
00:18:18,911 --> 00:18:21,521
Michael: I've met somebody the other day who didn't unfortunately.

253
00:18:21,521 --> 00:18:24,878
So I think it's worth  repeating, but Awesome.

254
00:18:24,908 --> 00:18:27,853
You'd have mentioned a couple of things that we probably should touch on.

255
00:18:28,093 --> 00:18:30,883
So index corruption's another version of.

256
00:18:31,558 --> 00:18:33,126
Maintenance that you might need to do.

257
00:18:33,245 --> 00:18:39,985
I know of one time that's really famous for causing
corrupt indexes, which is operating system upgrades.

258
00:18:40,226 --> 00:18:42,446
Nikolay: Any operating system is dangerous.

259
00:18:42,596 --> 00:18:51,854
A operat system upgrade is dangerous  it Glissy version of
great it may cause and corruption silently, and it's a problem.

260
00:18:51,974 --> 00:19:01,555
So it's a big problem, unfortunately, and, and,  there's no like easy solution
and it it's quite easy to get into trouble if you don't think about it.

261
00:19:02,410 --> 00:19:02,980
Unfortunately.

262
00:19:03,100 --> 00:19:03,460
So

263
00:19:03,760 --> 00:19:08,477
Michael: Well, just wanted to say it because just so
in any case, anybody else wasn't aware of that one?

264
00:19:08,663 --> 00:19:10,763
certainly wasn't a year or two ago, so,

265
00:19:10,973 --> 00:19:15,293
Nikolay: yeah, you need to recreate indexes and
you need to do it inside maintenance window.

266
00:19:15,683 --> 00:19:16,043
Michael: Yeah.

267
00:19:16,463 --> 00:19:23,993
Nikolay: Also, unfortunately like genes can be bloated as well, and
there is no good way to estimate it and they can be corrupted as well.

268
00:19:23,993 --> 00:19:32,813
Also, there is no good way official way to estimate a there are patches
for check that still are not applied and, but they already quite advanced.

269
00:19:32,813 --> 00:19:38,955
And I used those patches to gene and indexes in a couple of places and it.

270
00:19:39,265 --> 00:19:41,005
So, I mean, it didn't find anything.

271
00:19:41,005 --> 00:19:44,499
So there's also always question, we had false positives there.

272
00:19:44,504 --> 00:19:46,629
It was fixed, but yeah, so,

273
00:19:46,994 --> 00:19:49,424
Michael: I think their promise is no force negatives, right?

274
00:19:49,448 --> 00:19:49,838
am check,

275
00:19:50,118 --> 00:19:50,328
Nikolay: right.

276
00:19:50,449 --> 00:19:50,699
right.

277
00:19:51,021 --> 00:20:00,425
So if you, if you talk about bigger picture, imagine if you have this like
case successful company, very big database, but they have a lot of, blood.

278
00:20:00,492 --> 00:20:00,912
table.

279
00:20:00,912 --> 00:20:11,206
And  indexes  removing table blood is also interesting, but indexes,
we discussed various problems, including this micro level when

280
00:20:11,206 --> 00:20:20,844
particular queries are slow, removing that blood by the way, if
you have queue like pattern of working with table, you insert.

281
00:20:21,123 --> 00:20:24,163
Probably updated a few times in delete because task was process.

282
00:20:24,840 --> 00:20:27,664
this is a good way to get high blood.

283
00:20:27,940 --> 00:20:37,635
And I saw many times that high blood at some point it's like fine, but at
some point a query is degrader very fast because of this micro level problem.

284
00:20:38,151 --> 00:20:39,381
I mean, we have a memory.

285
00:20:40,346 --> 00:20:41,876
We have big shared buffers.

286
00:20:41,876 --> 00:20:49,436
Stable is quite small, like maybe few gigabytes only, but we see some
queries degrade because of these micro level issues we discussed.

287
00:20:49,916 --> 00:20:51,997
So the question is where to start.

288
00:20:52,002 --> 00:20:57,862
If you have many, many, many hundreds or thousands of
indexes, where would you start for this initial run?

289
00:20:58,212 --> 00:21:00,102
Would, would you start from small indexes?

290
00:21:00,693 --> 00:21:10,083
But those which have higher, like you, you put threshold like 90%
and, and take care of smaller indexes or large indexes, then you go

291
00:21:10,088 --> 00:21:16,228
down or you start with indexes, which have more blow bites first.

292
00:21:16,628 --> 00:21:17,648
Where would you start?

293
00:21:18,363 --> 00:21:19,563
Michael: That's a good question.

294
00:21:19,663 --> 00:21:25,213
I saw your tweet as well about asking people
about unused and redundant indexes as well.

295
00:21:25,213 --> 00:21:32,413
And I, I know that maybe I'm cheating by having seen that,
but that felt like a really nice, like, especially redundant.

296
00:21:32,623 --> 00:21:36,033
I'm not so sure about unused because I wonder if.

297
00:21:36,815 --> 00:21:43,035
Bloated anyway, I'm not I'm it depends on statistics, I guess,
a little bit, but redundant being a redundant index, I guess.

298
00:21:43,429 --> 00:21:47,249
easiest example of that is you've got the exact same index.

299
00:21:47,249 --> 00:21:52,905
So maybe let's take a simple case of a
single column Bre index, but we've got two of

300
00:21:53,034 --> 00:22:02,829
Nikolay: Well, redundant can be, for example, you have single column
you have two is of course single columns is redundant to two as if if

301
00:22:02,829 --> 00:22:06,297
two is, has the same column in the first place, not in the second place.

302
00:22:06,887 --> 00:22:07,307
This is

303
00:22:07,307 --> 00:22:07,547
a

304
00:22:07,742 --> 00:22:07,902
Michael: matters.

305
00:22:08,627 --> 00:22:16,280
Nikolay: classic redundant, but the problem will be what if we try to
eliminate the first index because it's redundant, but the second index is

306
00:22:16,280 --> 00:22:21,770
also unused and according to different report, we also decide to drop it.

307
00:22:22,130 --> 00:22:25,160
You drop both indexes and it's not a good idea already.

308
00:22:25,820 --> 00:22:30,914
Michael: Or it's so bloated that posts actually avoids using it and goes to

309
00:22:30,931 --> 00:22:32,516
Nikolay: This is, this is interesting question.

310
00:22:32,516 --> 00:22:34,000
So we discussed.

311
00:22:34,285 --> 00:22:40,595
Problems micro macro, depending on which you can
consider is the biggest problem for you for your case.

312
00:22:40,850 --> 00:22:41,880
see two options.

313
00:22:42,330 --> 00:22:49,254
If you think about particular queries that have very degraded
performance because of blood, you probably should say,  let's

314
00:22:49,364 --> 00:22:52,434
reindex indexes with blood level more than 90% first.

315
00:22:53,004 --> 00:23:02,297
Even if they, they are smaller ones and don't contribute a
To this macro level problem, like spamming our buffer pool.

316
00:23:02,893 --> 00:23:12,399
But if you think macro level problem is bigger, you probably should start with
the biggest like the, the indexes, which have more blo advice estimated right.

317
00:23:12,404 --> 00:23:18,048
Order by and go goat from top to down, even though some of them are unused

318
00:23:18,453 --> 00:23:24,363
Michael: I might, can I make a potentially wrong
argument for always starting with the smaller ones?

319
00:23:24,393 --> 00:23:25,683
I'll be interested in your thoughts.

320
00:23:26,193 --> 00:23:35,057
If you, if you've got a macro level problem, if your database
is on fire and you're trying to load the smaller index

321
00:23:35,057 --> 00:23:40,097
indexes, whilst they, they might be being used, like just
cuz they're smaller doesn't mean they aren't being used more.

322
00:23:40,367 --> 00:23:43,637
So I wonder if you could also look at access.

323
00:23:44,612 --> 00:23:45,362
If you started

324
00:23:45,362 --> 00:23:50,543
with the, smaller ones, the, yeah, the other the other angle would be.

325
00:23:51,323 --> 00:23:55,073
If I re-index a smaller index, it finishes faster.

326
00:23:55,133 --> 00:24:02,310
And my system reduces its load slightly sooner than if
I, if I re-index a large one and it takes hours, I've got

327
00:24:02,310 --> 00:24:06,439
hours more at the same level of higher disaster, I guess.

328
00:24:06,439 --> 00:24:12,019
But if I start re-indexing smaller ones and they finish
faster, maybe I can reduce the load a little bit quicker.

329
00:24:12,739 --> 00:24:18,815
Nikolay: Well, in my opinion, if, if we database on fire
as a whole, I will start from top to bottom fighting.

330
00:24:18,815 --> 00:24:21,635
Withs I, I, I tried to think about it.

331
00:24:21,665 --> 00:24:25,108
Should we look at usage stats for indexes?

332
00:24:25,198 --> 00:24:27,328
I didn't see any big reason for that.

333
00:24:27,328 --> 00:24:33,524
Like any way we want to, like, we, we didn't discuss the
threshold, but usually practical threshold is 30, 40.

334
00:24:34,194 --> 00:24:37,994
somewhere there, if we have bigger blood, this

335
00:24:37,994 --> 00:24:38,174
Michael: Yeah.

336
00:24:38,756 --> 00:24:40,076
Nikolay: this is for blood.

337
00:24:40,076 --> 00:24:40,316
Yes.

338
00:24:40,616 --> 00:24:40,856
It's.

339
00:24:40,856 --> 00:24:44,906
If it's blood below 30%, 20%, for example, it's we don't care usually.

340
00:24:45,138 --> 00:24:51,369
indexes by the way, are bloated by default 10%
by, on purpose because they have fuel factor 90,

341
00:24:51,922 --> 00:24:54,712
Michael: query most bloat estimation queries factor in

342
00:24:54,862 --> 00:25:01,623
Nikolay: yeah, they, yes, they, they, it this into account,
but anyway, I mean, 90% fill factor means we bought on

343
00:25:01,623 --> 00:25:05,615
purpose because we want some room for updates inside pages.

344
00:25:06,275 --> 00:25:07,835
If it's 90, if it's, if it.

345
00:25:08,474 --> 00:25:10,184
Actually 80.

346
00:25:10,244 --> 00:25:10,484
Okay.

347
00:25:10,484 --> 00:25:11,504
It's not that bad.

348
00:25:12,104 --> 00:25:18,155
I mean, I mean 20% blood, but if blot already had like
half of it, it's maybe already time to take care of it.

349
00:25:18,515 --> 00:25:23,165
And so I would go from top to bottom if we take care about hold database, but.

350
00:25:23,585 --> 00:25:31,952
In some cases, database is fine, but some particular queries, for
example, with Q Q, like pattern we use, and we see this particular

351
00:25:31,972 --> 00:25:35,097
queries dealing with the stable, they have very bad performance.

352
00:25:35,097 --> 00:25:41,847
In this case, I would start from the most
bloated indexes regardless of their size.

353
00:25:42,357 --> 00:25:44,127
Maybe I would go from top to bottom.

354
00:25:44,790 --> 00:25:54,391
but I would, I would skip indexes, which have blood estimation 60, 70 for
first run to help as soon as possible those queries, which we know they,

355
00:25:55,111 --> 00:25:58,451
maybe I would take particular tables sexually in this case because right.

356
00:25:58,595 --> 00:25:59,030
why

357
00:25:59,720 --> 00:26:04,500
Michael: But I would actually say, I I think I
understand now why  that actually makes sense to the

358
00:26:04,500 --> 00:26:07,860
strategy because bloat is not independent of usage.

359
00:26:08,100 --> 00:26:11,007
Chances are, if it's a heavily with

360
00:26:11,742 --> 00:26:12,252
Nikolay: good, good.

361
00:26:12,252 --> 00:26:13,752
But imagine some indexes.

362
00:26:13,812 --> 00:26:20,426
Oh, by the way, if indexes unused, they, they,
we should apply extreme solution for blo.

363
00:26:20,486 --> 00:26:21,396
We just drop index.

364
00:26:21,769 --> 00:26:21,905
Right.

365
00:26:22,143 --> 00:26:29,371
know, it's unused everywhere on standbys everywhere, and we
observe quite long, I usually recommend a statistics age to be

366
00:26:29,371 --> 00:26:33,625
more than one month because we had cases which index is unused.

367
00:26:33,740 --> 00:26:39,985
Several weeks we drop it, but in the beginning
of it's not, it's headed and mentioned.

368
00:26:42,325 --> 00:26:44,335
He mentioned the case when they headed.

369
00:26:44,515 --> 00:26:51,624
And when first of next month, Analysts are waiting for some
report and they don't see report because index was dropped.

370
00:26:52,074 --> 00:26:56,844
So some indexes are used only once per month and they are very important.

371
00:26:57,024 --> 00:27:01,964
So usage numbers, if they are not zero, I don't know how to use them.

372
00:27:01,964 --> 00:27:03,104
Maybe I'm wrong.

373
00:27:03,164 --> 00:27:12,067
I, I try to like how to join blood and usage more rights
more it, but imagine the case, you, you have an, a new index.

374
00:27:12,617 --> 00:27:15,020
You don't use it for index scans, but.

375
00:27:15,093 --> 00:27:18,513
Question doesn't contribute to this macro problem.

376
00:27:18,513 --> 00:27:21,351
Spamming, spamming our CAEs.

377
00:27:22,376 --> 00:27:23,246
Michael: Probably not.

378
00:27:23,396 --> 00:27:25,316
It's probably been long since evicted.

379
00:27:25,556 --> 00:27:26,036
Oh, you think

380
00:27:26,051 --> 00:27:26,591
Nikolay: yes.

381
00:27:26,891 --> 00:27:30,821
Any update, unless, unless it's hot hip, he on apples.

382
00:27:31,091 --> 00:27:33,821
It'll need to update this index to update the index.

383
00:27:33,821 --> 00:27:35,681
We need to load this page to memory.

384
00:27:36,986 --> 00:27:38,936
Michael: And so, and, and right.

385
00:27:38,936 --> 00:27:39,896
Overhead as well, but

386
00:27:40,061 --> 00:27:45,011
Nikolay: We don't use index, but still it occupies some space in our CAEs.

387
00:27:46,511 --> 00:27:46,751
yeah,

388
00:27:47,216 --> 00:27:48,086
Michael: interesting.

389
00:27:48,561 --> 00:27:51,651
Nikolay: maybe it also has some effects at micro levels.

390
00:27:51,741 --> 00:27:52,191
I don't know.

391
00:27:52,196 --> 00:27:52,731
Maybe not.

392
00:27:53,128 --> 00:27:56,301
So there are many, many, many interesting things here.

393
00:27:56,771 --> 00:28:01,310
, but I, I hope this discussion will help someone to understand it better.

394
00:28:01,310 --> 00:28:08,431
And anyway, just fight with block in indexes and
prepare not only to do it once manually, but automated.

395
00:28:08,831 --> 00:28:15,183
we using Pak or index can currently carefully
with some  index can currently might lead to.

396
00:28:15,333 --> 00:28:19,473
Well, well, right now, a lot of bug fixed people use our index concurrently.

397
00:28:19,473 --> 00:28:22,233
Many people use it, many projects, large projects.

398
00:28:22,703 --> 00:28:23,543
So it works.

399
00:28:23,603 --> 00:28:30,977
So I, I don't want to, to be blamed for like Nikolay told us
not to use index concurrently, use it, but just keep in mind

400
00:28:30,977 --> 00:28:34,367
that many bug were fixed and maybe there are some bug in future.

401
00:28:34,737 --> 00:28:39,785
I would just automate it, but also automate analysis of corruption using check

402
00:28:40,375 --> 00:28:41,285
least work weekly.

403
00:28:41,483 --> 00:28:41,803
at

404
00:28:41,803 --> 00:28:45,121
Michael: AMEC available managed services or?

405
00:28:45,121 --> 00:28:49,880
Nikolay: Yeah, it's it's it's standard country It's
it's a country model it's available everywhere,

406
00:28:50,508 --> 00:28:52,248
Michael: all the Contra modules are available everywhere.

407
00:28:52,248 --> 00:28:54,597
So I'm glad I But the, yes.

408
00:28:54,667 --> 00:28:55,117
Awesome.

409
00:28:55,207 --> 00:29:02,156
So you mentioned indexes briefly, and I think, I, I don't
think, I don't know enough about maintenance of them,

410
00:29:02,156 --> 00:29:04,556
but I think there, I read a really good blog post by,

411
00:29:04,916 --> 00:29:05,576
Nikolay: genes.

412
00:29:05,606 --> 00:29:06,556
There are Betric.

413
00:29:07,099 --> 00:29:13,331
small,  for posting list and for keys, as I
understand, like two types, maybe I'm wrong.

414
00:29:13,361 --> 00:29:19,721
I, I like it's, it's already past few
years since I touched the gene internals.

415
00:29:19,721 --> 00:29:22,921
So maybe I'm wrong here, but definitely there are three.

416
00:29:23,376 --> 00:29:31,645
I, I know that from developers first some Beri didn't exist and
the performance was not, was not good for, for larger scale,

417
00:29:31,705 --> 00:29:36,705
but between side and that's why they can be also corrupted.

418
00:29:36,754 --> 00:29:42,391
When you switch to new gripy version and the rules character order.

419
00:29:42,981 --> 00:29:44,102
colation also changes.

420
00:29:44,102 --> 00:29:50,740
So so gene can be just, of course can be
also corrupted because J is three as well.

421
00:29:51,660 --> 00:29:51,840
Michael: Yeah.

422
00:29:51,850 --> 00:29:54,076
Nikolay: So, but IM check is doesn't exist.

423
00:29:54,556 --> 00:29:56,806
How to eliminate, how to estimate blood.

424
00:29:56,811 --> 00:29:57,706
Also, we don't know.

425
00:29:58,096 --> 00:30:03,406
My, my rule use CLS use vacuum full from time to time and see actual

426
00:30:03,436 --> 00:30:04,046
Michael: actual,

427
00:30:04,230 --> 00:30:07,470
Nikolay: reliable number, like brute force.

428
00:30:08,223 --> 00:30:08,713
Michael: Yeah.

429
00:30:08,773 --> 00:30:09,373
I like it.

430
00:30:09,643 --> 00:30:13,613
It's the First good use case for a vacuum for on a, on an active system I've

431
00:30:13,729 --> 00:30:19,144
Nikolay: of course it'll take many hours if you have
many terabytes  in size, but maybe you should have.

432
00:30:19,546 --> 00:30:21,196
Partitioning is a reminder, right?

433
00:30:21,406 --> 00:30:24,526
Like don't allow your tables to grow over a hundred gigs.

434
00:30:24,979 --> 00:30:28,763
And then if you have partitioning, you
can run, check and parallel, by the way.

435
00:30:28,763 --> 00:30:30,523
It's not, not a trivial task.

436
00:30:30,746 --> 00:30:36,699
have some scripts, automation, scripts developed for GitLab, I guess, and run.

437
00:30:36,699 --> 00:30:38,094
We can run up check and parallel.

438
00:30:38,094 --> 00:30:42,054
We can run back in full and parallel on very temporarily.

439
00:30:42,332 --> 00:30:45,832
Clone, which has a lot of power and this automation is good to

440
00:30:45,972 --> 00:30:46,482
Michael: Nice.

441
00:30:46,812 --> 00:30:49,572
Is that in recent versions or has that been around for a while?

442
00:30:50,122 --> 00:30:51,262
Vacuum for in parallel.

443
00:30:51,292 --> 00:30:51,892
Nikolay: No, no, no, no.

444
00:30:51,892 --> 00:30:54,922
What can from parallel doesn't exist and I'm checking parallel doesn't exist.

445
00:30:54,922 --> 00:30:55,912
You need to script it.

446
00:30:56,862 --> 00:30:57,342
Michael: Oh, okay.

447
00:30:57,462 --> 00:30:59,262
I understand now because of the partitions.

448
00:30:59,562 --> 00:31:03,342
Nikolay: it's it's, by the way, it's interesting
people random check in, in single thread.

449
00:31:03,342 --> 00:31:08,379
And I also did it until some someone from
GIK think, think, thanks for this question.

450
00:31:08,649 --> 00:31:09,609
Asked why we ran.

451
00:31:09,879 --> 00:31:11,469
Like we have so many course why.

452
00:31:11,994 --> 00:31:13,764
It was, it was an excellent question.

453
00:31:13,764 --> 00:31:17,634
Of course it should be in parallel because
it it'll produce results much faster.

454
00:31:17,874 --> 00:31:25,341
And if you use, if you are in the cloud and you use temporary
clone, for example, of course you want to make a job faster,

455
00:31:25,551 --> 00:31:33,283
even if it applies all CPUs, a hundred percent of them because
you, you pay for minutes or seconds of usage in edible.

456
00:31:34,348 --> 00:31:43,588
not for, not for hundred percent or 50%, it doesn't matter, but for
time and you want to make your job faster, you, you run, run full speed

457
00:31:44,071 --> 00:31:44,371
Michael: yeah.

458
00:31:44,761 --> 00:31:46,411
Same in maintenance windows, I guess.

459
00:31:46,782 --> 00:31:47,472
you are against the

460
00:31:47,641 --> 00:31:48,451
Nikolay: maintenance windows.

461
00:31:48,451 --> 00:31:54,278
Yes, but not the same for regular index maintenance on
production, because there you probably want a single

462
00:31:54,278 --> 00:31:58,291
thread or a few threads of  index concurrently, maybe just.

463
00:31:58,786 --> 00:32:04,986
Because it's still, it's still already some stress and you
don't want to, make it full fulfilled, by the way, again,

464
00:32:04,986 --> 00:32:09,376
re encourage our listeners to read the articles GitLab blog.

465
00:32:09,553 --> 00:32:11,473
they have good materials.

466
00:32:11,563 --> 00:32:12,013
I recommend.

467
00:32:12,488 --> 00:32:23,210
I, I remember also Peter Gagan came and learned something interesting from
their experience and then working on arid, dead duplication in Pogo 13 and 14.

468
00:32:23,470 --> 00:32:25,409
So it's recommended material for it.

469
00:32:25,997 --> 00:32:26,417
Michael: Awesome.

470
00:32:26,601 --> 00:32:28,491
I suspect that's all we've got time for today.

471
00:32:28,772 --> 00:32:29,592
Thank you everybody.

472
00:32:29,592 --> 00:32:30,102
Thanks.

473
00:32:30,102 --> 00:32:34,111
everyone who keeps giving us feedback, keeps
sending us suggestions and shares it online.

474
00:32:34,116 --> 00:32:35,237
We really appreciate it.

475
00:32:35,580 --> 00:32:37,200
forward to seeing you next week.

476
00:32:37,530 --> 00:32:37,920
Cheers, Nick.

477
00:32:38,115 --> 00:32:38,325
Nikolay: Yeah.

478
00:32:38,553 --> 00:32:39,303
Thank you, Michael.

479
00:32:39,303 --> 00:32:39,933
See you next week.

480
00:32:39,933 --> 00:32:40,143
Bye.

481
00:32:40,296 --> 00:32:40,626
Michael: take care.

482
00:32:40,631 --> 00:32:40,836
Bye.