1
00:00:00,060 --> 00:00:02,080
Nikolay: Hello, hello, this is Postgres.FM.

2
00:00:02,080 --> 00:00:06,140
My name is Nik, Postgres.AI and as usual, my co-host is Michael,

3
00:00:06,140 --> 00:00:07,000
pgMustard.

4
00:00:07,000 --> 00:00:07,780
Hi, Michael.

5
00:00:08,500 --> 00:00:09,500
Michael: Hello, Nik.

6
00:00:10,080 --> 00:00:10,940
How are you doing?

7
00:00:11,040 --> 00:00:11,760
Nikolay: I'm doing great.

8
00:00:11,760 --> 00:00:12,540
How are you?

9
00:00:13,040 --> 00:00:14,280
Michael: I'm good, thank you.

10
00:00:14,300 --> 00:00:17,120
Nikolay: I tried to say I thought, but actually this is you who

11
00:00:17,120 --> 00:00:20,580
thought it's not enough episodes starting with "multi" word.

12
00:00:22,360 --> 00:00:24,640
So what is the topic today?

13
00:00:24,640 --> 00:00:25,400
Tell us.

14
00:00:26,140 --> 00:00:30,020
Michael: Yeah, I chose multi-column indexes and I know we've

15
00:00:30,020 --> 00:00:31,160
talked about them a bunch.

16
00:00:31,160 --> 00:00:34,780
We've done episodes on things like index-only scans, for example.

17
00:00:35,140 --> 00:00:38,300
But it felt to me looking back at those that we didn't really

18
00:00:38,300 --> 00:00:42,640
get into some of the details around multi-column indexes that

19
00:00:42,780 --> 00:00:44,280
I find particularly interesting.

20
00:00:45,060 --> 00:00:50,340
I think also this is still the area I see people, especially

21
00:00:51,420 --> 00:00:56,260
fairly experienced full stack engineers, backend engineers, struggling

22
00:00:56,260 --> 00:00:59,760
with in terms of performance or not realizing how big a difference

23
00:00:59,760 --> 00:01:03,580
they can make or how, how helpful they can be in certain, especially

24
00:01:03,900 --> 00:01:08,680
a lot of the workloads I see are web application, just like a

25
00:01:08,680 --> 00:01:14,740
lot of reads, like maybe 80, 90% reads and a lot of trying to

26
00:01:15,400 --> 00:01:18,780
optimize performance of those reads, especially with the odd

27
00:01:18,780 --> 00:01:20,580
kind of aggregate type query.

28
00:01:21,040 --> 00:01:25,140
And index-only scans help a bunch, but only once you understand

29
00:01:25,240 --> 00:01:26,200
multi-column indexes.

30
00:01:26,200 --> 00:01:28,380
And I think there's some detail in there that I'd like to discuss

31
00:01:28,380 --> 00:01:28,680
with you.

32
00:01:28,680 --> 00:01:32,060
So yeah, I thought it was about time we actually had one dedicated

33
00:01:32,240 --> 00:01:32,920
to this.

34
00:01:33,400 --> 00:01:33,520
Nikolay: Yeah.

35
00:01:33,520 --> 00:01:33,940
Great.

36
00:01:33,940 --> 00:01:38,800
I guess we will be going slightly beyond just multi-column indexes.

37
00:01:39,340 --> 00:01:39,800
Michael: Right.

38
00:01:39,800 --> 00:01:40,300
Possibly.

39
00:01:40,680 --> 00:01:41,580
What do you mean?

40
00:01:41,580 --> 00:01:41,820
Nikolay: Well,

41
00:01:41,820 --> 00:01:46,360
maybe the question is how we can compare them, for example,

42
00:01:46,360 --> 00:01:50,500
having just indexes on a single column, but many of them.

43
00:01:50,500 --> 00:01:54,560
And sometimes I have a case where people intentionally understanding

44
00:01:54,680 --> 00:02:00,240
consequences chose to cover every column with index, a single

45
00:02:00,240 --> 00:02:01,200
column, I guess.

46
00:02:01,940 --> 00:02:06,160
And yeah, we usually say it's an anti-pattern, right?

47
00:02:06,820 --> 00:02:09,140
Michael: Well, I mean, yeah, I guess it depends.

48
00:02:09,140 --> 00:02:12,620
I've not seen a workload, like with a set of access patterns

49
00:02:12,620 --> 00:02:14,700
where that would make the most
sense.

50
00:02:14,700 --> 00:02:19,540
Personally I tend to see tables
that have some columns that get

51
00:02:19,540 --> 00:02:22,660
queried a lot more than others
and the groups of columns that

52
00:02:22,660 --> 00:02:24,440
get queried very often together.

53
00:02:24,800 --> 00:02:29,820
Lots of equality searches amongst,
for example, account ID, User

54
00:02:29,820 --> 00:02:32,860
ID, but there may be a range search
in terms of date.

55
00:02:32,860 --> 00:02:36,540
And like, that's the kind of Query
that ends up being like so

56
00:02:36,540 --> 00:02:40,120
much more efficient with a multi-column
index than with a series

57
00:02:40,120 --> 00:02:41,340
of single column indexes.

58
00:02:41,440 --> 00:02:46,120
So yeah, the kind of access patterns
I see, I can't imagine living

59
00:02:46,120 --> 00:02:50,020
without them, But I'm sure that,
well, I'm looking forward to

60
00:02:50,020 --> 00:02:52,540
hearing about the case where it
made most sense to...

61
00:02:52,540 --> 00:02:56,780
Nikolay: Well, it was multi-tenant
application where the freedom

62
00:02:56,820 --> 00:02:59,240
of Queries is unpredictable.

63
00:03:00,540 --> 00:03:07,220
So every tenant can choose different
types of filtering and ordering,

64
00:03:07,720 --> 00:03:09,300
and we don't know in advance.

65
00:03:10,400 --> 00:03:12,880
And they are not engineers, those
tenants, right?

66
00:03:12,880 --> 00:03:18,020
So we need to give them some support,
at least somehow.

67
00:03:18,140 --> 00:03:23,260
But index-write amplification,
I guess, is a killer in this use

68
00:03:23,260 --> 00:03:25,820
case if data size grows.

69
00:03:27,440 --> 00:03:27,940
Michael: Yeah.

70
00:03:28,520 --> 00:03:34,080
And given it's a multi-tenant application,
are we talking single

71
00:03:34,080 --> 00:03:35,100
Database per...

72
00:03:35,200 --> 00:03:36,040
Nikolay: Separate tables.

73
00:03:36,660 --> 00:03:38,000
Michael: That makes way more sense.

74
00:03:38,620 --> 00:03:42,180
So that starts to make some sense
then at least.

75
00:03:42,340 --> 00:03:44,700
Nikolay: Yeah, there are smaller
Tables of course.

76
00:03:45,340 --> 00:03:47,340
And there's isolation, complete isolation.

77
00:03:47,500 --> 00:03:53,980
So all blocks and all tuples in
these blocks, buffers or pages,

78
00:03:54,480 --> 00:03:58,980
how you name it, they are belonging
only to specific tenants,

79
00:03:58,980 --> 00:04:02,920
so all problems are localized.

80
00:04:04,280 --> 00:04:12,020
Anyway, if we think columns A and
B and we have 2 indexes on

81
00:04:12,020 --> 00:04:16,980
both columns, when does it make
to consider a 2 column index

82
00:04:16,980 --> 00:04:17,480
instead?

83
00:04:17,760 --> 00:04:19,440
This is a question, I guess.

84
00:04:19,760 --> 00:04:21,360
Michael: Yeah, yeah, I like that
a lot.

85
00:04:21,360 --> 00:04:25,740
And I guess we probably should
go 1 step back and say probably

86
00:04:25,760 --> 00:04:29,700
we're going to be talking about
B-tree indexes for most of this.

87
00:04:29,700 --> 00:04:32,560
Nikolay: How about btree_gin indexes?

88
00:04:32,640 --> 00:04:36,780
Michael: Yeah well there's a few
the docs mention and it's quite

89
00:04:36,780 --> 00:04:39,800
funny The word only is still in
there, I think from the initial

90
00:04:39,800 --> 00:04:40,120
commit.

91
00:04:40,120 --> 00:04:45,380
So it says currently only the B-tree,
GiST, GIN and BRIN index

92
00:04:45,380 --> 00:04:47,620
types support multi-key column
indexes.

93
00:04:48,000 --> 00:04:53,940
And that, the key is important
word there, but basically that

94
00:04:53,940 --> 00:04:58,300
means, key columns not include
columns.

95
00:04:59,040 --> 00:05:01,220
So there is a subtle difference
there.

96
00:05:01,220 --> 00:05:03,180
Nikolay: Well, GIN indexes can't
be multi-column?

97
00:05:04,180 --> 00:05:08,200
Michael: GIN, So GIN can, BRIN
can, GiST can and B-tree can.

98
00:05:08,360 --> 00:05:15,360
So when it says only, I think the
only inbuilt index type that

99
00:05:15,360 --> 00:05:18,640
doesn't support multi-column indexes
is hash.

100
00:05:19,280 --> 00:05:22,360
So I think only is a bit of a dubious
word there.

101
00:05:22,360 --> 00:05:23,160
Nikolay: Is that it?

102
00:05:23,680 --> 00:05:24,440
Is that it?

103
00:05:25,160 --> 00:05:25,940
Michael: That's it?

104
00:05:26,380 --> 00:05:28,840
I can't think of a sixth 1.

105
00:05:29,440 --> 00:05:29,940
Nikolay: Okay.

106
00:05:30,280 --> 00:05:33,340
Michael: Or maybe SP-GiST, but like
I think that's a subcategory

107
00:05:33,440 --> 00:05:34,100
of GiST.

108
00:05:34,740 --> 00:05:37,940
Anyway, basically I think we're
probably gonna be talking about

109
00:05:37,940 --> 00:05:41,720
a B-tree, like the vast majority
of indexes is a B-tree and the

110
00:05:41,720 --> 00:05:44,560
vast majority of multi-column indexes
I see a B-tree.

111
00:05:45,060 --> 00:05:48,440
Have you got any different examples
other than, well yeah, maybe

112
00:05:48,440 --> 00:05:50,120
B-tree, btree_gin?

113
00:05:51,320 --> 00:05:53,000
Nikolay: Yeah, there's a problem
with GIN.

114
00:05:53,000 --> 00:05:59,660
I always like to talk about that
if you have a scalar column,

115
00:05:59,880 --> 00:06:06,260
column of scalar values, like timestamp
or integer, bigint, and

116
00:06:06,260 --> 00:06:10,260
you need to order by it and limit
like 125.

117
00:06:11,280 --> 00:06:15,960
Then it can be inefficient because
GIN doesn't know, like planner

118
00:06:16,100 --> 00:06:20,560
needs to choose to use only B-tree
or only GIN.

119
00:06:21,040 --> 00:06:25,220
And this is where btree_gin can help
to combine them into multi-column

120
00:06:25,280 --> 00:06:26,180
index, right?

121
00:06:26,760 --> 00:06:33,720
So we can talk about hybrid indexes
when 1 layer is GIN and another

122
00:06:33,720 --> 00:06:35,520
is B-tree, for example.

123
00:06:35,920 --> 00:06:39,440
Michael: Yeah, well, and I think
that kind of starts to get to

124
00:06:39,440 --> 00:06:43,260
why we need or want multi-column
indexes in the first place,

125
00:06:43,260 --> 00:06:48,940
because I think a large part of
it is ordering or clustering.

126
00:06:49,840 --> 00:06:53,500
Like you want it, B-tree is a simple
example.

127
00:06:54,440 --> 00:06:58,580
Beautifully, it has a single order
through the entire structure,

128
00:06:58,580 --> 00:06:59,080
right?

129
00:07:00,240 --> 00:07:00,740
Nikolay: Single.

130
00:07:01,640 --> 00:07:02,720
Michael: Yes, exactly.

131
00:07:03,480 --> 00:07:07,360
So the order is really useful for
things like enforcing unique

132
00:07:07,360 --> 00:07:07,860
constraints.

133
00:07:07,900 --> 00:07:10,960
Like things can be, things are
placed in an order and you can

134
00:07:10,960 --> 00:07:14,160
see, is there 1 of this already
in place?

135
00:07:14,440 --> 00:07:19,440
And multi-column just gives you
kind of order by the first of

136
00:07:19,440 --> 00:07:23,360
these columns first and then for
any that are equal order by

137
00:07:23,360 --> 00:07:27,740
the second column next and I don't
think the docs make this super

138
00:07:27,740 --> 00:07:31,720
clear I think they give a good
example but that could be clearer

139
00:07:31,720 --> 00:07:37,040
I think that it's single order
and order by A, then order by

140
00:07:37,040 --> 00:07:42,160
B and then C and then D and however
many you you list and that's

141
00:07:42,160 --> 00:07:43,280
if they're key columns.

142
00:07:43,500 --> 00:07:46,860
Includes is a bit different, that
doesn't affect the ordering

143
00:07:46,960 --> 00:07:48,300
and that has some benefits.

144
00:07:49,620 --> 00:07:52,320
Well, we can discuss whether it
does, I guess, in a bit.

145
00:07:52,440 --> 00:07:54,720
That's 1 of the main things I wanted
to discuss with you.

146
00:07:54,720 --> 00:07:57,380
Actually, I think that's a particularly
interesting part.

147
00:07:58,580 --> 00:08:02,120
But that ordering thing is really,
really important in terms

148
00:08:02,120 --> 00:08:06,400
of then choosing which order to
index things in in the first

149
00:08:06,400 --> 00:08:06,820
place.

150
00:08:06,820 --> 00:08:13,280
So I've never, I've never come up with or come across a guideline

151
00:08:13,940 --> 00:08:16,600
that I'm totally happy with in terms of...

152
00:08:16,720 --> 00:08:17,980
Nikolay: Well, it's straightforward.

153
00:08:18,520 --> 00:08:21,180
We discussed this 20 years ago in some circles.

154
00:08:21,560 --> 00:08:25,940
So, by the way, not only ordering, but also filtering, if your

155
00:08:25,940 --> 00:08:31,980
filters involve range comparison or like between operator, right?

156
00:08:31,980 --> 00:08:36,000
Or in case of GiST, it can be something else.

157
00:08:36,760 --> 00:08:42,080
So anyway, so it can be not only strict ordering, but also just,

158
00:08:42,660 --> 00:08:44,660
is it inside the range or no?

159
00:08:44,720 --> 00:08:47,640
Is it more or less these operators?

160
00:08:48,460 --> 00:08:54,240
But as for the rule, ah, let me this time also step 1 step back

161
00:08:54,240 --> 00:08:58,640
and say super roughly at high level forgetting about parallel

162
00:08:58,640 --> 00:08:59,680
operations and so on.

163
00:08:59,680 --> 00:09:02,520
We have 3 types of access to data.

164
00:09:03,000 --> 00:09:08,940
Sequential scan on 1 side for lots of data, it's the slowest.

165
00:09:09,840 --> 00:09:13,780
index and index-only scan, let's combine them in this case.

166
00:09:14,340 --> 00:09:20,020
On another side, the fastest single index scan, especially single

167
00:09:20,020 --> 00:09:23,980
index-only scan is the fastest if you have a lot of data.

168
00:09:24,120 --> 00:09:26,760
And between them there is a bitmap scan, which is a combination

169
00:09:26,820 --> 00:09:29,660
of bitmap index scan and bitmap heap scan.

170
00:09:30,400 --> 00:09:38,040
And if you have like, This is like a proposal to have an explanation

171
00:09:38,320 --> 00:09:40,580
why we need a multi-column index.

172
00:09:40,840 --> 00:09:45,720
In case of 2 columns and filtering or ordering on them, If you

173
00:09:45,720 --> 00:09:50,640
have 2 separate single column indexes, you likely will end up

174
00:09:50,640 --> 00:09:56,960
using either 1 of them or maybe both of them, but in the form

175
00:09:56,960 --> 00:10:01,700
of bitmap scan, which is not the most efficient.

176
00:10:02,980 --> 00:10:07,880
And with multi-column index can bring you the most efficient

177
00:10:08,040 --> 00:10:11,640
either index scan or even index-only scan, right?

178
00:10:12,660 --> 00:10:17,360
Michael: Yeah, and I think There are exceptions or times where

179
00:10:17,360 --> 00:10:20,100
they're pretty much equal in efficiency.

180
00:10:20,380 --> 00:10:21,040
Nikolay: Of course.

181
00:10:21,340 --> 00:10:24,980
Michael: But it's very, very easy to demonstrate with non overlapping

182
00:10:25,380 --> 00:10:26,180
like datasets.

183
00:10:27,860 --> 00:10:31,180
Like you quite often use the social media thing, right?

184
00:10:31,180 --> 00:10:37,960
If we had an index on user ID and a separate index on created_at

185
00:10:37,960 --> 00:10:42,980
or you know some kind of timestamp looking at all of the users

186
00:10:42,980 --> 00:10:48,120
posts from a certain time range even if we can do the bitmap

187
00:10:48,120 --> 00:10:51,540
index scan on users just to find all of that users posts and

188
00:10:51,540 --> 00:10:55,240
then a bitmap index scan on all of the posts in that time range,

189
00:10:55,240 --> 00:10:58,440
it's very easy to see how we're having to scan a ton more data

190
00:10:58,440 --> 00:11:01,880
in that And then even though we're only looking at, they're doing

191
00:11:01,880 --> 00:11:04,440
like the bitmap index scan, only having to look at the index

192
00:11:04,440 --> 00:11:08,520
entries and the bitmap itself will be relatively small if that

193
00:11:08,520 --> 00:11:09,720
user doesn't have any posts.

194
00:11:09,720 --> 00:11:12,400
And then the eventual heap scan will be relatively small and

195
00:11:12,400 --> 00:11:13,400
relatively efficient.

196
00:11:14,060 --> 00:11:18,780
The actual initial scans to get the right data is huge compared

197
00:11:18,860 --> 00:11:23,400
to having a multi-column index with all of the users posts next

198
00:11:23,400 --> 00:11:25,780
to each other with their created_at timestamps.

199
00:11:26,480 --> 00:11:29,540
So it's, yeah, in terms of, we
often talk about buffer reads,

200
00:11:29,540 --> 00:11:29,800
right?

201
00:11:29,800 --> 00:11:32,580
Nikolay: We could create statistics,
right?

202
00:11:32,900 --> 00:11:37,160
And so the planner would know that
this user was recently active,

203
00:11:37,160 --> 00:11:41,440
so we need to follow the index
on created_at.

204
00:11:41,640 --> 00:11:46,360
And another is not recently active,
so it's better to fetch all

205
00:11:47,780 --> 00:11:51,660
entries for this user and then
order by memory and have top end

206
00:11:51,660 --> 00:11:52,620
sort in memory.

207
00:11:53,500 --> 00:11:57,660
Michael: But both with any sufficiently
active system, both would

208
00:11:57,660 --> 00:12:01,480
lose massively in terms of buffer
reads still to the index-only

209
00:12:01,480 --> 00:12:05,280
scan or even an index scan on a
single multicolumn index.

210
00:12:05,980 --> 00:12:06,220
Nikolay: Yeah.

211
00:12:06,220 --> 00:12:06,720
Yeah.

212
00:12:07,740 --> 00:12:11,000
Anyway, also, also what matters,
random page cost matters here,

213
00:12:11,000 --> 00:12:11,500
right?

214
00:12:11,880 --> 00:12:16,640
Because planner might, in reality,
index scan might still be

215
00:12:16,640 --> 00:12:21,660
better or vice versa, but planner
might think it can choose a

216
00:12:23,000 --> 00:12:27,180
bitmap scan in favor of index scan
because random page cost

217
00:12:27,180 --> 00:12:30,460
is 4, for example, default.

218
00:12:30,780 --> 00:12:32,120
Anyway, let's return...

219
00:12:32,740 --> 00:12:35,860
Michael: While we're on the topic
of outdated things, I actually

220
00:12:35,860 --> 00:12:39,660
think there's a relevant part of
the docs that I have an issue

221
00:12:39,660 --> 00:12:42,900
with that I wanted your opinion
on.

222
00:12:42,900 --> 00:12:43,940
It's in this area.

223
00:12:43,940 --> 00:12:46,300
It says multi-column indexes should
be used sparingly.

224
00:12:46,480 --> 00:12:49,340
In most situations, an index on
a single column is sufficient

225
00:12:49,340 --> 00:12:51,000
and saves space and time.

226
00:12:51,220 --> 00:12:54,280
Indexes with more than 3 columns
are unlikely to be helpful unless

227
00:12:54,280 --> 00:13:01,400
the usage of the table is extremely
stylized. They're still

228
00:13:01,400 --> 00:13:05,900
there in the latest versions and
to be fair to them, they've

229
00:13:05,900 --> 00:13:10,020
existed pretty much since 7.2.

230
00:13:10,920 --> 00:13:11,560
Nikolay: I understand.

231
00:13:12,100 --> 00:13:15,640
In reality, it depends on the application.

232
00:13:15,740 --> 00:13:19,440
In some applications, there are
absolutely critical queries which

233
00:13:19,440 --> 00:13:23,760
must have index scan and sometimes
even index-only scan with

234
00:13:23,760 --> 00:13:28,180
aggressively tuned to vacuum and
visibility maps maintained very

235
00:13:28,180 --> 00:13:30,560
well so index-only scan doesn't
degrade.

236
00:13:31,040 --> 00:13:37,420
So I think this is like, I can
see why this was written, but

237
00:13:37,420 --> 00:13:40,680
I cannot agree with it in all cases
because definitely we have

238
00:13:40,680 --> 00:13:44,380
3 column indexes sometimes because
we choose to have the most

239
00:13:44,380 --> 00:13:47,680
efficient path specifically to
support specific queries, which

240
00:13:47,680 --> 00:13:49,260
are most important for us.

241
00:13:49,340 --> 00:13:52,820
They receive a lot of traffic,
so we cannot allow degradation

242
00:13:53,000 --> 00:13:53,500
here.

243
00:13:54,060 --> 00:13:54,560
Michael: Yeah.

244
00:13:54,960 --> 00:13:56,920
What about 4 or 5 columns?

245
00:13:56,980 --> 00:13:58,080
I guess it's rarer.

246
00:13:58,080 --> 00:14:00,600
I don't see many, but I've seen
a few.

247
00:14:01,100 --> 00:14:05,340
Nikolay: Well let's a little bit
postpone this question because

248
00:14:05,340 --> 00:14:08,680
I know you wanted also to touch
covering indexes, right?

249
00:14:08,680 --> 00:14:11,980
Because maybe this is where we will discuss this, but let me

250
00:14:11,980 --> 00:14:13,800
return to your previous question.

251
00:14:14,340 --> 00:14:14,720
Yeah.

252
00:14:14,720 --> 00:14:15,980
How to choose the order.

253
00:14:16,160 --> 00:14:21,320
And I remember the rule originally was, if it doesn't matter,

254
00:14:21,540 --> 00:14:23,370
so there are 2 rules.

255
00:14:23,680 --> 00:14:25,820
You need to spend time understanding them.

256
00:14:30,020 --> 00:14:32,660
I'm pretty sure listeners who are most experienced, they know

257
00:14:32,660 --> 00:14:37,940
this very well, but I know a lot of unexperienced listeners our

258
00:14:37,940 --> 00:14:39,020
podcast receives.

259
00:14:39,520 --> 00:14:46,820
So 2 rules, 1 rule, and this is maybe most popular in terms of

260
00:14:47,120 --> 00:14:47,620
mentionings.

261
00:14:49,060 --> 00:14:55,740
The column on, of the index on columns A, B makes a separate

262
00:14:56,540 --> 00:15:01,560
index on columns A, B makes an index on column A, a single column

263
00:15:01,560 --> 00:15:05,700
index, like not needed, you can drop it.

264
00:15:05,820 --> 00:15:09,240
Right, because these 2 column index can support queries that

265
00:15:09,240 --> 00:15:11,420
need only index on column A.

266
00:15:12,240 --> 00:15:17,460
Right, and this means that if you have queries dealing with both

267
00:15:17,460 --> 00:15:22,320
A and B, in terms of ordering and filtering, And also you have

268
00:15:22,320 --> 00:15:26,660
queries that deal with only A, you should put A on the first

269
00:15:26,660 --> 00:15:27,160
place.

270
00:15:27,340 --> 00:15:28,940
This is rule number 1.

271
00:15:28,940 --> 00:15:30,520
And there are nuances, of course.

272
00:15:30,910 --> 00:15:36,600
And second rule is if you cannot choose, if you have patterns

273
00:15:36,600 --> 00:15:42,980
that kind of… imagine you have only 1 query which involves both

274
00:15:42,980 --> 00:15:43,480
columns.

275
00:15:43,820 --> 00:15:46,240
How to choose the order?

276
00:15:46,360 --> 00:15:50,500
And the rule says choose that column which will give you the

277
00:15:50,500 --> 00:15:53,500
highest selectivity and put it to the first place.

278
00:15:54,720 --> 00:16:00,260
So the scope of like your scope narrows down faster.

279
00:16:01,060 --> 00:16:01,640
For example.

280
00:16:01,640 --> 00:16:03,340
Michael: Yeah, I like it.

281
00:16:03,340 --> 00:16:06,140
And I think for single query, and I think there's a, another

282
00:16:06,140 --> 00:16:10,680
1 I've heard is your equality conditions in the leftmost.

283
00:16:12,040 --> 00:16:16,000
And then if you have a range condition or an inequality condition,

284
00:16:16,000 --> 00:16:18,940
like a greater than or less than, that needs to come last.

285
00:16:18,940 --> 00:16:22,080
So you want all of your quality conditions first, then a range

286
00:16:22,080 --> 00:16:23,940
condition like created_at.

287
00:16:23,940 --> 00:16:25,460
Nikolay: So with previous 1, imagine

288
00:16:25,520 --> 00:16:29,100
Michael: if- Well, but it's really important for, let's say you

289
00:16:29,100 --> 00:16:33,880
have a query that has like where A equals 1, B equals 2, C equals

290
00:16:33,880 --> 00:16:39,220
3, and then D is between 17 and a thousand and 1.

291
00:16:40,240 --> 00:16:43,900
It's way more efficient to have the AB again, depending a little

292
00:16:43,900 --> 00:16:47,560
bit on selectivity, I'm sure you can come up with examples where

293
00:16:47,560 --> 00:16:52,420
it isn't true, but having a, b and c in some order and then d

294
00:16:52,420 --> 00:16:56,100
last is really important to making sure that it's as like low

295
00:16:56,100 --> 00:16:58,880
buffers as possible in terms of that index scan.

296
00:16:59,100 --> 00:17:02,520
Nikolay: But imagine we have a query, forgetting about order

297
00:17:02,520 --> 00:17:07,860
by limit, We have a query which selects with 2 filters.

298
00:17:07,920 --> 00:17:13,480
1 is some boolean column and another is time range.

299
00:17:15,140 --> 00:17:17,000
So what should go first?

300
00:17:18,100 --> 00:17:21,380
Michael: Well, I, so think about it though, because like, let's

301
00:17:21,380 --> 00:17:23,800
say the 50 50, that's the simplest case, right?

302
00:17:23,800 --> 00:17:27,140
We've got 50% true, 50% false.

303
00:17:27,620 --> 00:17:30,820
Probably if we have a different thing, It's trickier.

304
00:17:30,880 --> 00:17:33,460
Nikolay: It should go, it should be partial index actually.

305
00:17:33,480 --> 00:17:34,260
But yeah.

306
00:17:34,640 --> 00:17:37,580
Michael: Maybe depending if we need to like query the opposite

307
00:17:37,580 --> 00:17:38,540
1 sometimes.

308
00:17:39,720 --> 00:17:42,540
Nikolay: Maybe it should be, maybe it should be 2 partial indexes.

309
00:17:43,660 --> 00:17:46,100
Michael: Or I don't like, I don't think so.

310
00:17:46,280 --> 00:17:49,140
I don't think 2 partial indexes are more efficient than…

311
00:17:50,460 --> 00:17:54,220
Nikolay: But do you agree that filter on date, usually, if we,

312
00:17:54,220 --> 00:17:58,780
for example, we know that we usually select narrow ranges like

313
00:17:58,780 --> 00:18:02,980
day or maximum month, and we have a lot of data for many years.

314
00:18:03,320 --> 00:18:10,220
Do you agree that the filter on timestamp gives you better selectivity

315
00:18:11,840 --> 00:18:12,940
in general case?

316
00:18:15,780 --> 00:18:17,380
Michael: I think it has to, right?

317
00:18:17,860 --> 00:18:18,780
What do you mean?

318
00:18:18,840 --> 00:18:24,600
Nikolay: Well, of course, if a Boolean column has some, like

319
00:18:24,600 --> 00:18:28,940
it has 2 values and most of columns have 1 value, but you're

320
00:18:28,940 --> 00:18:31,040
selecting another 1, it's a different story.

321
00:18:33,340 --> 00:18:34,340
Where is deleted?

322
00:18:34,400 --> 00:18:38,940
And we know we have most rows not soft deleted, right?

323
00:18:38,940 --> 00:18:40,120
Or something like this.

324
00:18:40,280 --> 00:18:44,360
In this case, the Boolean filter can give you better selectivity,

325
00:18:44,540 --> 00:18:46,300
but it's less.

326
00:18:47,320 --> 00:18:50,000
Michael: But I think a scan, let's say, let's think about like

327
00:18:50,000 --> 00:18:55,320
the buffer reads, if you, if it's true and between these dates

328
00:18:55,320 --> 00:18:59,540
or between these timestamps, our index scan can be perfectly

329
00:18:59,540 --> 00:19:04,000
efficient if we are indexed by boolean first then timestamp because

330
00:19:04,000 --> 00:19:08,200
we only have to look at the true values between a certain date

331
00:19:08,200 --> 00:19:08,700
range.

332
00:19:08,860 --> 00:19:13,080
If we mix that, if we do date range then boolean values, we discard

333
00:19:13,080 --> 00:19:14,440
half the values we read.

334
00:19:14,540 --> 00:19:18,480
We read all of the date range and have to discard half the Boolean

335
00:19:18,480 --> 00:19:18,980
values.

336
00:19:19,120 --> 00:19:20,020
Why should I get

337
00:19:20,020 --> 00:19:20,780
Nikolay: all the date?

338
00:19:23,880 --> 00:19:24,720
I don't get it.

339
00:19:24,720 --> 00:19:25,220
Why?

340
00:19:25,840 --> 00:19:27,080
Michael: We should test it.

341
00:19:27,560 --> 00:19:29,720
I reckon there'll be double the buffer reads.

342
00:19:30,040 --> 00:19:34,160
Nikolay: I forgot to mention my main rule, always doubt and test

343
00:19:34,160 --> 00:19:35,580
in reality.

344
00:19:36,820 --> 00:19:39,120
Michael: Based on my understanding, you'll get double the buffer

345
00:19:39,120 --> 00:19:42,600
reads for putting the date first and the Boolean second.

346
00:19:44,280 --> 00:19:48,480
All of the rules of thumb are good, but they're not perfect.

347
00:19:48,480 --> 00:19:50,400
And actually they're slightly contradictory.

348
00:19:51,340 --> 00:19:56,700
And I would say like even the most selective first, think about

349
00:19:57,180 --> 00:20:01,240
the, the shared, multi-tenancy thing.

350
00:20:01,240 --> 00:20:01,600
Yeah.

351
00:20:01,600 --> 00:20:01,900
Yeah.

352
00:20:01,900 --> 00:20:02,320
Yeah.

353
00:20:02,320 --> 00:20:05,920
I like, I prefer account ID before user ID, even though user

354
00:20:05,920 --> 00:20:07,680
ID is definitely more selective.

355
00:20:08,200 --> 00:20:11,840
Because like partly because we're never really, well, in most

356
00:20:11,840 --> 00:20:15,900
cases, data locality is 1 thing.

357
00:20:16,340 --> 00:20:16,620
Yeah.

358
00:20:16,620 --> 00:20:19,800
Data locality is 1 thing, but also we're very rarely serving

359
00:20:19,840 --> 00:20:21,820
a single query with a single index.

360
00:20:21,820 --> 00:20:24,440
Normally we're serving more than 1 query.

361
00:20:26,760 --> 00:20:32,320
But then the rules start to become a little bit more of an art

362
00:20:32,320 --> 00:20:35,420
form and you're trying to think what will they be in future?

363
00:20:35,660 --> 00:20:39,280
What's most likely, what, what might we want to query together?

364
00:20:39,280 --> 00:20:42,360
Like it's quite likely we want to query users in the same account

365
00:20:42,360 --> 00:20:45,440
together at some point, maybe we want to look at all posts by

366
00:20:45,440 --> 00:20:49,400
this account or this team And having that data closer together

367
00:20:49,400 --> 00:20:53,260
in the index has benefits than if we do it by user ID and it's,

368
00:20:53,260 --> 00:20:56,120
let's say it's, or no matter how you're doing user ID, whether

369
00:20:56,120 --> 00:21:00,780
it's, maybe there'll be clustered teams.

370
00:21:00,860 --> 00:21:03,640
Like maybe if you've got like a big int, maybe they'll be clustered

371
00:21:03,640 --> 00:21:05,800
close together, but they, they won't all be.

372
00:21:05,800 --> 00:21:08,440
You might add a team member many years after setting up your

373
00:21:08,440 --> 00:21:09,880
account and that kind of thing.

374
00:21:10,520 --> 00:21:13,500
So I prefer some breaking of those rules.

375
00:21:13,780 --> 00:21:17,120
Nikolay: There is 1 more rule, Which is foundational.

376
00:21:18,980 --> 00:21:22,540
Data locality, by the way, I would like to think about it.

377
00:21:22,540 --> 00:21:26,760
And also partitioning, if we have it, of course we don't want

378
00:21:26,760 --> 00:21:28,400
to scan all partitions, right?

379
00:21:28,940 --> 00:21:32,220
And actually partitioning serves as kind of an index.

380
00:21:32,860 --> 00:21:35,740
Michael: Kind of leading, like partition keys, kind of the leading

381
00:21:35,740 --> 00:21:37,100
column or leading columns.

382
00:21:37,660 --> 00:21:41,460
Nikolay: I remember I was trying to create an index on partitioning

383
00:21:41,520 --> 00:21:42,600
key in each partition.

384
00:21:42,720 --> 00:21:47,260
It was so strange because this partition had only 1 value.

385
00:21:48,820 --> 00:21:51,900
And I was trying to include this as additional column and somebody

386
00:21:51,900 --> 00:21:54,360
said, we have partitioning, why do we do this?

387
00:21:54,380 --> 00:21:58,040
And I realized, oh, partitioning eliminates the need in additional

388
00:21:58,440 --> 00:22:00,700
step in indexes or some indexes.

389
00:22:01,000 --> 00:22:06,500
So what I was trying to say, additional rule is to remember the

390
00:22:06,500 --> 00:22:09,180
order of execution of select clause.

391
00:22:09,660 --> 00:22:11,660
Oh, not select clause, the whole query.

392
00:22:13,100 --> 00:22:17,360
So first, like we should think, for example, simple, let's make

393
00:22:17,360 --> 00:22:18,400
it, keep it simple.

394
00:22:18,580 --> 00:22:20,100
Order by goes last.

395
00:22:21,220 --> 00:22:21,540
Right.

396
00:22:21,540 --> 00:22:27,900
So if we have order by created_at desk limit 25, it means created_at

397
00:22:27,900 --> 00:22:31,480
should be the last step in the multi-column index.

398
00:22:31,980 --> 00:22:32,480
Right?

399
00:22:32,780 --> 00:22:33,280
Michael: Yeah.

400
00:22:33,720 --> 00:22:35,240
I never thought about that.

401
00:22:35,280 --> 00:22:38,300
But yeah, that fits with the range
thing I was talking about.

402
00:22:38,400 --> 00:22:40,440
Nikolay: Yeah, but because of the
order of execution, if you

403
00:22:40,440 --> 00:22:43,940
put created_at on the first place,
it won't be helpful.

404
00:22:45,040 --> 00:22:46,140
It's a simple rule.

405
00:22:48,160 --> 00:22:51,560
Intuition should tell you this,
but maybe not.

406
00:22:53,040 --> 00:22:58,880
But I agree with you that rules
can be confusing a little bit,

407
00:22:59,060 --> 00:23:03,840
and it's better to test and learn
and sometimes doubt and test

408
00:23:03,840 --> 00:23:04,340
again.

409
00:23:04,780 --> 00:23:07,400
And testing, we should pay attention
to buffers.

410
00:23:08,400 --> 00:23:08,900
Michael: Yeah.

411
00:23:09,640 --> 00:23:12,940
This, well, and this is going to
get better in Postgres 18, right?

412
00:23:12,940 --> 00:23:14,520
People are going to get it by default.

413
00:23:15,240 --> 00:23:16,520
Yeah, me neither.

414
00:23:17,040 --> 00:23:19,620
Nikolay: We already have customers
migrating to 17.

415
00:23:21,020 --> 00:23:24,800
So next year I expect buffers will
become default and I hope

416
00:23:24,800 --> 00:23:30,220
you will see a growing number of
data with buffers.

417
00:23:30,220 --> 00:23:34,240
I mean plans with buffers involved
because it's difficult.

418
00:23:34,240 --> 00:23:36,240
Michael: Yeah, we definitely will.

419
00:23:36,420 --> 00:23:36,920
Nikolay: Cool.

420
00:23:37,660 --> 00:23:39,440
What about covering indexes?

421
00:23:40,240 --> 00:23:44,340
Michael: Well, yeah, when I saw
the commit for adding include

422
00:23:45,060 --> 00:23:46,660
to create index.

423
00:23:46,920 --> 00:23:50,820
So create index added this option
to include certain columns

424
00:23:51,900 --> 00:23:56,600
in the leaf pages, so in the tuples
that are stored in the leaf

425
00:23:56,600 --> 00:24:00,040
pages so you can return more data.

426
00:24:00,040 --> 00:24:04,760
Like if a query for example wants,
your order by like maybe wants

427
00:24:04,760 --> 00:24:12,180
to filter by user ID equals this
and name equals this, order

428
00:24:12,180 --> 00:24:16,360
by this day, the name could be
a payload, doesn't have to be

429
00:24:16,360 --> 00:24:21,340
in the index ordering, but it could
be there in the index leaf

430
00:24:21,340 --> 00:24:21,840
pages.

431
00:24:24,240 --> 00:24:25,520
To achieve index-only scans.

432
00:24:25,520 --> 00:24:26,600
Yeah, very, very important.

433
00:24:26,600 --> 00:24:31,620
That's I think the only use case
for this, but it's a very important

434
00:24:31,620 --> 00:24:32,120
1.

435
00:24:32,140 --> 00:24:33,070
Important enough to get added.

436
00:24:33,070 --> 00:24:37,120
Nikolay: The whole purpose is to
avoid heap fetches, but that

437
00:24:37,120 --> 00:24:41,820
also works if under heavy load,
it works only if you tuned to

438
00:24:41,820 --> 00:24:45,320
vacuum properly and avoid blockers
of xmin horizon.

439
00:24:46,860 --> 00:24:47,860
Michael: Yes, yes.

440
00:24:47,900 --> 00:24:52,320
Otherwise your index-only scans
will end up just regular.

441
00:24:52,540 --> 00:24:56,400
Nikolay: Slowly degrade to index
scans basically.

442
00:24:56,640 --> 00:24:57,140
Michael: Yeah.

443
00:24:57,880 --> 00:25:02,580
So yeah, but covering index, Like
the word covering for me, I

444
00:25:02,580 --> 00:25:08,680
think is slightly loaded because
I think a, an index for me is

445
00:25:08,680 --> 00:25:10,180
covering for a query.

446
00:25:10,380 --> 00:25:13,220
If you can serve it with an
index-only scan.

447
00:25:13,680 --> 00:25:17,420
In my opinion, we had covering
indexes before this, before includes,

448
00:25:17,420 --> 00:25:20,480
because if you had a multi-column
index on all of the columns

449
00:25:20,480 --> 00:25:25,140
that you needed, that was a covering
index for that query.

450
00:25:25,140 --> 00:25:29,440
Nikolay: My favorite
topic, criticizing the choice

451
00:25:29,440 --> 00:25:30,420
of terminology.

452
00:25:32,200 --> 00:25:33,360
I agree with you.

453
00:25:34,040 --> 00:25:36,560
Michael: But I'm
only reiterating this because

454
00:25:36,560 --> 00:25:38,460
I think it's important for people
to understand.

455
00:25:38,460 --> 00:25:43,180
Like I think it could be confusing
if you hear that the covering

456
00:25:43,180 --> 00:25:47,680
indexes are only for the word when
you use includes because it

457
00:25:47,680 --> 00:25:48,540
is confusing.

458
00:25:49,540 --> 00:25:53,140
So yeah, it got added in Postgres
11, 2018.

459
00:25:54,520 --> 00:25:59,700
And I'm interested in kind of like
the, the benefits over like,

460
00:25:59,700 --> 00:26:01,860
why not just add it as another
key?

461
00:26:02,320 --> 00:26:04,270
What are the downsides to doing
that?

462
00:26:04,270 --> 00:26:06,760
Nikolay: I spent
some time thinking about it and

463
00:26:06,760 --> 00:26:11,600
I came to conclusion when they
were added in 2018, I spent some

464
00:26:11,600 --> 00:26:19,280
time and I came to conclusion that
the key difference is the

465
00:26:19,280 --> 00:26:21,000
semantics of uniqueness.

466
00:26:27,100 --> 00:26:31,560
If you add additional columns to
your multi-column indexes, you

467
00:26:31,560 --> 00:26:33,460
change uniqueness semantics, right?

468
00:26:34,160 --> 00:26:35,740
Michael: That's a great point,
yeah.

469
00:26:35,740 --> 00:26:38,900
Nikolay: If you want an index on
this can work well, but you

470
00:26:38,900 --> 00:26:40,900
cannot, but this is unique index.

471
00:26:42,780 --> 00:26:45,880
This is where you should have including,
But it's quite a rare

472
00:26:45,880 --> 00:26:46,660
case actually.

473
00:26:47,240 --> 00:26:51,940
Michael: So let's say we want an
index on email, back to last

474
00:26:51,940 --> 00:26:53,560
week's episode or whenever it was.

475
00:26:53,560 --> 00:26:56,040
We want it to be unique on email,
but we want to include the

476
00:26:56,040 --> 00:26:58,820
name as a payload, but we want
to let people have the same name

477
00:26:58,820 --> 00:27:02,900
as each other, even if they, But
wait, because it's unique on

478
00:27:02,900 --> 00:27:07,900
email already, wouldn't it always
be unique on email and name?

479
00:27:07,960 --> 00:27:10,560
Oh no, because then we, we could,
if they had a different name,

480
00:27:10,560 --> 00:27:11,640
they could use the same email.

481
00:27:11,640 --> 00:27:11,820
Yeah.

482
00:27:11,820 --> 00:27:12,520
That's a problem.

483
00:27:12,520 --> 00:27:15,700
Nikolay: So we don't want to have
uniqueness on pairs of values.

484
00:27:15,700 --> 00:27:15,800
Yeah.

485
00:27:15,800 --> 00:27:19,040
For example, If you want uniqueness
only on email and name it

486
00:27:19,040 --> 00:27:20,920
something like this, you said payload.

487
00:27:20,980 --> 00:27:23,720
This is good point of view on it.

488
00:27:23,720 --> 00:27:26,660
You're just putting there as a
passenger.

489
00:27:27,500 --> 00:27:29,120
Michael: Yeah, that makes perfect
sense.

490
00:27:29,180 --> 00:27:29,440
Nikolay: Yeah.

491
00:27:29,440 --> 00:27:34,220
I also think, do you remember,
is there any size difference?

492
00:27:34,820 --> 00:27:37,000
Michael: Well, I would, yeah, I
was going to bring up the exact

493
00:27:37,000 --> 00:27:37,700
same thing.

494
00:27:37,860 --> 00:27:40,940
I wondered if it also got done
for that reason a little bit,

495
00:27:40,940 --> 00:27:45,120
because if we've got the structure
of a B-tree, you've got the,

496
00:27:45,520 --> 00:27:48,580
Once it gets sufficiently large,
you've got levels of kind of,

497
00:27:48,580 --> 00:27:50,560
they're kind of like an org tree,
don't they?

498
00:27:50,560 --> 00:27:52,320
Like layers of management almost.

499
00:27:52,800 --> 00:27:59,380
And the, if you have a multi-column
index with keys, so where

500
00:27:59,380 --> 00:28:03,860
they're part of the ordering, you
need to preserve that information

501
00:28:04,120 --> 00:28:08,840
at the different levels, like for
ordering purposes, but because

502
00:28:09,160 --> 00:28:12,940
includes columns aren't needed
for ordering, they're not relevant,

503
00:28:13,420 --> 00:28:17,140
they only live in the leaf pages
at the bottom level of the like

504
00:28:17,160 --> 00:28:17,660
B-tree.

505
00:28:18,600 --> 00:28:24,480
So size-wise there should be a
difference, but, and I think this

506
00:28:24,480 --> 00:28:29,320
has changed recently, de-duplication
I think changes this, not

507
00:28:29,320 --> 00:28:33,840
for unique indexes, But in cases
where you can have duplicates,

508
00:28:33,940 --> 00:28:38,160
which is quite a lot of indexes,
I actually think the deduplication

509
00:28:38,380 --> 00:28:42,420
will normally beat the benefits
of not having to store it multiple

510
00:28:42,440 --> 00:28:44,200
like storing the data at multiple
levels.

511
00:28:44,200 --> 00:28:46,800
Obviously depending a little bit
on how big this data is, that

512
00:28:46,800 --> 00:28:49,800
you're probably not putting huge
paid loads in this.

513
00:28:50,240 --> 00:28:52,280
But yeah, I think there was a size
benefit.

514
00:28:52,280 --> 00:28:55,900
I think it's less now that we've
got the B2D duplication work.

515
00:28:56,140 --> 00:29:01,060
Nikolay: Yeah, obviously, in the
columns you're including part

516
00:29:01,060 --> 00:29:05,680
of the create index, you cannot use
them in filtering and ordering.

517
00:29:05,680 --> 00:29:09,020
They are like passengers, they
are only beneficial for

518
00:29:09,020 --> 00:29:09,620
index-only scans.

519
00:29:09,620 --> 00:29:13,380
But since they are sitting like
basically in leaves, do you remember

520
00:29:13,380 --> 00:29:13,880
this?

521
00:29:14,640 --> 00:29:17,060
There is a problem with B-tree on text.

522
00:29:17,320 --> 00:29:19,780
There is a limitation due to key
size.

523
00:29:20,600 --> 00:29:20,940
Right.

524
00:29:20,940 --> 00:29:25,200
So you could not like 2000 something
bytes or something like

525
00:29:25,200 --> 00:29:25,680
this.

526
00:29:25,680 --> 00:29:31,460
I guess in this case for for columns
used in including there

527
00:29:31,460 --> 00:29:32,520
is no such limitation.

528
00:29:32,580 --> 00:29:33,080
Right.

529
00:29:33,220 --> 00:29:33,920
I'm not

530
00:29:33,920 --> 00:29:35,740
Michael: there must be a a limit.

531
00:29:35,740 --> 00:29:35,900
Right.

532
00:29:35,900 --> 00:29:38,220
Because it's still going in normal
data pages.

533
00:29:39,140 --> 00:29:39,860
Nikolay: I'm not sure.

534
00:29:39,860 --> 00:29:40,700
I'm not sure.

535
00:29:40,760 --> 00:29:42,600
Let's not jump to...

536
00:29:42,700 --> 00:29:45,600
Michael: Wait, yeah, but I read
this in documentation and I haven't

537
00:29:45,600 --> 00:29:48,220
made note of it, but it is in the
docs, I'll find it.

538
00:29:48,660 --> 00:29:54,740
Nikolay: But I must submit, this
is a rare beast to meet in the

539
00:29:54,740 --> 00:29:55,240
wild.

540
00:29:56,200 --> 00:29:59,440
I don't see you including the word
often.

541
00:29:59,480 --> 00:30:01,360
I see, but not often at all.

542
00:30:01,400 --> 00:30:02,800
Michael: Oh yeah, I was going to
ask.

543
00:30:02,800 --> 00:30:08,300
So you, so generally you see people
with multi-column indexes,

544
00:30:08,300 --> 00:30:09,360
but they're all keys.

545
00:30:09,360 --> 00:30:10,080
Is that fair?

546
00:30:10,080 --> 00:30:10,680
Nikolay: Yeah.

547
00:30:10,680 --> 00:30:10,940
Yeah.

548
00:30:10,940 --> 00:30:14,280
It's, well, this is what Minds
tells you.

549
00:30:15,860 --> 00:30:17,700
Including is still kind of exotic.

550
00:30:17,800 --> 00:30:23,260
It's good to have it, but not so
obvious when exactly you should

551
00:30:23,260 --> 00:30:23,800
use it.

552
00:30:23,800 --> 00:30:26,680
Well, we discussed it, but still
in normal practice you say, ah,

553
00:30:26,680 --> 00:30:27,800
I will just add it.

554
00:30:27,800 --> 00:30:31,280
If it's not unique index, why should
I think about it?

555
00:30:31,280 --> 00:30:36,760
If there are no limitations in
key to be met, if it's just some

556
00:30:36,760 --> 00:30:40,580
timestamp or number, it's easier
to edit.

557
00:30:41,280 --> 00:30:44,120
So I wanted to touch 1 more topic
here.

558
00:30:45,040 --> 00:30:50,040
When we should avoid completely
putting another column to multi-column

559
00:30:50,140 --> 00:30:54,820
index or this type of indexes we
just discussed, which are called

560
00:30:54,820 --> 00:30:56,700
the covering but you don't like
it.

561
00:30:57,840 --> 00:31:02,520
When should we avoid adding such
columns, another column there?

562
00:31:02,580 --> 00:31:05,960
For example, we see, oh, there
is obvious benefit.

563
00:31:06,600 --> 00:31:10,380
Michael: So when you say benefit,
like benefit to a specific

564
00:31:10,940 --> 00:31:13,940
set of read queries, what could
be faster?

565
00:31:14,180 --> 00:31:18,560
Nikolay: Imagine we have some simple
case, like a table, some

566
00:31:19,200 --> 00:31:22,800
filter with 4 thereby, and it's
already working well, but we

567
00:31:22,800 --> 00:31:28,440
see there is also column status,
it has several values, and we

568
00:31:28,440 --> 00:31:32,360
see most of the time we select
70% of everything, like status

569
00:31:32,360 --> 00:31:33,900
equals active or something.

570
00:31:34,740 --> 00:31:39,960
But we think, actually we could
shave off, we see some rows are

571
00:31:39,960 --> 00:31:41,180
filtered out dynamically.

572
00:31:41,480 --> 00:31:44,200
This is what pgMustard always suggests
immediately.

573
00:31:44,380 --> 00:31:49,220
Oh, index works inefficiently because later some

574
00:31:49,220 --> 00:31:51,240
rows you exclude, right?

575
00:31:51,700 --> 00:31:53,240
Rows filtered or something like
that.

576
00:31:53,240 --> 00:31:54,080
What is it called?

577
00:31:55,600 --> 00:31:57,380
Michael: Well, yeah, there's a
couple of ways.

578
00:31:57,500 --> 00:32:00,980
It can be either a filter or an
index recheck.

579
00:32:01,120 --> 00:32:04,840
Some types of index and sometimes,
for example, with a bitmap

580
00:32:04,840 --> 00:32:04,990
scan.

581
00:32:04,990 --> 00:32:07,720
I was looking at this today actually,
because Postgres 18 changes

582
00:32:07,720 --> 00:32:11,320
this a little bit, but exact heap
blocks and lossy heap blocks,

583
00:32:11,320 --> 00:32:13,680
so that can end up with a recheck.

584
00:32:13,860 --> 00:32:17,400
Nikolay: But I'm talking about
rows removed because we have

585
00:32:17,400 --> 00:32:20,280
an additional part of the work,
work,

586
00:32:20,280 --> 00:32:22,040
Michael: filter, yeah, filter

587
00:32:22,360 --> 00:32:24,720
Nikolay: and they are filtered out
dynamically and we think,

588
00:32:24,720 --> 00:32:25,880
oh, it's not efficient.

589
00:32:26,040 --> 00:32:27,540
Let's go fully efficient.

590
00:32:28,080 --> 00:32:32,540
And we, we start adding, for example,
status to like as additional

591
00:32:33,300 --> 00:32:37,540
column in our multi column index,
or it was a single column,

592
00:32:37,540 --> 00:32:39,220
but it becomes 2 columns, right?

593
00:32:39,280 --> 00:32:46,440
Or we decide to use it as a condition
and make index partial.

594
00:32:47,080 --> 00:32:51,200
So we say where status equals active
because we know most of

595
00:32:51,200 --> 00:32:52,800
the time it's active or something
like this.

596
00:32:52,800 --> 00:32:53,600
It doesn't matter.

597
00:32:53,620 --> 00:32:57,760
So index starts using this column
somehow.

598
00:32:59,180 --> 00:33:05,920
Any of 3 ways, another column, including or where?

599
00:33:06,740 --> 00:33:07,240
Keywords.

600
00:33:09,800 --> 00:33:11,440
Michael: I know what you're going to talk about.

601
00:33:12,260 --> 00:33:14,240
Are we going into HOT update territory?

602
00:33:14,760 --> 00:33:15,060
Nikolay: Exactly.

603
00:33:15,060 --> 00:33:15,360
Exactly.

604
00:33:15,360 --> 00:33:17,300
So this is what happened to me.

605
00:33:17,320 --> 00:33:20,880
I was like enthusiastic optimizing some project.

606
00:33:20,900 --> 00:33:24,780
It was, I think, I remember first time it was very long ago.

607
00:33:24,780 --> 00:33:25,620
It was DocSend.

608
00:33:26,280 --> 00:33:29,240
Later it was acquired by Dropbox.

609
00:33:30,120 --> 00:33:31,280
It was in San Francisco.

610
00:33:31,420 --> 00:33:35,640
And I was like, oh, we optimized additionally, like shaved off

611
00:33:35,640 --> 00:33:39,440
like 20-30% of buffer hits and reads and it's great.

612
00:33:39,480 --> 00:33:41,420
It's small optimization, but it's helpful.

613
00:33:41,460 --> 00:33:45,880
And we can pay this price, okay, we need to maintain this index

614
00:33:45,880 --> 00:33:50,400
anyway, it's slightly bigger now, but it's still worth it.

615
00:33:50,660 --> 00:33:53,080
And then later I see degradation of updates.

616
00:33:54,520 --> 00:33:55,680
And you know why, right?

617
00:33:55,680 --> 00:34:00,700
Because we had HOT updates, because basically most of updates,

618
00:34:00,700 --> 00:34:01,880
they updated status.

619
00:34:01,880 --> 00:34:02,380
Yeah.

620
00:34:04,200 --> 00:34:08,820
And now it's participating in the index definition, so HOT updates,

621
00:34:09,060 --> 00:34:12,380
it makes HOT updates not possible at all.

622
00:34:13,000 --> 00:34:15,780
And HOT updates, just a reminder, we know there is a new audience

623
00:34:15,800 --> 00:34:19,220
usually, So heap only tuple updates.

624
00:34:19,400 --> 00:34:25,740
This is update which changes only table pages, pages of table

625
00:34:25,740 --> 00:34:27,740
data and doesn't touch indexes.

626
00:34:28,520 --> 00:34:33,320
Because normally regular update, it needs to touch every single

627
00:34:33,320 --> 00:34:39,200
index, which produces more dirty pages, Checkpointer or budget

628
00:34:39,200 --> 00:34:43,780
writer or even backend, they need to flush it to disk, first

629
00:34:43,780 --> 00:34:45,180
of all, it doesn't matter.

630
00:34:45,360 --> 00:34:49,940
And then it generates more WAL and it slows down updates.

631
00:34:50,220 --> 00:34:53,340
Regular updates are much slower than HOT updates.

632
00:34:54,140 --> 00:34:59,500
So just putting another column to your multi-column index might

633
00:34:59,500 --> 00:35:00,320
affect updates.

634
00:35:00,340 --> 00:35:01,420
This is worth remembering.

635
00:35:02,460 --> 00:35:04,060
Michael: If it wasn't indexed already.

636
00:35:04,260 --> 00:35:07,260
Like if it was indexed already in some other way.

637
00:35:07,380 --> 00:35:08,200
Fair enough.

638
00:35:08,200 --> 00:35:11,960
It's a really good point And I think those index efficiency tips

639
00:35:11,960 --> 00:35:16,280
are the ones, like, I think people are most surprised by, like

640
00:35:16,280 --> 00:35:19,200
people think they look through their query.

641
00:35:19,200 --> 00:35:22,180
They see, I'm talking about like experienced engineers, but not

642
00:35:22,180 --> 00:35:25,520
experienced Postgres folk look through a query plan, see index

643
00:35:25,520 --> 00:35:29,340
scan, index scan, index scan, maybe there's a lot of loops.

644
00:35:29,340 --> 00:35:31,880
Maybe there is a rows removed by filter, but maybe it only says

645
00:35:31,880 --> 00:35:37,640
1, but it's like, 20, 000 loops and they don't realize that that's

646
00:35:37,640 --> 00:35:39,920
the most inefficient part of their query.

647
00:35:39,920 --> 00:35:43,680
Like just, anyway, so I love those, but normally when it's like

648
00:35:43,680 --> 00:35:47,860
index efficiency is we talk about
a percentage, like 0.1%, like

649
00:35:47,860 --> 00:35:49,220
that, or less than 0.1%.

650
00:35:49,820 --> 00:35:50,320
Great.

651
00:35:50,740 --> 00:35:54,220
If it's 70%, we score it much less.

652
00:35:54,220 --> 00:35:54,620
Well,

653
00:35:54,620 --> 00:35:56,760
Nikolay: It's a combination of
factors, this choice.

654
00:35:56,760 --> 00:35:57,260
Yeah.

655
00:35:57,520 --> 00:36:01,020
But 1 particular case which is
very common and when I see it

656
00:36:01,020 --> 00:36:02,700
I immediately recognize the problem.

657
00:36:03,040 --> 00:36:07,740
If I see SELECT blah blah blah
where some filters and then order

658
00:36:07,740 --> 00:36:10,660
by created_at desk limit 100.

659
00:36:11,740 --> 00:36:16,020
Definitely there are multiple indexes
which include updated_at

660
00:36:16,020 --> 00:36:19,940
and there is a trigger which or
just some Ruby or Django doesn't

661
00:36:19,940 --> 00:36:23,940
doesn't matter application which
updates updated_at always.

662
00:36:24,520 --> 00:36:28,380
So this table doesn't see what
updates 100%.

663
00:36:29,340 --> 00:36:34,660
And yeah and how to break that
issue not losing the the mechanics

664
00:36:34,740 --> 00:36:37,260
of updated_at timestamp, I don't
know.

665
00:36:37,640 --> 00:36:40,740
I try to sometimes to avoid indexing
this.

666
00:36:41,920 --> 00:36:48,380
If you present to your users values,
really like you need order

667
00:36:48,380 --> 00:36:51,160
by updated_at desk, limit something.

668
00:36:51,740 --> 00:36:52,900
You need it, right?

669
00:36:53,100 --> 00:36:57,460
I try to say, okay, updated_at
is very close to created_at usually

670
00:36:57,980 --> 00:37:02,880
let's use created_at And then like
somehow like it doesn't work

671
00:37:02,880 --> 00:37:03,380
well.

672
00:37:04,540 --> 00:37:07,080
Michael: So it doesn't, I would
have thought in most cases it

673
00:37:07,080 --> 00:37:08,300
would actually, but yeah.

674
00:37:08,300 --> 00:37:09,400
Nikolay: Depends of course.

675
00:37:09,400 --> 00:37:09,960
It depends.

676
00:37:09,960 --> 00:37:10,440
Yeah.

677
00:37:10,440 --> 00:37:11,520
Sometimes it's good.

678
00:37:11,520 --> 00:37:15,720
Like you basically already have
almost everything and on the

679
00:37:15,720 --> 00:37:16,060
fly.

680
00:37:16,060 --> 00:37:16,400
Yeah.

681
00:37:16,400 --> 00:37:19,400
Sometimes there is correlations,
very good correlation.

682
00:37:20,280 --> 00:37:24,020
But sometimes if it's on partition
table, updates happening all

683
00:37:24,020 --> 00:37:26,260
the time and it's a mess.

684
00:37:27,040 --> 00:37:30,040
And updates might happen a month
later, for example.

685
00:37:30,060 --> 00:37:32,220
This is like, screw this.

686
00:37:32,220 --> 00:37:35,840
Michael: I will, for newer listeners,
I will link up our episodes

687
00:37:35,860 --> 00:37:39,480
on HOT updates and we did 1 on
limit that's really good for this.

688
00:37:39,620 --> 00:37:40,120
Nikolay: Yeah, yeah, yeah.

689
00:37:40,120 --> 00:37:42,080
I just wanted to say this is a
warning.

690
00:37:42,880 --> 00:37:43,140
Michael: Yeah, yeah.

691
00:37:43,140 --> 00:37:46,100
Nikolay: We have multi-column releases,
Everyone needs them,

692
00:37:46,400 --> 00:37:46,900
100%.

693
00:37:47,320 --> 00:37:53,000
But we should be careful if we
need fast updates and fewer WAL

694
00:37:53,000 --> 00:37:54,560
bytes put to WAL.

695
00:37:55,760 --> 00:37:58,640
Because WAL bytes can be an issue.

696
00:37:59,440 --> 00:38:03,840
And they put pressure on both backups
and replication systems.

697
00:38:06,100 --> 00:38:08,860
Michael: Yes, and this is the old
read versus write thing, isn't

698
00:38:08,860 --> 00:38:09,000
it?

699
00:38:09,000 --> 00:38:12,340
And also, like if you've got a
write-heavy application or if

700
00:38:12,340 --> 00:38:15,480
you're not doing many reads at
all, you might not want many indexes

701
00:38:15,600 --> 00:38:15,940
at all.

702
00:38:15,940 --> 00:38:19,700
Like It is a trade-off in general,
not just multi-column, right?

703
00:38:20,280 --> 00:38:23,040
Nikolay: Sometimes, you know, sometimes
here it makes sense to

704
00:38:23,100 --> 00:38:28,860
split and have one-on-one relationship,
2 tables, and in 1 you

705
00:38:28,860 --> 00:38:34,920
update a lot And you give up, but
you have only like 3 or 2 columns

706
00:38:34,920 --> 00:38:35,860
and that's it.

707
00:38:35,980 --> 00:38:36,240
Right.

708
00:38:36,240 --> 00:38:39,240
And another 1 has everything else
and search and so on.

709
00:38:39,240 --> 00:38:42,700
But in this case, you will end
up with the problem that we cannot

710
00:38:42,700 --> 00:38:44,180
create an index on 2 tables.

711
00:38:44,660 --> 00:38:48,840
We lose for some queries, we will
lose in a single index scan.

712
00:38:48,840 --> 00:38:50,040
We will need to join.

713
00:38:50,660 --> 00:38:50,860
Yeah.

714
00:38:50,860 --> 00:38:51,300
So it's

715
00:38:51,300 --> 00:38:53,400
Michael: back to denormalization
topic again.

716
00:38:53,400 --> 00:38:53,720
Yeah.

717
00:38:53,720 --> 00:38:53,940
Nikolay: Yeah.

718
00:38:53,940 --> 00:38:54,440
Yeah.

719
00:38:54,640 --> 00:38:56,620
Michael: I had 1 more on include.

720
00:38:57,260 --> 00:39:02,320
I wondered if it might end up with
less bloat in some workloads.

721
00:39:02,580 --> 00:39:07,880
So when we're updating that column
that's been included as a

722
00:39:07,880 --> 00:39:11,540
payload for example, the tuple
doesn't need to move in the index.

723
00:39:11,540 --> 00:39:15,860
So in the case where we've got
them both as key columns, We're

724
00:39:15,860 --> 00:39:19,260
having to move the tuple to a different
page most likely.

725
00:39:19,320 --> 00:39:25,760
If it's like ordered by email,
then name, for example, if the

726
00:39:25,760 --> 00:39:29,760
name changes, probably the entry
will move page and you could

727
00:39:29,760 --> 00:39:30,760
end up with a split.

728
00:39:30,760 --> 00:39:35,860
So I was just thinking in terms
of updates, I wonder if we'd

729
00:39:35,860 --> 00:39:39,560
end up with less index bloat if
we used include instead of multiple

730
00:39:39,560 --> 00:39:40,060
keys.

731
00:39:40,520 --> 00:39:43,980
It seems like a minor advantage
but I wondered if that could

732
00:39:43,980 --> 00:39:44,840
help as well.

733
00:39:45,160 --> 00:39:49,360
Nikolay: Yeah it's an interesting
point I should think about

734
00:39:49,360 --> 00:39:49,860
it.

735
00:39:50,360 --> 00:39:54,640
Michael: Yeah but yeah it feels
to me like a Include feels like

736
00:39:54,940 --> 00:39:59,020
a scalpel, like a very specific
tool that is helping.

737
00:39:59,060 --> 00:40:02,120
Nikolay: Including, yeah, the word
is including.

738
00:40:02,380 --> 00:40:03,880
Are you sure?

739
00:40:04,180 --> 00:40:06,060
No, I'm not sure.

740
00:40:08,460 --> 00:40:08,960
Include?

741
00:40:09,040 --> 00:40:09,960
Yeah, it's includes.

742
00:40:11,040 --> 00:40:14,200
It was a bug in my mind sorry yeah
include

743
00:40:14,280 --> 00:40:16,800
Michael: it's not even includes
ploys include singular

744
00:40:17,260 --> 00:40:18,980
Nikolay: yes yeah

745
00:40:20,740 --> 00:40:24,340
Michael: yeah but that shows how
common like shows how commonly

746
00:40:24,340 --> 00:40:25,040
you see it.

747
00:40:25,040 --> 00:40:25,540
Nikolay: Yeah.

748
00:40:25,680 --> 00:40:26,440
Super rare.

749
00:40:27,560 --> 00:40:30,540
Michael: Including would probably make more sense, but it's just

750
00:40:30,540 --> 00:40:31,480
include, yeah.

751
00:40:33,300 --> 00:40:34,060
Nikolay: So, okay.

752
00:40:34,440 --> 00:40:35,580
Michael: Good factor naming.

753
00:40:35,660 --> 00:40:36,680
I think we're done.

754
00:40:37,120 --> 00:40:42,200
Nikolay: Yeah, it was a bag of something, some pieces of advice.

755
00:40:42,880 --> 00:40:45,060
Michael: And I think we'll come back to some of the other index

756
00:40:45,060 --> 00:40:46,360
types at some point.

757
00:40:46,620 --> 00:40:50,280
By the way, I found it just as like an interesting tidbit GIN

758
00:40:50,280 --> 00:40:53,980
and BRIN because of their structures doesn't actually matter

759
00:40:53,980 --> 00:40:56,680
about the ordering of the columns which I found fascinating.

760
00:40:57,100 --> 00:40:57,840
Nikolay: Say that again?

761
00:40:58,440 --> 00:41:03,040
Michael: GIN indexes and BRIN indexes, the order of the columns

762
00:41:03,920 --> 00:41:04,720
doesn't matter.

763
00:41:05,660 --> 00:41:05,820
Nikolay: Yeah.

764
00:41:05,820 --> 00:41:06,320
Interesting.

765
00:41:07,000 --> 00:41:07,500
Okay.

766
00:41:08,140 --> 00:41:10,920
Michael: I mean mostly we're worried about B-tree so it's not

767
00:41:10,920 --> 00:41:11,600
too important.

768
00:41:11,640 --> 00:41:13,220
Nikolay: Yeah, 90 plus percent.

769
00:41:14,040 --> 00:41:14,540
Michael: Yeah.

770
00:41:14,840 --> 00:41:16,680
Alright, nice 1 Niccolo.

771
00:41:17,220 --> 00:41:18,000
Nikolay: Thank you so much.

772
00:41:18,000 --> 00:41:19,060
Michael: Catch you next week.

773
00:41:19,180 --> 00:41:19,680
Nikolay: Bye-bye.