1
00:00:00,060 --> 00:00:02,660
Michael: Hello and welcome to Postgres.FM, a weekly show about

2
00:00:02,660 --> 00:00:03,820
all things PostgreSQL.

3
00:00:03,820 --> 00:00:06,740
I am Michael, founder of pgMustard and I'm joined as usual by

4
00:00:06,740 --> 00:00:08,360
Nikolay, founder of Postgres.AI.

5
00:00:08,360 --> 00:00:09,180
Hello Nikolay.

6
00:00:09,900 --> 00:00:11,780
Nikolay: Hi Michael, how are you?

7
00:00:12,860 --> 00:00:16,160
Let's not miss the fact that we have a guest today.

8
00:00:16,340 --> 00:00:17,660
Michael: We didn't miss it.

9
00:00:17,720 --> 00:00:19,820
You throw things off by saying, how are you?

10
00:00:19,820 --> 00:00:23,200
And I'm British, so I have to reply, and how are you?

11
00:00:23,240 --> 00:00:25,340
Nikolay: Well, let's blame the US.

12
00:00:25,520 --> 00:00:28,200
This is where I learned to ask.

13
00:00:28,380 --> 00:00:31,420
So I'm super glad we have Peter today.

14
00:00:31,560 --> 00:00:32,320
Hi, Peter.

15
00:00:32,920 --> 00:00:33,420
Peter: Hi.

16
00:00:33,940 --> 00:00:34,820
Good to be here.

17
00:00:34,820 --> 00:00:36,180
Thanks for having me on the pod.

18
00:00:36,280 --> 00:00:37,760
Michael: We are delighted to have you.

19
00:00:37,760 --> 00:00:39,100
And yeah, it's Peter Geoghegan.

20
00:00:39,180 --> 00:00:42,800
Hopefully everybody knows of Peter, but in case you don't, he's

21
00:00:42,800 --> 00:00:45,700
a major contributor and committer to Postgres for many years.

22
00:00:45,920 --> 00:00:49,060
Three or so of the last were at AWS.

23
00:00:49,740 --> 00:00:52,580
So today we're going to be talking about some of Pete's recent

24
00:00:52,580 --> 00:00:56,240
and upcoming work on Postgres related to skip scan implementation,

25
00:00:56,760 --> 00:00:58,460
but it's far more than that.

26
00:00:58,460 --> 00:01:02,020
And honestly, the more I learn about this work, the more excited

27
00:01:02,020 --> 00:01:02,980
I am about it.

28
00:01:02,980 --> 00:01:06,060
I thought I knew quite a lot, and I've been reading up in the

29
00:01:06,060 --> 00:01:08,980
last couple of days, and it's even better.

30
00:01:09,100 --> 00:01:12,600
I didn't realise, and I'm probably cheating or skipping ahead

31
00:01:12,600 --> 00:01:16,260
far too much, but I want people to understand how big this is.

32
00:01:16,320 --> 00:01:19,340
This could affect our indexing strategies going forwards.

33
00:01:19,340 --> 00:01:21,880
We could be choosing to index Postgres differently.

34
00:01:23,720 --> 00:01:26,700
I guess it's obvious in hindsight, but there are some real major

35
00:01:26,720 --> 00:01:27,220
differences.

36
00:01:27,380 --> 00:01:29,340
So yeah, I'm super excited about this work.

37
00:01:29,340 --> 00:01:33,240
I want to thank you for having started it and continuing with

38
00:01:33,240 --> 00:01:33,580
it.

39
00:01:33,580 --> 00:01:36,100
I'd be interested to hear, though, what attracted you to this

40
00:01:36,100 --> 00:01:38,860
work, and did you realise how powerful it would be when you first

41
00:01:38,860 --> 00:01:39,520
started on it?

42
00:01:39,520 --> 00:01:41,100
Peter: Actually not at first, no.

43
00:01:41,680 --> 00:01:43,980
I last did a similar interview to this.

44
00:01:43,980 --> 00:01:47,140
I think it was a predecessor podcast, technically, not this one.

45
00:01:47,620 --> 00:01:51,140
At that time, I happened to talk about how, for whatever reason,

46
00:01:51,140 --> 00:01:54,380
I kind of like finding as many different ways as possible of

47
00:01:54,380 --> 00:01:55,440
looking at a thing.

48
00:01:55,580 --> 00:01:57,740
At the time, I think I said something like, well, there's a bunch

49
00:01:57,740 --> 00:01:58,640
of different perspectives.

50
00:01:58,660 --> 00:02:02,580
There's the data structure perspective, there's the concurrency

51
00:02:02,620 --> 00:02:04,420
control perspective, and on and
on down the list.

52
00:02:05,660 --> 00:02:09,480
And it was that sort of general
mentality that brought me to

53
00:02:09,480 --> 00:02:12,940
this work, because I realised,
hey, I'm missing something here.

54
00:02:12,980 --> 00:02:16,040
My tendency to sort of collect
these different ways of looking

55
00:02:16,240 --> 00:02:19,900
at the thing, The same familiar
old topic, this tendency I have.

56
00:02:20,140 --> 00:02:24,120
I was for a long time missing this
extra bit, which I think is

57
00:02:24,120 --> 00:02:27,420
embodied by the work I'm doing
now, and its source material,

58
00:02:27,800 --> 00:02:31,020
which is a pretty obscure 1995
paper.

59
00:02:31,160 --> 00:02:33,420
It happens to be called Efficient
Search of Multidimensional

60
00:02:33,720 --> 00:02:34,220
Indexes.

61
00:02:34,340 --> 00:02:38,140
So you haven't heard of it, but
hopefully that's not any indication

62
00:02:38,140 --> 00:02:39,440
of its real-world relevance.

63
00:02:39,720 --> 00:02:42,440
It has been something that I believe
has influenced a bunch of

64
00:02:42,440 --> 00:02:43,080
other systems.

65
00:02:43,080 --> 00:02:46,080
So it's, in that sense, I think,
pretty standard stuff.

66
00:02:46,080 --> 00:02:50,280
But the implications may be, as
you were saying, Michael, are

67
00:02:50,280 --> 00:02:51,680
not fully appreciated.

68
00:02:52,280 --> 00:02:54,520
So maybe we'll start with a general
overview for those that don't

69
00:02:54,520 --> 00:02:56,100
know what skip scan is.

70
00:02:56,260 --> 00:03:01,180
It's a feature that certainly Oracle
and in recent versions MySQL

71
00:03:01,920 --> 00:03:03,740
and also SQLite have.

72
00:03:04,060 --> 00:03:05,280
They all call it skip scans.

73
00:03:05,280 --> 00:03:08,620
I think in DB2 it's called jump
scans or something like that.

74
00:03:08,620 --> 00:03:14,380
So the basic idea is to extend
the standard B-tree index access

75
00:03:14,380 --> 00:03:18,080
method to support new types of
querying without changing anything

76
00:03:18,080 --> 00:03:21,680
about the on-disk contents, without
fundamentally altering anything.

77
00:03:22,480 --> 00:03:25,740
This is possible with multi-column
indexes, composite indexes.

78
00:03:26,140 --> 00:03:29,680
The simplest case, and the one that
gets the most attention, is

79
00:03:29,820 --> 00:03:33,300
where you have an omitted prefix
column in your query.

80
00:03:33,600 --> 00:03:40,300
So, for example, you do a SELECT
from a table where b equals,

81
00:03:40,680 --> 00:03:41,640
let's say, 5.

82
00:03:42,080 --> 00:03:46,580
Here we would have a composite
index on AB, of course, A has been

83
00:03:46,820 --> 00:03:48,840
omitted from the query predicate.

84
00:03:48,840 --> 00:03:52,120
So traditionally, in Postgres at
least, you wouldn't be able

85
00:03:52,120 --> 00:03:53,340
to really use the index.

86
00:03:53,360 --> 00:03:56,680
You might do a very inefficient
full index scan where you sort

87
00:03:56,680 --> 00:03:59,740
of have to grovel through the entire
leaf level of the index,

88
00:04:00,140 --> 00:04:02,940
And that works, but obviously that's
not very efficient.

89
00:04:03,340 --> 00:04:06,540
What SkipScan will allow you to
do here is, assuming that there

90
00:04:06,540 --> 00:04:10,020
are not terribly many distinct
values in column A, as a rule

91
00:04:10,020 --> 00:04:13,080
of thumb, I would say hundreds
of distinct columns total, or

92
00:04:13,080 --> 00:04:17,140
perhaps thousands, is about the
upper limit, then you can effectively

93
00:04:17,840 --> 00:04:22,860
use that composite index as a collection
of logical sub-indexes,

94
00:04:22,960 --> 00:04:24,180
as they're sometimes called.

95
00:04:24,240 --> 00:04:27,600
So, for example, the first value
might be 1 in the column A.

96
00:04:27,600 --> 00:04:32,080
So we'll do 1, A equals 1, B equals
5, A equals 2, B equals 5,

97
00:04:32,080 --> 00:04:37,280
A equals 3, B equals 5, and exhaustively
will do multiple small

98
00:04:37,280 --> 00:04:40,940
index scans that are effectively
pretending to be one continuous

99
00:04:40,940 --> 00:04:41,820
index scan.

100
00:04:42,180 --> 00:04:45,900
And that will be certainly much
more efficient than a sequential

101
00:04:45,900 --> 00:04:47,680
scan in most cases.

102
00:04:48,080 --> 00:04:51,480
Of course, it's true that you'd
be much better off having a column

103
00:04:51,480 --> 00:04:56,040
on B directly, but it's also true
that who says that that's going

104
00:04:56,040 --> 00:04:56,820
to be possible?

105
00:04:57,040 --> 00:05:02,120
You might just not think to do
it, or in some cases, at least,

106
00:05:02,140 --> 00:05:05,660
it might not really be feasible,
which depends on the requirements

107
00:05:06,100 --> 00:05:06,900
for the user.

108
00:05:07,080 --> 00:05:10,840
This is particularly likely with
things like data warehousing

109
00:05:11,520 --> 00:05:17,300
use cases, the facts table, where
a lot of the querying is sort

110
00:05:17,300 --> 00:05:22,200
of ad hoc, and it's not really
feasible to have all the indexes

111
00:05:22,200 --> 00:05:24,100
you might possibly require ahead
of time.

112
00:05:24,380 --> 00:05:28,220
So here, it's making a composite
index, which is more useful

113
00:05:28,380 --> 00:05:30,200
for a wider variety of queries.

114
00:05:30,580 --> 00:05:35,320
If you only require adequate performance
here, sub-second performance

115
00:05:35,600 --> 00:05:36,840
is attainable.

116
00:05:37,420 --> 00:05:40,260
It's not going to give you sub-millisecond
performance, although

117
00:05:40,260 --> 00:05:41,540
that does sometimes happen.

118
00:05:41,720 --> 00:05:45,640
But it's still, for these kinds
of use cases, almost as good.

119
00:05:45,820 --> 00:05:49,240
And yet it allows you to avoid
creating a bunch of quasi-redundant

120
00:05:49,640 --> 00:05:50,140
indexes.

121
00:05:50,740 --> 00:05:54,460
So in that sense, as you were saying,
Michael, it's kind of like...

122
00:05:55,080 --> 00:05:58,020
It at least would require some
of the conventional ways that

123
00:05:58,020 --> 00:05:58,700
the updated...

124
00:05:58,940 --> 00:06:02,520
I wouldn't say it's a radical shift,
but there is a sense of

125
00:06:02,520 --> 00:06:06,140
which, OK, in light of this new
capability, we're going to have

126
00:06:06,140 --> 00:06:09,720
to rethink some of the rules of
thumb we have for indexing in

127
00:06:09,720 --> 00:06:10,220
general.

128
00:06:10,440 --> 00:06:12,500
I think that's likely to work out
that way.

129
00:06:12,500 --> 00:06:16,640
I would say I'm excited to see
how people end up using the feature

130
00:06:16,640 --> 00:06:17,300
in practice.

131
00:06:17,520 --> 00:06:18,620
Many things are possible.

132
00:06:19,460 --> 00:06:23,340
So that example I gave is a sort
of very traditional simple example.

133
00:06:23,940 --> 00:06:28,280
But also worth noting that the
techniques that I'm working on

134
00:06:28,280 --> 00:06:31,560
include similar, though slightly
different, cases where, for

135
00:06:31,560 --> 00:06:34,180
example, you have a range.

136
00:06:34,280 --> 00:06:39,820
So suppose the query is SELECT
* from sales, where sales date

137
00:06:39,820 --> 00:06:43,500
between the 1st of August and the
10th of August.

138
00:06:43,940 --> 00:06:49,020
Then it could be that you're not
enumerating every possible date

139
00:06:49,020 --> 00:06:52,260
in the entire table, but just those
dates from that range.

140
00:06:52,280 --> 00:06:54,340
But the same exact idea still.

141
00:06:54,480 --> 00:06:59,220
So for the 1st of August, plus
whatever the actual thing that

142
00:06:59,220 --> 00:07:02,220
appeared in your predicate in the
query is, let's say, again,

143
00:07:02,220 --> 00:07:04,040
it's b equals 5, I don't know.

144
00:07:04,440 --> 00:07:06,300
That for the first of August, for
the second of August, for the

145
00:07:06,300 --> 00:07:07,160
third of August.

146
00:07:07,240 --> 00:07:10,920
So from the query perspective,
you haven't omitted the column.

147
00:07:10,920 --> 00:07:16,080
The column appears in this between
range, but it is effectively

148
00:07:16,100 --> 00:07:16,980
the same idea.

149
00:07:17,720 --> 00:07:20,900
It also should be pointed out that
these are all very general

150
00:07:21,160 --> 00:07:21,600
techniques.

151
00:07:21,600 --> 00:07:24,360
They can be combined in all kinds
of different ways.

152
00:07:24,840 --> 00:07:28,440
And that's where it becomes super
interesting.

153
00:07:28,440 --> 00:07:32,060
Like, you have very, very complicated
combinations of these things.

154
00:07:32,220 --> 00:07:36,900
The example query that I used in
my initial email to the hackers

155
00:07:36,900 --> 00:07:40,320
and adding this to sell it to people
was taken from the original

156
00:07:40,320 --> 00:07:41,180
source paper.

157
00:07:41,980 --> 00:07:44,560
And in that instance, it was, I
believe...

158
00:07:45,020 --> 00:07:49,540
So you have an initial omitted
prefix column, which is on a column

159
00:07:49,540 --> 00:07:50,220
called Apartments.

160
00:07:50,580 --> 00:07:51,840
Then you have a range.

161
00:07:52,660 --> 00:07:56,260
Then you have an in, and then you
have another in.

162
00:07:56,740 --> 00:07:58,580
So it's all kind of working together.

163
00:07:58,580 --> 00:08:00,180
You see it's like this sort of...

164
00:08:00,480 --> 00:08:03,260
And that's where I think it can
make a huge difference when you

165
00:08:03,260 --> 00:08:06,200
combine all these techniques together,
which you will be able

166
00:08:06,200 --> 00:08:06,940
to do freely.

167
00:08:07,540 --> 00:08:12,080
So there, effectively, you have
this query that would conventionally

168
00:08:12,280 --> 00:08:15,300
require a huge and inefficient index
scan, which is just kind of

169
00:08:15,300 --> 00:08:15,960
a non-starter.

170
00:08:16,400 --> 00:08:19,400
What you actually would have to
do is have a bunch of single-column

171
00:08:19,440 --> 00:08:23,900
indexes and use some kind of bitmap
or something like that.

172
00:08:24,140 --> 00:08:28,480
But then you give up the ability
to have the sort order from

173
00:08:28,480 --> 00:08:29,020
the index.

174
00:08:29,020 --> 00:08:31,060
You can't do a limit 5 or something
like that.

175
00:08:31,060 --> 00:08:34,140
You can't use a group aggregate
directly, etc.

176
00:08:34,780 --> 00:08:38,940
Whereas this way, you can pretty
well convert what you would

177
00:08:38,940 --> 00:08:44,040
think of as 1 big query into what
you would think of as maybe

178
00:08:44,040 --> 00:08:49,280
thousands of tiny queries that
each land on 1 if page again and

179
00:08:49,280 --> 00:08:49,780
again.

180
00:08:50,080 --> 00:08:52,840
And it's sort of a lot of small
queries pretending to be 1 big

181
00:08:52,840 --> 00:08:53,340
one.

182
00:08:53,560 --> 00:08:57,740
It's not apparent to you looking
at EXPLAIN Analyzer that's going

183
00:08:57,740 --> 00:08:58,840
on, but it is.

184
00:08:59,200 --> 00:09:03,820
So effectively, you're rewriting
the query as if it was just

185
00:09:03,820 --> 00:09:08,400
a bunch of tiny queries in a way
that is fairly intuitively obvious.

186
00:09:08,560 --> 00:09:11,500
It wouldn't be very practical to
do this, but you could do it

187
00:09:11,500 --> 00:09:15,220
by procedurally generating a bunch
of these small queries yourself,

188
00:09:15,300 --> 00:09:16,580
in principle, at least.

189
00:09:17,020 --> 00:09:21,160
And that is the wooden core clever
idea here.

190
00:09:21,740 --> 00:09:24,920
It's surprising how far this wooden
core idea will get you, because

191
00:09:24,920 --> 00:09:27,680
it is a fairly simple idea in the
end.

192
00:09:27,720 --> 00:09:31,100
But there's a bunch of things that
I haven't even considered

193
00:09:31,100 --> 00:09:33,540
yet that the paper also contemplates.

194
00:09:33,660 --> 00:09:36,840
For example, it has a part about
not-in predicates.

195
00:09:37,420 --> 00:09:40,640
You would think it would be impossible
to use a not-in predicate

196
00:09:40,640 --> 00:09:41,460
for an index.

197
00:09:42,260 --> 00:09:43,720
But again, it's the same idea.

198
00:09:44,380 --> 00:09:48,520
It's like skip-scan, except rather
than enumerating every possible

199
00:09:48,520 --> 00:09:51,140
value in the index, it's enumerating
every possible value in

200
00:09:51,140 --> 00:09:52,380
the index except these ones.

201
00:09:52,660 --> 00:09:55,120
So it's basically the same thing
again and again.

202
00:09:55,120 --> 00:10:00,040
So it just buys you an awful lot,
one core simple idea, which I

203
00:10:00,060 --> 00:10:01,020
find kind of cool.

204
00:10:01,520 --> 00:10:03,620
So that's the big picture.

205
00:10:04,020 --> 00:10:07,200
Yeah, something that I'm expecting
to get into Postgres 18, and

206
00:10:07,200 --> 00:10:10,820
something that builds on work that
started with Postgres 17.

207
00:10:11,600 --> 00:10:13,360
Michael: So yeah, that's a great
explanation.

208
00:10:13,660 --> 00:10:18,220
I think the multi-column thing
is the complex example you mentioned

209
00:10:18,480 --> 00:10:21,560
is the part that blows my mind
and starts to make me think, you

210
00:10:21,560 --> 00:10:24,900
know all this advice we've given
that's, you generally advise

211
00:10:24,900 --> 00:10:29,540
people against huge multi-column
indexes because they often only

212
00:10:29,540 --> 00:10:32,540
target then like a very, very niche
set of queries.

213
00:10:33,160 --> 00:10:36,720
And you end up accepting that you're
going to do a large index

214
00:10:36,720 --> 00:10:39,680
scan with a bunch of rows being
filtered, which we know is inefficient,

215
00:10:39,720 --> 00:10:41,320
but it's currently the best solution.

216
00:10:41,320 --> 00:10:41,980
You lost

217
00:10:41,980 --> 00:10:43,240
Nikolay: hot updates also.

218
00:10:43,280 --> 00:10:47,480
I'm going to continue advising
against too many columns in indexes.

219
00:10:48,060 --> 00:10:49,000
Michael: Well, okay.

220
00:10:49,000 --> 00:10:51,560
But I can see the flip side.

221
00:10:52,360 --> 00:10:55,280
Once you've got fewer, once you
only need fewer indexes, you

222
00:10:55,280 --> 00:10:57,980
don't have to support, like let's
say you've got 5 columns that

223
00:10:57,980 --> 00:10:59,180
are important here.

224
00:10:59,660 --> 00:11:03,340
Sometimes I've seen people index
all 5 in 3 different orders

225
00:11:03,680 --> 00:11:06,540
to support different versions of...

226
00:11:07,120 --> 00:11:10,020
And I feel like that kind of thing
will become less important

227
00:11:10,120 --> 00:11:12,160
if we even need it at all anymore.

228
00:11:12,500 --> 00:11:13,820
So that's mind-blowing.

229
00:11:13,860 --> 00:11:16,780
But just to go back, Your thought
process is super interesting

230
00:11:16,780 --> 00:11:17,320
here as well.

231
00:11:17,320 --> 00:11:21,920
So did you come across this paper
as part of widening your horizons

232
00:11:22,360 --> 00:11:24,880
and then think of all these implementation
ideas?

233
00:11:24,880 --> 00:11:27,380
Or was it, you thought, this is
an area, this is a blind spot

234
00:11:27,380 --> 00:11:30,140
for me, I should seek out papers
in this area?

235
00:11:30,660 --> 00:11:33,340
Peter: Well, there's a surprisingly
little written about the topic,

236
00:11:33,340 --> 00:11:36,980
actually, even though, as I say,
it's implemented in a bunch

237
00:11:36,980 --> 00:11:38,800
of other systems, like Postgres.

238
00:11:39,480 --> 00:11:42,440
I also think it's probably true
that they don't exploit it to

239
00:11:42,440 --> 00:11:43,360
its full extent.

240
00:11:44,180 --> 00:11:48,320
The actual paper was written by
people that were working on a

241
00:11:48,320 --> 00:11:53,920
system called Non-Stop SQL in the
1990s, which hasn't quite completely

242
00:11:53,920 --> 00:11:56,780
died yet, but it certainly has
far fewer users.

243
00:11:56,780 --> 00:11:59,440
It was a very influential system,
though.

244
00:11:59,680 --> 00:12:03,560
So I think that it's likely that
the full generality of it hasn't

245
00:12:03,560 --> 00:12:05,580
been exploited for whatever reason
in other systems.

246
00:12:06,040 --> 00:12:09,060
My hope is that my implementation
will do just that.

247
00:12:09,340 --> 00:12:10,340
It's kind of weird.

248
00:12:10,520 --> 00:12:14,240
The example complicated query I
gave, what I noticed about that

249
00:12:14,240 --> 00:12:17,140
is, most of the time, if I changed
the order of the columns,

250
00:12:17,640 --> 00:12:19,820
not always, but most of the time,
it made hardly any difference.

251
00:12:20,280 --> 00:12:20,780
Wow.

252
00:12:21,580 --> 00:12:22,760
Right, which is sort of like...

253
00:12:23,680 --> 00:12:26,960
That wouldn't always be true, but
just given the particulars

254
00:12:26,960 --> 00:12:30,060
of the query that I'm talking about,
it kind of didn't matter

255
00:12:30,060 --> 00:12:30,560
if I swapped.

256
00:12:30,560 --> 00:12:33,080
It made maybe a small difference
in favor or get...

257
00:12:33,580 --> 00:12:36,760
But that is a bit mind-blowing,
sure, that that could ever be

258
00:12:36,760 --> 00:12:37,260
true.

259
00:12:37,680 --> 00:12:40,960
Again, it's going to be most compelling
in cases where you don't

260
00:12:40,960 --> 00:12:42,900
know what the requirements are
ahead of time.

261
00:12:43,080 --> 00:12:45,260
It's very good for ad hoc indexing.

262
00:12:46,060 --> 00:12:50,080
And you can always beat it by having
a dedicated index if you

263
00:12:50,080 --> 00:12:51,500
want to pay for that.

264
00:12:52,120 --> 00:12:55,940
But that's perhaps impractical
for a lot of users.

265
00:12:56,260 --> 00:12:59,540
And again, I think in OLTP apps,
it is still useful, but it's

266
00:12:59,540 --> 00:13:02,960
more for avoiding the worst case,
for something that you maybe

267
00:13:02,960 --> 00:13:04,820
should have anticipated but didn't
for whatever reason.

268
00:13:04,820 --> 00:13:08,440
It's good for that more so than
something that you would ideally

269
00:13:08,440 --> 00:13:09,440
plan on using.

270
00:13:09,520 --> 00:13:11,440
But it is relevant to both.

271
00:13:13,040 --> 00:13:15,140
As for why, I don't know.

272
00:13:15,140 --> 00:13:19,200
I think it entered my awareness
about 5 or 6 years ago, but I

273
00:13:19,200 --> 00:13:20,700
didn't fully understand it.

274
00:13:20,980 --> 00:13:23,360
And it took me a little while to
figure out the right way of

275
00:13:23,360 --> 00:13:24,140
doing it.

276
00:13:24,660 --> 00:13:30,160
So having explained all that, it
should also be pointed out that

277
00:13:30,160 --> 00:13:34,700
Whether or not you actually want
to rewrite your query in this

278
00:13:34,700 --> 00:13:37,540
manner, it all depends, doesn't
it?

279
00:13:37,580 --> 00:13:40,760
It could be, for example, that
the old way of doing it happens

280
00:13:40,760 --> 00:13:45,160
to be the most efficient one, because,
yes, you could, in principle,

281
00:13:45,480 --> 00:13:49,180
rewrite it into 100 different small
queries, but those small

282
00:13:49,180 --> 00:13:51,960
queries would all be hitting the
same few leaf pages anyway,

283
00:13:51,960 --> 00:13:57,180
so you actually would want to do
maybe not 100, maybe 1 or 2

284
00:13:57,180 --> 00:13:59,440
at most small queries there.

285
00:13:59,440 --> 00:14:01,800
How do you know ahead of time which
it is?

286
00:14:01,800 --> 00:14:03,220
Well, you don't, right?

287
00:14:03,620 --> 00:14:04,940
But you don't have to.

288
00:14:05,280 --> 00:14:09,940
So I've had this intuition that
if I solve that problem first,

289
00:14:10,200 --> 00:14:15,020
if I solve it for in-lists, then
after that, I would be able

290
00:14:15,020 --> 00:14:16,280
to move on to this.

291
00:14:16,280 --> 00:14:17,980
That would be a useful foundation.

292
00:14:18,340 --> 00:14:21,400
So Postgres 17, this has been implemented.

293
00:14:22,460 --> 00:14:26,820
It'll be available, I think, in
3 weeks when we release 17, if

294
00:14:26,880 --> 00:14:27,940
all goes as planned.

295
00:14:28,580 --> 00:14:32,840
So that took the existing capability
for indexing in-lists and

296
00:14:32,840 --> 00:14:39,560
made it so that if you have 50
integers, say, and they're contiguous

297
00:14:39,560 --> 00:14:45,160
integers like 1, 2, 3, 4, 5,
through to 50, then suppose they're

298
00:14:45,160 --> 00:14:47,280
all co-located on the same leaf
page, then you're going to do

299
00:14:47,280 --> 00:14:49,620
1 index scan and read them all
at once.

300
00:14:49,760 --> 00:14:52,040
If, on the other hand, it turns
out that they're spread out all

301
00:14:52,040 --> 00:14:55,200
over the index because maybe it's
low cardinality, or maybe it's

302
00:14:55,200 --> 00:14:58,460
low cardinality just in that part
of the index, there's no rule

303
00:14:58,460 --> 00:14:59,760
that says it has to be 1 or the
other.

304
00:14:59,760 --> 00:15:02,720
It could be at different points
in the key space.

305
00:15:03,340 --> 00:15:05,140
Low cardinality, others could be
high.

306
00:15:05,860 --> 00:15:10,240
So making a dynamic choice is very
useful, I think, because you

307
00:15:10,240 --> 00:15:11,340
don't have to know.

308
00:15:11,680 --> 00:15:14,560
That's what makes it safe to aggressively
apply these optimizations,

309
00:15:14,600 --> 00:15:18,540
because if it doesn't work out,
in any case, the v3 scan code

310
00:15:18,540 --> 00:15:24,040
is able to combine the would-be
accesses into 1 on the fly.

311
00:15:24,160 --> 00:15:28,740
So you'll get the same behavior
as in previous versions, if

312
00:15:28,740 --> 00:15:30,280
and only if that makes sense.

313
00:15:30,660 --> 00:15:35,660
So that sort of flexibility, that
dynamic behavior, seemed like

314
00:15:35,660 --> 00:15:36,760
an essential prerequisite.

315
00:15:37,360 --> 00:15:38,940
So it's not actually true

316
00:15:39,000 --> 00:15:42,720
It's conceptually the case that
you're rewriting the big query

317
00:15:42,720 --> 00:15:46,440
to lots of little ones and then
combining the result, allowing

318
00:15:46,440 --> 00:15:49,760
them to pretend that they are one
big index scan, even though

319
00:15:49,760 --> 00:15:50,460
they're not.

320
00:15:50,660 --> 00:15:54,060
But the actual degree to which
you do that, how much atomizing

321
00:15:54,060 --> 00:15:57,740
actually happens, is determined
on the fly as the scan progresses.

322
00:15:58,180 --> 00:16:02,600
So it's kind of indeterminate if
it's how many actual index scans

323
00:16:02,600 --> 00:16:05,560
you're doing, but that doesn't
matter, if you get what I mean.

324
00:16:05,740 --> 00:16:06,540
Michael: Yeah, it's so cool.

325
00:16:06,540 --> 00:16:09,020
It doesn't matter to the, it's
like one of those things that as

326
00:16:09,020 --> 00:16:12,280
a user, I don't have to worry about
it, think about it at all.

327
00:16:12,600 --> 00:16:15,300
As somebody who's implementing
it though, it feels like it makes

328
00:16:15,300 --> 00:16:19,740
it much more complex for you to
build.

329
00:16:19,920 --> 00:16:20,880
Is that fair?

330
00:16:21,420 --> 00:16:22,540
Peter: Yes and no.

331
00:16:22,540 --> 00:16:24,280
I think there's two ways of looking
at it.

332
00:16:24,280 --> 00:16:27,160
There's what you just said, which
is, in an immediate sense,

333
00:16:27,160 --> 00:16:27,660
true.

334
00:16:27,720 --> 00:16:28,760
It is more complicated.

335
00:16:29,180 --> 00:16:31,100
But in another sense, It isn't,
because...

336
00:16:32,220 --> 00:16:33,380
Complicated to who?

337
00:16:33,900 --> 00:16:37,620
I mean, the optimizer has to model
all these costs, right?

338
00:16:37,720 --> 00:16:39,860
And it turns out that's hard.

339
00:16:40,120 --> 00:16:41,920
And it turns out that statistics
are...

340
00:16:42,320 --> 00:16:43,380
Well, they're statistics.

341
00:16:44,340 --> 00:16:47,640
And if you're talking about, like,
you have all these secret

342
00:16:47,640 --> 00:16:50,580
hidden correlations, but these
are multiple column indexes, of

343
00:16:50,580 --> 00:16:53,260
course, so we have statistics for
each column, but at least,

344
00:16:53,260 --> 00:16:56,820
unless you go out of your way to,
there's no correlated statistics

345
00:16:57,040 --> 00:16:57,680
for columns.

346
00:16:58,140 --> 00:17:01,080
There's a generic assumption that
these are independent, even

347
00:17:01,080 --> 00:17:03,220
though very often they're not.

348
00:17:03,940 --> 00:17:07,740
So having the access path be very
forgiving, so the optimizer

349
00:17:07,900 --> 00:17:10,900
gets things quite wrong, but it
actually doesn't matter, or it

350
00:17:10,900 --> 00:17:14,640
doesn't matter very much at least,
is way easier, at least from

351
00:17:14,640 --> 00:17:16,760
the point of view of the query
optimizer.

352
00:17:17,040 --> 00:17:22,160
And on balance, that probably makes
it much, much easier to actually,

353
00:17:22,180 --> 00:17:24,740
as a practical matter, make it
work.

354
00:17:25,520 --> 00:17:28,440
I think that that will probably
turn out to be really important,

355
00:17:28,440 --> 00:17:30,660
although at this point, that's
pretty speculative.

356
00:17:31,460 --> 00:17:34,020
Michael: I had this on my list
of things that might come up,

357
00:17:34,020 --> 00:17:36,100
but are the cost models super interesting?

358
00:17:36,100 --> 00:17:39,280
It seems to me, and I haven't checked
for sure, but from looking

359
00:17:39,280 --> 00:17:43,200
at some examples, you might not
have changed the cost estimations

360
00:17:43,260 --> 00:17:44,440
at least in 2017.

361
00:17:45,480 --> 00:17:48,080
You might consider that because
you're making a bunch of things

362
00:17:48,080 --> 00:17:51,900
more efficient in a bunch of cases,
you might try and reduce

363
00:17:51,900 --> 00:17:55,840
the costs for those to encourage
those scans when they make sense.

364
00:17:55,840 --> 00:17:57,640
But I guess we're already doing...

365
00:17:57,900 --> 00:18:01,140
Anyway, I guess my question is,
have you changed the cost model?

366
00:18:01,160 --> 00:18:02,580
And if not, why not?

367
00:18:02,720 --> 00:18:05,620
Peter: Yes, but I'm not surprised
that you didn't notice it because

368
00:18:05,820 --> 00:18:09,080
the main change that I made to
the cost model was that now it

369
00:18:09,080 --> 00:18:09,780
will saturate.

370
00:18:10,520 --> 00:18:14,440
Because, a very simple observation,
the most expensive possible

371
00:18:14,440 --> 00:18:16,780
index scan ought to be a full index
scan.

372
00:18:17,040 --> 00:18:18,580
You know, very simple.

373
00:18:18,780 --> 00:18:20,140
If you scan every single...

374
00:18:20,140 --> 00:18:25,120
There's a pretty tangible maximum
cost to, in this hand, at least

375
00:18:25,120 --> 00:18:27,780
in this world where this stuff
is available, as it will be in

376
00:18:27,780 --> 00:18:32,440
17, where it is physically impossible
to scan the same leaf page

377
00:18:32,440 --> 00:18:37,000
a second time now, no matter how
the constants have been specified

378
00:18:37,000 --> 00:18:41,780
in the query itself because automatically
that just cannot happen.

379
00:18:42,100 --> 00:18:43,340
It's a physical impossibility.

380
00:18:43,940 --> 00:18:46,420
So we know that we can take to
the bank.

381
00:18:46,840 --> 00:18:48,420
And so the cost model saturates.

382
00:18:48,840 --> 00:18:51,380
There's an awful lot of uncertainty,
except when it comes to

383
00:18:51,380 --> 00:18:51,960
the worst case.

384
00:18:51,960 --> 00:18:55,520
We know for absolute sure that
in the very worst case we'll

385
00:18:55,520 --> 00:18:57,220
scan every leaf page once.

386
00:18:57,560 --> 00:18:59,340
So that's where you notice it more.

387
00:18:59,380 --> 00:19:04,640
So you can have an absurdly long
in-list, and at a certain point,

388
00:19:04,640 --> 00:19:08,400
it'll just stop adding additional
cost, since it is physically

389
00:19:08,400 --> 00:19:11,760
impossible for the cost to keep
going up even as you continue

390
00:19:11,760 --> 00:19:12,540
to add these.

391
00:19:12,740 --> 00:19:16,920
So that sort of plays into what
I was saying about having some

392
00:19:16,920 --> 00:19:19,740
general understanding of what the
worst possible case can be

393
00:19:19,740 --> 00:19:22,000
when everything goes wrong, when
the statistics are garbage.

394
00:19:22,360 --> 00:19:28,380
That assurance, in the end, makes
everything easier rather than

395
00:19:28,380 --> 00:19:29,700
making it harder, I feel.

396
00:19:30,500 --> 00:19:31,600
It's also just useful.

397
00:19:32,520 --> 00:19:36,840
Michael: Mason I like that it seems
pessimistic but also safe.

398
00:19:36,980 --> 00:19:39,980
Peter: That's exactly it, sort of.

399
00:19:40,520 --> 00:19:45,140
I think statistics is being kind
of inherently just very it's

400
00:19:45,140 --> 00:19:48,140
amazing that optimizers work as
well as they do, I say.

401
00:19:48,940 --> 00:19:53,800
Like, people don't realise, I feel,
often, when something goes

402
00:19:53,800 --> 00:19:57,180
wrong, well, it was next to a miracle
that everything was working

403
00:19:57,180 --> 00:19:59,280
as well as it was.

404
00:20:00,020 --> 00:20:02,900
And probably things were not technically
working optimally, but

405
00:20:02,900 --> 00:20:06,500
you didn't know that because you
were happy with adequate performance.

406
00:20:07,000 --> 00:20:09,880
You didn't consider that that was
technically 10 times slower

407
00:20:09,880 --> 00:20:13,440
than what could have, in principle,
have been attained with the

408
00:20:13,440 --> 00:20:14,720
same index and such.

409
00:20:14,960 --> 00:20:20,900
So, like, I think that if we're
honest, we have to admit that

410
00:20:20,900 --> 00:20:25,640
there's an awful lot of uncertainty
about the cost models, and

411
00:20:25,640 --> 00:20:27,100
that doesn't necessarily matter.

412
00:20:27,980 --> 00:20:33,340
We're more encouraging the idea
that it's amongst people that

413
00:20:33,340 --> 00:20:37,400
are on the hackers list, that better
to have a kind of...

414
00:20:37,440 --> 00:20:40,580
I wouldn't even call it insurance,
that's too strong a word,

415
00:20:40,640 --> 00:20:41,620
but just have...

416
00:20:42,380 --> 00:20:46,280
Account for variation at runtime
dynamically rather than doing

417
00:20:46,280 --> 00:20:48,480
so statically in the query plan.

418
00:20:48,600 --> 00:20:52,800
That just seems like, to me, something
that has very little downside

419
00:20:52,800 --> 00:20:54,360
and potentially quite a lot of
upside.

420
00:20:54,960 --> 00:20:56,480
It is a more general idea.

421
00:20:56,480 --> 00:20:57,520
It doesn't just apply here.

422
00:20:57,520 --> 00:20:59,020
It applies, for example, with...

423
00:20:59,160 --> 00:21:00,040
I don't know.

424
00:21:00,060 --> 00:21:02,300
You can do some similar things
with hash joins.

425
00:21:02,640 --> 00:21:06,020
It does get more speculative if
you keep going down the line

426
00:21:06,020 --> 00:21:07,320
of things that are like that.

427
00:21:07,480 --> 00:21:11,580
But why would you trust an uncertain
selectivity estimate if

428
00:21:11,580 --> 00:21:12,680
you didn't have to?

429
00:21:12,740 --> 00:21:16,480
There's no reason why you can't
have some kind of way of recovering

430
00:21:16,880 --> 00:21:20,320
from a set summation at runtime,
at least in certain cases.

431
00:21:21,020 --> 00:21:25,280
I guess that it is related to that
whole way of thinking about

432
00:21:25,280 --> 00:21:25,780
it.

433
00:21:25,920 --> 00:21:29,480
That is the projects I'm working
on are sort of in that way,

434
00:21:29,640 --> 00:21:30,360
I feel.

435
00:21:31,220 --> 00:21:35,080
Nikolay: Do you mind if I break
the structure a little bit as

436
00:21:35,080 --> 00:21:35,580
usual?

437
00:21:36,060 --> 00:21:36,720
Peter: Go ahead.

438
00:21:38,100 --> 00:21:42,280
Nikolay: Honestly, I didn't read
the whole discussion.

439
00:21:42,560 --> 00:21:45,960
So I'm going to ask some newbie
questions.

440
00:21:46,560 --> 00:21:49,700
First of all, I see this potential
of...

441
00:21:50,740 --> 00:21:52,760
There is loose index scan as well,
right?

442
00:21:52,760 --> 00:21:55,740
And this is a very different thing
for distinct values.

443
00:21:55,760 --> 00:22:00,200
And we have a wiki page, and we're
constantly using this technique.

444
00:22:00,360 --> 00:22:04,600
And we also mentioned, like you
did, other database systems have

445
00:22:04,600 --> 00:22:07,060
it implemented, Postgres doesn't.

446
00:22:07,660 --> 00:22:11,720
But here we discuss skip scan and
skip scan, and this is very

447
00:22:11,720 --> 00:22:12,840
different, I see.

448
00:22:12,840 --> 00:22:18,000
But I just googled a little bit
and I found people use skip scan

449
00:22:18,120 --> 00:22:19,740
in that context as well, right?

450
00:22:19,740 --> 00:22:23,220
So if I'm right, some people reaching
this point will still not

451
00:22:23,220 --> 00:22:25,420
fully understand what we'll discuss,
right?

452
00:22:25,760 --> 00:22:26,260
Peter: Alex.

453
00:22:27,180 --> 00:22:27,880
Nikolay: It's possible.

454
00:22:27,880 --> 00:22:28,280
Right?

455
00:22:28,280 --> 00:22:30,040
So we discussed the problem.

456
00:22:30,660 --> 00:22:31,400
There is a problem.

457
00:22:31,400 --> 00:22:37,540
We have an index on columns A, B,
but filter is only on B, the first

458
00:22:37,540 --> 00:22:42,440
layer of A is not relevant, and
usually it means this index cannot

459
00:22:42,440 --> 00:22:43,100
be applied

460
00:22:43,660 --> 00:22:44,160
Peter: Well.

461
00:22:45,060 --> 00:22:48,080
It's worth pointing out that with
what I'm talking about, it

462
00:22:48,080 --> 00:22:51,380
only applies when you have at least
2 index columns, whereas

463
00:22:51,380 --> 00:22:53,700
I don't believe that's true with
Skip Scan.

464
00:22:54,100 --> 00:22:57,040
You can still usefully skip if
there's only 1 column.

465
00:22:57,360 --> 00:23:00,140
Nikolay: Because there we need
distinct values, it's a different

466
00:23:00,180 --> 00:23:00,600
task.

467
00:23:00,600 --> 00:23:04,820
Here we need all values, but we
have an index, which usually we

468
00:23:04,820 --> 00:23:06,800
say you have the wrong index, right?

469
00:23:07,120 --> 00:23:12,680
But your optimization will make
it good, but only if a column

470
00:23:12,800 --> 00:23:14,340
has not many values, right?

471
00:23:14,340 --> 00:23:17,200
If it has a lot of values, maybe
it won't be good, right?

472
00:23:17,640 --> 00:23:19,940
Peter: It's certainly not going
to be able to help because,

473
00:23:19,940 --> 00:23:24,860
of course, technically you could
say, okay, there's a million

474
00:23:25,120 --> 00:23:28,260
logically distinct subindexes,
but that degenerates to a full

475
00:23:28,260 --> 00:23:30,200
index scan, so it's the same.

476
00:23:30,560 --> 00:23:33,400
By the way, that might still be
the best plan, all things considered.

477
00:23:33,740 --> 00:23:35,180
We can't rule that out either.

478
00:23:35,220 --> 00:23:38,280
Sometimes that is actually the
best available plan.

479
00:23:38,500 --> 00:23:40,760
But overall, yeah, you're probably
not going to want to use it

480
00:23:40,760 --> 00:23:45,120
unless you have hundreds, perhaps
low thousands, or occasionally

481
00:23:45,140 --> 00:23:47,720
maybe tens of thousands, but after
that it's unlikely.

482
00:23:48,180 --> 00:23:51,800
Nikolay: Will the planner decide,
will it have some threshold

483
00:23:52,120 --> 00:23:55,920
to decide when skip scan should
be applied when it already doesn't

484
00:23:55,920 --> 00:23:56,700
make sense?

485
00:23:57,740 --> 00:24:00,360
Peter: Not as such, because as
I sort of touched on already,

486
00:24:00,360 --> 00:24:03,400
there is no special skip scan index
node.

487
00:24:03,840 --> 00:24:06,900
I think that's a bad idea because
it's all dynamic, right?

488
00:24:07,060 --> 00:24:08,300
It's an additive thing.

489
00:24:08,400 --> 00:24:10,080
Nikolay: Yeah, now I'm catching
up.

490
00:24:10,080 --> 00:24:12,980
Peter: Sometimes for all the usual
reasons, you might want to

491
00:24:12,980 --> 00:24:16,240
have an index only scan that happens
to use a skip scan, or you

492
00:24:16,240 --> 00:24:19,060
might want to bitmap index scan,
or you might want to parallel

493
00:24:19,200 --> 00:24:21,040
index scan, etc.

494
00:24:22,360 --> 00:24:26,780
It is useful, I think, to have
a distinct kind of index path.

495
00:24:27,380 --> 00:24:30,320
More choices means more wrong choices
from the point of view

496
00:24:30,320 --> 00:24:31,100
of the optimizer.

497
00:24:31,800 --> 00:24:35,080
So I feel like that's a distinct
advantage of the approach I've

498
00:24:35,080 --> 00:24:35,580
taken.

499
00:24:35,840 --> 00:24:38,480
Of course, I still have to figure
out a way of making it not

500
00:24:38,480 --> 00:24:40,620
matter when that's not the right
thing to do.

501
00:24:41,000 --> 00:24:43,780
And that's not very easy, but I'm
almost...

502
00:24:44,100 --> 00:24:45,900
I think I've got that part figured
out.

503
00:24:46,220 --> 00:24:46,980
We'll see.

504
00:24:47,180 --> 00:24:51,960
Michael: Do you mean to like minimize
regressions at times where...

505
00:24:51,960 --> 00:24:52,700
Peter: Yeah, OK.

506
00:24:53,260 --> 00:24:56,720
Because again, occasionally it's
going to be intrinsically the

507
00:24:56,720 --> 00:25:00,140
best thing is to just scan the
index without trying to skip at

508
00:25:00,140 --> 00:25:00,640
all.

509
00:25:00,680 --> 00:25:06,240
So you don't want to waste a lot
of CPU cycles on a useless enterprise

510
00:25:06,740 --> 00:25:08,560
that is skipping when you shouldn't
be skipping.

511
00:25:08,560 --> 00:25:11,460
You really do want to fall back
on essentially the same behaviour

512
00:25:11,460 --> 00:25:12,880
as previous releases there.

513
00:25:13,080 --> 00:25:17,640
Now, mind you, overall, hopefully
the optimizer just wouldn't

514
00:25:17,640 --> 00:25:21,060
pick that way of doing it, because
it would be wrong.

515
00:25:21,060 --> 00:25:23,940
It wouldn't be the fastest available
plan, given the available

516
00:25:23,940 --> 00:25:24,440
indexes.

517
00:25:24,760 --> 00:25:28,080
So that only applies at all when
you're already doing something

518
00:25:28,080 --> 00:25:31,160
that's basically pretty slow.

519
00:25:31,380 --> 00:25:33,840
But overall, yeah, I think we're
going to have to make sure that

520
00:25:33,840 --> 00:25:38,140
you don't notice it when it isn't
able to do what you would hope

521
00:25:38,140 --> 00:25:38,800
it would.

522
00:25:39,520 --> 00:25:42,240
That is sort of complicated, for
sure.

523
00:25:42,540 --> 00:25:45,060
Michael: Just to make sure I've
understood, if we could do a

524
00:25:45,060 --> 00:25:49,800
scan without skips, if we had the
perfect index for a very efficient,

525
00:25:49,920 --> 00:25:53,140
let's say, index-only scan, that
would have been the index that

526
00:25:53,140 --> 00:25:55,640
would have been chosen in the first
place because its cost estimate

527
00:25:55,640 --> 00:25:56,620
would have been lowest.

528
00:25:56,980 --> 00:26:01,440
So already when we are skipping,
you're doing a scan that involves

529
00:26:01,440 --> 00:26:02,140
some skips.

530
00:26:02,240 --> 00:26:05,400
We haven't had that available,
and therefore, yeah, okay.

531
00:26:05,500 --> 00:26:06,500
That's really cool.

532
00:26:06,760 --> 00:26:09,280
Peter: Yeah, in general, I think
it makes more sense to think

533
00:26:09,280 --> 00:26:12,740
about these sorts of queries as
being on a continuum.

534
00:26:13,260 --> 00:26:18,140
One extreme is where there's, let's
say, no more than 5 or 10 distinct.

535
00:26:18,580 --> 00:26:21,300
And then the fact that you're doing
5 different descents of the

536
00:26:21,300 --> 00:26:22,740
index is hardly noticeable.

537
00:26:23,100 --> 00:26:26,360
It's maybe less than a millisecond
in the end anyway.

538
00:26:26,820 --> 00:26:29,040
So that would be a very favourable
case for skipping.

539
00:26:29,240 --> 00:26:32,720
And then you have less favourable
cases, I suppose, where cardinality

540
00:26:33,640 --> 00:26:34,400
goes up.

541
00:26:34,540 --> 00:26:37,040
And eventually, you get to the
point where you can't do any skipping

542
00:26:37,040 --> 00:26:40,140
at all, and that's where you don't
want to pay a penalty for

543
00:26:40,640 --> 00:26:42,940
being open to the possibility of
skipping, if you like.

544
00:26:44,060 --> 00:26:48,280
Also, skipping, as I sort of touched
on earlier, there could

545
00:26:48,280 --> 00:26:52,540
be, for example, that you have
an index that has 90% of the pages

546
00:26:52,540 --> 00:26:56,140
in the index in respect of the
leading column have 3 values total,

547
00:26:56,140 --> 00:26:59,080
but nevertheless, the remaining
10% are all perfectly unique.

548
00:26:59,320 --> 00:27:00,420
So now you don't have...

549
00:27:00,420 --> 00:27:01,800
You can't really average the two.

550
00:27:01,800 --> 00:27:03,060
It's just two different things.

551
00:27:03,060 --> 00:27:05,260
You can't think of it as the average
of the 2, really.

552
00:27:06,140 --> 00:27:10,220
So the fact that it's kind of continuous
just makes sense.

553
00:27:10,600 --> 00:27:12,280
You wouldn't want to have it be...

554
00:27:13,660 --> 00:27:17,300
If the planner were to make a static
choice, then how would you

555
00:27:17,300 --> 00:27:20,420
account for that variation where
you legitimately need to change

556
00:27:20,420 --> 00:27:24,980
your strategy for the same index
scan in real time like that's.

557
00:27:25,320 --> 00:27:28,700
Totally possible that that would
be the best thing to do so I

558
00:27:28,700 --> 00:27:32,280
like not having an opinion until
you really have an opinion the

559
00:27:32,280 --> 00:27:33,520
last minute you.

560
00:27:34,020 --> 00:27:35,700
I do get the best of both worlds.

561
00:27:35,740 --> 00:27:36,600
That's the thinking.

562
00:27:36,760 --> 00:27:37,260
Anyway.

563
00:27:38,360 --> 00:27:39,740
Michael: I think it's super smart.

564
00:27:39,800 --> 00:27:43,480
And those distributions are quite
common.

565
00:27:43,780 --> 00:27:45,400
We see them all the time.

566
00:27:45,720 --> 00:27:49,280
And it's the time where statistics
most obviously falls flat

567
00:27:49,280 --> 00:27:51,440
on its face because of averages.

568
00:27:52,960 --> 00:27:58,220
I know it does some things to most
common values.

569
00:27:58,840 --> 00:28:01,360
But anyway, I completely agree
that makes loads of sense.

570
00:28:01,360 --> 00:28:02,640
Peter: Sorry, you were going to
say?

571
00:28:02,640 --> 00:28:05,440
I was going to give another example
of what you're saying with

572
00:28:05,440 --> 00:28:05,940
statistics.

573
00:28:06,340 --> 00:28:08,320
I mentioned briefly earlier, there's
this...

574
00:28:08,320 --> 00:28:10,940
I think it's called the attribute-entity-independence
assumption.

575
00:28:11,140 --> 00:28:15,340
It's this generic sort of assumption
made by all cost-based optimizers

576
00:28:15,420 --> 00:28:19,740
that there's no correlation between
different columns, which

577
00:28:19,740 --> 00:28:23,800
is demonstrably false all the time,
but somehow mostly doesn't

578
00:28:24,660 --> 00:28:25,620
break that badly.

579
00:28:26,140 --> 00:28:29,820
But for something like this, you've
got to think, there could

580
00:28:29,820 --> 00:28:33,280
be a very strong correlation between
columns, or there could

581
00:28:33,280 --> 00:28:37,300
be a totally different, completely
independent relationship between

582
00:28:37,300 --> 00:28:38,000
the 2.

583
00:28:38,000 --> 00:28:40,140
And it's really hard to model that.

584
00:28:40,400 --> 00:28:45,480
So if you can just not do that
to much of an extent at all and

585
00:28:45,480 --> 00:28:49,200
still get useful query plans, that
seems ideal.

586
00:28:49,640 --> 00:28:52,820
You know, there's just different
types of indexes.

587
00:28:52,820 --> 00:28:58,940
So you have, for example, in the
paper that I talk about, it's

588
00:28:59,280 --> 00:29:01,500
data warehousing, So you explicitly
have different dimensions

589
00:29:01,520 --> 00:29:03,980
of a business that are obviously
quite independent.

590
00:29:04,600 --> 00:29:08,580
But then you could also have something
like, I don't know, supplier

591
00:29:08,600 --> 00:29:11,680
ID, first column, second column,
discount code.

592
00:29:11,940 --> 00:29:14,440
So it might be that different suppliers
use the same discount

593
00:29:14,440 --> 00:29:17,560
code, coincidentally, But even
then, those 2 discount codes are

594
00:29:17,560 --> 00:29:19,240
completely distinct entities.

595
00:29:19,740 --> 00:29:23,860
So it just doesn't make sense to
think of them as being the same

596
00:29:23,860 --> 00:29:24,440
at all.

597
00:29:24,440 --> 00:29:28,700
You wouldn't expect a query to
just supply a predicate that doesn't

598
00:29:28,700 --> 00:29:30,060
have a supplier ID.

599
00:29:30,060 --> 00:29:33,580
You would expect either to have
a display ID or have both.

600
00:29:33,900 --> 00:29:36,820
You wouldn't expect it to be omitted,
because it just wouldn't

601
00:29:36,820 --> 00:29:37,620
make sense.

602
00:29:37,960 --> 00:29:41,280
So it's impossible or very difficult,
in any case, for the optimizer

603
00:29:41,280 --> 00:29:43,140
to understand those kinds of distinctions,
even though they're

604
00:29:43,140 --> 00:29:45,560
obviously very, very common.

605
00:29:45,780 --> 00:29:47,360
So again, ideally, it wouldn't
have to.

606
00:29:47,360 --> 00:29:51,420
Ideally, it would just be able
to fall back on this sort of idea

607
00:29:51,420 --> 00:29:55,760
of, hey, we'll try to work it out
at runtime to the best of our

608
00:29:55,760 --> 00:29:56,260
ability.

609
00:29:56,920 --> 00:30:00,960
As long as we get the fastest query
plan, everyone's happy.

610
00:30:01,380 --> 00:30:04,680
It doesn't necessarily have to
be that exact and cost model.

611
00:30:04,680 --> 00:30:06,040
In fact, usually isn't.

612
00:30:06,940 --> 00:30:10,240
So that's kind of the, the overarching
philosophy of the thing.

613
00:30:10,240 --> 00:30:10,740
And.

614
00:30:11,380 --> 00:30:15,600
Michael: Yeah, as usual with your
work, it's so simple once it

615
00:30:15,600 --> 00:30:19,320
makes like, once you understand
it, but to like, I'm super impressed

616
00:30:19,320 --> 00:30:20,900
with the thought process to have
got there.

617
00:30:20,900 --> 00:30:25,560
It feels like yet another feature
where, as users, we don't have

618
00:30:25,560 --> 00:30:28,640
to think about how it's working,
why it's working that much.

619
00:30:28,780 --> 00:30:31,980
Maybe with strategies later in
terms of indexing, we can take

620
00:30:31,980 --> 00:30:32,900
advantage of that.

621
00:30:32,900 --> 00:30:35,500
But we could carry on indexing
exactly the way we are, and we'll

622
00:30:35,500 --> 00:30:37,620
only benefit as a result still.

623
00:30:38,260 --> 00:30:41,020
Peter: Unless I mess up, of course,
but hopefully I won't do that.

624
00:30:41,580 --> 00:30:42,080
Yeah.

625
00:30:42,620 --> 00:30:44,760
I'm not exactly playing to my strengths
when I'm talking about

626
00:30:44,760 --> 00:30:47,920
the optimizer, you understand,
but nevertheless, that is kind

627
00:30:47,920 --> 00:30:49,460
of part of the philosophy.

628
00:30:49,940 --> 00:30:51,700
I'm not really an optimizer person.

629
00:30:52,200 --> 00:30:54,440
I like being successful, that's
probably why.

630
00:30:55,320 --> 00:30:59,160
But here, I think that undeniably,
there is some thought given

631
00:30:59,160 --> 00:31:01,900
to those aspects, because it kind
of has to be.

632
00:31:02,120 --> 00:31:03,420
Michael: What do you mean successful?

633
00:31:04,060 --> 00:31:06,180
Peter: Well, it's just really hard
to do anything in the optimizer.

634
00:31:06,180 --> 00:31:07,860
That's why some people will do.

635
00:31:08,400 --> 00:31:11,140
And thank goodness someone is willing
to do it.

636
00:31:11,140 --> 00:31:15,860
But it's a very difficult area
to work in, I feel, which does

637
00:31:15,860 --> 00:31:18,880
seem to put people off more than
perhaps it should.

638
00:31:19,020 --> 00:31:21,220
Perhaps that will change at some
point in the future, but you

639
00:31:21,220 --> 00:31:22,860
have to ask somebody else about
that.

640
00:31:22,960 --> 00:31:24,060
Michael: Al-Khalili Yes.

641
00:31:24,520 --> 00:31:27,720
That somebody else, I believe,
is going on another podcast, which

642
00:31:27,720 --> 00:31:29,200
I'm excited to… Al-Khalili

643
00:31:29,200 --> 00:31:29,980
Peter: Oh, great.

644
00:31:30,660 --> 00:31:32,380
I think I know what you're talking
about.

645
00:31:32,980 --> 00:31:35,980
Michael: Yes, for everybody else,
this is Tom Lane going on the

646
00:31:35,980 --> 00:31:39,160
Talking Postgres podcast, which
is scheduled to be live, I think,

647
00:31:39,160 --> 00:31:39,860
next month.

648
00:31:39,860 --> 00:31:41,340
So I'll share a link to that.

649
00:31:41,580 --> 00:31:42,480
Exciting times.

650
00:31:42,600 --> 00:31:46,220
So we've talked about it briefly
already, But another cool thing

651
00:31:46,220 --> 00:31:51,220
you've done here is split this
work into some useful stuff on

652
00:31:51,220 --> 00:31:53,980
its own for version 17 coming out
soon.

653
00:31:54,340 --> 00:31:57,640
I'll be interested to hear how
and when did you realize you could

654
00:31:57,840 --> 00:32:02,560
split it up into the in-list optimizations
that I think are going

655
00:32:02,560 --> 00:32:06,160
to be very useful for a bunch of
often-generated queries that

656
00:32:06,160 --> 00:32:09,500
people don't have that much control
over, like long, like relatively

657
00:32:09,640 --> 00:32:13,440
long in-lists seem to pop up, at
least from what I've seen from

658
00:32:13,440 --> 00:32:14,840
like ORMs normally.

659
00:32:15,360 --> 00:32:16,820
How did you work that out?

660
00:32:16,920 --> 00:32:18,160
Peter: I'm not sure exactly when.

661
00:32:18,160 --> 00:32:21,060
I think it probably wasn't any
1 moment where I realised that

662
00:32:21,060 --> 00:32:25,480
those were… I remember talking
about another patch, actually

663
00:32:25,480 --> 00:32:29,440
1 for LucidNextScan, and just being
very confused about that

664
00:32:29,440 --> 00:32:32,800
whole area myself, being very confused
about what the difference

665
00:32:32,800 --> 00:32:35,460
is, where to draw the boundaries
architecturally.

666
00:32:36,100 --> 00:32:39,560
And I recall asking to nobody in
particular at the time, how

667
00:32:39,560 --> 00:32:42,580
does this relate to in-lists or
what we would call internally,

668
00:32:42,700 --> 00:32:47,960
ScalarArrayOpExprs equals
any array expressions?

669
00:32:48,580 --> 00:32:50,340
There must be some relationship
here.

670
00:32:50,800 --> 00:32:53,420
At that time, I was aware of the
paper, the source material,

671
00:32:53,860 --> 00:32:55,780
and that strongly suggested as
much.

672
00:32:55,840 --> 00:32:56,820
But I just didn't...

673
00:32:56,920 --> 00:32:59,980
I kind of thought, you can't really
have an opinion about 1 without

674
00:32:59,980 --> 00:33:01,960
having an opinion about the other.

675
00:33:02,080 --> 00:33:04,100
You want to be able to combine
these things arbitrarily.

676
00:33:04,780 --> 00:33:08,680
So that was when the initial awareness
sort of sunk in.

677
00:33:08,680 --> 00:33:11,180
Then I had some work on Vacuum
over a couple of releases, and

678
00:33:11,180 --> 00:33:12,840
then I came back to B-tree again.

679
00:33:13,040 --> 00:33:16,220
And I think at that point I realised,
OK, this is probably the

680
00:33:16,220 --> 00:33:17,920
most useful thing I could do.

681
00:33:18,040 --> 00:33:21,340
And then I got a little bit lucky
as well with the project for

682
00:33:21,340 --> 00:33:21,840
17.

683
00:33:22,540 --> 00:33:25,580
I worked out a couple of things
on the optimizer side that I

684
00:33:25,580 --> 00:33:27,280
wasn't necessarily counting on.

685
00:33:27,440 --> 00:33:28,180
That's more subtle.

686
00:33:28,180 --> 00:33:31,960
It's stuff that's related to whether
or not we know that the

687
00:33:31,960 --> 00:33:33,740
scan will return ordered results.

688
00:33:34,140 --> 00:33:37,180
So I realized the relationship
between that question...

689
00:33:37,360 --> 00:33:42,440
So in previous versions, you would
do a query like SELECT FROM

690
00:33:42,440 --> 00:33:49,460
TABLE where a equals 5 and b in
a long list, ordered by AB limit

691
00:33:49,740 --> 00:33:53,040
something, and it wouldn't be able
to terminate the scan early

692
00:33:53,040 --> 00:33:56,660
because, for reasons that are complicated
and not that interesting,

693
00:33:57,040 --> 00:34:00,520
the implementation could trust
the B-tree code to correctly return

694
00:34:00,520 --> 00:34:01,700
things in the ordered results.

695
00:34:02,400 --> 00:34:05,860
So I kind of had this idea that
that was important, but I didn't

696
00:34:05,860 --> 00:34:08,600
know how important it was to everything
else.

697
00:34:08,600 --> 00:34:12,480
So I kind of got a bit lucky in
figuring that out in about 2023.

698
00:34:13,780 --> 00:34:15,840
Before I even realized that I was
pretty sure that I would do

699
00:34:15,840 --> 00:34:19,440
the skip scan thing, but there
was some serendipity with, there

700
00:34:19,440 --> 00:34:22,500
was actually a user complaint,
I think it was on Slack.

701
00:34:23,320 --> 00:34:24,300
Michael: Oh, yeah, Benoit.

702
00:34:25,840 --> 00:34:27,340
I'll link that up as well.

703
00:34:27,620 --> 00:34:29,320
Peter: Yeah, Benoit had a complaint.

704
00:34:29,540 --> 00:34:32,660
Basically, because of the limitation
I just mentioned, the query

705
00:34:32,660 --> 00:34:38,000
plan, which was for such a query
with an emit, it couldn't do

706
00:34:38,000 --> 00:34:42,980
it without creating a way of filtering
the non-matching rows

707
00:34:42,980 --> 00:34:45,140
that required heap access.

708
00:34:45,660 --> 00:34:49,540
So for reasons that are not really
fundamental and worth explaining,

709
00:34:49,540 --> 00:34:52,200
it did way more heap accesses just
to exclude non-matching rows,

710
00:34:52,200 --> 00:34:54,620
when in principle it could have
done that at the index level

711
00:34:54,620 --> 00:34:56,700
before accessing the heap at all.

712
00:34:57,040 --> 00:35:00,460
So for very roundabout, hard to
explain, and actually not that

713
00:35:00,460 --> 00:35:03,160
relevant reasons, that made a huge
difference for his use case.

714
00:35:03,160 --> 00:35:05,820
I wasn't even thinking about it,
because I'm a B-tree fellow.

715
00:35:05,980 --> 00:35:08,240
I wasn't really thinking about
avoiding heap accesses.

716
00:35:08,240 --> 00:35:11,920
But then I found, actually, that's
by far the main reason to

717
00:35:11,920 --> 00:35:12,500
do that.

718
00:35:12,500 --> 00:35:16,020
I was thinking initially about
just the simple fact that, hey,

719
00:35:16,020 --> 00:35:20,580
do you want to do the index descent
1 time or a hundred times?

720
00:35:21,140 --> 00:35:23,140
Obviously, you want to do it 1
time if that's feasible.

721
00:35:23,160 --> 00:35:25,940
That's where I started, but then
I realized not much later, but

722
00:35:25,940 --> 00:35:29,880
somewhat later with some help from
Edoard, oh, actually, yeah,

723
00:35:29,880 --> 00:35:32,820
you can in some cases at least
completely...

724
00:35:33,520 --> 00:35:36,880
Only to all this right about nonsense
with restrictions in the

725
00:35:36,880 --> 00:35:38,040
planner, you can completely...

726
00:35:38,560 --> 00:35:41,540
In some cases, like the one he showed,
you can completely eliminate

727
00:35:41,540 --> 00:35:42,480
a huge number.

728
00:35:42,880 --> 00:35:48,080
And that was by far the best, the
most flattering case for the

729
00:35:48,080 --> 00:35:52,120
B-tree work, where I think it was
like 8, 20 times faster, something

730
00:35:52,120 --> 00:35:52,580
like that.

731
00:35:52,580 --> 00:35:55,360
It doesn't really matter what the
number is, it's way, way faster,

732
00:35:55,360 --> 00:35:58,760
because you're fundamentally doing
way less I/O.

733
00:35:58,840 --> 00:36:02,500
So I didn't necessarily count on
that happening at all.

734
00:36:02,500 --> 00:36:06,820
And then it kind of did without
my initially realizing it, which

735
00:36:06,820 --> 00:36:07,360
was nice.

736
00:36:07,360 --> 00:36:10,120
Serendipity did play a role, as
it often does.

737
00:36:10,600 --> 00:36:11,340
Michael: Very cool.

738
00:36:11,360 --> 00:36:12,760
Nikolay, do you have any questions?

739
00:36:13,380 --> 00:36:16,520
Nikolay: Of course, I have comments
and questions.

740
00:36:16,580 --> 00:36:18,140
All of them are silly, maybe.

741
00:36:18,280 --> 00:36:21,000
Once again, we should avoid this
confusion.

742
00:36:21,100 --> 00:36:23,420
Loose index scan is not skip scan.

743
00:36:23,680 --> 00:36:28,760
Because I saw posts mixing these
names, but I double-checked

744
00:36:29,480 --> 00:36:36,680
Oracle and SQLite, and they have
both skip scan, the same meaning,

745
00:36:36,680 --> 00:36:38,300
so we should stick to this.

746
00:36:39,060 --> 00:36:40,080
Peter: I guess so, yeah.

747
00:36:40,080 --> 00:36:43,220
I mean, I wouldn't mind calling
it something else entirely, but

748
00:36:43,260 --> 00:36:47,000
I certainly do think the distinction
between loose index scan

749
00:36:47,000 --> 00:36:47,940
and skip scan.

750
00:36:47,960 --> 00:36:49,640
I was confused on that point myself.

751
00:36:49,640 --> 00:36:50,740
I mentioned this already.

752
00:36:52,280 --> 00:36:54,740
The important distinction between
the 2 is...

753
00:36:55,020 --> 00:36:58,640
So loose index scan involves an
Index scan that knows specifically

754
00:36:59,640 --> 00:37:02,620
that it's feeding into a group
Aggregate.

755
00:37:03,160 --> 00:37:06,580
So it's only correct because you
don't actually logically need

756
00:37:06,580 --> 00:37:09,800
to return all of the rows from
the Index, or indeed from the

757
00:37:09,800 --> 00:37:11,260
Heap at all.

758
00:37:11,260 --> 00:37:13,940
It's only correct because of that
high-level context that you're

759
00:37:13,940 --> 00:37:16,160
applying in the slow-level code.

760
00:37:17,460 --> 00:37:19,660
Whereas, in contrast, SkipScan
isn't like that.

761
00:37:19,660 --> 00:37:22,540
SkipScan is just exploiting information
that's readily available

762
00:37:22,540 --> 00:37:23,340
to the scan.

763
00:37:23,360 --> 00:37:26,840
It's doing Index scan things in
a more clever way without really

764
00:37:26,840 --> 00:37:29,620
understanding how that information
is being consumed 1 level

765
00:37:29,620 --> 00:37:30,820
up or 2 levels up.

766
00:37:31,020 --> 00:37:32,860
So that's how I think of it.

767
00:37:32,900 --> 00:37:35,740
Nikolay: Yeah, you mentioned work
on LucentXScan also happening.

768
00:37:35,740 --> 00:37:38,580
What do you think about the future
of that work?

769
00:37:38,800 --> 00:37:40,780
Peter: I think it's a perfectly
fine idea.

770
00:37:40,840 --> 00:37:42,880
There's nothing wrong with it.

771
00:37:43,080 --> 00:37:46,400
I did briefly look at that some
years back.

772
00:37:46,560 --> 00:37:50,820
My recollection of exactly how
it was at the time is a little

773
00:37:50,820 --> 00:37:51,480
bit vague.

774
00:37:51,500 --> 00:37:54,960
I remember noticing that it had
lots of bloat which was

775
00:37:54,960 --> 00:37:57,580
probably a little bit of something
that discouraged my involvement,

776
00:37:57,720 --> 00:37:59,840
because, again, I like being successful.

777
00:38:00,400 --> 00:38:01,400
Right, yeah.

778
00:38:02,720 --> 00:38:04,980
But fundamentally, it's a perfectly
sound idea.

779
00:38:05,020 --> 00:38:10,100
It's obviously a good idea that
could be implemented with enough

780
00:38:10,380 --> 00:38:10,880
effort.

781
00:38:10,920 --> 00:38:16,060
And I think that the patch didn't
have any major problems, or

782
00:38:16,060 --> 00:38:19,400
at least didn't have any major
problems that could not be addressed

783
00:38:19,400 --> 00:38:20,640
in the fullness of time.

784
00:38:20,940 --> 00:38:25,160
I recall Tom, at the time, who
was sort of asked to take a look

785
00:38:25,160 --> 00:38:30,220
at it by me, saying it was maybe
doing things the wrong way on

786
00:38:30,220 --> 00:38:32,440
the Planner, but that's kind of
above my pay grade.

787
00:38:32,440 --> 00:38:35,860
But from an indexing point of view,
obviously it makes sense.

788
00:38:35,860 --> 00:38:42,100
It's like, just thinking about
it very sensibly, it's doing potentially

789
00:38:45,660 --> 00:38:48,340
an enormous amount of extra I/O
you don't need to do logically.

790
00:38:48,480 --> 00:38:49,320
It's just not...

791
00:38:49,840 --> 00:38:54,520
Yeah, I mean, it's inarguable that
that's a perfectly good idea.

792
00:38:54,920 --> 00:38:58,340
If I haven't started there or I
haven't paid too much attention

793
00:38:58,340 --> 00:39:00,300
to it, it's only because of competing
priorities.

794
00:39:01,560 --> 00:39:03,160
I think it's totally reasonable.

795
00:39:03,900 --> 00:39:07,420
Nikolay: Yeah, we write with recursive
all the time to implement

796
00:39:07,420 --> 00:39:10,120
this, basically at the application
level.

797
00:39:10,460 --> 00:39:14,860
I'm curious if there are any specific
benchmarks which could

798
00:39:14,860 --> 00:39:19,660
be helpful for these both patches
to check, like, kind of workloads?

799
00:39:20,860 --> 00:39:24,280
Peter: Well, not really, because
as often is the case with things

800
00:39:24,280 --> 00:39:27,500
that have some compiler component
to them, it's sort of like,

801
00:39:27,500 --> 00:39:31,400
It's perfectly obvious that it's
not a quantitative improvement,

802
00:39:31,400 --> 00:39:32,700
it's a qualitative improvement.

803
00:39:32,960 --> 00:39:36,300
So you name a number, and I'll
give you a test case that is faster

804
00:39:36,300 --> 00:39:36,820
by that amount.

805
00:39:36,820 --> 00:39:38,580
It doesn't make up a number.

806
00:39:38,860 --> 00:39:39,520
It's no problem.

807
00:39:39,520 --> 00:39:41,040
I'll get you a test case that will
do that.

808
00:39:41,040 --> 00:39:45,460
I just have to come up with something
that has sufficiently large

809
00:39:45,460 --> 00:39:47,580
amounts of things that you will
be skipping now.

810
00:39:49,760 --> 00:39:52,280
Coming up with a number, it's not
really very meaningful.

811
00:39:53,960 --> 00:39:56,920
You're doing it in the efficient
way, the obvious, sensible way,

812
00:39:56,920 --> 00:40:00,560
having not done that, so there's
really no practical limit on

813
00:40:01,080 --> 00:40:02,220
how much faster it could be.

814
00:40:02,220 --> 00:40:04,700
And it's hard to make generalizations
in the real world.

815
00:40:07,060 --> 00:40:09,380
That's the annoying answer to your
question.

816
00:40:09,960 --> 00:40:12,740
Nikolay: By the way, I wanted to
ask you about propagation of

817
00:40:12,740 --> 00:40:16,520
this idea from B-tree to other
types of indexes, GIN first of

818
00:40:16,520 --> 00:40:17,020
all.

819
00:40:17,200 --> 00:40:20,140
Is it a valid idea at all?

820
00:40:20,740 --> 00:40:23,500
Maybe with combination with the
btree_gin?

821
00:40:25,380 --> 00:40:28,600
Peter: It definitely could be,
but then the way the GIN in particular

822
00:40:29,020 --> 00:40:31,560
does multicolumn indexes is kind
of weird.

823
00:40:31,560 --> 00:40:36,980
The way that actually works is,
it essentially, rather than having

824
00:40:38,240 --> 00:40:42,620
a B-tree that has A and B stored
together, let's say it has a

825
00:40:42,620 --> 00:40:45,360
B-tree, if and only if you have
a multi-column index, It has

826
00:40:45,360 --> 00:40:50,320
a B-tree where attribute 1 is stored
with 1, whatever the value

827
00:40:50,320 --> 00:40:53,000
for A is, and attribute 2 is stored
with 2, whatever the value

828
00:40:53,000 --> 00:40:53,600
for A is.

829
00:40:53,600 --> 00:40:58,280
So the column number is, in the
B-tree itself, where the entries

830
00:40:58,280 --> 00:41:02,000
live, is the most significant key,
which is very different.

831
00:41:02,280 --> 00:41:05,840
Now, if you said to me, what about
GIST, then that's very different,

832
00:41:05,840 --> 00:41:08,640
because GIST is much more like
B-tree, particularly here.

833
00:41:08,680 --> 00:41:10,460
I think with GIST it's totally
possible.

834
00:41:11,040 --> 00:41:16,180
The way that GIST works, so the
source paper I mentioned, Efficient

835
00:41:16,180 --> 00:41:19,900
Search of Multidimensional Indexes,
it might be confused.

836
00:41:20,140 --> 00:41:22,800
You might think, well, multidimensional
like GIST?

837
00:41:23,320 --> 00:41:25,020
And the answer is, yes, sort of.

838
00:41:25,020 --> 00:41:29,820
I'm doing work that's purely enhancing
B-trees, but some of the

839
00:41:29,820 --> 00:41:34,680
ideas with dimensionality are at
least at a high level related

840
00:41:34,680 --> 00:41:38,640
to things like multidimensional
indexing using GIST.

841
00:41:39,000 --> 00:41:41,280
So I think it's entirely possible
to do that.

842
00:41:41,280 --> 00:41:45,040
GIST is loosely ordered, so it's
kind of inherently doing something

843
00:41:45,040 --> 00:41:49,900
a bit like that anyway, where it's
not so much finding the exact

844
00:41:50,280 --> 00:41:53,600
match and the exact place where
it must be, but rather looking

845
00:41:53,600 --> 00:41:57,900
in all the places it could be exhaustively,
which is usually

846
00:41:58,180 --> 00:41:59,880
only a fairly small number of pages.

847
00:42:00,040 --> 00:42:04,640
It doesn't have the same propensity
of B-tree to go to exactly

848
00:42:04,640 --> 00:42:05,260
the right...

849
00:42:05,740 --> 00:42:09,380
The only right place for that value
that you're looking for directly.

850
00:42:09,380 --> 00:42:13,480
It's more because of the loose
ordering of the index, it's not

851
00:42:13,480 --> 00:42:14,840
quite doing it that way.

852
00:42:15,040 --> 00:42:17,580
There's a bounding box in the internal
pages which describes

853
00:42:17,880 --> 00:42:19,260
what the leaf pages contain.

854
00:42:19,600 --> 00:42:22,360
So you might have to do 2 or 3,
even for a point lookup, you

855
00:42:22,360 --> 00:42:25,840
might have to do 2 or 3 leaf page
accesses, as opposed to just

856
00:42:25,840 --> 00:42:28,300
1, which would be very often the
case with B-tree.

857
00:42:28,660 --> 00:42:30,860
So it's almost kind of doing that
anyway.

858
00:42:31,500 --> 00:42:34,680
So it seems entirely plausible
that with GiST you could do something

859
00:42:34,680 --> 00:42:35,180
similar.

860
00:42:35,820 --> 00:42:38,920
I don't think that's probably possible
in the case of GIN, though,

861
00:42:38,920 --> 00:42:42,660
because what does that even mean
in light of the scheme with

862
00:42:42,660 --> 00:42:46,160
GIN storing two different attributes
in completely different places?

863
00:42:46,160 --> 00:42:48,700
And the tree doesn't really seem
compatible with that.

864
00:42:48,720 --> 00:42:49,410
I don't know, though.

865
00:42:49,410 --> 00:42:50,640
I mean, anything's possible

866
00:42:50,740 --> 00:42:53,940
Nikolay: what if it's a bit region
when first called on this.

867
00:42:54,900 --> 00:43:00,160
Scalar like organization ID
or category or something and

868
00:43:00,160 --> 00:43:02,700
second column is like JSONB or
something?

869
00:43:05,580 --> 00:43:05,860
Peter: Maybe.

870
00:43:05,860 --> 00:43:09,140
I mean, I think it's probably more
proveable to talk about it

871
00:43:09,140 --> 00:43:10,660
in the context of a use case.

872
00:43:10,960 --> 00:43:17,700
So there, I guess, intrinsically,
you have a kind of nested structure

873
00:43:18,260 --> 00:43:18,980
to JSON.

874
00:43:19,700 --> 00:43:25,580
And so having that capability could
be useful if you had some

875
00:43:25,580 --> 00:43:28,060
kind of partitioning scheme or
something like that, I suppose.

876
00:43:28,380 --> 00:43:30,900
Nikolay: I mean, JSON nested structure
doesn't matter.

877
00:43:30,900 --> 00:43:31,980
It's a second column.

878
00:43:32,100 --> 00:43:32,600
I

879
00:43:32,720 --> 00:43:33,580
Peter: know, I know.

880
00:43:33,720 --> 00:43:34,080
Nikolay: Right?

881
00:43:34,080 --> 00:43:39,240
We put a scalar value, like some
number or timestamp on the first

882
00:43:39,560 --> 00:43:40,060
place.

883
00:43:41,400 --> 00:43:43,940
Peter: Yeah, I mean, I think that
could be possible.

884
00:43:43,940 --> 00:43:45,740
I haven't thought about it too
much, though.

885
00:43:46,780 --> 00:43:48,680
You have to talk about it in two
levels as well, though.

886
00:43:48,680 --> 00:43:52,420
It's like, what would be practically
achievable within a year?

887
00:43:52,420 --> 00:43:53,440
Nikolay: Let me tell you.

888
00:43:55,520 --> 00:43:57,540
Why I care so much about this.

889
00:43:58,180 --> 00:44:03,040
Because I started to reflect on
one problem recently related to

890
00:44:03,040 --> 00:44:06,420
full-text search in Postgres and
why people move to Elastic.

891
00:44:07,540 --> 00:44:08,300
Why I did?

892
00:44:08,560 --> 00:44:12,020
Because I saw a similar situation
happening with pgvector.

893
00:44:13,780 --> 00:44:17,780
The most popular issue in pgvector
repository right now on GitHub

894
00:44:18,520 --> 00:44:20,520
is additional filtering.

895
00:44:21,420 --> 00:44:24,960
So people need like global index
with vectors, but they also

896
00:44:24,960 --> 00:44:29,940
need to be able to sometimes search
locally for a particular project,

897
00:44:30,060 --> 00:44:31,900
organization, I don't know, anything.

898
00:44:32,220 --> 00:44:35,640
And this usually is done, what
Elastic and others, they call

899
00:44:35,640 --> 00:44:36,820
it metadata search.

900
00:44:36,820 --> 00:44:41,000
Basically, it's additional filters,
usually these colors, like

901
00:44:41,000 --> 00:44:42,840
numbers or timestamps, something.

902
00:44:44,340 --> 00:44:47,040
And for vector search, it's not
solved yet.

903
00:44:47,040 --> 00:44:51,220
That's why people still can say,
like, they say PostgreSQL vector

904
00:44:51,220 --> 00:44:54,200
search with pgvector is not well
for us.

905
00:44:54,780 --> 00:44:59,940
Because the usual solution is,
let's just create partial indexes.

906
00:45:00,300 --> 00:45:04,940
But if you have hundreds or thousands
of categories, it doesn't

907
00:45:04,940 --> 00:45:05,660
work well.

908
00:45:06,160 --> 00:45:11,120
Then I realized that for full-text
search, we had a similar problem

909
00:45:11,120 --> 00:45:11,880
for years.

910
00:45:12,500 --> 00:45:15,740
Usually it's solved by extension
bit region.

911
00:45:16,720 --> 00:45:21,300
Then you are able to create a multi-column
index, but this is

912
00:45:21,300 --> 00:45:22,680
a barrier for many people.

913
00:45:22,680 --> 00:45:26,380
Although this is a contrib module,
it's present everywhere, it's

914
00:45:26,380 --> 00:45:27,540
some kind of a barrier.

915
00:45:27,700 --> 00:45:30,860
And I saw cases when people even
created this extension, but

916
00:45:30,860 --> 00:45:33,760
somehow still not using it well.

917
00:45:34,120 --> 00:45:39,940
They still have this problem, B-tree
vs. GIN dilemma for planner,

918
00:45:39,940 --> 00:45:40,440
right?

919
00:45:40,960 --> 00:45:48,280
So I'm thinking, for example, if
we use BRIN more often, we

920
00:45:48,280 --> 00:45:51,900
usually will put scalar value on
first position, because it's

921
00:45:51,900 --> 00:45:54,100
more selective, by default, maybe.

922
00:45:54,100 --> 00:45:55,020
We don't know.

923
00:45:56,060 --> 00:46:01,760
And if it has a low number of distinct
values, this skip-scan

924
00:46:02,660 --> 00:46:05,360
approach would make total sense
there as well.

925
00:46:06,180 --> 00:46:06,680
Right?

926
00:46:07,480 --> 00:46:07,980
Peter: Maybe.

927
00:46:08,420 --> 00:46:12,520
I don't have a lot of domain expertise
here, so forgive me, I

928
00:46:12,520 --> 00:46:15,100
need to ask questions to clarify
what you meant.

929
00:46:15,100 --> 00:46:18,220
Are you talking about pushing down
a limit for each grouping

930
00:46:18,220 --> 00:46:19,340
or something like that?

931
00:46:20,020 --> 00:46:21,240
Nikolay: Limit is like...

932
00:46:21,400 --> 00:46:24,020
I'm not sure about limit and ordering,
I'm just thinking...

933
00:46:24,120 --> 00:46:26,880
Peter: But not a limit for the
entire query, but rather for each

934
00:46:26,880 --> 00:46:28,940
individual grouping or whatever.

935
00:46:29,600 --> 00:46:30,740
Where you want to...

936
00:46:31,420 --> 00:46:34,520
Is that what you mean to the question?

937
00:46:34,940 --> 00:46:37,740
Nikolay: Imagine we have full-text
search, TSVector, GIN index,

938
00:46:37,740 --> 00:46:41,660
and then we have additional things,
like, for example, creation

939
00:46:41,660 --> 00:46:41,980
time.

940
00:46:41,980 --> 00:46:47,460
And when we want to search within
some window only, By default,

941
00:46:47,460 --> 00:46:50,040
full-text search in Postgres doesn't
support it itself.

942
00:46:51,260 --> 00:46:55,180
It will be a different B-tree index
on timestamp, like created at

943
00:46:55,180 --> 00:46:59,840
timestamp, and we have a global
whole table GIN.

944
00:47:00,300 --> 00:47:02,420
And then the planner decides what
is cheaper.

945
00:47:02,780 --> 00:47:07,480
Bit by bit and then filter on the fly,
or use full text search and

946
00:47:07,480 --> 00:47:10,880
then filter or order by on timestamp.

947
00:47:12,320 --> 00:47:15,560
And the pg_trgm extension gives
you the opportunity to combine,

948
00:47:16,480 --> 00:47:18,020
have 2 column index.

949
00:47:19,020 --> 00:47:22,760
And then the goal is to find everything,
no limits, right?

950
00:47:23,920 --> 00:47:31,040
But if we put scalar value on first
position, queries which don't

951
00:47:31,040 --> 00:47:36,740
have this filter will be not good
with this index.

952
00:47:37,740 --> 00:47:43,480
So we put a timestamp as our column
A and the TS vector as our

953
00:47:43,480 --> 00:47:47,820
column B, use B-tree for that
to have 2-column index.

954
00:47:48,580 --> 00:47:52,020
And then I would like to have skip
scan for this, because if

955
00:47:52,020 --> 00:47:56,040
my query doesn't have additional
filtering and we have a low

956
00:47:56,040 --> 00:48:00,340
number of distinct values for column
A, maybe timestamp was not

957
00:48:00,340 --> 00:48:03,620
good because it will be a lot of
distinct values.

958
00:48:04,620 --> 00:48:08,140
Let's say it's some category ID
or, I don't know, like group

959
00:48:08,140 --> 00:48:09,160
ID or something.

960
00:48:09,780 --> 00:48:13,520
In this case, probably skip scan
will help there as well for

961
00:48:13,520 --> 00:48:14,240
such queries.

962
00:48:14,240 --> 00:48:17,920
Or I'm forced to put this column
to second position and keep

963
00:48:17,920 --> 00:48:20,640
full-text vector on the first position?

964
00:48:21,380 --> 00:48:23,660
Okay, this is something to think
about additionally.

965
00:48:23,800 --> 00:48:25,020
I'm just very curious.

966
00:48:25,600 --> 00:48:27,320
Peter: I don't know is the simple
answer.

967
00:48:27,600 --> 00:48:29,740
In principle, you could...

968
00:48:31,060 --> 00:48:33,740
I mean, after all, what I've described
is, in the end, there's

969
00:48:33,740 --> 00:48:36,180
a bunch of details, but it is basically
quite simple.

970
00:48:36,380 --> 00:48:40,480
You have to be willing to do, at
least in the simple case where

971
00:48:40,480 --> 00:48:44,800
you omitted, fully omitted a prefix
column, you have to be willing

972
00:48:44,800 --> 00:48:48,960
to look at every single individual
partition, every distinct

973
00:48:48,960 --> 00:48:53,400
value, which adds up quite quickly
if it's something like a timestamp.

974
00:48:54,060 --> 00:48:57,380
Whereas if you had some kind of
bucket by day or something like

975
00:48:57,380 --> 00:48:59,660
that, then it would be more feasible,
because then you would

976
00:48:59,660 --> 00:49:04,600
have relatively few buckets, and
especially if you had not completely

977
00:49:04,600 --> 00:49:09,960
omitted the date column, if you
had a range in between, then

978
00:49:09,960 --> 00:49:12,800
it would be, I think, feasible
from a data structure point of

979
00:49:12,800 --> 00:49:16,000
view, not necessarily in GIN, but
in something, to do something

980
00:49:16,000 --> 00:49:16,700
like that.

981
00:49:17,220 --> 00:49:20,700
I mean, I don't know what the basis
of comparison here, what

982
00:49:20,960 --> 00:49:22,420
is Elastic doing?

983
00:49:22,420 --> 00:49:23,540
I just don't know.

984
00:49:23,560 --> 00:49:24,660
I can't speak to that.

985
00:49:24,660 --> 00:49:25,640
I don't know
Nikolay: as well, internals.

986
00:49:26,020 --> 00:49:29,940
They just claim if you have additional
filtering, it's not slower

987
00:49:30,060 --> 00:49:31,720
at all, and usually it's faster.

988
00:49:31,720 --> 00:49:33,420
This is what their documentation
says.

989
00:49:33,420 --> 00:49:36,660
I'm very curious about internals
as well, because I just feel

990
00:49:36,660 --> 00:49:40,640
this, even from a user perspective,
even the need to create btree_gin

991
00:49:41,060 --> 00:49:44,340
extension, it's already a
barrier.

992
00:49:45,280 --> 00:49:49,020
Some people don't do it, and they
think, okay, full-text search

993
00:49:49,020 --> 00:49:52,320
is weak, And they are going to
have the same impression about

994
00:49:52,360 --> 00:49:53,260
vector search.

995
00:49:53,700 --> 00:49:58,900
And this whole topic about combination
of these powerful indexes

996
00:49:58,900 --> 00:50:01,620
with B-tree indexes, It's interesting.

997
00:50:01,660 --> 00:50:04,100
Direction, I think, can be improved
as well.

998
00:50:04,200 --> 00:50:07,540
And with SkipScan, maybe it can
be improved further.

999
00:50:07,540 --> 00:50:10,240
This is my basic idea.

1000
00:50:10,240 --> 00:50:12,420
It's quite maybe not well thought.

1001
00:50:13,300 --> 00:50:15,100
I promised to have silly questions.

1002
00:50:15,100 --> 00:50:15,600
So

1003
00:50:16,080 --> 00:50:16,720
Peter: It's okay.

1004
00:50:16,720 --> 00:50:18,780
There are no silly questions.

1005
00:50:18,900 --> 00:50:22,940
Well, I've got an even sillier
idea because I don't have I think

1006
00:50:22,940 --> 00:50:27,160
a very very Fisher Price level
understanding of how pgvector

1007
00:50:27,160 --> 00:50:30,380
works, which is not going to do
much good here So, you know,

1008
00:50:30,380 --> 00:50:34,160
I'm not going to say something
about I think I know so little

1009
00:50:34,160 --> 00:50:36,060
about it's something that.

1010
00:50:36,980 --> 00:50:41,140
Maybe I'll talk to Jonathan Katz
better next time I see him he's

1011
00:50:41,140 --> 00:50:43,120
my go to expert on these things.

1012
00:50:43,840 --> 00:50:44,340
Yeah

1013
00:50:45,780 --> 00:50:48,580
Nikolay: I'm sure he knows already
this problem very well, because

1014
00:50:48,580 --> 00:50:52,860
as I said, this is the most active
discussion among issues on

1015
00:50:52,860 --> 00:50:56,060
GitHub repository, and basically all people
need this.

1016
00:50:56,180 --> 00:50:59,440
They need a combination of B-tree
and vector search somehow.

1017
00:50:59,960 --> 00:51:05,440
Maybe not only just one B-tree, not
only this color, but a bunch

1018
00:51:05,440 --> 00:51:07,460
of them, because they need to filter
additionally.

1019
00:51:08,460 --> 00:51:09,500
This is a challenge.

1020
00:51:09,520 --> 00:51:15,560
I also know one project which they
built some kind of quite universal

1021
00:51:15,920 --> 00:51:17,320
solution for the others.

1022
00:51:18,780 --> 00:51:23,980
When they first came to us, they
said, we are forced to index

1023
00:51:24,020 --> 00:51:27,380
every column for flexibility.

1024
00:51:27,620 --> 00:51:30,600
We know this is anti-pattern, but
we are forced.

1025
00:51:30,920 --> 00:51:35,820
I'm sure they will be happy when
Postgres 18 is out with this

1026
00:51:35,820 --> 00:51:36,320
optimization.

1027
00:51:37,040 --> 00:51:41,880
This is super relief for them,
because they are struggling with

1028
00:51:41,880 --> 00:51:42,600
this problem.

1029
00:51:45,060 --> 00:51:49,120
It's good to have only, as Michael
said, only one index on multiple

1030
00:51:49,120 --> 00:51:52,060
columns and don't think too much
about order.

1031
00:51:52,060 --> 00:51:52,560
Peter: Yeah.

1032
00:51:53,360 --> 00:51:57,260
Especially if, as I say, the important
distinction is between

1033
00:51:57,260 --> 00:52:03,300
getting a very slow, totally unacceptable
performance, typically

1034
00:52:03,540 --> 00:52:06,900
if sequentials can have a very
large table, versus adequate,

1035
00:52:07,240 --> 00:52:08,660
sub-second but not sub-millisecond.

1036
00:52:09,520 --> 00:52:14,800
Like, a lot of people are in that
zone, and if they have a lot

1037
00:52:14,800 --> 00:52:18,900
of ad hoc queries, they can't really
predict the indexing needs

1038
00:52:18,900 --> 00:52:19,700
ahead of time.

1039
00:52:19,700 --> 00:52:22,160
I think it makes the biggest difference
there.

1040
00:52:22,580 --> 00:52:25,760
Much less so in OLTP apps, but
still to some degree, it's going

1041
00:52:25,760 --> 00:52:30,240
to get you acceptable performance
where you didn't know that

1042
00:52:30,240 --> 00:52:32,220
you really ought to have had a
certain kind of index, and then

1043
00:52:32,220 --> 00:52:35,500
you get adequate-ish performance
there too, which could make

1044
00:52:35,500 --> 00:52:36,960
all the difference in a pinch.

1045
00:52:37,360 --> 00:52:40,160
Nikolay: Yeah, well, actually,
interesting additional question

1046
00:52:40,160 --> 00:52:40,940
I have.

1047
00:52:41,840 --> 00:52:45,400
With loose index scan, we have
this wiki page describing this

1048
00:52:45,400 --> 00:52:45,860
technique.

1049
00:52:45,860 --> 00:52:50,340
We apply this technique using recursive
CTE, and there is an

1050
00:52:50,940 --> 00:52:54,240
Expectation in the future we will
stop doing this, it will be

1051
00:52:54,240 --> 00:52:54,940
fully automated.

1052
00:52:55,260 --> 00:52:58,980
It doesn't make any sense to think
about implementing while we

1053
00:52:58,980 --> 00:53:02,980
are still not there and not on
Postgres 18, it's not even committed.

1054
00:53:03,680 --> 00:53:06,540
It doesn't make sense to try to
implement a similar approach

1055
00:53:07,120 --> 00:53:11,960
at a higher level ourselves, somehow,
splitting query to pieces

1056
00:53:12,040 --> 00:53:13,940
like different index scans.

1057
00:53:14,480 --> 00:53:15,200
Peter: It might.

1058
00:53:16,160 --> 00:53:18,760
I would just say, in the case of...

1059
00:53:19,140 --> 00:53:21,180
If you're able to upgrade to Postgres
17,

1060
00:53:23,420 --> 00:53:23,560
Nikolay: and

1061
00:53:23,560 --> 00:53:26,420
Peter: having done that, you're
able to also...

1062
00:53:28,120 --> 00:53:31,360
Suppose you have 10 distinct values
in your leading column.

1063
00:53:31,360 --> 00:53:34,860
If you can write an inlist that
has those 10 distinct columns

1064
00:53:35,220 --> 00:53:38,680
and are satisfied that that will
work for you, that will give

1065
00:53:38,680 --> 00:53:42,180
you the correct answer, not only
when you write the query, but

1066
00:53:42,180 --> 00:53:44,960
going forward, if it's sort of
static, then you can totally do

1067
00:53:44,960 --> 00:53:45,300
that.

1068
00:53:45,300 --> 00:53:50,040
And in fact, I would expect the
actual access patterns to be

1069
00:53:50,560 --> 00:53:55,320
identical to what you would have
gotten had you had the skip

1070
00:53:55,320 --> 00:53:56,180
scan feature.

1071
00:53:56,420 --> 00:53:58,040
It is basically doing the same
thing.

1072
00:53:58,040 --> 00:54:02,380
It's just rather than having an
array with these constants that

1073
00:54:02,380 --> 00:54:05,940
come from a query, you will instead
have a magical sort of an

1074
00:54:05,940 --> 00:54:09,060
array that the B-tree scan code
uses almost the same way.

1075
00:54:09,100 --> 00:54:12,840
It's just that it generates its
array elements procedurally and

1076
00:54:12,840 --> 00:54:16,960
on demand, but it is essentially
the same thing at quite a low

1077
00:54:16,960 --> 00:54:17,300
level.

1078
00:54:17,300 --> 00:54:20,880
So it would be not just similar,
but identical, if you could

1079
00:54:21,820 --> 00:54:23,300
make it work within those constraints.

1080
00:54:23,560 --> 00:54:27,880
And in particular, there'd be no
possible downside in terms of

1081
00:54:27,880 --> 00:54:30,220
whether or not it made sense to
do a full index scan.

1082
00:54:30,900 --> 00:54:35,200
Depending on the distribution of
values, it's going to be adaptive

1083
00:54:35,200 --> 00:54:36,520
in that same way I described.

1084
00:54:36,960 --> 00:54:40,180
With a big caveat, of course, you
kind of need to know that,

1085
00:54:41,380 --> 00:54:46,460
at least for those leading values,
that it's static and predictable,

1086
00:54:46,460 --> 00:54:48,260
and it won't change and break your
query.

1087
00:54:48,580 --> 00:54:51,740
Nikolay: So I guess our approaches
will change with this.

1088
00:54:51,740 --> 00:54:55,080
Usually we, like, analyze workload
and decide on indexes, we

1089
00:54:55,080 --> 00:54:57,840
decide what to put on the first
position, second position.

1090
00:54:58,100 --> 00:55:04,640
Here it seems like Sometimes we
will decide to put columns with

1091
00:55:04,640 --> 00:55:07,860
lower cardinality of distinct values
or something.

1092
00:55:08,760 --> 00:55:11,040
Peter: That'll start to make way
more sense than it did.

1093
00:55:11,040 --> 00:55:12,440
I don't want to overstate it.

1094
00:55:12,440 --> 00:55:16,620
It's not like a revolution by any
means, but there are specific

1095
00:55:16,980 --> 00:55:21,060
rules of thumb that will be we
can buy it for sure Would that

1096
00:55:21,060 --> 00:55:26,100
you already said yourself Nikolai
was Oh put the most Just put

1097
00:55:26,100 --> 00:55:29,120
the highest cardinality column first
That's already kind

1098
00:55:29,120 --> 00:55:32,640
of suspect actually, but it's actually
is in this world with

1099
00:55:32,640 --> 00:55:33,300
this capability.

1100
00:55:34,540 --> 00:55:35,040
Nikolay: Interesting.

1101
00:55:35,860 --> 00:55:37,320
Peter: I think it's mostly confined.

1102
00:55:37,860 --> 00:55:42,620
That'll be most true by far with
more your data warehousing use

1103
00:55:42,620 --> 00:55:46,560
cases, where there is a built-in
requirement for a lot of ad

1104
00:55:46,560 --> 00:55:49,640
hoc querying that you can't really
exactly predict ahead of time,

1105
00:55:49,640 --> 00:55:51,820
looking at the details of something
rather than just a summary,

1106
00:55:51,820 --> 00:55:56,540
really matters, then yes, absolutely,
that will be true.

1107
00:55:56,740 --> 00:55:58,680
Well, with all the CPUs, less
so.

1108
00:55:59,480 --> 00:55:59,980
Nikolay: Cool.

1109
00:56:00,060 --> 00:56:02,880
Yeah, this is an interesting direction
to explore, even if the patch

1110
00:56:02,880 --> 00:56:03,980
is not yet committed.

1111
00:56:04,700 --> 00:56:05,720
Yeah, thank you.

1112
00:56:05,980 --> 00:56:09,440
Go and have something in the to-do
list.

1113
00:56:10,440 --> 00:56:10,940
Michael: Yeah.

1114
00:56:11,640 --> 00:56:13,980
Is there anything else you wanted
to make sure you mentioned,

1115
00:56:13,980 --> 00:56:16,960
Peter, or anything anyone can help
you with?

1116
00:56:17,380 --> 00:56:22,160
Peter: Well, I always welcome input
into the work I'm doing.

1117
00:56:22,540 --> 00:56:26,040
It would be kind of nice if I could
get some feedback on the

1118
00:56:26,040 --> 00:56:29,900
stuff we were just discussing about,
hey, how as a practical

1119
00:56:29,900 --> 00:56:33,420
matter does this alter the general
advice that we give to you

1120
00:56:33,580 --> 00:56:35,140
about how to index things?

1121
00:56:36,040 --> 00:56:39,800
I'm not entirely sure how, in practice,
that will tend to work

1122
00:56:39,800 --> 00:56:40,460
out anyway.

1123
00:56:41,160 --> 00:56:43,500
It is nice that this is a very
general thing.

1124
00:56:43,780 --> 00:56:47,880
I think that will tend to make
it possible to use it more often,

1125
00:56:47,880 --> 00:56:52,440
because in a certain sense, the
optimizer only has to have the

1126
00:56:52,440 --> 00:56:55,440
right rough idea, because if it
has that, then it will pick an

1127
00:56:55,440 --> 00:56:58,040
index scan, and then the index
scan itself will figure out what

1128
00:56:58,040 --> 00:56:58,680
to do.

1129
00:56:58,740 --> 00:57:01,720
So I think that's probably going
to make it more compelling than

1130
00:57:01,720 --> 00:57:05,280
it appears to be in other systems,
where, as far as I can tell,

1131
00:57:05,280 --> 00:57:09,880
I might be mistaken, they require
the optimizer to generate a

1132
00:57:09,880 --> 00:57:12,380
distinct index node type.

1133
00:57:12,560 --> 00:57:16,440
So it's kind of either-or, which
seems to me anyway, could be

1134
00:57:16,500 --> 00:57:19,260
something that limited the applicability
of the optimizations.

1135
00:57:19,840 --> 00:57:23,460
So I feel like it would be nice
if I had a better idea of how,

1136
00:57:23,600 --> 00:57:27,040
practically speaking, this will
change things and where.

1137
00:57:27,100 --> 00:57:29,560
I don't think I've got a complete
understanding of that myself.

1138
00:57:29,800 --> 00:57:32,620
Like, what kind of applications,
what kind of usage patterns.

1139
00:57:33,240 --> 00:57:34,960
That would be nice to have more
detail on.

1140
00:57:34,960 --> 00:57:37,960
So if anyone wants to help me,
that would be an obvious place

1141
00:57:37,960 --> 00:57:39,120
to start, I would say.

1142
00:57:39,840 --> 00:57:41,440
Michael: Nice, thanks so much,
Peter.

1143
00:57:41,520 --> 00:57:43,200
It's been an honor having you on.

1144
00:57:43,200 --> 00:57:43,940
Great chat.

1145
00:57:43,940 --> 00:57:44,500
Peter: Thank you both.

1146
00:57:44,500 --> 00:57:45,260
Thanks, Michael.

1147
00:57:46,120 --> 00:57:47,020
I really enjoyed it.

1148
00:57:47,020 --> 00:57:47,720
Nikolay: Thank you.