1
00:00:00,060 --> 00:00:01,900
Nikolay: Hello, hello, this is
Postgres.FM.

2
00:00:01,920 --> 00:00:05,280
As usual, I don't remember the
episode number.

3
00:00:05,280 --> 00:00:05,780
Michael: 137?

4
00:00:06,420 --> 00:00:06,920
Nikolay: 137.

5
00:00:07,420 --> 00:00:11,380
My name is Nikolay, Postgres.AI,
and as usual, my co-host is Michael,

6
00:00:11,380 --> 00:00:12,520
pgMustard.

7
00:00:12,520 --> 00:00:14,700
Hi, Michael, how are you doing
today?

8
00:00:14,800 --> 00:00:15,680
Michael: Hello, Nikolay.

9
00:00:15,900 --> 00:00:17,040
I am good, thank you.

10
00:00:17,040 --> 00:00:17,860
How are you doing?

11
00:00:18,460 --> 00:00:20,140
Nikolay: Fantastic, honestly.

12
00:00:21,260 --> 00:00:21,600
Good.

13
00:00:21,600 --> 00:00:23,860
A lot of things are happening,
but all good.

14
00:00:23,860 --> 00:00:28,060
I mean, like, things are changing
and a lot of things are changing

15
00:00:28,260 --> 00:00:30,600
in a constructive way, which I
like.

16
00:00:31,180 --> 00:00:35,140
So, yeah, let's discuss GIN.

17
00:00:35,660 --> 00:00:41,280
It's one of things a lot of folks
use and a lot of like it's one

18
00:00:41,280 --> 00:00:46,980
of index types many people use
still And I think it's not going

19
00:00:46,980 --> 00:00:55,140
to disappear, despite the rise
of vector-related types of indexes

20
00:00:55,240 --> 00:00:59,280
and approximate k-nearest-neighbors
indexes.

21
00:01:00,060 --> 00:01:04,860
GIN is a very interesting type
of index developed by the same

22
00:01:04,860 --> 00:01:08,000
folks who brought WAL support
to GiST.

23
00:01:08,800 --> 00:01:14,160
GiST was the only thing for complex
data types before GIN, but

24
00:01:14,380 --> 00:01:20,780
then With the rise of JSON and
JSONB, Gene became very popular,

25
00:01:21,140 --> 00:01:22,520
finding a lot of applications.

26
00:01:23,420 --> 00:01:26,640
Not only full-text search, as it
was, I think, originally, it

27
00:01:26,640 --> 00:01:28,300
was created for full-text search.

28
00:01:28,660 --> 00:01:29,160
Michael: Yeah.

29
00:01:30,360 --> 00:01:31,740
Yeah, I looked at the history.

30
00:01:32,020 --> 00:01:36,780
I was shocked to see that it was
added back in 8.2, so 2006.

31
00:01:38,140 --> 00:01:39,640
Nearly 20 years ago.

32
00:01:39,760 --> 00:01:42,700
Nikolay: Yeah, this is exactly
when I started to be involved

33
00:01:42,700 --> 00:01:43,380
with Postgres.

34
00:01:44,240 --> 00:01:50,420
And 8.2 when I worked on XML as
well and I thought how to provide

35
00:01:50,600 --> 00:01:54,240
good indexing capabilities for
XML But it was too big task for

36
00:01:54,240 --> 00:01:58,820
me being like too young guy, but
we discussed with People who

37
00:01:58,820 --> 00:02:03,000
worked on GIN these ideas, of
course, and of course I missed

38
00:02:03,000 --> 00:02:08,560
those days and I feel Sad very
sad about what's happening in

39
00:02:08,560 --> 00:02:12,840
the world and how disconnected
we became all of us Yeah

40
00:02:13,540 --> 00:02:16,080
Michael: because these guys are
all Russian folks, but also I

41
00:02:16,080 --> 00:02:21,140
was looking at the domains and
one of the TL there's a there's

42
00:02:21,160 --> 00:02:25,680
a an article mentioned in the Postgres
docs on GIN indexes

43
00:02:26,040 --> 00:02:29,240
that references the initial release
and mentions who's who sponsored

44
00:02:29,240 --> 00:02:32,620
the work which I think was a German
company but one of the domain

45
00:02:32,620 --> 00:02:35,860
the domain it's hosted on is .su
which I didn't even know I didn't

46
00:02:35,860 --> 00:02:39,180
even know what that stood for it
turns out it's USSR or Soviet

47
00:02:39,180 --> 00:02:40,820
Union which makes sense.

48
00:02:41,120 --> 00:02:44,660
Nikolay: Yeah I wanted to say that
originally the primary purpose

49
00:02:44,660 --> 00:02:47,740
was to support better performance
for full-text search.

50
00:02:47,900 --> 00:02:50,940
And originally, full-text search
was powered by GiST.

51
00:02:51,280 --> 00:02:57,760
And this version called RD3, Russian
Doll Tree, where its B-tree-like

52
00:02:57,940 --> 00:03:03,180
structure was created for basically
arrays, for sets.

53
00:03:04,060 --> 00:03:09,140
And supported operations, like
instead of less than, greater

54
00:03:09,140 --> 00:03:13,780
than, supported operations is contained,
1 array is contained

55
00:03:13,780 --> 00:03:16,080
in another, or contains.

56
00:03:16,400 --> 00:03:16,900
Right?

57
00:03:17,700 --> 00:03:19,240
So, also overlaps.

58
00:03:20,340 --> 00:03:26,400
And it just can work with this,
it can support full text search,

59
00:03:27,340 --> 00:03:31,560
but it's slow if you have large
volumes of data and a lot of

60
00:03:31,560 --> 00:03:36,240
words, different words, like vocabularies
is huge.

61
00:03:36,820 --> 00:03:38,600
Actually used vocabulary, right?

62
00:03:38,900 --> 00:03:41,880
So GiN is inverted index.

63
00:03:42,980 --> 00:03:47,680
And the name was inherited from
just just as Generalized Search

64
00:03:47,680 --> 00:03:48,180
Tree.

65
00:03:49,020 --> 00:03:50,660
I like this so much.

66
00:03:50,660 --> 00:03:54,600
Folks who are interested in databases
and want to learn something

67
00:03:54,600 --> 00:04:01,200
cool, read article by Hellerstein
from Berkeley, Joseph Hellerstein,

68
00:04:01,560 --> 00:04:07,180
and article about jumping from
single dimension B-tree to two-dimensional

69
00:04:07,540 --> 00:04:12,380
space R-tree and then generalizing
this idea to GiST, generalized

70
00:04:12,380 --> 00:04:13,120
search tree.

71
00:04:13,180 --> 00:04:14,160
It's so cool.

72
00:04:15,180 --> 00:04:18,100
And then additionally, RD-tree,
it's 4 sets already.

73
00:04:18,220 --> 00:04:21,180
And of course, you can have multiple
dimensions, like many dimensions.

74
00:04:21,760 --> 00:04:25,460
Not so many as for vectors, unfortunately,
like not 2,000, but

75
00:04:25,460 --> 00:04:28,340
for a few dimensions, GiST works
well.

76
00:04:29,920 --> 00:04:35,320
And Even there was a history for
R-tree, the original implementation

77
00:04:35,440 --> 00:04:37,000
of R-tree, native implementation.

78
00:04:37,540 --> 00:04:40,740
GiST behaved much better, so it
was replaced.

79
00:04:41,160 --> 00:04:45,160
The original implementation of
R-tree for multiple dimensions was

80
00:04:45,160 --> 00:04:47,720
replaced by GiST, not for B-tree.

81
00:04:48,280 --> 00:04:53,540
But there is extension, GiST B-tree,
for various multi-column

82
00:04:53,600 --> 00:04:57,540
indexes and so on, when you have
columns of different nature.

83
00:04:59,180 --> 00:05:03,880
And trees are great, but for full-text
search, they are not so

84
00:05:03,880 --> 00:05:04,380
great.

85
00:05:04,640 --> 00:05:10,120
So they jumped from idea of working
with trees to inverted index,

86
00:05:10,120 --> 00:05:14,080
which is a more common idea for
search engines.

87
00:05:15,060 --> 00:05:25,940
And compared to tree-like structure,
inverted index is very simple

88
00:05:27,040 --> 00:05:30,200
explanation can be, okay, we have
list of words, and for each

89
00:05:30,200 --> 00:05:36,040
word we have list of references
to places where this word was

90
00:05:36,040 --> 00:05:37,040
noticed, right?

91
00:05:37,540 --> 00:05:38,040
Like,

92
00:05:38,480 --> 00:05:39,520
Michael: like, where is

93
00:05:39,520 --> 00:05:40,580
Nikolay: links to tuples?

94
00:05:40,760 --> 00:05:40,920
Yeah,

95
00:05:40,920 --> 00:05:41,660
Michael: no, yeah.

96
00:05:41,760 --> 00:05:42,260
Nikolay: Rovers.

97
00:05:43,340 --> 00:05:47,060
So list of words, of course, normalized.

98
00:05:47,720 --> 00:05:52,520
And then for each word list of
places where it can be found.

99
00:05:52,580 --> 00:05:57,440
It's very, very similar to index
in a book, in the end of the

100
00:05:57,440 --> 00:05:57,940
Michael: book.

101
00:05:58,140 --> 00:05:58,640
Yeah.

102
00:05:59,060 --> 00:06:00,040
I read this.

103
00:06:00,360 --> 00:06:05,860
And for so many years, I've been using and reading about, you

104
00:06:06,040 --> 00:06:10,120
know, indexes in the back of books or, you know, in recipe books

105
00:06:10,120 --> 00:06:13,580
is like, there's a really famous example, really easy 1 to understand

106
00:06:13,680 --> 00:06:15,900
how indexing speeds things up.

107
00:06:15,900 --> 00:06:19,940
But it never occurred to me that they're not a B-tree structure

108
00:06:20,140 --> 00:06:25,560
they are a list of words yeah exactly it's a it's an inverted

109
00:06:25,560 --> 00:06:29,340
index so yeah and I think the thing that clicked for me as to

110
00:06:29,340 --> 00:06:37,160
like why the word inverted is used was in a standard, in a non-inverted

111
00:06:37,540 --> 00:06:43,920
index, we have an entry per row or row version and in an inverted

112
00:06:43,980 --> 00:06:48,960
1 each row could be referenced multiple times assuming...

113
00:06:48,960 --> 00:06:49,460
Nikolay: Exactly.

114
00:06:49,700 --> 00:06:52,260
Michael: In a full text search case for example, each document

115
00:06:52,260 --> 00:06:54,640
is likely to have lots of words in it.

116
00:06:54,760 --> 00:06:57,880
So that's where the word inverted comes from.

117
00:06:57,880 --> 00:07:01,820
Nikolay: Yeah, in tree-like structure, in tree, in B-tree, R-tree,

118
00:07:02,520 --> 00:07:06,760
RD-tree, GiST in general, We perform descending.

119
00:07:07,660 --> 00:07:11,580
We say we need to find this value, for example, and we jump from

120
00:07:11,580 --> 00:07:17,200
1 node to another, down to the index, towards leaf nodes, leaf

121
00:07:17,200 --> 00:07:18,140
nodes, right?

122
00:07:18,200 --> 00:07:18,700
Leaves.

123
00:07:19,300 --> 00:07:26,320
Performing some operations to understand where to go, which child

124
00:07:26,320 --> 00:07:27,040
node to choose.

125
00:07:27,040 --> 00:07:31,860
Not only direction, because it's, how to say, each internal node

126
00:07:32,520 --> 00:07:38,080
have multiple, many children, unlike binary tree.

127
00:07:39,080 --> 00:07:43,760
And my favorite example, favorite question, when hiring engineer,

128
00:07:44,600 --> 00:07:45,860
not Database engineer.

129
00:07:46,500 --> 00:07:49,520
For Database engineer, this question is mandatory.

130
00:07:50,060 --> 00:07:54,380
But software engineer, Backend engineer, you can ask what is

131
00:07:54,380 --> 00:07:57,780
B-tree and if you hear binary, this is game over.

132
00:07:59,040 --> 00:08:03,120
And this, Unfortunately, I had situations when very, very good

133
00:08:03,120 --> 00:08:07,600
software engineers, I had this feeling, do I want to ask that

134
00:08:07,600 --> 00:08:08,100
question?

135
00:08:09,840 --> 00:08:12,760
Because it's like kind of flipping the coin.

136
00:08:14,440 --> 00:08:19,460
And then I hear binary, and okay, my internal principles say

137
00:08:19,460 --> 00:08:21,160
this is a no-go.

138
00:08:22,060 --> 00:08:22,780
So, yeah.

139
00:08:23,560 --> 00:08:25,440
So, B-tree is not binary tree.

140
00:08:25,440 --> 00:08:29,180
B-tree, unlike binary, where only 2 children is maximum.

141
00:08:29,640 --> 00:08:35,200
We have many children for each internal node, maintaining some

142
00:08:35,200 --> 00:08:35,580
rules.

143
00:08:35,580 --> 00:08:37,580
Michael: In case anybody's wondering, balanced?

144
00:08:38,840 --> 00:08:41,520
Nikolay: Well, there are several theories about it, as I remember,

145
00:08:41,520 --> 00:08:43,300
but I don't remember theories.

146
00:08:43,320 --> 00:08:46,340
But okay, yeah, balanced, but actually it's almost balanced,

147
00:08:46,500 --> 00:08:52,600
meaning that, like, conceptually, the path from the root to each

148
00:08:52,600 --> 00:08:54,820
leaf can be n or n plus 1.

149
00:08:55,320 --> 00:08:56,780
This helps with balancing.

150
00:08:57,920 --> 00:08:59,040
So almost balanced.

151
00:08:59,600 --> 00:09:03,920
Okay, Again, when we work with
tree, we have several operations

152
00:09:04,000 --> 00:09:06,680
until we reach leaf.

153
00:09:07,080 --> 00:09:12,260
And leaf has pointer to the tuple
we are looking for.

154
00:09:13,320 --> 00:09:18,300
Inverted index, okay, we find our
word.

155
00:09:19,140 --> 00:09:22,040
And then we find many places for
this word, right?

156
00:09:22,440 --> 00:09:26,260
And this also means there are several
challenges here.

157
00:09:26,380 --> 00:09:30,480
I remember originally, as I understand,
maybe like my perception

158
00:09:30,480 --> 00:09:32,680
might be wrong, but it was long
ago.

159
00:09:32,680 --> 00:09:33,160
How many?

160
00:09:33,160 --> 00:09:34,160
20 years, right?

161
00:09:34,160 --> 00:09:34,660
Almost.

162
00:09:35,320 --> 00:09:40,520
I remember, GIN was lacking some
internal B-tree indexes, it needs

163
00:09:40,520 --> 00:09:41,880
B-tree indexes twice.

164
00:09:42,100 --> 00:09:47,360
First, to find the word, because
if the number of different words

165
00:09:47,360 --> 00:09:52,500
is huge, we need additional mechanism
to perform the search faster

166
00:09:52,500 --> 00:09:54,360
and it can be B-tree, right?

167
00:09:54,660 --> 00:09:59,600
And second, inside posting lists,
we also have B-tree.

168
00:10:00,020 --> 00:10:03,900
So inside GIN, there are 2 types
of B-tree indexes, as I remember.

169
00:10:05,280 --> 00:10:10,140
And I saw Oleg Bartunov's articles
and book about internals.

170
00:10:11,240 --> 00:10:17,200
And this book has, of course, a
whole section about GIN indexes.

171
00:10:17,220 --> 00:10:21,820
And I saw a representation, visualization
for GIN was, I think

172
00:10:21,820 --> 00:10:24,260
actually, the commutation also
should have it, right?

173
00:10:24,320 --> 00:10:30,540
It's 1 of the, maybe the first
illustration which was added to

174
00:10:30,780 --> 00:10:33,420
Postgres documentation was GIN,
if I'm not mistaken.

175
00:10:33,600 --> 00:10:38,400
My memory might trick me a little
bit, but I remember I saw it

176
00:10:38,400 --> 00:10:40,860
was 90 degrees rotated.

177
00:10:40,940 --> 00:10:46,180
So instead of like in book, we
have words, and then we see page

178
00:10:46,180 --> 00:10:48,660
numbers to the right for each word.

179
00:10:48,900 --> 00:10:51,980
It was rotated 90 degrees so everything
looks down.

180
00:10:52,440 --> 00:10:57,880
So it seems like we going down
all the time like in case of GiST

181
00:10:57,880 --> 00:10:59,160
or B-tree.

182
00:10:59,340 --> 00:11:04,240
So starting from search of our
word, then we using internal B-tree,

183
00:11:04,240 --> 00:11:09,920
then we go in posting link, posting
list, we also use B-tree

184
00:11:09,920 --> 00:11:11,400
and also descending, right?

185
00:11:11,460 --> 00:11:13,600
And I honestly feel confusion.

186
00:11:14,380 --> 00:11:22,740
I like the book-like visualization
Because it distinguishes from

187
00:11:22,800 --> 00:11:23,860
tree structures.

188
00:11:24,340 --> 00:11:30,700
And also we have 3 internal trees
here, and it only adds confusion

189
00:11:30,700 --> 00:11:31,420
in my opinion.

190
00:11:31,420 --> 00:11:33,080
But, well, this is the choice.

191
00:11:33,080 --> 00:11:38,400
I think the guys who I think created
visualization and wrote

192
00:11:38,400 --> 00:11:42,580
and Igor Rogov and others, they
are great in terms of education.

193
00:11:43,260 --> 00:11:47,720
And I like, I definitely admire
them, their work in education.

194
00:11:48,340 --> 00:11:51,540
So, but I personally like original
representation.

195
00:11:51,900 --> 00:11:56,100
We could write B-tree also rotated,
highlighting that it's kind of

196
00:11:56,100 --> 00:11:58,760
different beast in terms of indexes,
right?

197
00:11:59,880 --> 00:12:00,600
Michael: Yeah, good.

198
00:12:00,620 --> 00:12:04,020
Nikolay: Yeah, this is, sorry,
this was my introduction, like

199
00:12:04,020 --> 00:12:08,220
just like history and so on like
and some simple explanation

200
00:12:09,480 --> 00:12:09,960
Yeah,

201
00:12:09,960 --> 00:12:13,540
Michael: yeah, I really liked it and I really want to like diagrams

202
00:12:13,620 --> 00:12:18,900
in the docs I think they a lot of effort goes into making diagrams

203
00:12:18,900 --> 00:12:22,000
that are easy to understand unfortunately we don't have that

204
00:12:22,000 --> 00:12:25,380
many in the Postgres documentation but I also find that I personally,

205
00:12:26,040 --> 00:12:30,560
I don't learn best visually I don't think and I found the description

206
00:12:30,720 --> 00:12:34,540
in the Postgres docs of how GIN works particularly useful.

207
00:12:34,540 --> 00:12:38,560
It's really well written like most of the documentation But I

208
00:12:38,560 --> 00:12:39,660
found that really helpful.

209
00:12:39,660 --> 00:12:41,960
So I'll link to that in the show notes as well I think that's

210
00:12:41,960 --> 00:12:46,260
an incredibly good description with a few examples of I mean,

211
00:12:46,260 --> 00:12:50,640
obviously Fulltext search is the main use case, but it's not

212
00:12:50,640 --> 00:12:51,360
the only

213
00:12:51,600 --> 00:12:54,780
Nikolay: I think right now not a full text search full text search

214
00:12:54,780 --> 00:12:58,640
was a regional main use case right now when use cases JSONB

215
00:12:59,280 --> 00:13:03,380
Michael: yeah interesting well I don't honestly I don't see that

216
00:13:03,380 --> 00:13:07,540
I'd say I Think I'd be interested in the results of your poll

217
00:13:07,540 --> 00:13:11,880
I think you've put a poll on Twitter, but I I do I do think GIN

218
00:13:11,880 --> 00:13:17,780
is probably the second most popular index type of city after

219
00:13:17,780 --> 00:13:21,900
B-tree, but it's by somewhere in the region I was thinking order

220
00:13:21,900 --> 00:13:26,260
of magnitude wise it's I say it's there's fewer than 1 in 10

221
00:13:26,320 --> 00:13:29,580
maybe but probably more than 1 in a hundred so like somewhere

222
00:13:29,580 --> 00:13:33,800
in between those So it's still not super common for me to see

223
00:13:33,800 --> 00:13:34,300
them.

224
00:13:34,820 --> 00:13:36,800
Nikolay: Yeah, almost half of folks voted.

225
00:13:37,480 --> 00:13:39,780
They say 2 index types are used.

226
00:13:39,780 --> 00:13:40,440
And I guess

227
00:13:40,440 --> 00:13:40,940
Michael: it's

228
00:13:41,680 --> 00:13:48,000
Nikolay: either b3 and GIN, or b3 and vector types like HNSW.

229
00:13:49,120 --> 00:13:49,620
Interesting.

230
00:13:49,980 --> 00:13:50,700
Who knows?

231
00:13:51,280 --> 00:13:54,220
Michael: I do see the occasional BRIN and the occasional hash,

232
00:13:54,220 --> 00:13:57,100
but often those are kind of people have left them lying around

233
00:13:57,100 --> 00:14:00,800
after an experiment, after like trying them out, but not necessarily

234
00:14:01,560 --> 00:14:02,980
getting that much usage.

235
00:14:03,580 --> 00:14:06,240
Nikolay: Postgres should have telemetry enabled by default.

236
00:14:07,120 --> 00:14:08,580
I'm joking, I'm joking.

237
00:14:10,080 --> 00:14:11,780
To get usage stats.

238
00:14:11,880 --> 00:14:15,320
Michael: Well, actually, I know this is a real tangent now, But

239
00:14:15,320 --> 00:14:18,940
I do think that the usage stats can be misleading for people

240
00:14:18,940 --> 00:14:22,120
as well, because the last person I came across that had a hash

241
00:14:22,120 --> 00:14:25,580
index, they had a hash index on the same column they had a regular

242
00:14:25,580 --> 00:14:26,720
B-tree index on.

243
00:14:26,720 --> 00:14:29,540
And I said, you know, that's redundant, like, if you could, the

244
00:14:29,540 --> 00:14:30,900
B-tree could...

245
00:14:31,400 --> 00:14:33,420
Nikolay: You already pay the price for me.

246
00:14:33,420 --> 00:14:34,400
Yeah, I'm b3.

247
00:14:34,440 --> 00:14:34,940
So

248
00:14:35,380 --> 00:14:38,660
Michael: Exactly, but the hash index was still getting used because

249
00:14:38,940 --> 00:14:42,880
it's a bit smaller for that Like in so was still getting usage

250
00:14:42,880 --> 00:14:43,940
in the time I'm

251
00:14:44,020 --> 00:14:45,360
Nikolay: about hash indexes.

252
00:14:45,460 --> 00:14:45,960
Remember?

253
00:14:45,980 --> 00:14:50,220
Michael: Yeah but all I mean is
the telemetry isn't enough to

254
00:14:50,220 --> 00:14:53,400
know that you can drop that hash
index because it's still getting

255
00:14:53,400 --> 00:14:53,900
usage.

256
00:14:54,520 --> 00:14:56,000
Nikolay: I'm joking about telemetry.

257
00:14:56,000 --> 00:15:00,660
I cannot imagine Postgres would
add some telemetry stuff enabled

258
00:15:00,660 --> 00:15:01,240
by default.

259
00:15:01,240 --> 00:15:05,540
It's like it's completely opposite
to what it's usually done

260
00:15:05,540 --> 00:15:06,140
in Postgres.

261
00:15:06,600 --> 00:15:08,540
But I'm very curious of course.

262
00:15:08,860 --> 00:15:13,160
I think big platforms might have
some stats but who knows.

263
00:15:13,260 --> 00:15:18,400
Anyway, It definitely looks like
B-tree is number 1, but GIN may

264
00:15:18,400 --> 00:15:20,580
be number 2 or number 3.

265
00:15:21,180 --> 00:15:23,500
Michael: 0, I'm sorry, I understand
what you mean by telemetry.

266
00:15:23,560 --> 00:15:26,580
I thought you meant like for the
users to see what they were

267
00:15:26,580 --> 00:15:26,780
using.

268
00:15:26,780 --> 00:15:30,100
Nikolay: No, no, imagine Postgres
has telemetry sending to some

269
00:15:30,100 --> 00:15:32,240
server or user shop.

270
00:15:32,580 --> 00:15:35,540
Michael: The cloud providers could
do it like exactly anonymized.

271
00:15:35,980 --> 00:15:39,960
Nikolay: Yeah, if, if they can,
because it's also a question

272
00:15:39,960 --> 00:15:46,300
how, how much I can go to schema
analysis, usually should be

273
00:15:46,300 --> 00:15:46,800
possible.

274
00:15:46,880 --> 00:15:48,040
But yeah, okay.

275
00:15:48,100 --> 00:15:54,300
But it's better to focus on different
tasks for cloud providers.

276
00:15:54,600 --> 00:15:56,420
We can discuss this another time.

277
00:15:56,720 --> 00:15:59,020
This is not number 1 priority,
I'm pretty sure.

278
00:15:59,020 --> 00:15:59,520
No.

279
00:16:00,060 --> 00:16:01,220
So, okay.

280
00:16:01,840 --> 00:16:02,360
Michael: How about you?

281
00:16:02,360 --> 00:16:03,080
What do you see?

282
00:16:03,080 --> 00:16:06,560
Like, how often do you see GIN
being used in the wild?

283
00:16:09,340 --> 00:16:13,200
Nikolay: Well, it's used for full-text
search, but often it's

284
00:16:13,200 --> 00:16:16,360
losing to Elastic in larger companies.

285
00:16:17,420 --> 00:16:18,940
In smaller projects, it's great.

286
00:16:19,020 --> 00:16:22,920
But in larger companies, we should
discuss performance gains

287
00:16:23,680 --> 00:16:25,620
and overhead next, right?

288
00:16:25,680 --> 00:16:31,240
But yeah, what I see, it's great
until it's not, in terms of

289
00:16:31,240 --> 00:16:32,080
full-text search.

290
00:16:32,080 --> 00:16:38,320
For JSONB, kind of similar, but
there we can tune it, we can

291
00:16:38,320 --> 00:16:43,600
use smaller size of index, just
preserving all the information

292
00:16:43,680 --> 00:16:45,640
about keys and values.

293
00:16:46,240 --> 00:16:51,760
JSONB pattern ops, these things
like additional, you can tweak

294
00:16:51,760 --> 00:16:52,400
it, right?

295
00:16:52,840 --> 00:16:54,640
Yeah, path ops, maybe.

296
00:16:54,800 --> 00:16:55,820
Path ops, exactly.

297
00:16:55,840 --> 00:16:57,500
Yeah, JSONB path ops.

298
00:16:57,500 --> 00:17:01,360
So you can say I don't need everything
to be indexed, all the

299
00:17:01,360 --> 00:17:06,220
knowledge about structure, I just
need keys and values and support

300
00:17:06,220 --> 00:17:11,420
only part of search operations
I have.

301
00:17:11,800 --> 00:17:16,160
And for JSONB for sure I see it,
but at the same time in many

302
00:17:16,160 --> 00:17:21,720
cases JSON or JSONB values left
unindexed because it's just

303
00:17:21,720 --> 00:17:24,680
storage for some metadata, for
example, and we don't need to

304
00:17:24,680 --> 00:17:25,640
search in it.

305
00:17:26,400 --> 00:17:32,980
Or we decide to search only using
specific paths inside JSON

306
00:17:32,980 --> 00:17:33,480
value.

307
00:17:34,240 --> 00:17:37,840
And in this case, for both JSON
and JSONB, and even for XML

308
00:17:37,840 --> 00:17:41,740
data type, you can have B-tree because
it becomes a scalar value

309
00:17:41,740 --> 00:17:43,160
when you apply some.

310
00:17:44,060 --> 00:17:47,760
You need to get the value from
some path.

311
00:17:47,760 --> 00:17:48,540
That's it.

312
00:17:49,300 --> 00:17:51,300
Michael: It's just an expression
index, yeah.

313
00:17:51,300 --> 00:17:52,000
Nikolay: Yeah, exactly.

314
00:17:52,720 --> 00:18:01,100
So I also see quite a lot trigram,
pg_trgm being used sometimes

315
00:18:01,100 --> 00:18:06,520
with some issues, but yeah, this
I see as well, yeah.

316
00:18:07,200 --> 00:18:10,080
Michael: Again, but again, normally
related to full text search

317
00:18:10,080 --> 00:18:13,660
right for I say trigram by the
way and I thought maybe it's

318
00:18:13,660 --> 00:18:14,540
trigram I don't

319
00:18:14,540 --> 00:18:17,780
Nikolay: know I think JSONB is
number 1 in my head it's so I

320
00:18:17,780 --> 00:18:24,320
don't have I don't have analytics
so it's some subjective perception

321
00:18:24,380 --> 00:18:27,980
Michael: it makes sense right It
makes sense if you think I might

322
00:18:27,980 --> 00:18:29,760
want to do some search on this.

323
00:18:29,760 --> 00:18:30,540
It's the perfect...

324
00:18:30,570 --> 00:18:35,280
I mean, it came 10 years before
JSONB, but it's the perfect

325
00:18:35,280 --> 00:18:38,940
index type for it, if you don't
know in advance what you're going

326
00:18:38,940 --> 00:18:39,900
to want to be searching for.

327
00:18:39,900 --> 00:18:44,280
Nikolay: There is also a full text
search for JSONB, additionally,

328
00:18:44,340 --> 00:18:45,660
a specific case.

329
00:18:46,120 --> 00:18:49,200
I remember some support was added
to perform full text search

330
00:18:49,200 --> 00:18:53,740
inside JSON values, JSONB values,
also powered by GIN, I guess.

331
00:18:55,080 --> 00:18:57,540
Anyway, this index is great.

332
00:18:57,560 --> 00:19:00,940
Well, for arrays as well, I use
it many times for arrays.

333
00:19:01,560 --> 00:19:09,480
If I know I need to store arrays,
text arrays, number, like integer

334
00:19:09,480 --> 00:19:11,300
arrays, doesn't matter.

335
00:19:12,440 --> 00:19:14,940
I like how Postgres supports them
for decades.

336
00:19:15,860 --> 00:19:20,380
Arrays support in Postgres is great
except 1 thing you know it

337
00:19:20,380 --> 00:19:20,880
right?

338
00:19:22,660 --> 00:19:23,160
Michael: Nodes?

339
00:19:23,800 --> 00:19:25,760
Nikolay: No no no similar but no
it's

340
00:19:25,760 --> 00:19:26,920
Michael: I don't know it no

341
00:19:26,920 --> 00:19:33,320
Nikolay: it's indexes it starts
with 1 it's very confusing oh

342
00:19:33,320 --> 00:19:33,520
you

343
00:19:33,520 --> 00:19:35,740
Michael: confused me saying indexes
I was thinking

344
00:19:35,740 --> 00:19:40,160
Nikolay: no no yeah it's different
yeah Yeah I mean to access

345
00:19:40,160 --> 00:19:45,660
the first element, you know everywhere
else Outside Postgres

346
00:19:45,660 --> 00:19:49,800
at 0 right but in Postgres it's
1 and you need to switch switch

347
00:19:49,800 --> 00:19:52,740
your mind all the time Not everywhere
else.

348
00:19:52,740 --> 00:19:56,180
But yeah and For a race, it's great.

349
00:19:56,180 --> 00:20:01,500
For example, I remember my very
first talk in the US in 2008

350
00:20:01,800 --> 00:20:05,720
was about how cool Postgres is
for Web 2.0.

351
00:20:06,600 --> 00:20:12,240
And I talked about EAV structure,
how it's performance-wise not

352
00:20:12,240 --> 00:20:15,100
good for larger data volumes.

353
00:20:15,420 --> 00:20:20,020
And when we talked, I talked about
tagging, tags, right?

354
00:20:20,020 --> 00:20:21,980
For social media, we need tags.

355
00:20:22,960 --> 00:20:27,340
And putting tags into separate
tables following EAV, entity attribute

356
00:20:27,340 --> 00:20:29,720
value structure, is going to hit
performance.

357
00:20:30,340 --> 00:20:34,180
So instead, we can put it as arrays
in the same table.

358
00:20:35,680 --> 00:20:38,940
And then to search we can just
use GiN, right?

359
00:20:39,120 --> 00:20:42,340
And Russian Doll Tree will be effectively
used, but supported

360
00:20:42,340 --> 00:20:44,840
by GiN implicitly, right?

361
00:20:45,360 --> 00:20:52,400
So in this case you can find quickly
like give me rows where

362
00:20:53,040 --> 00:20:57,680
where we have this tag right there
will be problem although of

363
00:20:57,700 --> 00:20:59,740
ordering Right.

364
00:20:59,760 --> 00:21:04,540
This is the like Before we move
on to discussing this problem,

365
00:21:04,820 --> 00:21:07,640
I think this is the biggest problem
with full-text search and

366
00:21:07,640 --> 00:21:08,620
general GiN.

367
00:21:09,060 --> 00:21:11,140
People suffer from it, but we will
discuss it.

368
00:21:11,140 --> 00:21:14,780
Before that, let's discuss performance
gains and losses.

369
00:21:15,400 --> 00:21:16,360
Gain is obvious.

370
00:21:16,860 --> 00:21:21,000
Search speed is much better for
inverted index than for tree-like

371
00:21:21,020 --> 00:21:21,520
structure.

372
00:21:22,540 --> 00:21:26,520
So for larger volumes of data,
search is good, SELECT is good,

373
00:21:26,520 --> 00:21:27,020
right?

374
00:21:27,620 --> 00:21:33,000
If we, if like 1 comment here,
Only if we talk about the use

375
00:21:33,260 --> 00:21:34,580
Michael: of GiN only, right?

376
00:21:35,340 --> 00:21:40,280
And like only searching by 1 dimension.

377
00:21:41,040 --> 00:21:45,440
Nikolay: Yeah, well, 1 word or
few words also, it has an internal

378
00:21:45,720 --> 00:21:50,020
kind of additional internal query
language, you can say, and

379
00:21:50,020 --> 00:21:54,320
or phrase search, a lot of features
there.

380
00:21:54,520 --> 00:21:58,260
But also when we say this, search
speed is better.

381
00:21:58,260 --> 00:22:01,220
We need to think about edge cases
as well.

382
00:22:01,220 --> 00:22:06,440
For example, if we have a very,
very popular word, which was

383
00:22:06,440 --> 00:22:14,200
not excluded by putting it to stop
list or vocabulary, stop list

384
00:22:14,200 --> 00:22:14,700
dictionary.

385
00:22:15,480 --> 00:22:19,540
If we didn't put it there, we're
going to have poor selectivity

386
00:22:20,200 --> 00:22:23,340
and high cardinality of results
and performance.

387
00:22:23,680 --> 00:22:24,440
I don't know.

388
00:22:24,840 --> 00:22:26,140
The limit works, right?

389
00:22:26,500 --> 00:22:31,260
But then there are edge cases where
it's quite a popular word,

390
00:22:31,400 --> 00:22:32,300
we use GiN.

391
00:22:33,280 --> 00:22:39,120
And there is a big problem of the
need to combine GiN based

392
00:22:39,520 --> 00:22:43,400
search with something else, like
for example, additional filter

393
00:22:43,440 --> 00:22:48,460
on some scholar like timestamp
or anything, or ordering.

394
00:22:49,740 --> 00:22:50,240
Right.

395
00:22:50,580 --> 00:22:52,740
And this, this is huge problem.

396
00:22:52,800 --> 00:22:56,580
Like I want to find everything
which includes these words or,

397
00:22:56,580 --> 00:23:02,080
or these values doesn't matter,
but I also Need to get 25 last

398
00:23:02,080 --> 00:23:05,420
items from there And this is somehow

399
00:23:05,640 --> 00:23:09,060
Michael: wait it like you might
want to wait based on the the

400
00:23:09,060 --> 00:23:12,180
more the more recent The more you
want to wait it, but if it's

401
00:23:12,180 --> 00:23:16,120
got, you know, the word many many times over I I get it I think

402
00:23:16,120 --> 00:23:20,340
we're probably venturing a bit too far into full-text search

403
00:23:20,340 --> 00:23:22,760
stuff rather than GIN stuff, but you're right.

404
00:23:22,820 --> 00:23:23,620
Nikolay: It's about GIN,

405
00:23:23,620 --> 00:23:25,160
it's not about full-text search.

406
00:23:25,640 --> 00:23:29,580
For trigram, don't you want to have most popular or used or like

407
00:23:29,580 --> 00:23:30,300
the latest?

408
00:23:30,480 --> 00:23:36,620
Or for JSONB search, I want to find everything which includes

409
00:23:36,620 --> 00:23:39,120
these things, but I want very fresh.

410
00:23:40,020 --> 00:23:41,460
I want to order somehow.

411
00:23:41,940 --> 00:23:45,760
And in most cases in modern world, like when it was created,

412
00:23:45,760 --> 00:23:49,260
the idea was we are going to order by ranking.

413
00:23:49,400 --> 00:23:51,220
It's all was full search, search, right?

414
00:23:51,220 --> 00:23:56,040
Ranking like most relevant, but in social media, usually don't

415
00:23:56,040 --> 00:23:56,600
want that.

416
00:23:56,600 --> 00:23:58,640
We usually want the freshest information.

417
00:23:59,140 --> 00:23:59,640
Fresh.

418
00:23:59,720 --> 00:24:00,180
Yeah.

419
00:24:00,180 --> 00:24:04,120
So creation time or ID, if it's numeric or your ID version 7

420
00:24:04,120 --> 00:24:04,700
would work.

421
00:24:04,700 --> 00:24:06,540
So we need to order by some scalar.

422
00:24:07,400 --> 00:24:09,220
And this works not well with GIN.

423
00:24:09,600 --> 00:24:10,680
Michael: I think you're right though.

424
00:24:10,680 --> 00:24:14,000
I think maybe at scale it's always going to be a problem.

425
00:24:14,440 --> 00:24:18,000
There is a feature that was added from the beginning with GIN,

426
00:24:18,420 --> 00:24:23,440
at least for the stop word issue, and that's GIN fuzzy search

427
00:24:23,440 --> 00:24:27,940
limit that you can set in, that it recommends in the thousands,

428
00:24:28,080 --> 00:24:32,840
so that if your search would have returned more than that number

429
00:24:32,840 --> 00:24:36,320
roughly it will limit them but it will limit them randomly so

430
00:24:36,320 --> 00:24:39,920
that that plays into your which which 1 should be returned but

431
00:24:39,920 --> 00:24:43,100
the idea behind that feature is if you're returning thousands

432
00:24:43,440 --> 00:24:47,320
of options anyway how good is that search in the first place?

433
00:24:47,320 --> 00:24:48,280
Like it's a...

434
00:24:48,540 --> 00:24:50,760
So, yeah, you're right.

435
00:24:50,760 --> 00:24:53,960
Nikolay: Also statistics-related work, like over time it was

436
00:24:53,960 --> 00:24:59,080
improved, but I remember still having issues like with lossiness

437
00:24:59,240 --> 00:25:01,160
of fore a planner, right?

438
00:25:01,440 --> 00:25:02,660
Like we need to recheck.

439
00:25:03,280 --> 00:25:05,320
I don't know actually details right now.

440
00:25:05,320 --> 00:25:09,300
I'm using GIN blindly lately in terms of performance.

441
00:25:09,440 --> 00:25:14,580
But for me, the key problem is inability for GIN to combine search

442
00:25:14,580 --> 00:25:18,620
with additional Scalar-based filters or ordering.

443
00:25:18,620 --> 00:25:20,640
Ordering is number 1 problem for me.

444
00:25:20,680 --> 00:25:23,800
It led the same folks to create RUM indexes.

445
00:25:24,240 --> 00:25:28,320
Michael: Yeah, which confused me because when they launched GIN

446
00:25:28,320 --> 00:25:31,120
they said you should think of it as a genie not as an alcoholic

447
00:25:31,160 --> 00:25:34,280
drink but then they come out with RUM and it's definitely about

448
00:25:34,280 --> 00:25:35,280
alcoholic drinks.

449
00:25:35,280 --> 00:25:36,900
So I don't believe them.

450
00:25:36,900 --> 00:25:37,360
Yeah.

451
00:25:37,360 --> 00:25:38,400
Nikolay: Ambush, right?

452
00:25:38,940 --> 00:25:39,440
Yeah.

453
00:25:40,080 --> 00:25:40,580
Yeah.

454
00:25:40,840 --> 00:25:43,860
Well, and the RUM, unfortunately,
in my case, didn't work well

455
00:25:43,860 --> 00:25:46,560
because of like, it was huge and
so on.

456
00:25:46,560 --> 00:25:48,900
Maybe it was because I tried to
put...

457
00:25:49,400 --> 00:25:53,420
The idea of RUM is let's put columns
into a structure, right?

458
00:25:53,420 --> 00:25:53,500
Michael: Like timestamp.

459
00:25:53,500 --> 00:25:57,160
So that you can do the thing you
wanted.

460
00:25:57,340 --> 00:25:58,100
Nikolay: Right, exactly.

461
00:25:58,380 --> 00:26:01,330
And creation time, it's how many?

462
00:26:01,330 --> 00:26:03,420
8 bytes or 16?

463
00:26:03,940 --> 00:26:04,880
I keep forgetting.

464
00:26:05,220 --> 00:26:06,240
It was huge.

465
00:26:06,960 --> 00:26:09,480
I guess I could reduce it.

466
00:26:09,520 --> 00:26:15,520
For example, putting 4 byte integers
just to support ordering.

467
00:26:16,360 --> 00:26:20,200
There are ideas how to improve,
but still, this is an Extension,

468
00:26:20,460 --> 00:26:23,740
it's not available on many platforms,
unfortunately.

469
00:26:24,220 --> 00:26:27,540
But there is another thing, there
is a standard contrib module

470
00:26:27,540 --> 00:26:30,920
Extension which is available everywhere
called B-tree GIN.

471
00:26:32,660 --> 00:26:35,460
At least this can help you to combine
different filters.

472
00:26:35,460 --> 00:26:41,120
For example, if you want to have
filter to search in JSONB or

473
00:26:41,120 --> 00:26:45,200
full text search or trigram index,
you want to combine it with

474
00:26:45,200 --> 00:26:47,700
additional filtering, for example,
price, right?

475
00:26:47,840 --> 00:26:49,980
Or age, depending on data, right?

476
00:26:50,140 --> 00:26:52,160
This is possible, this is worth
considering.

477
00:26:52,440 --> 00:26:57,240
So to have multi-column index,
and maybe for ordering it also

478
00:26:57,240 --> 00:27:02,340
can be helpful, but if you don't
do anything, what is usually

479
00:27:02,780 --> 00:27:05,700
happening, the planner needs to
decide.

480
00:27:06,620 --> 00:27:10,760
Should it use, for example, a primary
key to order, or create

481
00:27:10,760 --> 00:27:15,640
and add timestamp to order by, and
then apply filter on...

482
00:27:16,060 --> 00:27:19,960
Instead of GIN, it will be just
applying filter for whole data

483
00:27:19,960 --> 00:27:21,840
it will find.

484
00:27:22,280 --> 00:27:23,540
And it can be very...

485
00:27:24,080 --> 00:27:24,820
It's like...

486
00:27:25,920 --> 00:27:30,940
Alternative is to perform GIN filtering,
find everything and

487
00:27:30,940 --> 00:27:32,160
then sort in memory.

488
00:27:33,100 --> 00:27:34,280
And it's hit and miss.

489
00:27:34,280 --> 00:27:38,820
Sometimes we see that a planner
thinks, a planner is very optimistic

490
00:27:39,000 --> 00:27:43,300
thinking we will quickly find just
following B-tree, by created

491
00:27:43,320 --> 00:27:48,580
at, in reverse order, like very
fresh item, dynamically check,

492
00:27:48,580 --> 00:27:51,960
does it match our filter which
is supposed to go through GIN?

493
00:27:52,080 --> 00:27:54,060
Well, it's not matching, not matching,
not matching.

494
00:27:54,060 --> 00:27:58,940
And if it's a miss, it might skip
many, many, many rows until

495
00:27:58,940 --> 00:28:01,420
it finds our 25 target house.

496
00:28:02,040 --> 00:28:05,260
And just following B-tree, not involving
GIN at all.

497
00:28:05,580 --> 00:28:07,060
And this is a terrible situation.

498
00:28:07,080 --> 00:28:09,480
I saw many incidents because of
that.

499
00:28:09,840 --> 00:28:15,540
So, or vice versa, GIN, again,
we use some popular word, it

500
00:28:15,540 --> 00:28:19,640
finds millions of entries, then
tries to sort them in memory

501
00:28:19,640 --> 00:28:21,400
to find top 25.

502
00:28:21,680 --> 00:28:26,120
Maybe it's like edge cases here
may be hitting less in terms

503
00:28:26,120 --> 00:28:29,380
of performance, but also not not
not fun at all to have this

504
00:28:29,380 --> 00:28:30,980
situation Right.

505
00:28:30,980 --> 00:28:32,520
Yeah, and this is number 1 be

506
00:28:32,520 --> 00:28:35,340
Michael: where the the fuzzy limit
could actually maybe help

507
00:28:35,340 --> 00:28:40,760
there, depending on how important
it is to not have false negative,

508
00:28:40,760 --> 00:28:44,120
like things that should show up
not showing up, I guess, depending

509
00:28:44,120 --> 00:28:45,140
on the use case.

510
00:28:45,140 --> 00:28:47,520
If you just want to show something,
let's say you're showing

511
00:28:47,520 --> 00:28:50,540
a social media feed, trying to
show engaging things to people,

512
00:28:50,540 --> 00:28:53,000
it doesn't matter if there's a
few really good ones that don't

513
00:28:53,000 --> 00:28:53,500
show.

514
00:28:53,680 --> 00:28:57,160
But then if you've got a different
use case, then it matters

515
00:28:57,160 --> 00:28:57,540
a lot.

516
00:28:57,540 --> 00:29:00,860
Like if the full-text search, if
you don't show the most relevant

517
00:29:00,860 --> 00:29:02,120
thing, that's an issue.

518
00:29:03,240 --> 00:29:08,700
Nikolay: Or for example, user just
added something, goes to search,

519
00:29:08,720 --> 00:29:12,440
expects this to be there, but it's
not there, it's not good.

520
00:29:12,440 --> 00:29:16,260
It's like, feeling of this is like
it's a bug.

521
00:29:16,400 --> 00:29:19,220
So it's in many cases, it's unacceptable,
unfortunately.

522
00:29:19,740 --> 00:29:22,360
And also there is a problem with
pending list, right?

523
00:29:23,300 --> 00:29:24,260
Pending list.

524
00:29:24,380 --> 00:29:24,680
So

525
00:29:24,680 --> 00:29:26,640
Michael: this is about inserts
rather than selects.

526
00:29:26,640 --> 00:29:28,060
Nikolay: Yeah, not posting lists.

527
00:29:28,180 --> 00:29:34,080
Posting lists For each word, basically,
we have references to

528
00:29:34,080 --> 00:29:34,580
tuples.

529
00:29:35,180 --> 00:29:36,340
It's posting list.

530
00:29:36,420 --> 00:29:41,140
And then there is pending list,
which was like the trick to,

531
00:29:42,380 --> 00:29:44,980
we forgot to mention the main problem
with GIN, actually.

532
00:29:45,160 --> 00:29:48,120
The key problem is slow writes,
right?

533
00:29:49,120 --> 00:29:52,720
Because search is fast with all
the things we discussed, but

534
00:29:52,720 --> 00:29:56,520
writes are slower than writes to
tree-like structures, to trees,

535
00:29:56,520 --> 00:29:58,020
to B-tree, to GiST.

536
00:29:58,900 --> 00:30:01,760
So even with rebalancing, I guess
it's like slower.

537
00:30:02,780 --> 00:30:05,720
And what was created?

538
00:30:05,900 --> 00:30:07,160
Fast update option.

539
00:30:07,920 --> 00:30:09,760
Michael: It's an unfair comparison,
right?

540
00:30:09,760 --> 00:30:13,580
It's almost by design that they
have to be slower because documents

541
00:30:13,640 --> 00:30:15,080
contain lots of words.

542
00:30:17,060 --> 00:30:23,680
1 row being updated or added creates
multiple writes to the index.

543
00:30:23,680 --> 00:30:26,880
So it's not a fair fight, it's
not a fair comparison.

544
00:30:27,040 --> 00:30:27,680
Nikolay: I agree.

545
00:30:27,940 --> 00:30:33,940
So to mitigate this fast update
technique was created and there's

546
00:30:33,940 --> 00:30:36,300
an option fast update when you
create an index.

547
00:30:36,580 --> 00:30:39,020
And I guess it's on by default,
right?

548
00:30:39,140 --> 00:30:39,640
Yeah.

549
00:30:39,720 --> 00:30:40,220
Yeah.

550
00:30:40,240 --> 00:30:43,940
So fast update means that changes
will go to some temporary like

551
00:30:43,940 --> 00:30:46,560
buffer, like kind of location inside
the structure called...

552
00:30:46,560 --> 00:30:47,220
Michael: Pending list.

553
00:30:47,220 --> 00:30:48,260
Nikolay: Pending list, yes.

554
00:30:48,260 --> 00:30:52,380
And then at some point when they
reach by default 4 megabyte.

555
00:30:53,420 --> 00:30:55,240
Michael: Yeah, there's a few things
that can trigger it.

556
00:30:55,240 --> 00:30:55,940
That's 1.

557
00:30:55,940 --> 00:30:56,440
There's

558
00:30:57,720 --> 00:30:57,940
Nikolay: like a

559
00:30:57,940 --> 00:30:58,350
Michael: catch there.

560
00:30:58,350 --> 00:30:58,940
What happens?

561
00:30:59,540 --> 00:31:03,180
But also, Oh, it batch processes
them, right?

562
00:31:03,180 --> 00:31:06,640
Nikolay: Yeah, and it's happening,
it can happen synchronously.

563
00:31:06,900 --> 00:31:12,100
So some write, some Update or INSERT
might suddenly be very slow.

564
00:31:12,260 --> 00:31:15,780
So slow it might hit like your
statement amount, which is like

565
00:31:15,780 --> 00:31:19,240
30 seconds, for example, or 60
depending, right, or 10 seconds.

566
00:31:19,540 --> 00:31:24,680
And this is like unpredictable
performance for writes Yeah, or

567
00:31:24,680 --> 00:31:26,400
vacuum can do it a synchronously.

568
00:31:26,400 --> 00:31:28,400
Wow Yeah, vacuum.

569
00:31:28,400 --> 00:31:32,240
I like but what I saw from practice
vacuum is good and mitigate

570
00:31:32,280 --> 00:31:39,440
this mitigate this and not allowing
this job to be done in a

571
00:31:39,440 --> 00:31:40,420
synchronous manner.

572
00:31:40,580 --> 00:31:47,040
But it works only if our duration
of vacuum is reasonable, which

573
00:31:47,040 --> 00:31:50,100
points to the direction we do need
Partitioning.

574
00:31:51,180 --> 00:31:54,780
Because if we have a huge Table,
vacuum is super slow.

575
00:31:54,840 --> 00:31:56,880
It cannot catch up with our writes.

576
00:31:56,880 --> 00:31:59,780
And pending list problem hits our
writes.

577
00:32:00,060 --> 00:32:01,520
And users notice it.

578
00:32:01,560 --> 00:32:05,860
At some point, it might be so that
we decide, OK, we need to

579
00:32:05,860 --> 00:32:10,160
switch off FastUpdate for some
Indexes, for large Tables, because

580
00:32:10,160 --> 00:32:13,940
we want predictable, maybe not
good, but predictable performance

581
00:32:14,020 --> 00:32:14,680
of writes.

582
00:32:15,200 --> 00:32:15,900
It depends.

583
00:32:16,340 --> 00:32:20,680
There is a trade-off, and I see
both cases are not ideal, you

584
00:32:20,680 --> 00:32:24,520
know, like if you have really large
volumes of data and workload,

585
00:32:25,520 --> 00:32:26,400
heavy workload.

586
00:32:26,920 --> 00:32:33,120
So Partitioning can help here and
to keep FastUpdate on, in

587
00:32:33,120 --> 00:32:33,780
my opinion.

588
00:32:34,560 --> 00:32:38,040
Michael: Well, yeah, have you tried
if you tried that I'm just

589
00:32:38,040 --> 00:32:43,140
thinking actually then you've got
to have the Partition key in

590
00:32:43,140 --> 00:32:47,020
the Query and then suddenly you've
got another dimension there

591
00:32:47,860 --> 00:32:48,700
sounds tricky

592
00:32:49,980 --> 00:32:54,620
Nikolay: and and so what we need
different Index in this case

593
00:32:54,620 --> 00:32:55,120
right

594
00:32:55,260 --> 00:32:57,560
Michael: Well it just might not
then use the GIN.

595
00:32:57,560 --> 00:33:00,960
I've had trouble convincing the
planner to use GIN indexes in

596
00:33:00,960 --> 00:33:05,340
the past when like they've got
another choice and it doesn't

597
00:33:06,180 --> 00:33:10,120
I'm probably doing I'm maybe I'm
doing something wrong but I

598
00:33:10,120 --> 00:33:15,580
wanted to try and get a bitmap
scan like anding the Index like

599
00:33:15,580 --> 00:33:19,640
with a GIN Index as 1 bitmap Index
scan and let's say a B-tree

600
00:33:19,640 --> 00:33:20,640
Index is another.

601
00:33:20,640 --> 00:33:23,600
Nikolay: Sounds like the case for
B-tree GIN for me.

602
00:33:24,240 --> 00:33:28,100
Michael: Well a multi-column GIN
index with B-tree in it could

603
00:33:28,100 --> 00:33:31,160
have helped but it wouldn't have
been as like it wouldn't have

604
00:33:31,160 --> 00:33:36,380
been structured quite as like optimally
in my opinion so I was

605
00:33:36,380 --> 00:33:41,260
hoping to get this and list really
short and then a really like

606
00:33:41,600 --> 00:33:45,700
quick scan and I didn't have luck
with multi-column B-tree GIN

607
00:33:45,700 --> 00:33:48,900
index being as like as performant
as I thought we could get it

608
00:33:48,900 --> 00:33:50,820
with a bitmap scan.

609
00:33:51,620 --> 00:33:56,520
But the partitioning thing, I would
love to be wrong, but the

610
00:33:56,520 --> 00:33:59,940
partitioning thing sounds like
you've now got a GIN index per

611
00:33:59,960 --> 00:34:01,040
partition, right?

612
00:34:01,340 --> 00:34:04,540
So you've got to get to the right
partition so then it's like

613
00:34:06,200 --> 00:34:07,200
again if it's time

614
00:34:07,200 --> 00:34:11,180
Nikolay: partition should work
partition pruning should be in

615
00:34:11,180 --> 00:34:14,560
good shape of course but this is
general rule for partitioning

616
00:34:15,420 --> 00:34:18,260
Michael: yeah but so let's say
you're partitioning by time, it

617
00:34:18,260 --> 00:34:21,000
means we've got the time filter
on, right?

618
00:34:21,100 --> 00:34:21,600
Nikolay: Right.

619
00:34:22,120 --> 00:34:23,820
Michael: Is that the case?

620
00:34:24,160 --> 00:34:26,720
Like for full text search, for
example, are we going to be filtering

621
00:34:26,720 --> 00:34:27,460
by time?

622
00:34:27,800 --> 00:34:30,760
Nikolay: Well, if we're ordering
by time, as I, like For social

623
00:34:30,760 --> 00:34:32,660
media, again, we need fresh information.

624
00:34:32,900 --> 00:34:36,300
In most cases, this is the rule
compared to previous, like pre-previous

625
00:34:36,500 --> 00:34:38,440
era for Web 1.0.

626
00:34:39,320 --> 00:34:41,500
Michael: Well, unless there isn't
fresh information, then you

627
00:34:41,500 --> 00:34:43,320
need to go back and back and back,
right?

628
00:34:43,320 --> 00:34:44,040
Nikolay: Maybe, yes.

629
00:34:44,040 --> 00:34:45,300
Well, depends, depends.

630
00:34:45,300 --> 00:34:49,820
So my question is how we limit
how we order Like if we don't

631
00:34:49,820 --> 00:34:54,000
limit don't order This is found
fundamental question to this

632
00:34:54,000 --> 00:34:54,400
query.

633
00:34:54,400 --> 00:34:56,780
Why do we request unlimited?

634
00:34:57,540 --> 00:34:59,440
Results Right.

635
00:34:59,580 --> 00:35:01,020
This is this question.

636
00:35:01,020 --> 00:35:04,860
I raised very often during consulting
practice very often I see

637
00:35:04,860 --> 00:35:08,740
no limit or no order by and this
is the question in most cases

638
00:35:08,740 --> 00:35:12,180
people say oh, actually, yeah,
it makes sense to have pagination

639
00:35:12,280 --> 00:35:13,220
here, for example.

640
00:35:14,060 --> 00:35:15,600
Or pagination with GIN.

641
00:35:16,020 --> 00:35:16,520
Yeah.

642
00:35:16,640 --> 00:35:17,220
It's interesting.

643
00:35:17,220 --> 00:35:18,400
Michael: It's difficult, right?

644
00:35:18,400 --> 00:35:18,900
Nikolay: Yeah.

645
00:35:19,080 --> 00:35:24,360
Well, again, if we order by ID
or timestamp by scalar value,

646
00:35:24,780 --> 00:35:28,000
it's like key set pagination works,
and that's it.

647
00:35:28,660 --> 00:35:31,100
And GIN is just additional filtering
in this case.

648
00:35:31,640 --> 00:35:33,720
Michael: And we're kind of back
to where we started with your

649
00:35:33,720 --> 00:35:36,680
original comment that it works
until it doesn't.

650
00:35:37,720 --> 00:35:40,960
So in scale, it's fine to not have
these, it's fine to not have

651
00:35:40,960 --> 00:35:43,940
the limit and the order by because
your data set is small enough

652
00:35:43,940 --> 00:35:46,860
that your query response times
are pretty good.

653
00:35:47,020 --> 00:35:49,900
Nikolay: But we designed for the
future, not only for now, right?

654
00:35:49,900 --> 00:35:54,520
Because, yeah, so it's always worth
asking yourself how much

655
00:35:54,520 --> 00:35:58,300
data you expect in the future,
how much data you need to return

656
00:35:58,380 --> 00:36:02,980
with this query to your application
or frontend, I don't know.

657
00:36:03,060 --> 00:36:04,140
Do we need everything?

658
00:36:04,640 --> 00:36:07,760
I saw some case, interesting case
when people decided, intentionally

659
00:36:07,960 --> 00:36:12,100
decided to return everything and
then do a lot of work on frontend.

660
00:36:12,980 --> 00:36:17,220
I saw it many times actually, with
guys who decided that our

661
00:36:17,220 --> 00:36:21,540
frontend should be very thick,
and a lot of logic there, and

662
00:36:21,540 --> 00:36:22,980
it's better to cache everything.

663
00:36:23,260 --> 00:36:25,320
Like, let's load and then work.

664
00:36:25,320 --> 00:36:27,940
Well, in my opinion, it's a dead
end.

665
00:36:27,980 --> 00:36:30,840
Because if your project grows,
it's a dead end.

666
00:36:30,840 --> 00:36:33,020
You will experience poor performance.

667
00:36:33,340 --> 00:36:38,040
And other case was guys who decided
to perform analytics with

668
00:36:38,040 --> 00:36:42,940
R, so they also needed to load
everything and then build B-tree

669
00:36:43,040 --> 00:36:47,620
and so on and kind of group by
and so on was performed only in

670
00:36:47,620 --> 00:36:53,080
memory using R, which for me looks
like, why do you do this?

671
00:36:53,300 --> 00:36:55,060
Do it inside your database.

672
00:36:57,340 --> 00:37:04,240
Let me not forget, when pending
list is growing to the limit,

673
00:37:05,540 --> 00:37:07,260
it can exceed limits, right?

674
00:37:07,840 --> 00:37:08,540
Or no?

675
00:37:09,060 --> 00:37:10,820
Michael: Is it a soft limit or
no?

676
00:37:11,000 --> 00:37:11,880
I'm not sure.

677
00:37:12,040 --> 00:37:13,420
Nikolay: This is a good question.

678
00:37:13,520 --> 00:37:15,720
But anyway, maybe it was a bug
when it exceeded.

679
00:37:15,720 --> 00:37:16,320
I don't remember.

680
00:37:16,320 --> 00:37:20,740
I remember some parts of my memory
tell me that it happened in

681
00:37:20,740 --> 00:37:21,180
the past.

682
00:37:21,180 --> 00:37:23,000
I'm not sure it was a bug or not.

683
00:37:23,000 --> 00:37:27,340
But obviously, also, select performance
will be affected if pending

684
00:37:27,340 --> 00:37:29,080
list is huge.

685
00:37:29,280 --> 00:37:34,300
For example, imagine for specific
filters, we are unlucky and

686
00:37:34,300 --> 00:37:39,000
we need to process all like we
we need to deal with it and selects

687
00:37:39,000 --> 00:37:41,100
become slower so

688
00:37:41,100 --> 00:37:43,940
Michael: yeah so it's for it's
4 megabytes right the

689
00:37:43,940 --> 00:37:44,840
Nikolay: default I

690
00:37:45,140 --> 00:37:49,760
Michael: suspect I suspect it's
not too much of a penalty if

691
00:37:49,760 --> 00:37:53,880
you leave it at the 4MB, but a
tempting thing to do might be

692
00:37:53,880 --> 00:37:59,060
to increase that so that you hopefully
have autovacuum kick

693
00:37:59,060 --> 00:38:02,060
in and do the work in the background
instead of having a random

694
00:38:02,120 --> 00:38:03,460
insert paying the penalty.

695
00:38:03,720 --> 00:38:04,220
Nikolay: Right.

696
00:38:04,280 --> 00:38:08,240
Michael: But then once you've increased
the list, then each select

697
00:38:08,240 --> 00:38:12,600
has to look through more pending
entries, especially towards

698
00:38:12,600 --> 00:38:16,000
the end, and then you're paying
the price on select as well.

699
00:38:16,000 --> 00:38:20,140
So yes, I don't think I haven't
seen in fact, actually, we probably

700
00:38:20,140 --> 00:38:23,440
should mention there's a really
good write up about these kinds

701
00:38:23,440 --> 00:38:27,720
of issues by Lukas Fittl of pganalyze who also

702
00:38:27,720 --> 00:38:28,220
Nikolay: use

703
00:38:29,060 --> 00:38:30,180
Michael: the GitLab work.

704
00:38:30,460 --> 00:38:35,800
Nikolay: I think Lukas does a fantastic
job analyzing various Postgres

705
00:38:35,800 --> 00:38:36,300
cases.

706
00:38:36,340 --> 00:38:39,840
I think that's why his company
is called pganalyze, but this

707
00:38:39,840 --> 00:38:41,420
is your joke, not mine.

708
00:38:41,680 --> 00:38:43,200
Michael: You're stealing my jokes
now.

709
00:38:43,200 --> 00:38:43,700
Nikolay: Yes.

710
00:38:44,280 --> 00:38:47,380
Well, yeah, good job with PG analysis.

711
00:38:48,920 --> 00:38:52,640
And yeah, in GitLab's case, I was
slightly involved only, and

712
00:38:52,640 --> 00:38:56,260
I remember that case was a few
years ago and the result was a decision

713
00:38:56,260 --> 00:39:00,080
for specific indexes to switch
off fast update because the table

714
00:39:00,080 --> 00:39:01,740
was huge and so on, right

715
00:39:02,500 --> 00:39:05,400
Michael: yeah anyway so switching
off fast update that means

716
00:39:05,400 --> 00:39:07,500
we're not using the pending list
anymore.

717
00:39:07,640 --> 00:39:13,160
And every insert and update pays
the price at the time of insertion

718
00:39:13,200 --> 00:39:14,480
to update the index.

719
00:39:14,540 --> 00:39:18,420
And I think that's a really interesting
trade-off to make each

720
00:39:18,420 --> 00:39:23,360
insert a bit slower, but not have
to have occasional very slow

721
00:39:23,360 --> 00:39:23,860
ones.

722
00:39:24,060 --> 00:39:27,780
I like that a lot in terms of performance,
in terms of trade-off.

723
00:39:28,140 --> 00:39:31,200
Nikolay: Anyway, kudos to GitLab
for openness as usual, because

724
00:39:31,200 --> 00:39:34,640
all the details are visible to
the public, to the community, and it's

725
00:39:34,640 --> 00:39:40,040
super helpful for, like, for general
development of Postgres and

726
00:39:40,040 --> 00:39:40,680
so on.

727
00:39:40,680 --> 00:39:45,460
And kudos to Lukas for a very great
analysis of this Postgres

728
00:39:45,540 --> 00:39:46,040
case.

729
00:39:46,640 --> 00:39:47,080
Good.

730
00:39:47,080 --> 00:39:54,280
I think we touched a little bit
of some deeper waters but not

731
00:39:54,280 --> 00:39:55,080
super deep.

732
00:39:55,080 --> 00:39:58,940
If someone wants to read internals
Igor Rogov's articles and

733
00:39:59,480 --> 00:40:02,460
books are really great in this area.

734
00:40:03,740 --> 00:40:05,940
And yeah, I think that's it, right?

735
00:40:06,660 --> 00:40:07,700
Or anything else?

736
00:40:08,040 --> 00:40:11,280
Michael: Yeah, 1 last tiny tip
that I think is worth, you mentioned

737
00:40:11,280 --> 00:40:13,000
JSONBPathOps briefly.

738
00:40:13,840 --> 00:40:18,580
I think for most other data types,
the defaults are the only

739
00:40:18,660 --> 00:40:24,880
operator class, but for JSONB,
the default is actually JSONBOps,

740
00:40:25,120 --> 00:40:28,600
not JSONBPathOps, and you can get
a performance boost with JSONBPathOps

741
00:40:29,060 --> 00:40:32,540
if you don't need set and op, if
you're just using the usual JSON,

742
00:40:32,720 --> 00:40:35,320
the usual contains operators.

743
00:40:36,180 --> 00:40:39,300
Nikolay: Final words, GIN is a
very important index type.

744
00:40:39,520 --> 00:40:41,260
It's not going to disappear.

745
00:40:41,260 --> 00:40:43,020
It's going to be used heavily.

746
00:40:44,440 --> 00:40:50,900
It's 1 of the strengths of Postgres,
like, rich set of index

747
00:40:50,900 --> 00:40:51,400
types.

748
00:40:52,200 --> 00:40:53,160
Yeah, for sure.

749
00:40:53,220 --> 00:40:53,450
Good.

750
00:40:53,450 --> 00:40:55,220
Okay, So thank you.

751
00:40:55,440 --> 00:40:56,320
See you next time.

752
00:40:56,320 --> 00:40:57,320
Michael: Thanks so much, Nikolay.

753
00:40:57,320 --> 00:40:58,140
Catch you next week.

754
00:40:58,140 --> 00:40:58,500
Bye.