1
00:00:00,060 --> 00:00:03,480
Michael: Hello and welcome to PostgresFM episode 75.

2
00:00:04,300 --> 00:00:06,720
This is a weekly show about all things Postgres.

3
00:00:06,760 --> 00:00:08,420
I'm Michael, founder of pgMustard.

4
00:00:08,420 --> 00:00:11,080
This is my co-host Nikolay, founder of Postgres.ai.

5
00:00:11,280 --> 00:00:13,460
Hello Nikolay, what are we talking about today?

6
00:00:13,860 --> 00:00:15,080
Nikolay: Hi Michael, constraints.

7
00:00:16,160 --> 00:00:16,660
Michael: Yeah.

8
00:00:17,220 --> 00:00:21,720
And specifically the 6 DDL constraints that Postgres supports.

9
00:00:21,960 --> 00:00:23,100
Nikolay: Yeah, all of them.

10
00:00:23,160 --> 00:00:24,360
Michael: We're back to basics.

11
00:00:24,480 --> 00:00:25,680
I love this kind of thing.

12
00:00:25,680 --> 00:00:26,520
So nice choice.

13
00:00:26,520 --> 00:00:27,240
Thank you.

14
00:00:27,980 --> 00:00:29,240
Nikolay: Ah, it was my choice.

15
00:00:29,440 --> 00:00:34,060
I'm writing this Postgres Marathon series of how-tos and considering

16
00:00:34,280 --> 00:00:37,000
how to create various constraints without downtime.

17
00:00:37,840 --> 00:00:42,540
Already covered checks and foreign keys and something else.

18
00:00:43,300 --> 00:00:47,540
Saying this, those who follow me closely now understand when

19
00:00:47,540 --> 00:00:49,740
exactly we record this podcast, right?

20
00:00:49,860 --> 00:00:54,060
Okay, because I read these how-tos every day.

21
00:00:54,240 --> 00:00:58,880
So, about constraints in the same order as the documentation describes

22
00:00:58,900 --> 00:01:04,440
them. But also let's talk about practical complications when

23
00:01:04,440 --> 00:01:07,620
you have a lot of data and a lot of TPS as usual, right?

24
00:01:08,240 --> 00:01:11,920
Because the documentation actually doesn't cover these topics.

25
00:01:12,380 --> 00:01:12,820
Michael: Yeah.

26
00:01:12,820 --> 00:01:16,080
Another thing I don't think the documentation covers is why do

27
00:01:16,080 --> 00:01:16,980
we have constraints?

28
00:01:17,360 --> 00:01:18,820
I guess it's so obvious.

29
00:01:18,820 --> 00:01:21,880
But I think it's worth mentioning that without these, we'd be

30
00:01:21,880 --> 00:01:24,220
in real trouble database-wise, wouldn't we?

31
00:01:24,920 --> 00:01:25,640
Nikolay: Well, yes.

32
00:01:25,640 --> 00:01:26,420
So, constraint.

33
00:01:27,340 --> 00:01:29,180
So we have a schema, right?

34
00:01:29,180 --> 00:01:32,700
And without schema, it's not good to live without a schema.

35
00:01:35,200 --> 00:01:37,500
Let's say no to NoSQL in general.

36
00:01:37,800 --> 00:01:41,180
Maybe in some cases, it's fine, but if it's financial data and

37
00:01:41,180 --> 00:01:45,940
so on, you need structure to ensure that data has good quality.

38
00:01:46,320 --> 00:01:48,940
And constraints is the next step.

39
00:01:49,400 --> 00:01:54,800
So you have a schema, so you define column names, data types, and

40
00:01:54,800 --> 00:01:59,060
constraints is an addition to all this to ensure even better quality

41
00:01:59,060 --> 00:01:59,760
of data.

42
00:02:00,060 --> 00:02:04,760
For example, you say no one can create more than two rows with the

43
00:02:04,760 --> 00:02:06,180
same value in this column.

44
00:02:06,580 --> 00:02:09,400
For example, email, and this is an interesting situation because

45
00:02:09,400 --> 00:02:11,740
usually people forget about case sensitivity,

46
00:02:12,500 --> 00:02:13,000
Michael: Right?

47
00:02:13,740 --> 00:02:14,240
Nikolay: Yeah.

48
00:02:15,140 --> 00:02:16,600
Of text or varchar.

49
00:02:17,220 --> 00:02:22,060
And then you say if there is a row in this table referencing

50
00:02:23,100 --> 00:02:24,500
for example user id.

51
00:02:24,520 --> 00:02:28,600
It means that such user should
exist and so on and so forth.

52
00:02:28,600 --> 00:02:31,740
So it's all about data
quality but also sometimes

53
00:02:31,800 --> 00:02:35,420
for foreign key constraints.

54
00:02:36,820 --> 00:02:38,720
It also provides some automation.

55
00:02:40,520 --> 00:02:41,600
I would say limited.

56
00:02:42,060 --> 00:02:46,600
If you have automation for
handling, for example, deletes

57
00:02:46,780 --> 00:02:51,420
should dependent rows in the dependent
table be deleted

58
00:02:51,500 --> 00:02:52,220
or not?

59
00:02:53,000 --> 00:02:55,020
If the main row is deleted.

60
00:02:55,380 --> 00:02:59,220
But this should be used with care
if you have a million rows

61
00:02:59,220 --> 00:02:59,720
dependent.

62
00:03:00,940 --> 00:03:04,280
Deleting 1 row might take a lot
of time and this is also not

63
00:03:04,280 --> 00:03:04,780
good.

64
00:03:04,820 --> 00:03:05,320
Right?

65
00:03:06,680 --> 00:03:10,620
Michael: Yeah, I feel like we've
already dived into a few specifics

66
00:03:10,680 --> 00:03:13,260
around unique constraints and foreign
key constraints there.

67
00:03:13,260 --> 00:03:14,155
But let's go.

68
00:03:14,155 --> 00:03:14,860
Yeah, I think you're right.

69
00:03:14,860 --> 00:03:17,180
I think the Postgres documentation
does cover them in a really

70
00:03:17,180 --> 00:03:17,560
good order.

71
00:03:17,560 --> 00:03:22,400
And it starts with check constraints,
which are super flexible

72
00:03:23,560 --> 00:03:26,020
user-defined constraints, almost,
I'd say.

73
00:03:26,200 --> 00:03:27,260
Is that a fair description?

74
00:03:27,260 --> 00:03:30,360
Where we can choose, I think it
describes them as a boolean condition.

75
00:03:30,360 --> 00:03:34,320
So it's a check that returns true
or false for each row that's

76
00:03:34,320 --> 00:03:36,240
inserted or updated.

77
00:03:37,140 --> 00:03:41,620
Nikolay: Yeah, check constraint
is usually very, how to say,

78
00:03:41,680 --> 00:03:42,180
underappreciated.

79
00:03:43,500 --> 00:03:45,560
It's underused in my opinion.

80
00:03:45,720 --> 00:03:50,200
And unfortunately, there you can
define only some expression

81
00:03:50,340 --> 00:03:51,540
related to this table.

82
00:03:51,540 --> 00:03:56,100
You cannot involve different tables
and subqueries and so on.

83
00:03:56,100 --> 00:04:00,560
But it's so it limits the expressive
power of it.

84
00:04:00,800 --> 00:04:04,300
Michael: Yeah, so we can define it like we can have a check

85
00:04:04,300 --> 00:04:07,920
constraint on a column or we can
have it on multiple columns.

86
00:04:08,000 --> 00:04:09,640
Nikolay: Multiple columns is okay.

87
00:04:09,640 --> 00:04:13,340
You can say, for example, I don't
know, like the sum of these 2 columns

88
00:04:13,340 --> 00:04:14,880
should be positive or something.

89
00:04:15,060 --> 00:04:15,920
Some crazy stuff.

90
00:04:15,920 --> 00:04:17,120
It's possible, definitely.

91
00:04:18,420 --> 00:04:19,500
And this is good.

92
00:04:19,840 --> 00:04:23,980
I mean, for example, if you say,
if you want to say this is integer,

93
00:04:24,440 --> 00:04:28,600
but it should always be odd or
even or something like that, right?

94
00:04:28,660 --> 00:04:33,000
You just define the constraint
that will be checked, if you try to insert

95
00:04:33,080 --> 00:04:36,020
something which violates this constraint,
you will get an error.

96
00:04:36,020 --> 00:04:39,520
And this is how you can achieve
better data quality.

97
00:04:39,960 --> 00:04:40,460
Michael: Yeah.

98
00:04:40,680 --> 00:04:44,380
When you say it's underused or
underappreciated, what are the

99
00:04:44,380 --> 00:04:46,740
kind of typical cases you see for
people?

100
00:04:47,440 --> 00:04:52,780
Nikolay: Typical cases, people
rely on ORM and perform all checks

101
00:04:52,780 --> 00:04:53,280
on...

102
00:04:54,720 --> 00:04:59,300
So, usual, like, 3 parts of architecture,
front-end, back-end,

103
00:04:59,300 --> 00:05:02,740
and database, usually people start
with front-end when they realize.

104
00:05:03,240 --> 00:05:07,500
And this is fair because you should
check a lot of stuff, including

105
00:05:07,500 --> 00:05:11,120
all constraints should be checked
on front to minimize feedback

106
00:05:11,120 --> 00:05:11,420
loop.

107
00:05:11,420 --> 00:05:14,380
Users should quickly see that something
is not wrong.

108
00:05:14,380 --> 00:05:18,740
Ideally before they make an action,
For example, filling some

109
00:05:18,740 --> 00:05:23,980
form, I would prefer seeing constraint
violated and an explanation

110
00:05:24,240 --> 00:05:26,920
how to fix it before I press submit.

111
00:05:26,920 --> 00:05:30,960
It's not good to press submit,
wait some time, and then have

112
00:05:30,960 --> 00:05:31,560
some retries.

113
00:05:31,560 --> 00:05:32,660
It's very annoying.

114
00:05:32,840 --> 00:05:37,200
So constraint checks on frontend
make sense a lot.

115
00:05:37,600 --> 00:05:41,380
But then people, if they use ORM,
they prefer checking it in

116
00:05:41,380 --> 00:05:45,260
code because it's more flexible,
and those who write the logic

117
00:05:46,080 --> 00:05:50,740
for Python, Ruby, anything, Java,
they prefer writing it right

118
00:05:50,740 --> 00:05:52,800
there because it's their favorite
language.

119
00:05:53,440 --> 00:05:57,660
But the thing is that if a company
grows, the project grows, and

120
00:05:57,660 --> 00:06:02,140
then you start having different
users or applications, for example,

121
00:06:02,180 --> 00:06:07,020
someone directly works with data
in some good UI, or you have

122
00:06:07,020 --> 00:06:09,720
another application written in
different code, or in the same

123
00:06:09,720 --> 00:06:12,180
code, but skipping these checks.

124
00:06:12,660 --> 00:06:17,380
Implementation of constraints in
application code is weak because

125
00:06:17,380 --> 00:06:18,300
it's not guaranteed.

126
00:06:18,340 --> 00:06:19,860
Only the database can guarantee it.

127
00:06:19,860 --> 00:06:23,160
That's why check constraints or
other types of constraints are

128
00:06:23,160 --> 00:06:28,460
good to have in the database because
it's the safest way to safeguard,

129
00:06:28,660 --> 00:06:29,040
right?

130
00:06:29,040 --> 00:06:33,840
So you're on the safe side and
nobody will violate it unless

131
00:06:33,840 --> 00:06:34,540
there is a bug.

132
00:06:34,540 --> 00:06:38,100
Sometimes I saw some bugs and a unique
constraint violation happened.

133
00:06:38,420 --> 00:06:43,180
It's good that, for example, with uncheck
soon we will have a unique

134
00:06:43,180 --> 00:06:45,080
constraint corruption check.

135
00:06:45,180 --> 00:06:45,680
Michael: Nice.

136
00:06:45,940 --> 00:06:47,860
Nikolay: Yeah, but it's a different
story.

137
00:06:47,900 --> 00:06:51,820
So check constraint, you just say
the expression always should

138
00:06:51,820 --> 00:06:53,300
be followed, right?

139
00:06:53,300 --> 00:06:54,800
It should always return true.

140
00:06:54,800 --> 00:07:00,900
If it turns false, such an insert
or such an update should be discarded.

141
00:07:01,860 --> 00:07:02,360
Rollback.

142
00:07:02,860 --> 00:07:03,360
Right.

143
00:07:04,340 --> 00:07:08,100
Michael: And what, is it worth
us discussing what to do or how

144
00:07:08,100 --> 00:07:09,620
to add one retro?

145
00:07:09,620 --> 00:07:13,420
Let's say you listen to the podcast
and you realize you should

146
00:07:13,420 --> 00:07:16,240
have some of these in place but
you don't and they're on quite

147
00:07:16,240 --> 00:07:17,060
large tables.

148
00:07:17,500 --> 00:07:18,980
How would you go about adding?

149
00:07:19,540 --> 00:07:22,400
Nikolay: Yeah, actually let me
finish about this consideration

150
00:07:22,480 --> 00:07:25,960
about front-end, back-end, and database
and relationships between

151
00:07:25,960 --> 00:07:26,320
them.

152
00:07:26,320 --> 00:07:31,880
I remember in 2004, I implemented
what's called MVC, right?

153
00:07:31,880 --> 00:07:32,720
Model View Controller.

154
00:07:32,720 --> 00:07:37,960
A very old architecture, maybe
not cool anymore, I don't know.

155
00:07:37,960 --> 00:07:41,020
But what I implemented there, I
implemented a simple thing.

156
00:07:41,320 --> 00:07:45,420
So we define constraints on Postgres
and then we, at bootstrap

157
00:07:46,180 --> 00:07:51,520
of our application, we analyze
all constraints and build logic

158
00:07:51,540 --> 00:07:56,280
to inject it to form and also with
digital signature because

159
00:07:56,280 --> 00:07:57,180
forms can be...

160
00:07:57,180 --> 00:07:59,020
It was long ago, but it's interesting.

161
00:07:59,160 --> 00:08:03,080
Forms, frontend followed constraints
from the database.

162
00:08:03,080 --> 00:08:03,860
It was cool.

163
00:08:03,860 --> 00:08:07,320
I think it's an interesting idea,
probably it should be also

164
00:08:07,640 --> 00:08:09,240
rediscovered, I don't know.

165
00:08:09,240 --> 00:08:13,380
So you just define constraints
where they should be defined in

166
00:08:13,380 --> 00:08:17,020
the database, but then the frontend follows
exactly the same logic and

167
00:08:17,020 --> 00:08:19,760
you don't need to implement it
twice because if you implement

168
00:08:19,760 --> 00:08:23,040
it twice, you will have bugs, different
logic, right?

169
00:08:23,500 --> 00:08:26,320
Michael: Yeah, well, and I've just
realized there's another reason

170
00:08:26,320 --> 00:08:28,700
which is handling concurrent sessions.

171
00:08:28,840 --> 00:08:30,400
So you might have a constraint.

172
00:08:31,100 --> 00:08:34,640
Let's say you have like an amount
of stock of an item and

173
00:08:34,640 --> 00:08:37,600
it needs to not go below 0 or an
account balance that needs to

174
00:08:37,600 --> 00:08:38,800
not go below 0.

175
00:08:38,860 --> 00:08:42,660
If you have concurrent transactions,
you need them at the database

176
00:08:42,700 --> 00:08:46,720
level to make sure you don't end
up letting a user take out more

177
00:08:46,720 --> 00:08:47,640
money than they have.

178
00:08:47,640 --> 00:08:48,120
Nikolay: Exactly.

179
00:08:48,120 --> 00:08:53,660
You cannot check if such a row exists
and then make a decision

180
00:08:54,720 --> 00:08:55,680
outside of the database.

181
00:08:55,680 --> 00:08:57,440
You need to make a decision inside
the database.

182
00:08:57,440 --> 00:08:57,940
Yes.

183
00:08:58,620 --> 00:09:03,080
But also, what I'm talking about
is having main constraints should

184
00:09:03,080 --> 00:09:08,200
be in the database, but you can mirror
them back and front, and you

185
00:09:08,200 --> 00:09:12,940
can have automation, and it's good
if somebody who develops ORMs

186
00:09:13,320 --> 00:09:18,680
or GraphQL or something would follow
this approach, considering

187
00:09:18,740 --> 00:09:21,240
database side constraints as the
main one.

188
00:09:22,660 --> 00:09:27,260
So, check constraints are not only
very flexible, not super flexible,

189
00:09:27,260 --> 00:09:31,300
but quite flexible, but it also
has this beautiful option to

190
00:09:31,300 --> 00:09:38,300
be created in an online fashion,
so to speak.

191
00:09:38,400 --> 00:09:39,140
Yeah, not valid.

192
00:09:39,140 --> 00:09:43,070
So you say not valid, quite a confusing
term.

193
00:09:43,070 --> 00:09:43,760
Michael: Yes, very.

194
00:09:44,380 --> 00:09:48,260
Nikolay: Yes, so things to remember.

195
00:09:48,340 --> 00:09:51,420
When you create something not valid,
it means that it's already

196
00:09:51,940 --> 00:09:54,260
being validated for all new writes.

197
00:09:54,840 --> 00:09:56,140
So this is super confusing.

198
00:09:56,680 --> 00:10:00,000
Michael: So it's kind of not validated
on existing data.

199
00:10:02,260 --> 00:10:05,820
Nikolay: So you cannot create not
valid constraint and then insert

200
00:10:05,860 --> 00:10:07,500
something that violates it.

201
00:10:08,400 --> 00:10:11,540
This write will provide an error.

202
00:10:12,280 --> 00:10:16,440
But what it does is add a not valid
flag when it creates a check

203
00:10:16,440 --> 00:10:16,940
constraint.

204
00:10:17,220 --> 00:10:22,160
It just triggers a long-lasting operation
of a full table scan to

205
00:10:22,260 --> 00:10:26,680
check that all existing rows follow
this logic of the constraint.

206
00:10:26,820 --> 00:10:30,280
Michael: Which is the default behavior
when you add a new constraint.

207
00:10:30,780 --> 00:10:33,680
Nikolay: So if you don't think
about it and just try to create

208
00:10:33,680 --> 00:10:38,300
a check constraint for an existing
large table, it will block

209
00:10:38,480 --> 00:10:42,260
DDL and DML, everything basically,
to this table, and it's not

210
00:10:42,260 --> 00:10:42,760
fun.

211
00:10:43,080 --> 00:10:47,460
So what you should do, if you want
to do it without partial downtime,

212
00:10:48,640 --> 00:10:50,580
is to do it in an online fashion.

213
00:10:50,920 --> 00:10:54,320
You create, so 3 steps, not 2,
3 actually.

214
00:10:54,720 --> 00:10:56,700
My how-to yesterday was not full.

215
00:10:56,920 --> 00:11:00,900
So first you create it with the not
valid flag.

216
00:11:01,240 --> 00:11:06,920
Second, you understand that all
new writes are already being verified

217
00:11:07,500 --> 00:11:08,860
automatically by Postgres.

218
00:11:09,320 --> 00:11:11,780
You yourself take care of existing
rows.

219
00:11:12,380 --> 00:11:15,540
You check existing rows are okay
with this constraint.

220
00:11:15,940 --> 00:11:22,060
Just with simple selects and if
you find some rows that violate

221
00:11:22,080 --> 00:11:25,640
it, you probably want to delete
them or update them to adjust

222
00:11:25,640 --> 00:11:29,680
the values depending on your application
logic or on your logic.

223
00:11:30,240 --> 00:11:34,840
And then only then the third step,
alter table validate constraint,

224
00:11:34,980 --> 00:11:40,380
which will scan the whole table but
this step won't acquire locks

225
00:11:40,380 --> 00:11:42,540
that would block your DML.

226
00:11:43,940 --> 00:11:47,160
It will block only DDL, but hopefully,
you don't alter during

227
00:11:47,160 --> 00:11:47,660
this.

228
00:11:47,980 --> 00:11:49,620
You don't issue any DDLs.

229
00:11:49,900 --> 00:11:52,540
Michael: That middle step's nice,
and you can even do it in batches,

230
00:11:52,540 --> 00:11:54,660
I guess, if you want or need to.

231
00:11:54,700 --> 00:11:56,840
But I guess it shouldn't be a big
deal.

232
00:11:56,840 --> 00:11:57,540
Nikolay: Yeah, depending.

233
00:11:57,600 --> 00:12:01,960
But maybe scanning the whole table can also
be fine, because it's just select,

234
00:12:01,960 --> 00:12:02,460
right?

235
00:12:02,780 --> 00:12:06,300
Well, if you update, yes, in batches,
if you found many millions

236
00:12:06,300 --> 00:12:07,540
of rows that are violated.

237
00:12:08,500 --> 00:12:12,660
But this is, it depends, but this
three-step approach is very,

238
00:12:12,660 --> 00:12:17,180
like, universal, zero downtime approach
and it's great.

239
00:12:17,580 --> 00:12:20,320
Michael: Well, what's the downside
of jumping straight to step

240
00:12:20,320 --> 00:12:20,820
3?

241
00:12:20,820 --> 00:12:24,280
Because you're kind of doing that
if you think your data is fine.

242
00:12:24,280 --> 00:12:27,720
I guess is it in the real world
you're most likely to have some...

243
00:12:27,720 --> 00:12:31,400
Nikolay: Jump, if you want, you
just need to accept this risk

244
00:12:31,400 --> 00:12:32,540
and that's it, of course.

245
00:12:32,540 --> 00:12:36,880
If you are 100% sure, step 2 is
optional, let's say.

246
00:12:37,280 --> 00:12:42,480
But also, as usual, if you issue
an alter with the not valid, you also

247
00:12:42,480 --> 00:12:45,380
need to set lock_timeout and retry.

248
00:12:45,920 --> 00:12:50,260
Because you still need to change
the metadata of the table.

249
00:12:50,900 --> 00:12:54,560
And if, for example, Autovacuum
is running its transaction ID wraparound

250
00:12:54,560 --> 00:12:58,980
prevention mode, processing your
table, you won't be able to

251
00:12:58,980 --> 00:13:03,100
acquire a lock and without lock
timeout and retries logic,

252
00:13:03,240 --> 00:13:08,800
you will start waiting and again,
in this case, you will block

253
00:13:08,800 --> 00:13:11,540
everyone, even if it's not valid,
it's not good.

254
00:13:11,540 --> 00:13:16,220
So retries are needed and graceful
alter needed.

255
00:13:17,300 --> 00:13:21,460
I wish there was such an option like
graceful and you say like how

256
00:13:21,460 --> 00:13:24,860
many retries and how long timeout
for a particular operation should

257
00:13:24,860 --> 00:13:25,240
be.

258
00:13:25,240 --> 00:13:28,400
Michael: That would be a nice word,
like instead of concurrently

259
00:13:28,440 --> 00:13:29,440
it could be like gracefully.

260
00:13:30,920 --> 00:13:32,120
Alter table gracefully.

261
00:13:32,640 --> 00:13:34,200
Nikolay: Or concurrently, something
like that.

262
00:13:34,200 --> 00:13:38,540
Because I think 99% don't think
about it until they have many

263
00:13:38,540 --> 00:13:41,940
thousand TPS and then they realize
some basic operation.

264
00:13:42,580 --> 00:13:44,600
It was always working fine.

265
00:13:44,600 --> 00:13:49,020
Sometimes probably not, but people
like, you know, okay, we had

266
00:13:49,020 --> 00:13:52,920
an issue lasting 30 seconds, something
was not right, but it's

267
00:13:52,920 --> 00:13:53,420
okay.

268
00:13:53,520 --> 00:13:55,200
And then we don't understand why,
right?

269
00:13:55,200 --> 00:13:59,620
I mean, you blocked everyone dealing
with this table for 30 seconds,

270
00:13:59,620 --> 00:14:04,940
for example, But kind of fine,
and we live with it until it starts

271
00:14:05,280 --> 00:14:09,280
annoying you too much, and then
you realize that you need to

272
00:14:09,280 --> 00:14:10,740
lower lock timeout and retries.

273
00:14:11,280 --> 00:14:15,400
I mean, it requires effort,
unfortunately, to have this, right?

274
00:14:15,820 --> 00:14:20,140
And if you're small, you don't
care, but I wish it would be easier,

275
00:14:20,140 --> 00:14:23,140
like something similar to create
indexes concurrently or refresh

276
00:14:23,140 --> 00:14:24,220
materialized view concurrently.

277
00:14:24,960 --> 00:14:31,080
So also, when you validate, I think
if, for example, a DDL-like

278
00:14:31,320 --> 00:14:34,500
kind of create index concurrently
is happening or autovacuum

279
00:14:34,500 --> 00:14:37,020
processing, you won't be able to
acquire this lock, so you need

280
00:14:37,020 --> 00:14:38,000
also to be careful.

281
00:14:38,000 --> 00:14:42,800
But in general, if there you start
waiting, it's kind of fine.

282
00:14:42,800 --> 00:14:49,400
It just makes your operation longer,
but at least no DML transactions

283
00:14:50,660 --> 00:14:54,640
performing DML operations are behind
you in line, right?

284
00:14:54,960 --> 00:14:58,740
So I mean, this is also an issue
with this final step, but it's

285
00:14:58,740 --> 00:15:04,040
not so harmful as in the first
step when you need to instantly

286
00:15:04,040 --> 00:15:06,720
inject this constraint
with a not valid flag.

287
00:15:06,980 --> 00:15:08,800
Yeah, so I think we covered it,
right?

288
00:15:08,800 --> 00:15:10,740
So let's move on.

289
00:15:11,760 --> 00:15:13,520
Michael: The next one in the documentation
is

290
00:15:13,520 --> 00:15:13,650
Nikolay: not null.

291
00:15:13,650 --> 00:15:16,960
I think this should be hidden behind
some, I don't know, concurrently

292
00:15:17,020 --> 00:15:17,660
or gracefully.

293
00:15:18,080 --> 00:15:18,980
That would
Michael: be awesome.

294
00:15:20,800 --> 00:15:24,360
The next one in the documentation
is the not null constraint, which

295
00:15:24,360 --> 00:15:27,980
the documentation points out is
probably one of the ones people

296
00:15:27,980 --> 00:15:30,720
are most familiar with seeing in
schema definitions.

297
00:15:32,040 --> 00:15:35,280
But it's just a special case of
a check constraint, which I hadn't

298
00:15:35,280 --> 00:15:36,060
thought of before.

299
00:15:36,220 --> 00:15:36,580
Nikolay: Right.

300
00:15:36,580 --> 00:15:40,980
But unfortunately, you cannot,
well, you can already, but if

301
00:15:40,980 --> 00:15:44,680
you have check constraint NOT NULL,
saying this column is not

302
00:15:44,680 --> 00:15:48,580
null, like logically it's the same
as the standard NOT NULL.

303
00:15:49,620 --> 00:15:50,120
Same.

304
00:15:50,580 --> 00:15:53,580
But primary key needs the latter,
right?

305
00:15:54,020 --> 00:15:56,240
It cannot use...

306
00:15:57,380 --> 00:15:58,220
But it can.

307
00:15:58,480 --> 00:16:02,420
Since Postgres 12, if you don't
have NOT NULL constraint and

308
00:16:02,420 --> 00:16:06,300
you define primary key or redefine
it, it will try to implicitly

309
00:16:06,460 --> 00:16:07,700
create NOT NULL constraint.

310
00:16:07,800 --> 00:16:10,520
But since Postgres 12, when you
create NOT NULL constraint and

311
00:16:10,520 --> 00:16:13,040
you already have check, it's
NOT NULL.

312
00:16:13,280 --> 00:16:16,820
It will just reuse it, skipping
full table scan, which is very

313
00:16:16,820 --> 00:16:17,620
good optimization.

314
00:16:18,180 --> 00:16:22,400
So you just create check constraint
in this three-phase or two-phase

315
00:16:22,880 --> 00:16:24,760
approach, as we just discussed.

316
00:16:25,160 --> 00:16:29,040
And then you can rely on it when
you're creating primary key,

317
00:16:29,040 --> 00:16:29,720
for example.

318
00:16:30,140 --> 00:16:30,320
Michael: Or

319
00:16:30,320 --> 00:16:33,420
Nikolay: you can define NOT NULL
constraint explicitly if you

320
00:16:33,420 --> 00:16:36,560
need it for primary key or any
other, I don't know, maybe your

321
00:16:36,560 --> 00:16:39,120
application wants regular NOT NULL.

322
00:16:39,520 --> 00:16:41,820
Relying on existing check is NOT NULL.

323
00:16:41,820 --> 00:16:46,080
And then you can drop check and
NOT NULL is still there and you

324
00:16:46,080 --> 00:16:49,440
skipped this unfortunate full table
scan.

325
00:16:49,440 --> 00:16:53,240
While it's like I'm telling this
because NOT NOW, creation of

326
00:16:53,240 --> 00:16:58,000
NOT NOW itself, Postgres doesn't
support 3-step or 2-step approach.

327
00:16:58,380 --> 00:17:01,820
If you want to create NOT NOW right
away on existing table, existing

328
00:17:01,820 --> 00:17:04,340
column, it will need to scan whole
table.

329
00:17:04,540 --> 00:17:06,880
Michael: So we don't have NOT NULL,
NOT VALID.

330
00:17:07,060 --> 00:17:07,540
Yeah.

331
00:17:07,540 --> 00:17:11,420
Nikolay: So yeah, this is like
some nuances to keep in mind.

332
00:17:12,620 --> 00:17:18,140
Generally, my recommendation is
to think more about Check Constraints.

333
00:17:18,340 --> 00:17:20,140
This is why I say they are underappreciated.

334
00:17:20,640 --> 00:17:25,380
They are good and they, you see,
here they support NOT NULL constraint

335
00:17:25,380 --> 00:17:25,880
creation.

336
00:17:26,280 --> 00:17:27,740
Since Postgres 12, not before.

337
00:17:27,740 --> 00:17:30,340
But it means all currently supported
Postgres versions.

338
00:17:30,340 --> 00:17:31,040
Michael: Yeah, true.

339
00:17:31,400 --> 00:17:33,060
Nikolay: 12 is already the oldest.

340
00:17:35,140 --> 00:17:36,140
Michael: Let's move on.

341
00:17:36,780 --> 00:17:37,760
Nikolay: Yeah, unique constraint.

342
00:17:38,180 --> 00:17:40,020
Unique constraint, this is interesting.

343
00:17:40,240 --> 00:17:43,880
I don't know how much detail we
should cover here.

344
00:17:45,100 --> 00:17:50,240
Before our recording, we discussed
the case I discovered in 2017

345
00:17:50,820 --> 00:17:52,540
and still saw in Postgres 16.

346
00:17:52,540 --> 00:17:56,420
So, unique constraint, physically
it relies on unique index,

347
00:17:56,840 --> 00:17:59,020
but it's a kind of implementation
detail.

348
00:17:59,020 --> 00:18:01,560
You can say I want a unique constraint
and Postgres will create

349
00:18:01,560 --> 00:18:02,700
unique index implicitly.

350
00:18:04,000 --> 00:18:07,000
And it's good that it can be done
concurrently, of course, right?

351
00:18:07,000 --> 00:18:10,300
Because indexes can be created
concurrently, which is good.

352
00:18:10,580 --> 00:18:11,700
That's it, basically.

353
00:18:11,800 --> 00:18:16,220
You create a constraint, but I think
you can say using, right?

354
00:18:17,040 --> 00:18:20,080
Create unique constraint using
some index if the index already

355
00:18:20,080 --> 00:18:20,580
exists.

356
00:18:22,120 --> 00:18:22,840
Or what?

357
00:18:24,340 --> 00:18:26,200
Yeah, I don't remember in detail.

358
00:18:26,200 --> 00:18:31,600
But what I do know is that although
unique constraint relies

359
00:18:31,800 --> 00:18:36,720
on an index, unique index, it's not
absolutely the same.

360
00:18:38,000 --> 00:18:41,400
Logically, again, it should be
kind of the same, but you can

361
00:18:41,400 --> 00:18:44,080
have an index without a constraint,
not vice versa.

362
00:18:44,340 --> 00:18:47,460
You cannot have a unique constraint
without a unique index because

363
00:18:47,480 --> 00:18:51,140
Postgres needs a unique index to
support validation checks.

364
00:18:52,740 --> 00:18:57,180
So imagine we created a unique index,
but we haven't created a unique

365
00:18:57,180 --> 00:18:57,680
constraint.

366
00:18:58,400 --> 00:19:05,520
One place where you can see a constraint
is needed is insert on conflict,

367
00:19:05,640 --> 00:19:06,140
right?

368
00:19:06,760 --> 00:19:08,420
If I'm not mistaken, right?

369
00:19:08,420 --> 00:19:09,480
Michael: Yeah, I think so.

370
00:19:09,800 --> 00:19:10,120
Nikolay: Yeah.

371
00:19:10,120 --> 00:19:13,180
So on conflict requires a constraint
to be present.

372
00:19:13,180 --> 00:19:18,000
And if you say you have a conflict,
like for example, do nothing,

373
00:19:18,340 --> 00:19:23,720
you cannot, if you have an index without
a constraint, you cannot

374
00:19:23,720 --> 00:19:28,180
say insert blah blah on conflict
on this constraint because the constraint

375
00:19:28,180 --> 00:19:30,600
does not exist, it will tell you
explicitly the constraint does not

376
00:19:30,600 --> 00:19:31,100
exist.

377
00:19:31,560 --> 00:19:35,120
But at the same time, Postgres
has an interesting, I think it's

378
00:19:35,120 --> 00:19:40,280
a logical bug, still not fixed, and
I reported it in 2017, and

379
00:19:40,280 --> 00:19:42,840
today I checked in Postgres 16,
it has it still.

380
00:19:42,880 --> 00:19:46,500
If you try to insert multiple rows,
it will explicitly say

381
00:19:46,500 --> 00:19:50,420
that the constraint, and it will use
the index name, is violated.

382
00:19:51,100 --> 00:19:54,280
So in one case it says there is no
such constraint, and in another

383
00:19:54,280 --> 00:19:57,780
case it says this constraint exactly
with the same name is violated.

384
00:19:58,140 --> 00:19:59,560
Okay, so inconsistency.

385
00:20:00,060 --> 00:20:03,380
I think it's just a bug that needs
to be fixed and that's it.

386
00:20:03,480 --> 00:20:04,060
Michael: That's funny.

387
00:20:04,060 --> 00:20:06,240
I'll link up the bug report as
well.

388
00:20:06,240 --> 00:20:09,140
Nikolay: Yeah, but honestly, from
a user perspective, I think

389
00:20:09,140 --> 00:20:12,520
it would be good to stop thinking
about unique constraints and

390
00:20:12,520 --> 00:20:15,240
unique indexes as something very
different.

391
00:20:15,480 --> 00:20:20,140
I cannot imagine the case when
we, like, they should go together,

392
00:20:20,140 --> 00:20:20,780
I think.

393
00:20:20,820 --> 00:20:24,160
I cannot imagine, like, we have
a unique index but why don't we

394
00:20:24,160 --> 00:20:25,820
have a unique constraint in this
case?

395
00:20:26,320 --> 00:20:30,740
I think they should go together
all the time. That's it.

396
00:20:31,000 --> 00:20:31,900
Michael: It makes sense.

397
00:20:32,640 --> 00:20:36,360
Nikolay: In this case, there would
not be any inconsistencies

398
00:20:36,760 --> 00:20:37,580
if the constraint...

399
00:20:38,360 --> 00:20:41,500
When I create a unique constraint,
a unique index is created implicitly.

400
00:20:41,540 --> 00:20:43,380
Okay, but why not vice versa?

401
00:20:43,380 --> 00:20:46,030
When I create a unique index, why 
Postgres doesn't create a unique

402
00:20:46,030 --> 00:20:46,800
constraint?

403
00:20:47,040 --> 00:20:48,340
I have no answer for this.

404
00:20:48,340 --> 00:20:51,820
Michael: You can't create a unique 
constraint not valid as well,

405
00:20:51,820 --> 00:20:52,360
can you?

406
00:20:52,360 --> 00:20:53,820
So there's no difference there.

407
00:20:53,860 --> 00:20:54,940
Nikolay: I don't think so.

408
00:20:55,580 --> 00:20:59,540
Michael: The one thing they have 
at, like a change in recent versions.

409
00:21:01,320 --> 00:21:02,920
Nikolay: As always, maybe I'm wrong.

410
00:21:04,160 --> 00:21:05,780
I think I'm not wrong here.

411
00:21:05,800 --> 00:21:06,740
Michael: I don't think so.

412
00:21:06,740 --> 00:21:09,380
The one thing that has changed with 
these in the last couple of

413
00:21:09,380 --> 00:21:15,640
years is in Postgres 15, we got 
this nulls not distinct option.

414
00:21:16,340 --> 00:21:19,120
Which I still don't, I'd love to 
hear from people that have good

415
00:21:19,120 --> 00:21:23,600
use cases for these, but it allows 
you to specify that you can

416
00:21:23,600 --> 00:21:27,180
only allow a single null value 
rather than multiple null values.

417
00:21:27,400 --> 00:21:32,280
Nikolay: You know, null, it's the 
biggest problem in SQL model

418
00:21:32,280 --> 00:21:32,920
now, right?

419
00:21:32,920 --> 00:21:34,040
We discussed it.

420
00:21:34,220 --> 00:21:36,020
Michael: We have a whole episode 
on it.

421
00:21:36,020 --> 00:21:36,520
Nikolay: Right.

422
00:21:36,740 --> 00:21:41,980
But why I think people might want 
null as like, according to

423
00:21:41,980 --> 00:21:47,640
this, how is it called, ternary 
logic, so three-value logic, true,

424
00:21:47,640 --> 00:21:49,060
false, unknown, right?

425
00:21:49,180 --> 00:21:53,360
According to this logic, null means 
unknown, and comparing one

426
00:21:53,360 --> 00:21:55,600
unknown to another unknown, you 
cannot conclude they are the

427
00:21:55,600 --> 00:21:56,000
same.

428
00:21:56,000 --> 00:21:58,160
You always say they are not the 
same.

429
00:21:58,940 --> 00:22:03,120
So the comparison should always 
yield to unknown, so basically

430
00:22:03,260 --> 00:22:04,340
to another null.

431
00:22:04,660 --> 00:22:07,020
Mixing nulls and unknowns is another 
topic.

432
00:22:07,500 --> 00:22:11,120
This means that a unique index, unlike 
a primary key, of course,

433
00:22:11,120 --> 00:22:14,760
a unique constraint, a unique key, 
let's also introduce the term

434
00:22:14,760 --> 00:22:17,700
unique key because the SQL standard 
doesn't follow this term.

435
00:22:17,880 --> 00:22:21,680
Unique key, unlike primary key, 
allows nulls in the column or

436
00:22:21,680 --> 00:22:25,020
in multiple columns if it's a multi-column 
index or constraint.

437
00:22:25,840 --> 00:22:30,420
But since we don't know if they 
are the same or not, we can allow

438
00:22:30,420 --> 00:22:31,360
multiple nulls.

439
00:22:32,040 --> 00:22:40,140
But historically, exactly because 
of the problems with manipulation

440
00:22:40,200 --> 00:22:42,080
of large tables and so on.

441
00:22:42,080 --> 00:22:46,840
For example, before Postgres 11, 
we had, if we, for example,

442
00:22:46,840 --> 00:22:51,500
add a new column and we want a default, 
it's like a full table rewrite,

443
00:22:51,500 --> 00:22:52,360
we cannot do it.

444
00:22:52,360 --> 00:22:54,660
It was fixed in Postgres 11, not 
fixed.

445
00:22:55,200 --> 00:22:58,640
A great feature that you can define 
like a virtual default, right?

446
00:22:59,060 --> 00:23:05,020
But we say a default, we cannot say 
a default, I don't want a full

447
00:23:05,020 --> 00:23:09,160
table rewrite, I don't want, so I say, 
okay, I will consider null as

448
00:23:09,160 --> 00:23:09,660
false.

449
00:23:10,760 --> 00:23:11,900
Just in my application.
```

450
00:23:13,260 --> 00:23:13,860
All right?

451
00:23:13,860 --> 00:23:18,640
So, before Postgres 11, null would
be my false, and true would

452
00:23:18,640 --> 00:23:19,460
be my true.

453
00:23:20,320 --> 00:23:25,180
In this case, I'm breaking theoretical
concepts here.

454
00:23:25,320 --> 00:23:26,980
Null should not be considered as
false.

455
00:23:26,980 --> 00:23:27,660
It's not right.

456
00:23:27,660 --> 00:23:31,980
But just I don't want this operation
to be such a nightmare.

457
00:23:32,780 --> 00:23:34,040
I have a billion rows.

458
00:23:34,900 --> 00:23:36,300
Now it will be my false.

459
00:23:36,460 --> 00:23:42,140
And this leads me to the idea I
want to be like 1 value in index.

460
00:23:42,780 --> 00:23:43,580
That's why.

461
00:23:44,060 --> 00:23:49,740
So to avoid the long heavy operations,
I give null a special

462
00:23:49,740 --> 00:23:52,900
meaning, not as it was supposed
to have.

463
00:23:53,400 --> 00:23:55,780
This is practice, it's not theory,
right?

464
00:23:56,440 --> 00:24:01,900
This is usual, like in many industries,
theory was very good,

465
00:24:02,220 --> 00:24:05,940
we developed great concepts, and
then you go to construction,

466
00:24:05,940 --> 00:24:07,700
for example, oh, this is how it's
used.

467
00:24:07,700 --> 00:24:08,560
We didn't expect.

468
00:24:08,560 --> 00:24:10,140
This is how nulls are used.

469
00:24:10,520 --> 00:24:14,700
Some people use nulls given special
meaning, and they consider

470
00:24:14,700 --> 00:24:16,020
it as a normal value.

471
00:24:16,240 --> 00:24:22,440
In this case, they might want an index
or constraint to say there

472
00:24:22,440 --> 00:24:24,400
should be only 1 row if null.

473
00:24:24,780 --> 00:24:27,780
This is my maybe there are many
other understandings but this

474
00:24:27,780 --> 00:24:30,360
is what I have from my practice.

475
00:24:31,160 --> 00:24:31,500
Michael: Cool.

476
00:24:31,500 --> 00:24:34,200
In fact, you mentioned in passing
there one other important thing

477
00:24:34,200 --> 00:24:37,080
about unique constraints is that
they can be defined across multiple

478
00:24:37,080 --> 00:24:39,160
columns in the table as well.

479
00:24:39,440 --> 00:24:43,000
It can be single columns; it's really
common, but you can do it across

480
00:24:43,000 --> 00:24:44,080
multiple as well.

481
00:24:44,340 --> 00:24:44,752
Nikolay: Okay.

482
00:24:44,752 --> 00:24:46,400
Should we move on?

483
00:24:46,400 --> 00:24:46,880
Primary key

484
00:24:46,880 --> 00:24:48,500
Michael: being a special case again.

485
00:24:49,120 --> 00:24:50,340
Nikolay: Primary key, okay.

486
00:24:50,860 --> 00:24:53,280
So not null plus unique constraint
basically.

487
00:24:54,180 --> 00:24:57,980
This pair gives you a primary key
but there can be only one primary

488
00:24:57,980 --> 00:24:58,480
key.

489
00:24:59,060 --> 00:25:00,680
So I think we already covered.

490
00:25:01,220 --> 00:25:02,360
You need not-null.

491
00:25:02,420 --> 00:25:05,640
For not-null, you probably need
implicitly or explicitly.

492
00:25:05,660 --> 00:25:09,140
You can do it yourself or just
rely on it when the primary key is

493
00:25:09,140 --> 00:25:09,640
redefined.

494
00:25:10,440 --> 00:25:12,080
For an existing large table, right?

495
00:25:12,340 --> 00:25:14,020
For small tables, no problem.

496
00:25:14,240 --> 00:25:17,900
Also, by the way, just this morning
we had a discussion, for

497
00:25:17,900 --> 00:25:21,580
example, creating this concurrently,
should we use it for new

498
00:25:21,580 --> 00:25:22,080
tables?

499
00:25:22,280 --> 00:25:23,420
In my opinion, no.

500
00:25:23,680 --> 00:25:27,000
Because creating this concurrently,
or these multi-step operations,

501
00:25:27,100 --> 00:25:32,180
in this case you lose the good,
beautiful property Postgres has,

502
00:25:32,180 --> 00:25:35,760
transactional DDL and ability to
pack everything into a single

503
00:25:35,760 --> 00:25:36,260
transaction.

504
00:25:36,340 --> 00:25:40,520
If you just define the table, follow
normal approach, don't care

505
00:25:40,520 --> 00:25:42,660
about this 0 downtime stuff.

506
00:25:42,840 --> 00:25:44,740
And you will have single transaction,
right?

507
00:25:45,060 --> 00:25:45,760
All or nothing.

508
00:25:46,080 --> 00:25:47,460
Michael: Same for tiny tables.

509
00:25:47,780 --> 00:25:48,840
Anything will work.

510
00:25:49,000 --> 00:25:52,120
Nikolay: Yeah, maybe like less
than 10,000 rows you don't care

511
00:25:52,120 --> 00:25:52,620
about.

512
00:25:52,660 --> 00:25:54,300
It takes like 100 milliseconds.

513
00:25:55,080 --> 00:25:55,680
Let's go.

514
00:25:55,680 --> 00:25:58,480
You have single step, it's atomic,
great.

515
00:25:59,380 --> 00:26:03,340
But if you have large tables, you
need to redefine primary key

516
00:26:03,340 --> 00:26:07,540
and int4 to int8, for
example.

517
00:26:07,540 --> 00:26:09,020
Michael: That's the big 1, yeah.

518
00:26:09,380 --> 00:26:12,540
Nikolay: Yeah, my team and I implemented
all types of this

519
00:26:12,540 --> 00:26:16,140
operation and we helped multiple
companies, a few billion dollar,

520
00:26:16,240 --> 00:26:19,920
multi-billion dollar companies,
public companies, we helped them

521
00:26:19,920 --> 00:26:22,180
to convert int4 to int8.

522
00:26:22,300 --> 00:26:25,620
I know a lot of interesting stuff
around it, but in general,

523
00:26:25,680 --> 00:26:30,280
you just need like not null, and
we discussed how there's also

524
00:26:30,280 --> 00:26:33,820
tricks if you post-guess 11 trick,
like default minus 1, not

525
00:26:33,820 --> 00:26:37,700
null, you can define right away,
virtually, right?

526
00:26:37,920 --> 00:26:42,280
You don't need even check constraint
but since Postgres 12 we

527
00:26:42,280 --> 00:26:46,580
rely on check constraint but we
remember primary key needs actual

528
00:26:46,580 --> 00:26:51,880
not null And we also create unique
index and when we create primary

529
00:26:51,880 --> 00:26:53,860
key, we say, using this index.

530
00:26:54,520 --> 00:26:59,440
This allows us to put primary key
creation as the final step into

531
00:26:59,440 --> 00:27:02,820
a transaction, which will probably
swap something, right?

532
00:27:03,080 --> 00:27:03,400
Yeah.

533
00:27:03,400 --> 00:27:04,900
Rename columns as well.

534
00:27:05,140 --> 00:27:07,940
And there, of course, you also need
to think about lock acquisition,

535
00:27:08,100 --> 00:27:10,760
retries, low lock timeout, like
all this stuff.

536
00:27:10,760 --> 00:27:15,600
And it's, of course, if you have
many, many, many, I don't know,

537
00:27:15,600 --> 00:27:19,940
like gigabytes, dozens, hundreds,
maybe terabytes, and a lot

538
00:27:19,940 --> 00:27:20,580
of TPS.

539
00:27:20,640 --> 00:27:23,000
You need to engineer this carefully.

540
00:27:23,000 --> 00:27:23,600
I mean,

541
00:27:23,900 --> 00:27:26,320
Michael: this is probably the most
involved of all the things

542
00:27:26,320 --> 00:27:27,120
we're talking about.

543
00:27:27,120 --> 00:27:29,340
And I think it's probably too big
to cover today.

544
00:27:29,340 --> 00:27:32,700
But there is a really good talk
by Robert Treat that I saw that

545
00:27:32,700 --> 00:27:35,820
covers this in about half an hour
in depth for people that actually

546
00:27:35,820 --> 00:27:37,120
have to do this kind of thing.

547
00:27:37,120 --> 00:27:39,800
Nikolay: I'm sure it's not possible
to cover everything in half

548
00:27:39,800 --> 00:27:41,760
an hour because there are several
methods.

549
00:27:41,880 --> 00:27:44,860
They have pros and cons and there
are many many nuances, for example

550
00:27:44,860 --> 00:27:45,480
Foreign keys.

551
00:27:45,480 --> 00:27:49,140
If you redefine primary key, you
need to deal with foreign key

552
00:27:49,160 --> 00:27:49,660
redefinition.

553
00:27:50,380 --> 00:27:51,680
And it's also interesting.

554
00:27:51,760 --> 00:27:54,760
And autovacuum and running transaction
**transaction ID wraparound prevention mode** can

555
00:27:54,760 --> 00:27:55,660
block you.

556
00:27:55,760 --> 00:28:00,480
And also if you decide to mark
foreign key as not valid and then

557
00:28:00,480 --> 00:28:02,300
you realize it blocks writes.

558
00:28:03,460 --> 00:28:08,760
Or you just forgot to drop old
foreign key and new rows after

559
00:28:08,760 --> 00:28:09,260
switch.

560
00:28:10,640 --> 00:28:12,460
So a lot of mistakes.

561
00:28:12,800 --> 00:28:16,360
Michael: Yeah, please use bigints
or **int8** in your

562
00:28:16,360 --> 00:28:17,120
new tables.

563
00:28:17,220 --> 00:28:17,920
Right away.

564
00:28:18,580 --> 00:28:20,840
Nikolay: Or UUID version 7, 8.

565
00:28:21,580 --> 00:28:22,080
Yeah,

566
00:28:22,360 --> 00:28:23,760
Michael: Or UUIDs, yeah.

567
00:28:24,520 --> 00:28:25,020
Cool.

568
00:28:25,320 --> 00:28:26,140
Foreign keys?

569
00:28:26,320 --> 00:28:30,740
Or actually, one last thing, is it
worth discussing, like, multiple,

570
00:28:30,740 --> 00:28:32,500
you can have multiple column primary
keys?

571
00:28:32,500 --> 00:28:35,680
I guess that's obvious from the
multiple column unique ones as

572
00:28:35,680 --> 00:28:36,180
well.

573
00:28:36,420 --> 00:28:37,860
But yeah, foreign keys.

574
00:28:38,240 --> 00:28:40,440
Nikolay: I don't think it's something
somewhat different.

575
00:28:40,440 --> 00:28:44,620
Yeah, so just you need to have
not null on each column participating

576
00:28:45,140 --> 00:28:46,360
in primary key definition.

577
00:28:46,820 --> 00:28:47,600
That's it.

578
00:28:48,060 --> 00:28:49,460
Unlike unique keys, of course.

579
00:28:49,540 --> 00:28:52,620
So foreign keys involve two tables.

580
00:28:53,000 --> 00:28:58,080
And creation of foreign key requires
several locks to be acquired

581
00:28:58,080 --> 00:28:58,980
on both sides.

582
00:29:00,600 --> 00:29:05,380
Fortunately, and full table scan
of both tables to ensure that

583
00:29:05,380 --> 00:29:09,600
values in referencing table have
values that are present in the

584
00:29:09,600 --> 00:29:10,980
referenced table.

585
00:29:12,340 --> 00:29:16,860
So in this case, if you just don't
care and like brute force

586
00:29:16,860 --> 00:29:19,800
approach, like defining documentation,
just create it, that's

587
00:29:19,800 --> 00:29:20,220
it.

588
00:29:20,220 --> 00:29:23,500
Well, you have an issue because
you will you are going to block

589
00:29:24,120 --> 00:29:30,260
probably not, I think lock level
there is not so bad than in

590
00:29:30,460 --> 00:29:33,460
previous cases we discussed check
constraints but you're going

591
00:29:33,460 --> 00:29:35,100
to block DDL for sure.

592
00:29:36,060 --> 00:29:38,600
Probably DML won't be blocked or
will be blocked.

593
00:29:38,600 --> 00:29:40,740
Yeah, probably it will be blocked
as well.

594
00:29:40,760 --> 00:29:44,480
I don't remember details here,
although I wrote it a few hours

595
00:29:44,480 --> 00:29:46,320
ago.
Check out my how-to.

596
00:29:46,980 --> 00:29:50,900
I specified all the locks and what
you're going to block.

597
00:29:50,900 --> 00:29:53,320
But in general, you should care
about it as well.

598
00:29:53,320 --> 00:29:57,380
And generally, under load, you
shouldn't want to create foreign

599
00:29:57,380 --> 00:29:58,660
key in one step.

600
00:29:58,660 --> 00:30:00,180
You need, again, three steps.

601
00:30:00,580 --> 00:30:04,580
First, creation with not valid,
with retries and low timeout.

602
00:30:05,540 --> 00:30:12,340
Then you need to check, again,
like with checks, Postgres will

603
00:30:12,340 --> 00:30:19,220
start checking new writes that
inserted and updated rows, or

604
00:30:19,220 --> 00:30:21,360
deleted in this case as well, because
if you...

605
00:30:21,380 --> 00:30:22,570
Michael: Very important, yeah.

606
00:30:22,570 --> 00:30:23,740
Nikolay: Yeah, yeah, yeah.

607
00:30:23,740 --> 00:30:25,640
And you need an index, but it's
another story.

608
00:30:26,120 --> 00:30:30,440
So, it will start checking to validate
this constraint for new

609
00:30:30,440 --> 00:30:33,840
writes, but for existing rows,
we're still not sure.

610
00:30:33,840 --> 00:30:39,520
So optional step two is to validate
and fix if you see problems,

611
00:30:40,760 --> 00:30:42,780
potential violation, right?

612
00:30:43,260 --> 00:30:46,920
And then third step, you say alter
validate constraint.

613
00:30:47,520 --> 00:30:52,100
Again, understanding that ongoing
index creation or recreation

614
00:30:52,120 --> 00:30:54,180
or vacuum can block you.

615
00:30:55,120 --> 00:30:57,320
So yeah, also three steps.

616
00:30:57,740 --> 00:31:03,680
I explained in detail in my latest
marathon post.

617
00:31:04,440 --> 00:31:06,040
So what else to say here?

618
00:31:07,020 --> 00:31:10,780
Foreign keys can be marked as,
how's it called?

619
00:31:11,040 --> 00:31:12,180
Deferred, right?

620
00:31:12,660 --> 00:31:14,980
Am I like deferred constraints,
right?

621
00:31:15,560 --> 00:31:21,660
So when you have a complex transaction,
you might want foreign

622
00:31:21,660 --> 00:31:26,000
keys to be checked at commit time
later, not at each statement

623
00:31:26,000 --> 00:31:26,780
time, later.

624
00:31:27,740 --> 00:31:31,940
And in this case, I don't remember
again details.

625
00:31:32,860 --> 00:31:37,880
Several years ago we had an issue,
we worked with Miro and had

626
00:31:37,880 --> 00:31:44,020
issue with using PgBouncer to fight
bloat and they used deferred

627
00:31:44,020 --> 00:31:44,520
constraints.

628
00:31:44,640 --> 00:31:48,720
So there is an article about it,
how to use PgBouncer if you have

629
00:31:48,820 --> 00:31:50,040
deferred constraints.

630
00:31:50,440 --> 00:31:53,960
And it explains a lot of details
I will provide, Nick.

631
00:31:54,340 --> 00:31:58,440
So Miro Engineering wrote it a
few years ago.

632
00:31:58,440 --> 00:31:59,280
It was interesting.

633
00:31:59,540 --> 00:32:01,180
Also issued to address.

634
00:32:02,300 --> 00:32:02,840
What else?

635
00:32:02,840 --> 00:32:03,560
I think that's it.

636
00:32:03,560 --> 00:32:05,920
Michael: I think on delete cascade
is worth mentioning.

637
00:32:05,920 --> 00:32:10,760
Like you can define at the point
of constraint creation, what

638
00:32:10,760 --> 00:32:16,260
you want to happen if, like if
you have a, a really common example

639
00:32:16,260 --> 00:32:19,460
is a table of blogs and a table
of blog posts.

640
00:32:19,960 --> 00:32:23,200
If you delete the blog, do you
want the blog posts to be deleted?

641
00:32:23,200 --> 00:32:25,780
Like what do you want to happen
in those cases?

642
00:32:25,840 --> 00:32:28,280
And that's the case where I think
it's really important to mention

643
00:32:28,280 --> 00:32:32,180
that whilst with when we define
a primary key, we get an index.

644
00:32:33,080 --> 00:32:36,220
Foreign keys defined on one table,
their referencing column is

645
00:32:36,220 --> 00:32:38,940
not necessarily, we don't check
that it's indexed as well.

646
00:32:38,940 --> 00:32:42,140
In fact, I've even read a blog
post saying you shouldn't always

647
00:32:42,260 --> 00:32:43,140
index that column.

648
00:32:43,140 --> 00:32:46,640
But I think the cases that you
should far outweigh the cases

649
00:32:46,640 --> 00:32:47,460
where you shouldn't.

650
00:32:47,640 --> 00:32:51,220
It is worth checking you do have
indexes on those, so that

651
00:32:51,220 --> 00:32:53,980
those deletes on cascade are efficient.

652
00:32:54,440 --> 00:32:54,940
Nikolay: Right.

653
00:32:55,080 --> 00:32:58,940
I always try to avoid using this
option, but I see people in

654
00:32:58,940 --> 00:33:03,180
quite large databases under a large
load use this deletion propagation

655
00:33:03,280 --> 00:33:05,040
or update propagation logic.

656
00:33:06,040 --> 00:33:08,900
So 2 things here, you're right
about indexes.

657
00:33:09,920 --> 00:33:14,280
And Postgres DBA toolkit I have
and postgres-checkup tool, they

658
00:33:14,280 --> 00:33:20,440
have reports to find, like, you
have a foreign key, there is an index,

659
00:33:20,440 --> 00:33:23,040
primary key on one side, you don't
have an index on the other side,

660
00:33:23,040 --> 00:33:27,660
so when you will need to delete,
you probably won't need to delete

661
00:33:27,660 --> 00:33:33,160
every day, but at some point if
you have to, it will be a sequential

662
00:33:33,160 --> 00:33:33,660
scan.

663
00:33:33,680 --> 00:33:34,940
Very bad, very slow.

664
00:33:35,080 --> 00:33:36,420
But this is only one thing.

665
00:33:36,420 --> 00:33:40,280
What if you're deleting a blog
which has a million posts?

666
00:33:41,200 --> 00:33:42,120
I don't like it.

667
00:33:42,120 --> 00:33:44,160
I don't like this propagation.

668
00:33:45,360 --> 00:33:51,540
If we know that only low volumes,
low number of rows will be

669
00:33:51,660 --> 00:33:52,860
automatically deleted.

670
00:33:53,240 --> 00:33:53,980
It's okay.

671
00:33:54,020 --> 00:33:57,280
But if it's an unpredictable number
of rows, I would prefer having

672
00:33:57,280 --> 00:34:00,780
my own logic with asynchronous
propagation of change, with some

673
00:34:00,780 --> 00:34:02,060
batches and so on.

674
00:34:02,620 --> 00:34:03,300
Michael: Makes sense.

675
00:34:04,400 --> 00:34:07,860
Nikolay: But I see people use it
on a quite large scale.

676
00:34:08,400 --> 00:34:09,940
Michael: Makes a lot of sense.

677
00:34:10,760 --> 00:34:13,160
Cool, I think we're down to the
last one.

678
00:34:13,320 --> 00:34:15,120
We've made it to exclusion constraints.

679
00:34:15,700 --> 00:34:19,400
Nikolay: Right, I always confuse
exclusion constraints with constraint

680
00:34:19,400 --> 00:34:22,680
exclusion, which was related to
partitioning.

681
00:34:23,600 --> 00:34:27,840
But exclusion constraints are for
time ranges, like intervals,

682
00:34:27,900 --> 00:34:31,680
and you want to define, you want
to say my intervals should not

683
00:34:31,680 --> 00:34:32,180
overlap.

684
00:34:32,320 --> 00:34:35,980
So it's kind of advanced uniqueness,
right?

685
00:34:35,980 --> 00:34:39,640
Maybe not uniqueness, but it's
like for special data, for GIST

686
00:34:39,640 --> 00:34:41,480
SP-GiST indexes and so on.

687
00:34:41,480 --> 00:34:44,820
So you define, I say, I'm building
some schedule.

688
00:34:44,960 --> 00:34:51,540
I cannot allow overlapping, or
I'm like describing some, I don't

689
00:34:51,540 --> 00:34:55,640
know, like 3D world and I have
various shapes and I don't want

690
00:34:55,640 --> 00:34:56,820
them to overlap.

691
00:34:57,240 --> 00:35:02,800
In this case, GIST and its variation,
SP-GiST, will support this constraint

692
00:35:02,960 --> 00:35:07,960
and ensure that no, like, no balls,
for example, or no cubes

693
00:35:07,960 --> 00:35:10,660
or something overlap in that space.

694
00:35:11,000 --> 00:35:12,880
So, Postgres supports this kind
of thing.

695
00:35:12,880 --> 00:35:14,920
Which is good, but it's quite a narrow
use case for me.

696
00:35:14,920 --> 00:35:16,500
I don't see it often.

697
00:35:17,320 --> 00:35:18,340
Michael: Yeah, very cool.

698
00:35:18,340 --> 00:35:21,600
I've heard, I think I've seen it
being used in examples where

699
00:35:21,600 --> 00:35:25,060
People are designing room booking
apps or things like that.

700
00:35:25,260 --> 00:35:26,920
But no, not used it myself.

701
00:35:27,260 --> 00:35:30,860
In fact, it's a tiny little documentation
entry, isn't it?

702
00:35:30,860 --> 00:35:31,360
So it's

703
00:35:31,360 --> 00:35:32,460
Nikolay: time and space.

704
00:35:32,460 --> 00:35:32,700
Yes.

705
00:35:32,700 --> 00:35:34,360
But yeah, the combination is very
brief.

706
00:35:34,360 --> 00:35:38,500
And here, I honestly don't have
anything to say.

707
00:35:39,780 --> 00:35:42,720
It's supported by some index, I
guess, right?

708
00:35:42,720 --> 00:35:48,160
So an index you create with concurrently,
usually, if you want

709
00:35:48,160 --> 00:35:49,400
0 downtime approach.

710
00:35:49,740 --> 00:35:53,600
I'm not sure if exclusion constraint
can be created based on

711
00:35:53,600 --> 00:35:54,720
an existing index.

712
00:35:54,720 --> 00:35:55,760
It should be possible.

713
00:35:55,760 --> 00:35:56,820
I have no idea.

714
00:35:57,560 --> 00:35:59,720
Michael: It says, yeah, good point.

715
00:36:00,060 --> 00:36:00,960
I don't know.

716
00:36:01,020 --> 00:36:02,720
Let us know in the comments.

717
00:36:04,060 --> 00:36:08,040
Nikolay: Yeah, but in general,
I think we covered pretty well

718
00:36:08,040 --> 00:36:11,160
how 6 types of constraints, at
least 5 types of constraints we

719
00:36:11,160 --> 00:36:15,300
covered, and how to create all
of them without downtime.

720
00:36:15,520 --> 00:36:21,600
Just for this final part, let's
think how to drop them.

721
00:36:22,280 --> 00:36:25,380
Well, I think dropping is straightforward
usually.

722
00:36:25,760 --> 00:36:26,260
Yeah.

723
00:36:27,040 --> 00:36:33,080
If it's like unique constraint,
which is based on an index, you

724
00:36:33,080 --> 00:36:37,620
probably just drop an index with
concurrently, right?

725
00:36:37,660 --> 00:36:38,160
Yeah.

726
00:36:38,320 --> 00:36:43,140
Others, you just drop them, but
you need a low lock timeout and

727
00:36:43,140 --> 00:36:48,060
retries to be involved, because
Postgres needs to change table

728
00:36:48,060 --> 00:36:51,680
metadata, and during this, if it's
blocked, it's not good.

729
00:36:51,740 --> 00:36:56,140
So again, low lock timeout and retries
are going to help here.

730
00:36:57,040 --> 00:36:58,000
Michael: What about altering?

731
00:36:58,020 --> 00:37:01,380
Like I saw a good blog post by
where people took some one of the

732
00:37:01,380 --> 00:37:06,060
use cases for check constraints
is, for example, we often say

733
00:37:06,060 --> 00:37:07,580
use text in Postgres.

734
00:37:08,120 --> 00:37:13,100
It's in a lot of recommendations
to use text and not be kind

735
00:37:13,100 --> 00:37:14,700
of constrained on how long.

736
00:37:15,420 --> 00:37:16,360
Nikolay: Not varchar, not char.

737
00:37:16,430 --> 00:37:17,500
Michael: Exactly, yeah.

738
00:37:18,480 --> 00:37:21,820
Nikolay: This, strictly speaking,
is not a constraint, right?

739
00:37:21,820 --> 00:37:23,960
But in a broader sense, it's a constraint.

740
00:37:24,140 --> 00:37:24,960
But it's- Oh, but

741
00:37:24,960 --> 00:37:26,740
Michael: you could implement it
with a constraint.

742
00:37:28,080 --> 00:37:28,840
Nikolay: Oh, exactly.

743
00:37:28,940 --> 00:37:32,600
Yeah, and I like it because you
control exactly how and when

744
00:37:32,600 --> 00:37:33,620
it will be verified.

745
00:37:33,920 --> 00:37:38,960
And you know, again, this is exactly
why, again, check constraints

746
00:37:39,720 --> 00:37:40,460
are underappreciated.

747
00:37:42,100 --> 00:37:46,040
And recently, I indeed prefer using
just text without limits,

748
00:37:46,560 --> 00:37:48,460
not varchar(n), but just text.

749
00:37:48,620 --> 00:37:52,120
And then if I need to limit, I
use check constraints, I know how

750
00:37:52,120 --> 00:37:54,340
to set it up without downtime.

751
00:37:54,560 --> 00:37:57,940
And I know how to change it if
I need to increase or decrease

752
00:37:57,960 --> 00:38:00,200
this limit, I know how to do it,
right?

753
00:38:00,240 --> 00:38:01,560
And It's good.

754
00:38:01,560 --> 00:38:03,800
Michael: But yeah, I think we only
lose 1.

755
00:38:03,800 --> 00:38:06,360
I think we only lose 1 benefit
or like 1 optimization.

756
00:38:06,880 --> 00:38:09,640
I'm not sure if there are if there
is this optimization for check

757
00:38:09,640 --> 00:38:10,120
constraints.

758
00:38:10,120 --> 00:38:13,920
Let's say we want to relax the
constraint from 200 characters

759
00:38:13,920 --> 00:38:15,140
to 300 characters.

760
00:38:15,280 --> 00:38:19,760
I think in if you're using varchar
Postgres will handle that

761
00:38:19,760 --> 00:38:22,160
knowing it doesn't have to recheck
existing data.

762
00:38:22,780 --> 00:38:24,020
But that's the only downside.

763
00:38:24,020 --> 00:38:27,940
Nikolay: If you increase n in varchar,
it won't rewrite.

764
00:38:28,320 --> 00:38:31,560
And it won't scan it, because obviously
if you just increase

765
00:38:31,560 --> 00:38:36,680
the limit, existing rows already
are okay with this.

766
00:38:37,540 --> 00:38:38,740
So yeah, I agree.

767
00:38:39,080 --> 00:38:42,260
And in the case of check constraint,
I'm not sure if we have

768
00:38:42,260 --> 00:38:43,820
any...
Michael: I don't think so.

769
00:38:44,020 --> 00:38:45,200
I've not heard of one.

770
00:38:45,200 --> 00:38:47,720
But we can do the three-step process
you mentioned.

771
00:38:49,860 --> 00:38:53,000
In fact, we can probably add another
constraint and then drop

772
00:38:53,000 --> 00:38:53,660
the existing.

773
00:38:54,920 --> 00:38:55,420
Nikolay: Exactly.

774
00:38:55,580 --> 00:38:56,340
Michael: Yeah, great.

775
00:38:57,340 --> 00:38:57,840
Cool.

776
00:38:58,040 --> 00:38:59,200
Thanks so much, Nikolay.

777
00:38:59,440 --> 00:39:00,520
Thanks, everyone, for listening.

778
00:39:00,520 --> 00:39:01,620
Catch you next week.

779
00:39:02,140 --> 00:39:03,040
Nikolay: Yeah.
See you later.