1
00:00:00,060 --> 00:00:01,480
Nikolay: Hello, this is PostgresFM.

2
00:00:01,480 --> 00:00:03,320
I don't remember the number of episodes.

3
00:00:03,320 --> 00:00:08,540
This is Nikolay Samokhvalov from Postgres.AI and my co-host as

4
00:00:08,540 --> 00:00:11,180
usual, Michael Christofides from pgMustard.

5
00:00:11,440 --> 00:00:12,160
Hi, Michael.

6
00:00:13,080 --> 00:00:13,880
Michael: Hello, Nikolay.

7
00:00:13,900 --> 00:00:15,180
I think this is 103.

8
00:00:16,560 --> 00:00:18,080
Nikolay: Okay, not bad.

9
00:00:18,740 --> 00:00:21,060
So we discussed what?

10
00:00:21,100 --> 00:00:24,200
Michael: Yeah, so it's one of those things that's come up in discussion

11
00:00:24,200 --> 00:00:24,800
a few times.

12
00:00:24,800 --> 00:00:27,180
And we said we should do an episode on that one day.

13
00:00:27,180 --> 00:00:31,340
And so we finally got to it and it's soft deletes, which is a

14
00:00:31,340 --> 00:00:35,140
pattern that a lot of us have come across whilst working with

15
00:00:35,140 --> 00:00:36,240
databases more generally.

16
00:00:36,240 --> 00:00:41,320
It's not specific to Postgres, but it's the idea of instead of

17
00:00:41,320 --> 00:00:45,560
deleting data, not deleting it, but hiding it from the application

18
00:00:45,760 --> 00:00:46,700
in some way.

19
00:00:46,860 --> 00:00:52,540
Nikolay: Let me explain it for regular, like how it feels since

20
00:00:52,540 --> 00:00:55,300
I've built three social networks in the past.

21
00:00:56,020 --> 00:00:58,120
I used soft deletes a lot, of course.

22
00:00:58,680 --> 00:01:04,560
And then like Postgres is not good with deletes due to how MVCC

23
00:01:04,600 --> 00:01:06,600
is organized, vacuum and so on.

24
00:01:07,080 --> 00:01:08,640
Deletes are not good, right?

25
00:01:08,680 --> 00:01:10,320
It's like an old idea.

26
00:01:11,000 --> 00:01:15,180
Although each update consists of delete and insert at a physical

27
00:01:15,180 --> 00:01:16,160
level, right?

28
00:01:16,940 --> 00:01:18,660
Unless it's a hot update.

29
00:01:19,000 --> 00:01:26,180
So how it feels in the eyes of millions of users I know, and

30
00:01:26,320 --> 00:01:29,320
it's not only about the projects I've built, like almost any

31
00:01:29,320 --> 00:01:30,880
social media uses it.

32
00:01:31,000 --> 00:01:35,240
You go and try to unregister your account, you press all the

33
00:01:35,240 --> 00:01:39,920
buttons say delete me or something like everything or just I

34
00:01:39,920 --> 00:01:42,820
don't know sometimes you need to write to support or something

35
00:01:42,820 --> 00:01:46,640
but usually it's good services have this button ready for you

36
00:01:46,640 --> 00:01:48,980
right So I want to delete my account fully.

37
00:01:49,760 --> 00:01:54,660
And then you find out that, for example, depending on implementation,

38
00:01:55,240 --> 00:02:00,320
then you find out that your page is, I don't know, some small

39
00:02:00,320 --> 00:02:05,040
artifacts you can see and feel that your page was not fully deleted

40
00:02:05,080 --> 00:02:07,240
or your account was not fully deleted.

41
00:02:07,720 --> 00:02:14,180
One of these artifacts, not artifacts, signs that SoftDelete is

42
00:02:14,180 --> 00:02:16,900
used is You can actually restore it, right?

43
00:02:17,180 --> 00:02:17,980
Easy, right?

44
00:02:17,980 --> 00:02:19,440
You can, for example, Facebook.

45
00:02:19,440 --> 00:02:21,360
Try to delete yourself on Facebook.

46
00:02:21,680 --> 00:02:26,040
It will give you many chances to restore, and even after you

47
00:02:26,040 --> 00:02:28,000
delete it, you still can restore it.

48
00:02:29,160 --> 00:02:32,960
So you can, like, one month later, you can just press a button

49
00:02:32,960 --> 00:02:33,390
and restore.

50
00:02:33,390 --> 00:02:38,660
If you, like, what users want when they are angry about some

51
00:02:38,660 --> 00:02:39,160
service?

52
00:02:39,580 --> 00:02:44,080
They don't want to deal with it
completely, like, something can

53
00:02:44,080 --> 00:02:45,140
happen in their lives, right?

54
00:02:45,140 --> 00:02:48,760
They can see having an account
as a problem, for example.

55
00:02:49,020 --> 00:02:52,940
They think it's a mistake to post
such pictures or to do some

56
00:02:52,940 --> 00:02:53,440
conversations.

57
00:02:54,960 --> 00:02:55,940
It doesn't matter.
I want to delete it.

58
00:02:55,940 --> 00:02:57,600
I'm a new person right now.

59
00:02:58,500 --> 00:02:59,560
It's there, right?

60
00:02:59,700 --> 00:03:00,700
So, delete.

61
00:03:01,500 --> 00:03:04,400
But then the server says, you can
restore.

62
00:03:05,080 --> 00:03:06,960
Did you delete my data or no?

63
00:03:07,360 --> 00:03:11,240
And then compliance, we have also
interesting points.

64
00:03:11,320 --> 00:03:17,920
In some cases, we have mandatory,
like demand to store data sometimes.

65
00:03:19,020 --> 00:03:19,520
Right?

66
00:03:20,220 --> 00:03:21,720
Michael: We have both sides, right?

67
00:03:21,900 --> 00:03:25,520
Yeah, we have laws that demand
we have to store data for a certain

68
00:03:25,520 --> 00:03:27,240
amount of time in some environment.

69
00:03:27,620 --> 00:03:28,400
Nikolay: And vice versa.

70
00:03:28,480 --> 00:03:32,380
Sometimes if user decides the service
must delete data.

71
00:03:32,440 --> 00:03:36,180
So delete, the word delete has
many meanings, right?

72
00:03:37,060 --> 00:03:40,180
It's not SQL statement delete.

73
00:03:40,520 --> 00:03:45,220
When you delete your account, it
doesn't mean the service executed

74
00:03:45,660 --> 00:03:46,580
actually delete.

75
00:03:47,060 --> 00:03:51,500
Most likely they just executed
updates or delete with triggers

76
00:03:51,500 --> 00:03:52,000
and so on.

77
00:03:52,000 --> 00:03:53,860
We will discuss, dive into details.

78
00:03:53,860 --> 00:04:00,140
But for users, it's really annoying
to feel lack of ability to

79
00:04:00,140 --> 00:04:01,020
truly delete.

80
00:04:01,580 --> 00:04:05,640
I must say, like I saw it and I
was on the other side of, I was

81
00:04:05,640 --> 00:04:07,620
on both sides many times actually.

82
00:04:08,260 --> 00:04:12,340
And I know how it feels when you
want to delete for sure, like

83
00:04:12,340 --> 00:04:14,840
100% I won't delete my data, it's
my data deleted.

84
00:04:15,060 --> 00:04:18,980
No, you can restore it means it
was not actually deleted.

85
00:04:19,600 --> 00:04:20,580
Michael: Yeah, for sure.

86
00:04:20,580 --> 00:04:23,320
Well, I think we can come back
to like,

87
00:04:23,800 --> 00:04:24,140
Nikolay: technical

88
00:04:24,140 --> 00:04:25,240
Michael: the user side.

89
00:04:25,240 --> 00:04:27,800
Well, I think there's I think there's
so many parts of this that

90
00:04:27,800 --> 00:04:28,340
are interesting.

91
00:04:28,340 --> 00:04:33,780
1 is kind of, If you do decide
you want this as a provider, implementation

92
00:04:34,120 --> 00:04:37,240
details are quite interesting and
there are different and Postgres

93
00:04:37,260 --> 00:04:40,360
specific details are quite interesting
as well.

94
00:04:40,640 --> 00:04:44,840
But from the usability side of
it, there are different implementations.

95
00:04:45,160 --> 00:04:48,980
I think various companies and organizations
are implementing

96
00:04:49,240 --> 00:04:53,960
features around deletion, around
kind of making sure people know

97
00:04:53,960 --> 00:04:57,720
exactly what they're doing before
hitting delete, you know, write

98
00:04:57,720 --> 00:04:59,940
the words of your account name
in here.

99
00:05:00,040 --> 00:05:03,840
Everyone's come across these kind
of make sure you type out,

100
00:05:03,840 --> 00:05:07,840
yes, I'm sure I really want to
delete my account and I will not

101
00:05:07,840 --> 00:05:09,900
beg support to undelete it later.

102
00:05:10,360 --> 00:05:14,980
Nikolay: We give you 1 month to
make, to reconsider it or something.

103
00:05:15,140 --> 00:05:18,340
Michael: Exactly, Or we can recover
it, but only in the next

104
00:05:18,340 --> 00:05:19,020
30 days.

105
00:05:19,020 --> 00:05:19,540
Type them.

106
00:05:19,540 --> 00:05:20,300
Yeah, exactly.

107
00:05:20,380 --> 00:05:23,600
Nikolay: I remember another way,
another sign that you can see

108
00:05:23,600 --> 00:05:28,360
that soft delete was applied is
instead of having 404 error,

109
00:05:28,460 --> 00:05:35,160
404 error, you go to your X page,
like you deleted it, but you

110
00:05:35,160 --> 00:05:36,920
see something like account deleted.

111
00:05:38,160 --> 00:05:41,260
It means that records still exist
in database, right?

112
00:05:42,180 --> 00:05:43,940
Just changed its status.

113
00:05:44,760 --> 00:05:48,220
So status like active, deleted,
right?

114
00:05:48,820 --> 00:05:52,080
Pending, like pending, pending,
it means you need to activate

115
00:05:52,080 --> 00:05:55,140
it, like email activation or something,
phone activation.

116
00:05:55,520 --> 00:05:58,480
Then it's active and then it's
deleted, lifecycle, right?

117
00:05:58,780 --> 00:06:01,120
Michael: In my mind, that's kind
of like a bug though, right?

118
00:06:01,120 --> 00:06:04,580
Like that's just a poorly implemented
soft deletion.

119
00:06:05,020 --> 00:06:06,480
Nikolay: Yeah, but it's so common.

120
00:06:06,820 --> 00:06:10,840
And I actually implemented, I implemented
soft deletes in many

121
00:06:10,840 --> 00:06:11,700
various ways.

122
00:06:12,040 --> 00:06:17,080
But honestly, this morning, thinking
like walking my dog, By

123
00:06:17,080 --> 00:06:20,660
the way, hello to everyone who
is running, walking dog, like,

124
00:06:20,660 --> 00:06:22,360
I don't know, like riding bicycle.

125
00:06:23,740 --> 00:06:24,240
Yeah.

126
00:06:24,520 --> 00:06:28,520
I know we promised to keep 30 minutes,
but it's good for you

127
00:06:28,580 --> 00:06:32,860
to run longer and ride bicycle
longer and for your dog as well.

128
00:06:33,340 --> 00:06:37,400
So walking my dogs I was thinking,
soft delete, soft delete.

129
00:06:37,660 --> 00:06:41,240
I also chatted with our bot to
like to brainstorm what to discuss

130
00:06:41,240 --> 00:06:46,820
and I realized that if I needed
to implement it right now, looking

131
00:06:46,940 --> 00:06:51,880
like back to my 20 years of experience,
3 social networks and

132
00:06:51,880 --> 00:06:56,980
many other systems and our customers,
all those ways to implement

133
00:06:56,980 --> 00:06:58,720
soft delete, what would I choose
now?

134
00:06:58,720 --> 00:06:59,920
I don't know.

135
00:07:00,300 --> 00:07:05,400
Maybe by the end of this discussion,
I will have some clarity.

136
00:07:05,580 --> 00:07:07,460
But it's so difficult.

137
00:07:08,220 --> 00:07:09,980
It's not an easy topic, actually.

138
00:07:11,120 --> 00:07:13,280
I expected a simple topic today.

139
00:07:15,060 --> 00:07:18,540
Michael: I think it's simple if
you only consider a single case

140
00:07:18,540 --> 00:07:19,160
at a time.

141
00:07:19,160 --> 00:07:22,460
If you want to give generic advice,
I think it becomes more complicated.

142
00:07:22,760 --> 00:07:26,340
And we also dove into the most
complicated version, which is

143
00:07:26,520 --> 00:07:29,240
deleting a whole account which
has multiple things.

144
00:07:29,240 --> 00:07:31,240
Nikolay: Yeah, let's talk about
only 1 table.

145
00:07:31,240 --> 00:07:34,040
Yeah, propagation is a different
topic.

146
00:07:34,040 --> 00:07:35,220
Consider only 1 table.

147
00:07:35,220 --> 00:07:36,280
We have many choices.

148
00:07:36,280 --> 00:07:40,820
For example, 1 I already discussed,
like status.

149
00:07:40,900 --> 00:07:43,760
You have pending, active, deleted,
something, archived.

150
00:07:44,500 --> 00:07:49,720
Or it can be a different column
is deleted by default.

151
00:07:50,420 --> 00:07:53,040
It's false, you can set it to true.

152
00:07:53,320 --> 00:07:54,840
Both are very similar approaches.

153
00:07:55,420 --> 00:07:57,040
Both I don't like at all.

154
00:07:57,440 --> 00:08:01,300
Michael: I quite like deleted at,
which is with it being null

155
00:08:01,300 --> 00:08:04,480
by default, and then a timestamp
as to when it was deleted.

156
00:08:04,480 --> 00:08:04,980
Right.

157
00:08:05,240 --> 00:08:06,900
Nikolay: I would not choose these
approaches.

158
00:08:07,340 --> 00:08:08,580
Neither of them.

159
00:08:09,240 --> 00:08:09,740
Today.

160
00:08:10,120 --> 00:08:11,500
I chose them in the past.

161
00:08:11,720 --> 00:08:12,080
Yeah.

162
00:08:12,080 --> 00:08:13,480
Today, no, thank you.

163
00:08:13,660 --> 00:08:14,060
Why?

164
00:08:14,060 --> 00:08:20,080
Because then you need to deal with,
like, optimizing your queries,

165
00:08:20,080 --> 00:08:21,220
thinking about indexes.

166
00:08:21,780 --> 00:08:25,220
Most of the time, queries work
with active data.

167
00:08:25,840 --> 00:08:30,620
So you need to keep all your indexes
probably partial with the

168
00:08:30,620 --> 00:08:37,060
WHERE clause having this filter
to deal only with active records.

169
00:08:37,660 --> 00:08:41,980
And you accumulate like non-active
data set in the same table,

170
00:08:42,700 --> 00:08:44,840
which is not useful at all, right?

171
00:08:45,520 --> 00:08:47,520
So and also foreign keys.

172
00:08:48,300 --> 00:08:49,840
Michael: Yeah, unique constraints.

173
00:08:50,060 --> 00:08:51,420
There's so many downsides.

174
00:08:51,660 --> 00:08:52,480
Nikolay: That's complexity.

175
00:08:53,300 --> 00:08:57,100
Michael: Well, and you need to,
you need to adapt your, your

176
00:08:57,100 --> 00:09:03,280
application to always view things
where deleted at is not.

177
00:09:04,920 --> 00:09:07,200
Because otherwise people will see
deleted data.

178
00:09:07,200 --> 00:09:09,640
So there is definitely complexity
to that, but implementation-wise,

179
00:09:10,240 --> 00:09:12,080
I think it's on the simpler side.

180
00:09:12,700 --> 00:09:13,980
I think there are some advantages.

181
00:09:15,040 --> 00:09:17,600
Nikolay: I maybe would choose it
only if I know that deletes

182
00:09:17,600 --> 00:09:20,420
will be much less than 1% of all
data.

183
00:09:21,100 --> 00:09:22,600
But who knows in advance, right?

184
00:09:22,600 --> 00:09:24,560
Maybe we will need to delete a
lot.

185
00:09:24,720 --> 00:09:30,200
Sometimes, from social media experience,
spammers, in ways they

186
00:09:30,200 --> 00:09:33,520
come, They can register a million
accounts and you need to delete

187
00:09:33,520 --> 00:09:34,020
them.

188
00:09:34,300 --> 00:09:37,300
At the same time, to follow your
procedures you developed, if

189
00:09:37,300 --> 00:09:43,640
you developed a soft deletes approach,
you're deleting fake accounts

190
00:09:44,180 --> 00:09:45,360
is also deletes, right?

191
00:09:45,360 --> 00:09:49,060
So you need to still keep them
in the same table and it's like

192
00:09:49,640 --> 00:09:54,520
a lot of records which will sit
on the disk and most importantly

193
00:09:55,020 --> 00:09:56,640
in memory, in caches.

194
00:09:57,260 --> 00:10:00,060
Michael: Let's talk about the use
cases for it because I think

195
00:10:00,060 --> 00:10:03,260
they're different enough that I
think it will then become clearer

196
00:10:03,400 --> 00:10:04,960
when we would use different ones.

197
00:10:04,960 --> 00:10:09,240
Like, for example, I think the
primary use case, the only use

198
00:10:09,240 --> 00:10:12,880
case I've ever used Soft Deletes
for in the past, I've always

199
00:10:12,880 --> 00:10:16,160
been at companies and working on
products where we haven't done

200
00:10:16,160 --> 00:10:17,420
soft deletes by default.

201
00:10:17,880 --> 00:10:22,060
I've only ever had it implemented
it once for an application

202
00:10:23,040 --> 00:10:26,980
because several times a year, we
were restoring data that had

203
00:10:26,980 --> 00:10:29,360
been deleted accidentally or prematurely.

204
00:10:30,280 --> 00:10:31,980
So it was a,

205
00:10:31,980 --> 00:10:36,620
Nikolay: and you didn't have that
by slab to, to, to go back

206
00:10:36,620 --> 00:10:37,060
in time.

207
00:10:37,060 --> 00:10:37,560
Easily.

208
00:10:37,900 --> 00:10:40,260
Michael: We could risk the solution.

209
00:10:40,260 --> 00:10:43,440
We, we, this is actually on the
Microsoft SQL Server database

210
00:10:43,440 --> 00:10:44,060
at the time.

211
00:10:44,060 --> 00:10:46,480
And we would actually restore from
a backup.

212
00:10:46,480 --> 00:10:47,560
That's what we'd end up doing.

213
00:10:47,560 --> 00:10:48,740
Nikolay: It takes many hours.

214
00:10:49,940 --> 00:10:52,320
Michael: The larger the database,
the longer it takes.

215
00:10:52,740 --> 00:10:54,380
And it's painful and manual.

216
00:10:54,520 --> 00:10:59,540
And we were doing this often enough
that we decided to implement

217
00:10:59,540 --> 00:11:00,280
soft deletes.

218
00:11:00,280 --> 00:11:01,500
But with a slight difference.

219
00:11:01,500 --> 00:11:04,900
I think sometimes people describe
soft deletes as if it's like

220
00:11:04,960 --> 00:11:08,520
permanently gonna be in that soft
deletion state we would do

221
00:11:08,520 --> 00:11:13,620
it and then have a like a Cron
job that anything older than so

222
00:11:13,620 --> 00:11:17,280
people would delete something by
mistake and very quickly email

223
00:11:17,280 --> 00:11:21,300
support and say, I deleted the
wrong project in this case.

224
00:11:21,340 --> 00:11:22,360
Nikolay: We need to restore.

225
00:11:22,360 --> 00:11:24,300
Yeah.
And restore procedures is complex.

226
00:11:25,200 --> 00:11:26,480
Michael: Yeah, but it was always
quickly.

227
00:11:26,480 --> 00:11:28,600
It was always within a day

228
00:11:28,600 --> 00:11:29,280
Nikolay: or 2.

229
00:11:29,680 --> 00:11:30,560
It's quicker.

230
00:11:31,080 --> 00:11:31,640
Michael: No, sorry.

231
00:11:31,640 --> 00:11:35,500
So in this case, with this application,
people that delete things

232
00:11:35,500 --> 00:11:38,540
accidentally, they knew pretty,
like almost straight away that

233
00:11:38,540 --> 00:11:41,260
they had done it and that they'd
regretted doing it.

234
00:11:41,580 --> 00:11:46,220
So we knew we could, for example,
have this data in in a soft

235
00:11:46,220 --> 00:11:49,700
deleted fashion with deleted app
with a timestamp.

236
00:11:49,920 --> 00:11:53,820
And then we could have a cron job,
hard delete, or actually delete

237
00:11:53,820 --> 00:11:56,920
data that was more than like, that
was deleted app more than

238
00:11:56,920 --> 00:11:57,780
30 days ago.

239
00:11:57,780 --> 00:12:01,300
So in that case, it meant we could
keep the structure simple,

240
00:12:01,820 --> 00:12:05,800
only add 1 column, update the application
only in a few places

241
00:12:05,800 --> 00:12:08,440
to view the data that wasn't deleted
yet.

242
00:12:08,440 --> 00:12:11,920
You know, so it was a relatively
simple case where we were having

243
00:12:11,920 --> 00:12:14,240
to restore data from time to time.

244
00:12:14,240 --> 00:12:15,120
Nikolay: It's a good point.

245
00:12:15,120 --> 00:12:21,180
So I took extreme case when we
discuss accounts and it's like

246
00:12:21,180 --> 00:12:25,200
sensitive to like personal feelings
are affected.

247
00:12:25,200 --> 00:12:30,680
But you're describing basically
the same thing as emails in Gmail

248
00:12:30,820 --> 00:12:35,900
and Recycle Bin or like how it's
called trash folder.

249
00:12:35,900 --> 00:12:37,160
Michael: Yeah, like on a desktop.

250
00:12:37,180 --> 00:12:37,680
Yeah.

251
00:12:37,700 --> 00:12:38,080
Nikolay: Right.

252
00:12:38,080 --> 00:12:42,440
So you put it there basically when
you do deletes and it sits

253
00:12:42,440 --> 00:12:44,280
there for 30 days, right?

254
00:12:44,540 --> 00:12:46,240
And then automatically hard deleted.

255
00:12:46,880 --> 00:12:47,380
Michael: Exactly.

256
00:12:47,460 --> 00:12:50,780
Nikolay: And you can restore it
anytime and users usually appreciate

257
00:12:51,140 --> 00:12:57,100
this feature and consider it as
good if you can undo your action.

258
00:12:57,260 --> 00:12:57,720
I agree.

259
00:12:57,720 --> 00:13:02,040
Like when we talk about some objects
inside the account, right?

260
00:13:02,040 --> 00:13:03,460
So some data.

261
00:13:03,900 --> 00:13:06,500
Michael: The only difference is
in this case is we didn't implement

262
00:13:06,500 --> 00:13:08,580
the ability to view what's in the...

263
00:13:08,600 --> 00:13:11,520
Like in those 2 cases, what we
could have done.

264
00:13:11,520 --> 00:13:14,440
And that's actually, potentially
would have been nice for users.

265
00:13:14,640 --> 00:13:17,660
Nikolay: Yeah, actually in this
case, this approach, now I'm

266
00:13:17,660 --> 00:13:21,000
thinking, yeah, in this case, I
would probably choose some status

267
00:13:21,040 --> 00:13:24,440
approach because it's actually
the same data, we just don't show

268
00:13:24,440 --> 00:13:26,920
it in 1 place, but we show it in
another place.

269
00:13:26,920 --> 00:13:29,620
So it's just some category, that's
it.

270
00:13:29,800 --> 00:13:30,300
Right?

271
00:13:30,520 --> 00:13:31,260
You're right.

272
00:13:31,260 --> 00:13:35,140
So in this case, I would prefer
actually status, maybe timestamp

273
00:13:35,140 --> 00:13:39,860
of changes, but maybe I would put
timestamps to special different

274
00:13:39,860 --> 00:13:45,060
like audit, like log style append
only table to track actions

275
00:13:45,060 --> 00:13:46,760
when some action was done.

276
00:13:47,080 --> 00:13:49,000
Yeah, but you're right, actually.

277
00:13:49,860 --> 00:13:55,520
Maybe this case, let's call it
in-place soft-delete when we keep

278
00:13:55,520 --> 00:13:57,520
data in the same table and so on.

279
00:13:57,520 --> 00:14:00,360
Michael: I was thinking use-case-wise,
that's if you're implementing

280
00:14:00,380 --> 00:14:04,780
it to avoid or to make restoration
easier or undelete easier.

281
00:14:05,020 --> 00:14:09,140
Nikolay: And viewing deleted data,
if some in some cases like

282
00:14:09,140 --> 00:14:17,120
trash, trash data, if you want
to allow users to review it, Why

283
00:14:17,120 --> 00:14:17,620
not?

284
00:14:18,420 --> 00:14:21,760
Michael: Yeah, well, that's the
other like, there's another use

285
00:14:21,760 --> 00:14:25,160
case for soft deletes I've heard
and seen people write about

286
00:14:25,160 --> 00:14:30,560
which is to give you the ability
to examine it or to audit it

287
00:14:30,560 --> 00:14:33,740
or to look through why it was done
or who did what.

288
00:14:33,740 --> 00:14:35,100
So I think there are some...

289
00:14:35,140 --> 00:14:35,640
Exactly.

290
00:14:36,220 --> 00:14:40,320
And I know we've had a whole episode
on audit, but it could be

291
00:14:40,320 --> 00:14:43,820
not necessarily for regulatory
purposes, but it could be for

292
00:14:45,060 --> 00:14:45,920
debugging purposes.

293
00:14:47,180 --> 00:14:51,380
Nikolay: And there we can have
a hard delete button for users

294
00:14:51,380 --> 00:14:51,820
as well.

295
00:14:51,820 --> 00:14:55,880
If like only there, not in main
place, but for like if object

296
00:14:55,900 --> 00:14:59,680
already soft deleted, we could
allow users to actually delete.

297
00:14:59,680 --> 00:15:01,540
But let's talk about Postgres spec.

298
00:15:02,480 --> 00:15:03,980
This is UX discussion.

299
00:15:05,320 --> 00:15:06,000
What else?

300
00:15:06,740 --> 00:15:07,240
Compliance.

301
00:15:07,360 --> 00:15:11,580
Sometimes you do need to delete
data for sure, like keeping it

302
00:15:11,580 --> 00:15:17,160
only in some backups and also have
some policy to delete there

303
00:15:17,160 --> 00:15:17,420
as well.

304
00:15:17,420 --> 00:15:21,000
I mean, backups should be deleted
as well at some point.

305
00:15:21,020 --> 00:15:24,940
Sometimes you do need to keep information
and soft deletes help

306
00:15:25,800 --> 00:15:28,600
to comply some policies and regulations.

307
00:15:30,060 --> 00:15:33,060
Depending on law you need to follow,
like it's hard.

308
00:15:34,080 --> 00:15:38,640
But honestly, I prefer, speaking
of law, I prefer when I need

309
00:15:38,640 --> 00:15:39,640
to delete everything.

310
00:15:39,920 --> 00:15:44,380
But for convenience and as a business
owner, it's actually much

311
00:15:44,380 --> 00:15:48,420
more convenient if you still have
data all the time and you can

312
00:15:48,420 --> 00:15:50,700
review what happened and restore
it?

313
00:15:51,180 --> 00:15:54,300
Well, sometimes, depending on the
nature of your business.

314
00:15:54,720 --> 00:15:59,500
I see more and more guys follow,
like, it's like some compliance

315
00:15:59,500 --> 00:16:04,840
rules, but I see more and more
companies implement the rule that

316
00:16:04,840 --> 00:16:08,760
Slack deletes messages after some
time and everything is deleted.

317
00:16:08,760 --> 00:16:09,440
I don't know.

318
00:16:09,440 --> 00:16:11,200
Maybe it's because of costs.

319
00:16:11,200 --> 00:16:12,800
Maybe it's some compliance, actually.

320
00:16:12,800 --> 00:16:13,640
I don't know.

321
00:16:14,060 --> 00:16:16,740
Michael: Well, I think there was
an age where this going back

322
00:16:16,740 --> 00:16:19,240
to kind of philosophical side of
things, but I think there was

323
00:16:19,240 --> 00:16:23,000
a time where people, I remember
reading headlines like data is

324
00:16:23,000 --> 00:16:25,440
the new oil and things like that
and people thought hoarding

325
00:16:25,440 --> 00:16:28,220
as much data or gathering as much
data as they could would be

326
00:16:28,220 --> 00:16:32,640
valuable in the future for analysis
reasons or for mining, that

327
00:16:32,640 --> 00:16:33,460
kind of thing.

328
00:16:33,660 --> 00:16:39,380
And I feel like with a lot of data
breaches, especially, sorry,

329
00:16:39,380 --> 00:16:42,940
I think I've got a siren in my
background, with data breaches,

330
00:16:43,820 --> 00:16:47,040
I still feel like things flipped
and people started to think

331
00:16:47,040 --> 00:16:49,380
of having data as a liability.

332
00:16:49,820 --> 00:16:53,040
And the more data they held, or
the more personal it was, the

333
00:16:53,040 --> 00:16:58,320
more at risk they are of, or the
bigger target they become for

334
00:16:58,320 --> 00:16:58,820
attacks.

335
00:16:59,380 --> 00:17:02,720
I feel like that's changed even
within the last, well, even within

336
00:17:02,720 --> 00:17:07,480
my career of how people view holding
data if you don't really

337
00:17:07,480 --> 00:17:08,040
need it.

338
00:17:08,040 --> 00:17:08,540
Nikolay: Yeah.

339
00:17:08,560 --> 00:17:09,940
Also performance reasons.

340
00:17:10,160 --> 00:17:13,100
If you keep everything, it's hard.

341
00:17:13,100 --> 00:17:18,760
I see many successful e-commerce
companies, they do need to actually

342
00:17:18,760 --> 00:17:23,240
delete data, and having soft deletes,
they at some point decide

343
00:17:23,240 --> 00:17:25,420
to clean up and actually delete
data.

344
00:17:26,460 --> 00:17:30,300
So in place of soft delete, you
just have a column or a couple

345
00:17:30,300 --> 00:17:33,920
of columns, and then you have a
lot of partial indexes, obviously,

346
00:17:33,920 --> 00:17:34,420
maybe.

347
00:17:34,680 --> 00:17:35,180
Right?

348
00:17:35,740 --> 00:17:36,240
Maybe.

349
00:17:36,340 --> 00:17:37,100
Maybe not.

350
00:17:37,100 --> 00:17:41,080
It depends on the distribution
and selectivity you have.

351
00:17:41,840 --> 00:17:43,860
And how could we improve this?

352
00:17:44,260 --> 00:17:48,900
What about having a special partition
or a set of partitions

353
00:17:50,740 --> 00:17:53,640
and include this column to partition
key.

354
00:17:54,440 --> 00:17:58,620
It would be transparent for application
if we update status setting

355
00:17:58,620 --> 00:18:00,560
from active to deleted.

356
00:18:01,460 --> 00:18:06,500
In this case, the row will be deleted
in the main partition,

357
00:18:06,500 --> 00:18:10,440
like actual working partition and
it will go to this kind of

358
00:18:10,440 --> 00:18:11,960
archive partition, right?

359
00:18:12,560 --> 00:18:13,940
Michael: Yeah, it would be like
moved.

360
00:18:13,940 --> 00:18:17,980
So I guess it's similar in terms
of like write-ahead log impact

361
00:18:17,980 --> 00:18:20,680
and but actually could it be higher?

362
00:18:20,680 --> 00:18:24,020
I guess you're forcing no hot updates
but I'm guessing you're

363
00:18:24,020 --> 00:18:27,020
not going to get hot updates anyway
because it's the delete that.

364
00:18:28,380 --> 00:18:28,880
Nikolay: Yeah.

365
00:18:29,760 --> 00:18:32,200
You cannot have hot update here.

366
00:18:32,200 --> 00:18:36,360
So 0 chances because it's actually
deleted, actually inserted.

367
00:18:39,380 --> 00:18:44,540
I guess hat updates is a good thing
to have, but in this case

368
00:18:44,540 --> 00:18:46,800
you should avoid partial...

369
00:18:48,060 --> 00:18:50,940
Back to a single table and just
a column.

370
00:18:51,180 --> 00:18:56,240
If you want hot updates to be used
when you update this column,

371
00:18:56,780 --> 00:19:00,780
you cannot use this column in partial
indexes in the where clause

372
00:19:00,780 --> 00:19:02,200
of those indexes.

373
00:19:03,260 --> 00:19:04,640
You must avoid it.

374
00:19:05,380 --> 00:19:11,320
It works well only if you have,
as I said, only low part of your

375
00:19:11,320 --> 00:19:12,700
table of deletes.

376
00:19:12,700 --> 00:19:14,660
Like deletes are rare.

377
00:19:14,760 --> 00:19:15,260
Michael: Yeah.

378
00:19:16,540 --> 00:19:20,900
So another, like another solution
I see mentioned that I've not

379
00:19:20,900 --> 00:19:22,920
used is using a view.

380
00:19:23,000 --> 00:19:28,620
So defining a view where deleted
at is, has no value for example.

381
00:19:28,620 --> 00:19:30,640
Nikolay: Yeah, I don't know.

382
00:19:30,840 --> 00:19:36,180
You know like when you're in love
with something many years and

383
00:19:36,180 --> 00:19:38,720
then this love is converted to
hate.

384
00:19:39,160 --> 00:19:45,260
This is my relationship with views
because my, yeah, my thesis,

385
00:19:45,540 --> 00:19:49,540
martial arts master, I don't remember,
was about updatable views

386
00:19:50,820 --> 00:19:55,080
and then updatable XML views and
like, oh, I spent so much time

387
00:19:55,080 --> 00:19:57,460
and I worked with them so much
time.

388
00:19:57,560 --> 00:20:02,360
Now the only views I can deal with
now these days myself in my

389
00:20:02,360 --> 00:20:04,620
projects where I have full control.

390
00:20:04,700 --> 00:20:06,660
Only Postgres views.

391
00:20:07,200 --> 00:20:07,980
Michael: That's it.

392
00:20:08,860 --> 00:20:11,920
Nikolay: Everywhere else I avoid
them because I don't want to

393
00:20:12,260 --> 00:20:14,440
deal with dependency hell later.

394
00:20:14,660 --> 00:20:16,360
Because it's just a lot of overhead.

395
00:20:17,860 --> 00:20:21,060
You just change something or add
column or drop column.

396
00:20:21,060 --> 00:20:26,880
You need to recreate your views
and logs and logs when you issue

397
00:20:26,880 --> 00:20:29,280
DDL, like, I don't want this.

398
00:20:29,640 --> 00:20:34,740
But Yes, views can be used in many
cases, but you cannot create

399
00:20:34,740 --> 00:20:36,640
an index on a regular view.

400
00:20:37,280 --> 00:20:39,060
Again, I'm speaking of performance.

401
00:20:40,900 --> 00:20:43,440
Just views will hide it, right?

402
00:20:43,440 --> 00:20:44,140
And that's it.

403
00:20:44,140 --> 00:20:45,540
I mean, hide some rows.

404
00:20:46,460 --> 00:20:46,960
Michael: Exactly.

405
00:20:47,220 --> 00:20:51,100
But it also makes the implementation
a little bit easier because

406
00:20:51,100 --> 00:20:52,280
you don't have to update.

407
00:20:52,280 --> 00:20:55,860
You don't have to make the application
aware of your soft delete

408
00:20:55,860 --> 00:20:56,360
implementation.

409
00:20:57,660 --> 00:20:58,140
Nikolay: Well, yeah.

410
00:20:58,140 --> 00:21:00,560
Michael: You could just point the
application at the view now.

411
00:21:00,560 --> 00:21:01,560
The Application doesn't

412
00:21:01,560 --> 00:21:03,220
Nikolay: see those rows at all,
right?

413
00:21:03,680 --> 00:21:05,780
Michael: Should we move on to some
other, like...

414
00:21:06,220 --> 00:21:06,720
Nikolay: Approach.

415
00:21:06,880 --> 00:21:08,200
Separate table approach.

416
00:21:08,760 --> 00:21:09,260
Michael: Yes.

417
00:21:10,200 --> 00:21:12,580
Nikolay: So straightforward approach.

418
00:21:13,320 --> 00:21:17,880
Let's just have the table with
same structure, right?

419
00:21:18,280 --> 00:21:22,260
Maybe not the same set of indexes
because we don't need all of

420
00:21:22,260 --> 00:21:23,580
them, maybe, right?

421
00:21:24,240 --> 00:21:29,900
And just insert to that table,
maybe using some trigger when

422
00:21:29,900 --> 00:21:31,040
delete happens.

423
00:21:32,260 --> 00:21:35,600
Michael: Yeah, so the 2 approaches
I've had are a shadow table

424
00:21:35,600 --> 00:21:38,260
for each table that you want to,
yeah.

425
00:21:38,320 --> 00:21:40,700
I don't know if that's the correct
term.

426
00:21:41,460 --> 00:21:43,080
Nikolay: And I also hated this.

427
00:21:44,440 --> 00:21:47,180
Because DDL, maintaining schema.

428
00:21:49,300 --> 00:21:53,100
Michael: But I did see a neat alternative
to this in a blog post

429
00:21:53,100 --> 00:21:55,440
by Brander, who's got a couple
of blog posts about DDL.

430
00:21:55,440 --> 00:21:58,740
Nikolay: So let's discuss why it's
bad, like in detail, just

431
00:21:58,740 --> 00:21:59,440
to clarify.

432
00:21:59,760 --> 00:22:00,300
Good idea.

433
00:22:00,300 --> 00:22:04,280
First of all, you need to create,
if you have many tables where

434
00:22:04,280 --> 00:22:08,220
you want soft delete to be implemented,
you always need additional

435
00:22:08,220 --> 00:22:08,600
table.

436
00:22:08,600 --> 00:22:10,140
It's just overhead.

437
00:22:10,840 --> 00:22:14,300
And also you need, every time you
create a column, for example,

438
00:22:14,300 --> 00:22:18,760
or a name column or something,
you also need not to forget to

439
00:22:18,760 --> 00:22:20,260
do it in the shadow table.

440
00:22:21,040 --> 00:22:26,540
And living with such schema many
years in 1 project, I must admit,

441
00:22:27,340 --> 00:22:29,280
it's a huge headache.

442
00:22:30,060 --> 00:22:31,580
People always forget it.

443
00:22:31,960 --> 00:22:35,400
You can try to invent some procedures
and automation not to forget

444
00:22:35,400 --> 00:22:42,480
it, but still, deployments to production
when shadow table DDL

445
00:22:42,820 --> 00:22:46,280
was forgotten, quite significant
risk.

446
00:22:46,420 --> 00:22:49,260
So It's a lot of objects to deal
with.

447
00:22:49,440 --> 00:22:54,060
Of course, you can put it to some,
say, schema named shadow,

448
00:22:54,060 --> 00:22:57,780
for example, and keep the table
names exactly the same as in

449
00:22:57,780 --> 00:23:02,200
the original schema, public or
something new, if it's a new project.

450
00:23:03,540 --> 00:23:06,540
So public schema is not popular
anymore, right?

451
00:23:07,060 --> 00:23:11,700
So, and it's good because it means
convenience because exactly

452
00:23:11,700 --> 00:23:16,800
the same table names, but it introduces
a lot of overhead, it

453
00:23:16,800 --> 00:23:22,960
requires good culture of automation,
like tooling, not to be,

454
00:23:23,360 --> 00:23:25,300
to be always in a consistent state.

455
00:23:26,640 --> 00:23:29,880
Michael: Yeah, I think there was
a lot, there are a lot of difficulties

456
00:23:30,060 --> 00:23:30,780
as part of this.

457
00:23:30,780 --> 00:23:34,780
Obviously, it does come with some
benefits, like easier to restore

458
00:23:35,020 --> 00:23:39,660
than the next, the alternative
way of doing this.

459
00:23:39,860 --> 00:23:41,680
Nikolay: Let me guess about alternative
way.

460
00:23:41,680 --> 00:23:42,980
I also implemented it.

461
00:23:42,980 --> 00:23:48,800
I didn't read this post by Brandur,
but I can guess because I

462
00:23:48,800 --> 00:23:53,280
also implement I think we walked
on the same paths, right?

463
00:23:53,440 --> 00:23:59,060
So you create a single table with
probably like some ID maybe

464
00:23:59,060 --> 00:24:04,960
some surrogate ID then table name
who deleted timestamp and then

465
00:24:04,960 --> 00:24:11,540
just JSON or JSONB column where
you just put all the data from

466
00:24:11,540 --> 00:24:12,340
any table.

467
00:24:13,040 --> 00:24:14,160
Right.
And you can...

468
00:24:15,900 --> 00:24:16,300
Exactly.

469
00:24:16,300 --> 00:24:16,420
Okay.
Why not?

470
00:24:16,420 --> 00:24:18,440
Preserving column names maybe even.

471
00:24:19,020 --> 00:24:19,360
Right.

472
00:24:19,360 --> 00:24:20,020
So you...

473
00:24:20,220 --> 00:24:20,580
Michael: Yes.

474
00:24:20,580 --> 00:24:24,100
I think they used like a to JSONB
function.

475
00:24:25,240 --> 00:24:25,740
Nikolay: Right.

476
00:24:26,200 --> 00:24:29,540
And you do everything in triggers,
so everything again is transparent.

477
00:24:30,040 --> 00:24:35,280
When a delete statement happens,
your row from any table which

478
00:24:35,280 --> 00:24:40,020
has this trigger is packed and
sent to this table.

479
00:24:40,160 --> 00:24:41,020
Very convenient.

480
00:24:41,740 --> 00:24:42,880
I like this schema.

481
00:24:43,740 --> 00:24:44,440
Michael: It's super.

482
00:24:44,620 --> 00:24:48,980
So again, this is optimized for
like, this is harder to restore

483
00:24:49,000 --> 00:24:52,540
from, right, because it's now in
a difficult format.

484
00:24:53,320 --> 00:24:57,080
And especially if like the schema
has changed in between.

485
00:24:58,260 --> 00:24:59,980
Nikolay: Well, this is a price.

486
00:25:01,100 --> 00:25:01,880
You have trade-off.

487
00:25:01,960 --> 00:25:07,100
Either you survive any schema changes
easily, or you need to

488
00:25:09,720 --> 00:25:11,780
deal with some mutations during
restore.

489
00:25:11,820 --> 00:25:15,660
You need to understand, okay, this
delete happened A year ago,

490
00:25:15,660 --> 00:25:16,800
we had a different schema.

491
00:25:16,800 --> 00:25:17,640
What to do?

492
00:25:17,640 --> 00:25:19,040
You do need to do this.

493
00:25:19,360 --> 00:25:21,100
Or you deal with schema...

494
00:25:21,560 --> 00:25:25,940
You need to maintain mirror schema
changes all the time.

495
00:25:25,940 --> 00:25:28,240
I would prefer actually deal with...

496
00:25:28,300 --> 00:25:29,020
Since restore...

497
00:25:29,340 --> 00:25:31,060
Depending on the project, of course.

498
00:25:31,280 --> 00:25:35,580
If restore is not super common,
I would prefer this schema.

499
00:25:35,860 --> 00:25:37,540
Super convenient, super universal.

500
00:25:38,440 --> 00:25:39,400
Michael: Yeah, me too.

501
00:25:39,400 --> 00:25:41,320
I think there's a lot of benefits
here.

502
00:25:41,320 --> 00:25:44,440
Not for the 1 case I did implement
soft deletes in the past,

503
00:25:44,440 --> 00:25:48,520
where the whole point was to make
restoration easier in a very

504
00:25:48,520 --> 00:25:49,780
small number of cases.

505
00:25:50,420 --> 00:25:52,440
This case is very different to
that.

506
00:25:52,540 --> 00:25:56,320
But yeah, I do think there are
some tricky parts of this, though,

507
00:25:56,320 --> 00:25:59,200
I think, for example, the compliance
thing is tricky here.

508
00:25:59,200 --> 00:26:02,320
Like, it's another place that you
need to remember to delete

509
00:26:02,320 --> 00:26:04,020
data from in the future.

510
00:26:04,020 --> 00:26:08,300
If you get a GDPR request, you
can't just search on normal tables

511
00:26:08,680 --> 00:26:09,360
for that data.

512
00:26:09,360 --> 00:26:11,920
You also have to search this delete
log.

513
00:26:12,180 --> 00:26:15,340
And it might be tricky to search
because it's JSON B data with

514
00:26:15,340 --> 00:26:16,200
like unstructured.

515
00:26:16,700 --> 00:26:18,700
So there's a there I think there
are downsides.

516
00:26:19,760 --> 00:26:20,260
Nikolay: Right.

517
00:26:20,800 --> 00:26:23,800
Is it fair to call this soft delete?

518
00:26:24,220 --> 00:26:26,340
Maybe it should be called just
archiving?

519
00:26:27,440 --> 00:26:28,200
Michael: Good point.

520
00:26:30,320 --> 00:26:33,120
There's no definition of soft delete
that I know of.

521
00:26:33,120 --> 00:26:35,180
It's not part of the SQL standard.

522
00:26:35,860 --> 00:26:41,200
So I think it's more around like
what are the, what's it for?

523
00:26:41,200 --> 00:26:45,100
And maybe we've lost sight of like
why are we implementing this

524
00:26:45,100 --> 00:26:45,360
feature?

525
00:26:45,360 --> 00:26:50,460
If it's for easy undeletes, maybe
this isn't the right solution.

526
00:26:50,500 --> 00:26:55,780
But if it's for debugging or if
it's for different purposes,

527
00:26:56,400 --> 00:26:59,280
then yeah, archiving makes some
sense.

528
00:26:59,540 --> 00:27:04,020
But I guess it is, it's deleting
it in a way that it's not deleted.

529
00:27:04,180 --> 00:27:08,000
That is the, you know, you mentioned
the, as a user seeing some

530
00:27:08,000 --> 00:27:11,400
signs that it wasn't deleted, this
avoids a lot of those, right?

531
00:27:11,400 --> 00:27:13,700
Like it's not in the same table
anymore.

532
00:27:14,140 --> 00:27:14,640
Nikolay: Yeah.

533
00:27:14,640 --> 00:27:20,280
Application can, If we define the
trigger function with security

534
00:27:20,320 --> 00:27:27,100
definer, we can revoke or just
do not grant any permissions to

535
00:27:27,100 --> 00:27:30,080
application user to deal with schema
or this table.

536
00:27:30,480 --> 00:27:31,320
And that's it.

537
00:27:31,320 --> 00:27:34,420
So application won't be able to
reach the data at all.

538
00:27:34,640 --> 00:27:38,140
At very low, like, I mean, at database
level, database guarantees,

539
00:27:38,200 --> 00:27:39,260
it's not possible.

540
00:27:39,620 --> 00:27:42,440
But at the same time, Trigger can
still insert it because it

541
00:27:42,440 --> 00:27:42,940
has...

542
00:27:43,180 --> 00:27:46,780
Trigger function has security definer,
meaning that it has permissions

543
00:27:46,960 --> 00:27:52,280
to run queries which are inside
the function using the level

544
00:27:52,280 --> 00:27:56,180
of access of the user who defined
this function.

545
00:27:56,180 --> 00:27:57,680
Michael: Cool.
I didn't know you could do that.

546
00:27:57,680 --> 00:27:58,640
That's very cool.

547
00:27:59,100 --> 00:28:00,640
Nikolay: It's called security definer.

548
00:28:01,860 --> 00:28:05,460
When you create a function, trigger
function, for trigger you

549
00:28:05,460 --> 00:28:08,600
need to create a function, you
just say Security Definer.

550
00:28:09,240 --> 00:28:13,700
And I like it because you can encapsulate
some actions which

551
00:28:13,920 --> 00:28:17,540
normal user, application user,
When I say application user, I

552
00:28:17,540 --> 00:28:21,000
mean database user which is used
by your application.

553
00:28:23,200 --> 00:28:26,820
Usually there are some tables that
this user cannot access.

554
00:28:27,180 --> 00:28:30,600
For example, this shadow table
or shadow schema with multiple

555
00:28:30,600 --> 00:28:31,100
tables.

556
00:28:31,960 --> 00:28:34,540
But I still prefer single table.

557
00:28:34,840 --> 00:28:38,680
But you can still allow, for example,
specific insert encapsulated

558
00:28:38,860 --> 00:28:45,660
in this function and just grant
permissions for specific action

559
00:28:45,940 --> 00:28:48,460
indirectly via the security definer.

560
00:28:49,540 --> 00:28:50,040
Michael: Nice.

561
00:28:50,140 --> 00:28:50,940
I like it.

562
00:28:51,140 --> 00:28:51,500
Nikolay: Yeah.

563
00:28:51,500 --> 00:28:53,860
I use it all the time for many
cases.

564
00:28:55,080 --> 00:28:58,980
So yeah, better control what can
be done, what cannot be done.

565
00:28:58,980 --> 00:29:01,920
So you basically it's good for
security reasons.

566
00:29:01,980 --> 00:29:06,420
It's good to forbid everything
first as first thing, and then

567
00:29:06,420 --> 00:29:09,600
just allow some specific action
in very controlled manner.

568
00:29:10,320 --> 00:29:10,660
Intrigue.

569
00:29:10,660 --> 00:29:13,480
Michael: I feel like there's a
phrase for this, isn't it?

570
00:29:13,480 --> 00:29:15,720
Like security by minimum permission
or something.

571
00:29:15,720 --> 00:29:17,240
I can't remember what the phrase
is.

572
00:29:17,240 --> 00:29:20,140
It's like a security principle
of least permission, give things

573
00:29:20,140 --> 00:29:21,720
the least permission they need.

574
00:29:21,960 --> 00:29:22,900
Right, right,

575
00:29:23,640 --> 00:29:24,140
Nikolay: right.

576
00:29:24,340 --> 00:29:28,660
Well, in former life, we called
it a blacklisting, white listing.

577
00:29:28,660 --> 00:29:32,060
I don't know why now it's not possible
to tell this, right?

578
00:29:32,340 --> 00:29:36,800
But it means that you will just
say nothing is possible except

579
00:29:37,340 --> 00:29:38,420
this, right?

580
00:29:39,440 --> 00:29:41,260
And yeah, it's good.

581
00:29:41,480 --> 00:29:45,560
So yeah, it's a good approach,
but you are right, Restoration

582
00:29:45,800 --> 00:29:46,660
will be tricky.

583
00:29:46,880 --> 00:29:52,500
You need to develop specific procedures,
deal somehow with schema

584
00:29:52,500 --> 00:29:53,000
evolution.

585
00:29:54,440 --> 00:30:02,160
And in this case, I would say if
we talk about some account deletion,

586
00:30:02,540 --> 00:30:03,540
I would do this.

587
00:30:03,740 --> 00:30:08,040
If we talk about something like
we have some objects user can

588
00:30:08,940 --> 00:30:14,940
create, modify, or delete, and
we want support undo for deletion,

589
00:30:15,520 --> 00:30:19,340
Like in Gmail with emails, when
you just delete email, it goes

590
00:30:19,340 --> 00:30:22,440
to trash, and then you can undo
or delete forever, right?

591
00:30:22,440 --> 00:30:23,680
There's a button, delete forever.

592
00:30:23,680 --> 00:30:24,780
This is hard to delete.

593
00:30:24,960 --> 00:30:29,860
In this case, I would prefer maybe
still in place, like in the

594
00:30:29,860 --> 00:30:33,660
same, like just a column in the
same table status or something.

595
00:30:33,760 --> 00:30:34,260
Michael: Nice.

596
00:30:34,280 --> 00:30:35,420
Nikolay: Right, right.

597
00:30:35,420 --> 00:30:39,240
Both are, we can say, both are
soft deletes with their own pros

598
00:30:39,240 --> 00:30:39,980
and cons.

599
00:30:40,600 --> 00:30:42,340
Michael: Yeah, seems very reasonable.

600
00:30:42,540 --> 00:30:45,640
Nikolay: Do you think they can
coexist in the same project?

601
00:30:47,000 --> 00:30:51,360
Not because they are, like decisions
were made by different persons,

602
00:30:52,120 --> 00:30:52,620
different

603
00:30:53,420 --> 00:30:53,547
Michael: developers.

604
00:30:53,547 --> 00:30:57,320
I think I genuinely think that
the end of Brando's blog post,

605
00:30:57,440 --> 00:31:00,900
he mentioned switching to this
in all but a couple of tables.

606
00:31:01,000 --> 00:31:03,380
So I think they are currently a
good example

607
00:31:03,640 --> 00:31:05,100
Nikolay: of a project that

608
00:31:05,100 --> 00:31:05,820
Michael: has both.

609
00:31:06,600 --> 00:31:08,160
Nikolay: You mean switching to
JSON B Shadow Table?

610
00:31:08,560 --> 00:31:11,640
Michael: Yes, instead of you calling
it in place.

611
00:31:11,640 --> 00:31:14,340
I think they still implement in
place in a couple of places.

612
00:31:15,560 --> 00:31:18,920
Nikolay: The best thing about this
single shadow table with JSON-B

613
00:31:19,080 --> 00:31:24,140
colon is that it's so easy to attach
it to any table.

614
00:31:24,280 --> 00:31:25,240
It's just a trigger.

615
00:31:25,240 --> 00:31:25,920
That's it.

616
00:31:25,920 --> 00:31:27,900
It will slow down, delete a little
bit.

617
00:31:27,900 --> 00:31:28,680
That's it.

618
00:31:29,340 --> 00:31:30,600
Michael: Yeah, because of the triggers.

619
00:31:31,560 --> 00:31:35,320
You could use it for the hard delete
and then have it for around

620
00:31:35,320 --> 00:31:36,540
for a while longer.

621
00:31:36,900 --> 00:31:39,520
Nikolay: We discussed about compliance
and so on, but, and I

622
00:31:39,520 --> 00:31:42,340
mentioned that for business owners,
like there are fears that

623
00:31:42,340 --> 00:31:45,040
like something was deleted, hard
to recover.

624
00:31:45,240 --> 00:31:47,220
It's very good for audit purposes.

625
00:31:47,220 --> 00:31:52,360
For example, you have some account
in your application, and there

626
00:31:52,360 --> 00:31:56,660
are some admins, or like owners,
managers, and there are some

627
00:31:56,660 --> 00:31:57,580
regular users.

628
00:31:57,860 --> 00:32:01,280
Some users do something, they deleted
something, and you just

629
00:32:01,280 --> 00:32:05,440
need to not only see and undo it
but see when it was done by

630
00:32:05,440 --> 00:32:06,360
whom like

631
00:32:06,500 --> 00:32:09,720
Michael: I saw another person in
a hacker news thread talk.

632
00:32:09,720 --> 00:32:10,220
Yeah.

633
00:32:10,240 --> 00:32:13,440
Not just audit but also what really
happened.

634
00:32:13,440 --> 00:32:16,080
So I saw somebody who had implemented
a chat system.

635
00:32:16,360 --> 00:32:19,980
Let's say it's a support business
to business support system

636
00:32:20,200 --> 00:32:25,240
and 1 of your internal works for
your company users said something

637
00:32:25,240 --> 00:32:28,060
they regret saying as part of this
chat and then deleted 1 of

638
00:32:28,060 --> 00:32:28,780
those messages.

639
00:32:29,020 --> 00:32:32,360
It's really useful for an admin
to see the whole history or you

640
00:32:32,360 --> 00:32:35,540
know, you mentioned social media
and a lot of social media sites

641
00:32:35,540 --> 00:32:37,605
that you edit a message, keeping
the old version.

642
00:32:37,605 --> 00:32:39,780
I edited
Nikolay: this update, not delete.

643
00:32:39,880 --> 00:32:43,280
So this leads us to like time travel
basically.

644
00:32:44,100 --> 00:32:46,820
We cannot say, name it soft updates,
right?

645
00:32:46,820 --> 00:32:51,420
But tracking all versions, it actually
also possible they are

646
00:32:51,420 --> 00:32:52,240
the same approach.

647
00:32:52,240 --> 00:32:53,960
And I think I did it as well.

648
00:32:54,000 --> 00:33:00,220
So when you have delete with this
trigger with security, security,

649
00:33:00,260 --> 00:33:04,800
definer function, trigger function,
which just archives previous

650
00:33:05,740 --> 00:33:06,240
record.

651
00:33:06,580 --> 00:33:10,780
In the trigger function, you use
old key name.

652
00:33:11,000 --> 00:33:12,680
This is just record, right?

653
00:33:12,780 --> 00:33:15,860
And you convert it to JSONB and
insert, right?

654
00:33:16,080 --> 00:33:18,820
Absolutely the same trigger can
be used for updates.

655
00:33:19,860 --> 00:33:24,340
With any update, maybe I would
have some checks if some actual

656
00:33:24,340 --> 00:33:27,440
data was changed, not just something
small was changed, right?

657
00:33:27,440 --> 00:33:30,140
But actual data, like message was
changed, right?

658
00:33:30,140 --> 00:33:31,160
Title, message.

659
00:33:31,940 --> 00:33:36,660
New dot title doesn't equal old
dot title, and so on.

660
00:33:36,660 --> 00:33:42,180
If something, just to avoid, you
know, like spam archiving, which

661
00:33:42,280 --> 00:33:45,560
like useless records in this shadow
table.

662
00:33:45,800 --> 00:33:50,140
So if I see something valuable
has been changed with this update,

663
00:33:50,600 --> 00:33:55,600
again, I back old to JSONB and
sort the same, like the same trigger

664
00:33:55,600 --> 00:33:56,100
basically.

665
00:33:56,820 --> 00:33:57,320
Right?

666
00:33:57,380 --> 00:33:58,640
Michael: Yeah, I'm not sure.

667
00:33:59,280 --> 00:33:59,940
It depends.

668
00:34:00,040 --> 00:34:03,040
It depends how and when you need,
how often you need to be viewing

669
00:34:03,040 --> 00:34:06,140
those older ones, because it might
be tricky to show the admin

670
00:34:06,140 --> 00:34:09,020
those if you, how are you going
to select that data back out

671
00:34:09,020 --> 00:34:10,460
to show them the old versions?

672
00:34:10,960 --> 00:34:12,020
I guess it's tricky.

673
00:34:12,260 --> 00:34:15,240
Nikolay: I would, by the way, no,
for updates, I think This is

674
00:34:15,240 --> 00:34:16,260
what I did actually.

675
00:34:16,400 --> 00:34:20,660
For updates, I would archive probably
both old and new records.

676
00:34:21,220 --> 00:34:25,220
Because just, yes, to be able to
show diff.

677
00:34:25,840 --> 00:34:29,000
Yes, it's like too much data to
store, but why not?

678
00:34:29,100 --> 00:34:32,160
In this case, I would extend shadow
table with additional JSONB

679
00:34:32,160 --> 00:34:32,660
column.

680
00:34:32,900 --> 00:34:34,520
Like, delete...

681
00:34:35,740 --> 00:34:38,760
Records related to deletes won't
have it.

682
00:34:39,140 --> 00:34:42,080
Like, they would have only nulls
in that column.

683
00:34:42,380 --> 00:34:45,200
For updates, I would have 2 JSONB
values.

684
00:34:45,560 --> 00:34:46,060
Michael: Interesting.

685
00:34:46,400 --> 00:34:47,620
Nikolay: Old and new, basically.

686
00:34:47,640 --> 00:34:51,360
And then we can build some diff
if needed.

687
00:34:52,100 --> 00:34:52,600
Right?

688
00:34:54,340 --> 00:34:54,840
Michael: Maybe.

689
00:34:55,600 --> 00:34:58,120
I think this is getting beyond
the scope of what...

690
00:34:58,660 --> 00:34:59,520
But this is interesting.

691
00:34:59,600 --> 00:35:00,028
I think my...

692
00:35:00,028 --> 00:35:02,620
Nikolay: No, it's useful and easy
to implement and super universal.

693
00:35:02,640 --> 00:35:06,320
You can again attach it to any
table and have simple audit like

694
00:35:06,320 --> 00:35:06,820
that.

695
00:35:07,720 --> 00:35:11,120
Michael: But what I don't like
or understand is because this

696
00:35:11,120 --> 00:35:15,260
is a single shadow table across
multiple, what are you querying?

697
00:35:15,360 --> 00:35:19,740
Like, how are you looking for that
in a, let's say it was a month

698
00:35:19,740 --> 00:35:24,400
ago, in your, how you're efficiently
finding that in that shadow

699
00:35:24,400 --> 00:35:24,880
table?

700
00:35:24,880 --> 00:35:26,340
What are you searching by?

701
00:35:26,820 --> 00:35:29,180
Nikolay: Well, depending on the
application, right?

702
00:35:29,180 --> 00:35:31,720
I would probably in this shadow
table, I would probably have

703
00:35:31,720 --> 00:35:37,060
some columns which would allow
me to search faster, or I would

704
00:35:37,060 --> 00:35:38,820
deal with searching inside JSON-B.

705
00:35:38,940 --> 00:35:39,640
Why not?

706
00:35:39,960 --> 00:35:40,460
Genetics?

707
00:35:40,680 --> 00:35:41,110
Michael: Well, yeah, maybe.

708
00:35:41,110 --> 00:35:42,240
This slows down updates.

709
00:35:46,240 --> 00:35:47,300
Yeah, there's a lot of...

710
00:35:47,300 --> 00:35:51,580
I think that it's a tricky one because
if it's a shadow table for

711
00:35:51,580 --> 00:35:55,240
all tables, you'd like having user
ID or like having other columns

712
00:35:55,240 --> 00:35:58,220
that you could search by might
not make sense for some of those.

713
00:36:00,180 --> 00:36:01,380
Nikolay: You can have null there.

714
00:36:01,560 --> 00:36:02,700
Michael: True, true, true.

715
00:36:02,900 --> 00:36:04,510
Nikolay: Well, yeah, I agree with
that.

716
00:36:04,510 --> 00:36:08,860
Especially if you add gin, insert
this implicit insert, which

717
00:36:08,860 --> 00:36:12,840
happens during delete or update,
it will be not fast at all because

718
00:36:12,840 --> 00:36:13,480
of gin.

719
00:36:14,440 --> 00:36:15,680
So yeah, overhead.

720
00:36:16,500 --> 00:36:17,440
This is the price.

721
00:36:17,440 --> 00:36:17,980
You have it.

722
00:36:17,980 --> 00:36:19,000
Michael: Yeah, I like that.

723
00:36:19,000 --> 00:36:19,940
It's another trade-off.

724
00:36:20,140 --> 00:36:22,540
It depends what you want, and it
depends how much data you have,

725
00:36:22,540 --> 00:36:24,600
and it depends what proportion
is ever going to be deleted, or

726
00:36:24,600 --> 00:36:28,300
what proportion is ever going to
be restored or looked into.

727
00:36:28,840 --> 00:36:30,060
But yeah, I think it's really cool.

728
00:36:30,060 --> 00:36:32,220
I'll link up the blog post we mentioned.

729
00:36:32,680 --> 00:36:34,340
Any last things you wanted to add?

730
00:36:35,020 --> 00:36:38,680
Nikolay: Actually you know what,
during this discussion, I was

731
00:36:38,680 --> 00:36:42,040
curious what we have in PostgreSQL
system and actually we have

732
00:36:42,040 --> 00:36:44,240
exactly this approach I just described.

733
00:36:44,240 --> 00:36:48,740
We have a shadow table, single
one for many tables to track deletes

734
00:36:48,740 --> 00:36:49,380
and updates.

735
00:36:49,900 --> 00:36:54,100
It's called audit log and we have
data before and data after

736
00:36:54,100 --> 00:36:55,360
2 JSONB columns.

737
00:36:56,120 --> 00:36:56,620
Michael: Interesting.

738
00:36:56,980 --> 00:36:57,180
Yeah,

739
00:36:57,180 --> 00:36:59,820
Nikolay: and actually a lot of
additional stuff.

740
00:37:00,040 --> 00:37:03,960
This is just to be able to show
admins of organizations what

741
00:37:03,960 --> 00:37:04,460
happened.

742
00:37:04,960 --> 00:37:06,980
Michael: In pgMustard, we have hard
deletes.

743
00:37:07,660 --> 00:37:11,320
It's slightly different because
it's just query plans, right?

744
00:37:11,320 --> 00:37:13,120
And you can always submit them
again.

745
00:37:13,280 --> 00:37:15,400
But if you delete them, they're
gone.

746
00:37:15,660 --> 00:37:18,060
So we decided not to at all.

747
00:37:18,820 --> 00:37:21,900
Nikolay: Yeah, and now an application
cannot reach this part

748
00:37:21,900 --> 00:37:23,500
of database at all.

749
00:37:23,520 --> 00:37:27,340
It's only possible like implicitly,
as I described, via security

750
00:37:27,340 --> 00:37:27,840
definers.

751
00:37:28,320 --> 00:37:30,040
So quite a good approach, very
reliable.

752
00:37:31,100 --> 00:37:31,600
Michael: Yeah.

753
00:37:33,180 --> 00:37:33,680
Nice.

754
00:37:34,740 --> 00:37:35,200
All right.

755
00:37:35,200 --> 00:37:36,520
Well, thanks so much, Nikolay.

756
00:37:36,680 --> 00:37:37,060
Catch you

757
00:37:37,060 --> 00:37:37,866
Nikolay: next week.
Thank you.

758
00:37:37,866 --> 00:37:38,758
Good chat.