A weekly podcast about all things PostgreSQL
Nikolay: Hello, this is PostgresFM.
I don't remember the number of episodes.
This is Nikolay Samokhvalov from Postgres.AI and my co-host as
usual, Michael Christofides from pgMustard.
Hi, Michael.
Michael: Hello, Nikolay.
I think this is 103.
Nikolay: Okay, not bad.
So we discussed what?
Michael: Yeah, so it's one of those things that's come up in discussion
a few times.
And we said we should do an episode on that one day.
And so we finally got to it and it's soft deletes, which is a
pattern that a lot of us have come across whilst working with
databases more generally.
It's not specific to Postgres, but it's the idea of instead of
deleting data, not deleting it, but hiding it from the application
in some way.
Nikolay: Let me explain it for regular, like how it feels since
I've built three social networks in the past.
I used soft deletes a lot, of course.
And then like Postgres is not good with deletes due to how MVCC
is organized, vacuum and so on.
Deletes are not good, right?
It's like an old idea.
Although each update consists of delete and insert at a physical
level, right?
Unless it's a hot update.
So how it feels in the eyes of millions of users I know, and
it's not only about the projects I've built, like almost any
social media uses it.
You go and try to unregister your account, you press all the
buttons say delete me or something like everything or just I
don't know sometimes you need to write to support or something
but usually it's good services have this button ready for you
right So I want to delete my account fully.
And then you find out that, for example, depending on implementation,
then you find out that your page is, I don't know, some small
artifacts you can see and feel that your page was not fully deleted
or your account was not fully deleted.
One of these artifacts, not artifacts, signs that SoftDelete is
used is You can actually restore it, right?
Easy, right?
You can, for example, Facebook.
Try to delete yourself on Facebook.
It will give you many chances to restore, and even after you
delete it, you still can restore it.
So you can, like, one month later, you can just press a button
and restore.
If you, like, what users want when they are angry about some
service?
They don't want to deal with it
completely, like, something can
happen in their lives, right?
They can see having an account
as a problem, for example.
They think it's a mistake to post
such pictures or to do some
conversations.
It doesn't matter.
I want to delete it.
I'm a new person right now.
It's there, right?
So, delete.
But then the server says, you can
restore.
Did you delete my data or no?
And then compliance, we have also
interesting points.
In some cases, we have mandatory,
like demand to store data sometimes.
Right?
Michael: We have both sides, right?
Yeah, we have laws that demand
we have to store data for a certain
amount of time in some environment.
Nikolay: And vice versa.
Sometimes if user decides the service
must delete data.
So delete, the word delete has
many meanings, right?
It's not SQL statement delete.
When you delete your account, it
doesn't mean the service executed
actually delete.
Most likely they just executed
updates or delete with triggers
and so on.
We will discuss, dive into details.
But for users, it's really annoying
to feel lack of ability to
truly delete.
I must say, like I saw it and I
was on the other side of, I was
on both sides many times actually.
And I know how it feels when you
want to delete for sure, like
100% I won't delete my data, it's
my data deleted.
No, you can restore it means it
was not actually deleted.
Michael: Yeah, for sure.
Well, I think we can come back
to like,
Nikolay: technical
Michael: the user side.
Well, I think there's I think there's
so many parts of this that
are interesting.
1 is kind of, If you do decide
you want this as a provider, implementation
details are quite interesting and
there are different and Postgres
specific details are quite interesting
as well.
But from the usability side of
it, there are different implementations.
I think various companies and organizations
are implementing
features around deletion, around
kind of making sure people know
exactly what they're doing before
hitting delete, you know, write
the words of your account name
in here.
Everyone's come across these kind
of make sure you type out,
yes, I'm sure I really want to
delete my account and I will not
beg support to undelete it later.
Nikolay: We give you 1 month to
make, to reconsider it or something.
Michael: Exactly, Or we can recover
it, but only in the next
30 days.
Type them.
Yeah, exactly.
Nikolay: I remember another way,
another sign that you can see
that soft delete was applied is
instead of having 404 error,
404 error, you go to your X page,
like you deleted it, but you
see something like account deleted.
It means that records still exist
in database, right?
Just changed its status.
So status like active, deleted,
right?
Pending, like pending, pending,
it means you need to activate
it, like email activation or something,
phone activation.
Then it's active and then it's
deleted, lifecycle, right?
Michael: In my mind, that's kind
of like a bug though, right?
Like that's just a poorly implemented
soft deletion.
Nikolay: Yeah, but it's so common.
And I actually implemented, I implemented
soft deletes in many
various ways.
But honestly, this morning, thinking
like walking my dog, By
the way, hello to everyone who
is running, walking dog, like,
I don't know, like riding bicycle.
Yeah.
I know we promised to keep 30 minutes,
but it's good for you
to run longer and ride bicycle
longer and for your dog as well.
So walking my dogs I was thinking,
soft delete, soft delete.
I also chatted with our bot to
like to brainstorm what to discuss
and I realized that if I needed
to implement it right now, looking
like back to my 20 years of experience,
3 social networks and
many other systems and our customers,
all those ways to implement
soft delete, what would I choose
now?
I don't know.
Maybe by the end of this discussion,
I will have some clarity.
But it's so difficult.
It's not an easy topic, actually.
I expected a simple topic today.
Michael: I think it's simple if
you only consider a single case
at a time.
If you want to give generic advice,
I think it becomes more complicated.
And we also dove into the most
complicated version, which is
deleting a whole account which
has multiple things.
Nikolay: Yeah, let's talk about
only 1 table.
Yeah, propagation is a different
topic.
Consider only 1 table.
We have many choices.
For example, 1 I already discussed,
like status.
You have pending, active, deleted,
something, archived.
Or it can be a different column
is deleted by default.
It's false, you can set it to true.
Both are very similar approaches.
Both I don't like at all.
Michael: I quite like deleted at,
which is with it being null
by default, and then a timestamp
as to when it was deleted.
Right.
Nikolay: I would not choose these
approaches.
Neither of them.
Today.
I chose them in the past.
Yeah.
Today, no, thank you.
Why?
Because then you need to deal with,
like, optimizing your queries,
thinking about indexes.
Most of the time, queries work
with active data.
So you need to keep all your indexes
probably partial with the
WHERE clause having this filter
to deal only with active records.
And you accumulate like non-active
data set in the same table,
which is not useful at all, right?
So and also foreign keys.
Michael: Yeah, unique constraints.
There's so many downsides.
Nikolay: That's complexity.
Michael: Well, and you need to,
you need to adapt your, your
application to always view things
where deleted at is not.
Because otherwise people will see
deleted data.
So there is definitely complexity
to that, but implementation-wise,
I think it's on the simpler side.
I think there are some advantages.
Nikolay: I maybe would choose it
only if I know that deletes
will be much less than 1% of all
data.
But who knows in advance, right?
Maybe we will need to delete a
lot.
Sometimes, from social media experience,
spammers, in ways they
come, They can register a million
accounts and you need to delete
them.
At the same time, to follow your
procedures you developed, if
you developed a soft deletes approach,
you're deleting fake accounts
is also deletes, right?
So you need to still keep them
in the same table and it's like
a lot of records which will sit
on the disk and most importantly
in memory, in caches.
Michael: Let's talk about the use
cases for it because I think
they're different enough that I
think it will then become clearer
when we would use different ones.
Like, for example, I think the
primary use case, the only use
case I've ever used Soft Deletes
for in the past, I've always
been at companies and working on
products where we haven't done
soft deletes by default.
I've only ever had it implemented
it once for an application
because several times a year, we
were restoring data that had
been deleted accidentally or prematurely.
So it was a,
Nikolay: and you didn't have that
by slab to, to, to go back
in time.
Easily.
Michael: We could risk the solution.
We, we, this is actually on the
Microsoft SQL Server database
at the time.
And we would actually restore from
a backup.
That's what we'd end up doing.
Nikolay: It takes many hours.
Michael: The larger the database,
the longer it takes.
And it's painful and manual.
And we were doing this often enough
that we decided to implement
soft deletes.
But with a slight difference.
I think sometimes people describe
soft deletes as if it's like
permanently gonna be in that soft
deletion state we would do
it and then have a like a Cron
job that anything older than so
people would delete something by
mistake and very quickly email
support and say, I deleted the
wrong project in this case.
Nikolay: We need to restore.
Yeah.
And restore procedures is complex.
Michael: Yeah, but it was always
quickly.
It was always within a day
Nikolay: or 2.
It's quicker.
Michael: No, sorry.
So in this case, with this application,
people that delete things
accidentally, they knew pretty,
like almost straight away that
they had done it and that they'd
regretted doing it.
So we knew we could, for example,
have this data in in a soft
deleted fashion with deleted app
with a timestamp.
And then we could have a cron job,
hard delete, or actually delete
data that was more than like, that
was deleted app more than
30 days ago.
So in that case, it meant we could
keep the structure simple,
only add 1 column, update the application
only in a few places
to view the data that wasn't deleted
yet.
You know, so it was a relatively
simple case where we were having
to restore data from time to time.
Nikolay: It's a good point.
So I took extreme case when we
discuss accounts and it's like
sensitive to like personal feelings
are affected.
But you're describing basically
the same thing as emails in Gmail
and Recycle Bin or like how it's
called trash folder.
Michael: Yeah, like on a desktop.
Yeah.
Nikolay: Right.
So you put it there basically when
you do deletes and it sits
there for 30 days, right?
And then automatically hard deleted.
Michael: Exactly.
Nikolay: And you can restore it
anytime and users usually appreciate
this feature and consider it as
good if you can undo your action.
I agree.
Like when we talk about some objects
inside the account, right?
So some data.
Michael: The only difference is
in this case is we didn't implement
the ability to view what's in the...
Like in those 2 cases, what we
could have done.
And that's actually, potentially
would have been nice for users.
Nikolay: Yeah, actually in this
case, this approach, now I'm
thinking, yeah, in this case, I
would probably choose some status
approach because it's actually
the same data, we just don't show
it in 1 place, but we show it in
another place.
So it's just some category, that's
it.
Right?
You're right.
So in this case, I would prefer
actually status, maybe timestamp
of changes, but maybe I would put
timestamps to special different
like audit, like log style append
only table to track actions
when some action was done.
Yeah, but you're right, actually.
Maybe this case, let's call it
in-place soft-delete when we keep
data in the same table and so on.
Michael: I was thinking use-case-wise,
that's if you're implementing
it to avoid or to make restoration
easier or undelete easier.
Nikolay: And viewing deleted data,
if some in some cases like
trash, trash data, if you want
to allow users to review it, Why
not?
Michael: Yeah, well, that's the
other like, there's another use
case for soft deletes I've heard
and seen people write about
which is to give you the ability
to examine it or to audit it
or to look through why it was done
or who did what.
So I think there are some...
Exactly.
And I know we've had a whole episode
on audit, but it could be
not necessarily for regulatory
purposes, but it could be for
debugging purposes.
Nikolay: And there we can have
a hard delete button for users
as well.
If like only there, not in main
place, but for like if object
already soft deleted, we could
allow users to actually delete.
But let's talk about Postgres spec.
This is UX discussion.
What else?
Compliance.
Sometimes you do need to delete
data for sure, like keeping it
only in some backups and also have
some policy to delete there
as well.
I mean, backups should be deleted
as well at some point.
Sometimes you do need to keep information
and soft deletes help
to comply some policies and regulations.
Depending on law you need to follow,
like it's hard.
But honestly, I prefer, speaking
of law, I prefer when I need
to delete everything.
But for convenience and as a business
owner, it's actually much
more convenient if you still have
data all the time and you can
review what happened and restore
it?
Well, sometimes, depending on the
nature of your business.
I see more and more guys follow,
like, it's like some compliance
rules, but I see more and more
companies implement the rule that
Slack deletes messages after some
time and everything is deleted.
I don't know.
Maybe it's because of costs.
Maybe it's some compliance, actually.
I don't know.
Michael: Well, I think there was
an age where this going back
to kind of philosophical side of
things, but I think there was
a time where people, I remember
reading headlines like data is
the new oil and things like that
and people thought hoarding
as much data or gathering as much
data as they could would be
valuable in the future for analysis
reasons or for mining, that
kind of thing.
And I feel like with a lot of data
breaches, especially, sorry,
I think I've got a siren in my
background, with data breaches,
I still feel like things flipped
and people started to think
of having data as a liability.
And the more data they held, or
the more personal it was, the
more at risk they are of, or the
bigger target they become for
attacks.
I feel like that's changed even
within the last, well, even within
my career of how people view holding
data if you don't really
need it.
Nikolay: Yeah.
Also performance reasons.
If you keep everything, it's hard.
I see many successful e-commerce
companies, they do need to actually
delete data, and having soft deletes,
they at some point decide
to clean up and actually delete
data.
So in place of soft delete, you
just have a column or a couple
of columns, and then you have a
lot of partial indexes, obviously,
maybe.
Right?
Maybe.
Maybe not.
It depends on the distribution
and selectivity you have.
And how could we improve this?
What about having a special partition
or a set of partitions
and include this column to partition
key.
It would be transparent for application
if we update status setting
from active to deleted.
In this case, the row will be deleted
in the main partition,
like actual working partition and
it will go to this kind of
archive partition, right?
Michael: Yeah, it would be like
moved.
So I guess it's similar in terms
of like write-ahead log impact
and but actually could it be higher?
I guess you're forcing no hot updates
but I'm guessing you're
not going to get hot updates anyway
because it's the delete that.
Nikolay: Yeah.
You cannot have hot update here.
So 0 chances because it's actually
deleted, actually inserted.
I guess hat updates is a good thing
to have, but in this case
you should avoid partial...
Back to a single table and just
a column.
If you want hot updates to be used
when you update this column,
you cannot use this column in partial
indexes in the where clause
of those indexes.
You must avoid it.
It works well only if you have,
as I said, only low part of your
table of deletes.
Like deletes are rare.
Michael: Yeah.
So another, like another solution
I see mentioned that I've not
used is using a view.
So defining a view where deleted
at is, has no value for example.
Nikolay: Yeah, I don't know.
You know like when you're in love
with something many years and
then this love is converted to
hate.
This is my relationship with views
because my, yeah, my thesis,
martial arts master, I don't remember,
was about updatable views
and then updatable XML views and
like, oh, I spent so much time
and I worked with them so much
time.
Now the only views I can deal with
now these days myself in my
projects where I have full control.
Only Postgres views.
Michael: That's it.
Nikolay: Everywhere else I avoid
them because I don't want to
deal with dependency hell later.
Because it's just a lot of overhead.
You just change something or add
column or drop column.
You need to recreate your views
and logs and logs when you issue
DDL, like, I don't want this.
But Yes, views can be used in many
cases, but you cannot create
an index on a regular view.
Again, I'm speaking of performance.
Just views will hide it, right?
And that's it.
I mean, hide some rows.
Michael: Exactly.
But it also makes the implementation
a little bit easier because
you don't have to update.
You don't have to make the application
aware of your soft delete
implementation.
Nikolay: Well, yeah.
Michael: You could just point the
application at the view now.
The Application doesn't
Nikolay: see those rows at all,
right?
Michael: Should we move on to some
other, like...
Nikolay: Approach.
Separate table approach.
Michael: Yes.
Nikolay: So straightforward approach.
Let's just have the table with
same structure, right?
Maybe not the same set of indexes
because we don't need all of
them, maybe, right?
And just insert to that table,
maybe using some trigger when
delete happens.
Michael: Yeah, so the 2 approaches
I've had are a shadow table
for each table that you want to,
yeah.
I don't know if that's the correct
term.
Nikolay: And I also hated this.
Because DDL, maintaining schema.
Michael: But I did see a neat alternative
to this in a blog post
by Brander, who's got a couple
of blog posts about DDL.
Nikolay: So let's discuss why it's
bad, like in detail, just
to clarify.
Good idea.
First of all, you need to create,
if you have many tables where
you want soft delete to be implemented,
you always need additional
table.
It's just overhead.
And also you need, every time you
create a column, for example,
or a name column or something,
you also need not to forget to
do it in the shadow table.
And living with such schema many
years in 1 project, I must admit,
it's a huge headache.
People always forget it.
You can try to invent some procedures
and automation not to forget
it, but still, deployments to production
when shadow table DDL
was forgotten, quite significant
risk.
So It's a lot of objects to deal
with.
Of course, you can put it to some,
say, schema named shadow,
for example, and keep the table
names exactly the same as in
the original schema, public or
something new, if it's a new project.
So public schema is not popular
anymore, right?
So, and it's good because it means
convenience because exactly
the same table names, but it introduces
a lot of overhead, it
requires good culture of automation,
like tooling, not to be,
to be always in a consistent state.
Michael: Yeah, I think there was
a lot, there are a lot of difficulties
as part of this.
Obviously, it does come with some
benefits, like easier to restore
than the next, the alternative
way of doing this.
Nikolay: Let me guess about alternative
way.
I also implemented it.
I didn't read this post by Brandur,
but I can guess because I
also implement I think we walked
on the same paths, right?
So you create a single table with
probably like some ID maybe
some surrogate ID then table name
who deleted timestamp and then
just JSON or JSONB column where
you just put all the data from
any table.
Right.
And you can...
Exactly.
Okay.
Why not?
Preserving column names maybe even.
Right.
So you...
Michael: Yes.
I think they used like a to JSONB
function.
Nikolay: Right.
And you do everything in triggers,
so everything again is transparent.
When a delete statement happens,
your row from any table which
has this trigger is packed and
sent to this table.
Very convenient.
I like this schema.
Michael: It's super.
So again, this is optimized for
like, this is harder to restore
from, right, because it's now in
a difficult format.
And especially if like the schema
has changed in between.
Nikolay: Well, this is a price.
You have trade-off.
Either you survive any schema changes
easily, or you need to
deal with some mutations during
restore.
You need to understand, okay, this
delete happened A year ago,
we had a different schema.
What to do?
You do need to do this.
Or you deal with schema...
You need to maintain mirror schema
changes all the time.
I would prefer actually deal with...
Since restore...
Depending on the project, of course.
If restore is not super common,
I would prefer this schema.
Super convenient, super universal.
Michael: Yeah, me too.
I think there's a lot of benefits
here.
Not for the 1 case I did implement
soft deletes in the past,
where the whole point was to make
restoration easier in a very
small number of cases.
This case is very different to
that.
But yeah, I do think there are
some tricky parts of this, though,
I think, for example, the compliance
thing is tricky here.
Like, it's another place that you
need to remember to delete
data from in the future.
If you get a GDPR request, you
can't just search on normal tables
for that data.
You also have to search this delete
log.
And it might be tricky to search
because it's JSON B data with
like unstructured.
So there's a there I think there
are downsides.
Nikolay: Right.
Is it fair to call this soft delete?
Maybe it should be called just
archiving?
Michael: Good point.
There's no definition of soft delete
that I know of.
It's not part of the SQL standard.
So I think it's more around like
what are the, what's it for?
And maybe we've lost sight of like
why are we implementing this
feature?
If it's for easy undeletes, maybe
this isn't the right solution.
But if it's for debugging or if
it's for different purposes,
then yeah, archiving makes some
sense.
But I guess it is, it's deleting
it in a way that it's not deleted.
That is the, you know, you mentioned
the, as a user seeing some
signs that it wasn't deleted, this
avoids a lot of those, right?
Like it's not in the same table
anymore.
Nikolay: Yeah.
Application can, If we define the
trigger function with security
definer, we can revoke or just
do not grant any permissions to
application user to deal with schema
or this table.
And that's it.
So application won't be able to
reach the data at all.
At very low, like, I mean, at database
level, database guarantees,
it's not possible.
But at the same time, Trigger can
still insert it because it
has...
Trigger function has security definer,
meaning that it has permissions
to run queries which are inside
the function using the level
of access of the user who defined
this function.
Michael: Cool.
I didn't know you could do that.
That's very cool.
Nikolay: It's called security definer.
When you create a function, trigger
function, for trigger you
need to create a function, you
just say Security Definer.
And I like it because you can encapsulate
some actions which
normal user, application user,
When I say application user, I
mean database user which is used
by your application.
Usually there are some tables that
this user cannot access.
For example, this shadow table
or shadow schema with multiple
tables.
But I still prefer single table.
But you can still allow, for example,
specific insert encapsulated
in this function and just grant
permissions for specific action
indirectly via the security definer.
Michael: Nice.
I like it.
Nikolay: Yeah.
I use it all the time for many
cases.
So yeah, better control what can
be done, what cannot be done.
So you basically it's good for
security reasons.
It's good to forbid everything
first as first thing, and then
just allow some specific action
in very controlled manner.
Intrigue.
Michael: I feel like there's a
phrase for this, isn't it?
Like security by minimum permission
or something.
I can't remember what the phrase
is.
It's like a security principle
of least permission, give things
the least permission they need.
Right, right,
Nikolay: right.
Well, in former life, we called
it a blacklisting, white listing.
I don't know why now it's not possible
to tell this, right?
But it means that you will just
say nothing is possible except
this, right?
And yeah, it's good.
So yeah, it's a good approach,
but you are right, Restoration
will be tricky.
You need to develop specific procedures,
deal somehow with schema
evolution.
And in this case, I would say if
we talk about some account deletion,
I would do this.
If we talk about something like
we have some objects user can
create, modify, or delete, and
we want support undo for deletion,
Like in Gmail with emails, when
you just delete email, it goes
to trash, and then you can undo
or delete forever, right?
There's a button, delete forever.
This is hard to delete.
In this case, I would prefer maybe
still in place, like in the
same, like just a column in the
same table status or something.
Michael: Nice.
Nikolay: Right, right.
Both are, we can say, both are
soft deletes with their own pros
and cons.
Michael: Yeah, seems very reasonable.
Nikolay: Do you think they can
coexist in the same project?
Not because they are, like decisions
were made by different persons,
different
Michael: developers.
I think I genuinely think that
the end of Brando's blog post,
he mentioned switching to this
in all but a couple of tables.
So I think they are currently a
good example
Nikolay: of a project that
Michael: has both.
Nikolay: You mean switching to
JSON B Shadow Table?
Michael: Yes, instead of you calling
it in place.
I think they still implement in
place in a couple of places.
Nikolay: The best thing about this
single shadow table with JSON-B
colon is that it's so easy to attach
it to any table.
It's just a trigger.
That's it.
It will slow down, delete a little
bit.
That's it.
Michael: Yeah, because of the triggers.
You could use it for the hard delete
and then have it for around
for a while longer.
Nikolay: We discussed about compliance
and so on, but, and I
mentioned that for business owners,
like there are fears that
like something was deleted, hard
to recover.
It's very good for audit purposes.
For example, you have some account
in your application, and there
are some admins, or like owners,
managers, and there are some
regular users.
Some users do something, they deleted
something, and you just
need to not only see and undo it
but see when it was done by
whom like
Michael: I saw another person in
a hacker news thread talk.
Yeah.
Not just audit but also what really
happened.
So I saw somebody who had implemented
a chat system.
Let's say it's a support business
to business support system
and 1 of your internal works for
your company users said something
they regret saying as part of this
chat and then deleted 1 of
those messages.
It's really useful for an admin
to see the whole history or you
know, you mentioned social media
and a lot of social media sites
that you edit a message, keeping
the old version.
I edited
Nikolay: this update, not delete.
So this leads us to like time travel
basically.
We cannot say, name it soft updates,
right?
But tracking all versions, it actually
also possible they are
the same approach.
And I think I did it as well.
So when you have delete with this
trigger with security, security,
definer function, trigger function,
which just archives previous
record.
In the trigger function, you use
old key name.
This is just record, right?
And you convert it to JSONB and
insert, right?
Absolutely the same trigger can
be used for updates.
With any update, maybe I would
have some checks if some actual
data was changed, not just something
small was changed, right?
But actual data, like message was
changed, right?
Title, message.
New dot title doesn't equal old
dot title, and so on.
If something, just to avoid, you
know, like spam archiving, which
like useless records in this shadow
table.
So if I see something valuable
has been changed with this update,
again, I back old to JSONB and
sort the same, like the same trigger
basically.
Right?
Michael: Yeah, I'm not sure.
It depends.
It depends how and when you need,
how often you need to be viewing
those older ones, because it might
be tricky to show the admin
those if you, how are you going
to select that data back out
to show them the old versions?
I guess it's tricky.
Nikolay: I would, by the way, no,
for updates, I think This is
what I did actually.
For updates, I would archive probably
both old and new records.
Because just, yes, to be able to
show diff.
Yes, it's like too much data to
store, but why not?
In this case, I would extend shadow
table with additional JSONB
column.
Like, delete...
Records related to deletes won't
have it.
Like, they would have only nulls
in that column.
For updates, I would have 2 JSONB
values.
Michael: Interesting.
Nikolay: Old and new, basically.
And then we can build some diff
if needed.
Right?
Michael: Maybe.
I think this is getting beyond
the scope of what...
But this is interesting.
I think my...
Nikolay: No, it's useful and easy
to implement and super universal.
You can again attach it to any
table and have simple audit like
that.
Michael: But what I don't like
or understand is because this
is a single shadow table across
multiple, what are you querying?
Like, how are you looking for that
in a, let's say it was a month
ago, in your, how you're efficiently
finding that in that shadow
table?
What are you searching by?
Nikolay: Well, depending on the
application, right?
I would probably in this shadow
table, I would probably have
some columns which would allow
me to search faster, or I would
deal with searching inside JSON-B.
Why not?
Genetics?
Michael: Well, yeah, maybe.
This slows down updates.
Yeah, there's a lot of...
I think that it's a tricky one because
if it's a shadow table for
all tables, you'd like having user
ID or like having other columns
that you could search by might
not make sense for some of those.
Nikolay: You can have null there.
Michael: True, true, true.
Nikolay: Well, yeah, I agree with
that.
Especially if you add gin, insert
this implicit insert, which
happens during delete or update,
it will be not fast at all because
of gin.
So yeah, overhead.
This is the price.
You have it.
Michael: Yeah, I like that.
It's another trade-off.
It depends what you want, and it
depends how much data you have,
and it depends what proportion
is ever going to be deleted, or
what proportion is ever going to
be restored or looked into.
But yeah, I think it's really cool.
I'll link up the blog post we mentioned.
Any last things you wanted to add?
Nikolay: Actually you know what,
during this discussion, I was
curious what we have in PostgreSQL
system and actually we have
exactly this approach I just described.
We have a shadow table, single
one for many tables to track deletes
and updates.
It's called audit log and we have
data before and data after
2 JSONB columns.
Michael: Interesting.
Yeah,
Nikolay: and actually a lot of
additional stuff.
This is just to be able to show
admins of organizations what
happened.
Michael: In pgMustard, we have hard
deletes.
It's slightly different because
it's just query plans, right?
And you can always submit them
again.
But if you delete them, they're
gone.
So we decided not to at all.
Nikolay: Yeah, and now an application
cannot reach this part
of database at all.
It's only possible like implicitly,
as I described, via security
definers.
So quite a good approach, very
reliable.
Michael: Yeah.
Nice.
All right.
Well, thanks so much, Nikolay.
Catch you
Nikolay: next week.
Thank you.
Good chat.