A weekly podcast about all things PostgreSQL
Nikolay: Hello, hello, this is
PostgresFM, your favorite podcast
about Postgres.
And my name is Nikolay.
My co-host is Michael.
Michael: Hello, Nikolay.
Nikolay: Hi, Michael.
And we have a very good guest today,
Sai.
Sai: Hey, Nik and Michael.
Nikolay: Founder of PeerDB.
We are very glad to have you here.
And we just discussed that we are
not inviting guests just for
guests.
We choose topic first, so it's
topic-centric discussion.
But my first thought when we discussed
we need to talk about
logical.
And also I saw your blog post published
recently.
My first thought was you're a very
great person to have for this
discussion, right?
So that's why we invited you.
Glad you found time to join.
Sai: Thanks Nik and Michael for
inviting me.
And yeah, logical replication and
logical decoding has been my
life since the past like 7, 8 months.
And we are trying to get into as
much depth as possible to, you
know, understand how it works and
probably down the line contribute
upstream as well.
But yeah, I'm very excited to be
here.
Nikolay: Cool.
I know Michael has a lot of questions,
so I probably will be
less active this episode.
But I will have questions as well
and maybe comments.
Michael: Yeah, I definitely have
questions.
But yeah, I completely echo everything
Nikolay said.
And yeah, we did a kind of a basics
intro episode to logical,
where Nikolay also talked us through
how to handle scale in terms
of major upgrades, at least.
So that's a common use case for
logical.
But 1 thing we didn't even touch
on, which was the topic of 1
of your recent blog posts, was
I listened back and I think protocol
versions got a single mention,
kind of like an off-the-cuff mention
at 1 point in the episode, you
know, 30, 40-minute episode.
So that was where I was thinking
of starting is, it'd be awesome
to hear a little bit about the
protocol versions, why you worry
about them recently.
Sai: Yeah, absolutely.
Thanks, Michael.
So the thing is logical replication
has this concept of protocol
versions that you could specify
as a part of the subscription
or if you are like an external
client who is reading the replication
slot you could do it as a part
of the start replication API which
lets you read the replication slot
and there are like four versions
that Postgres supports now with
PostgreSQL 16, right.
The first version is the default
version which decodes transactions
that are only committed so it doesn't
like start the decoding
process before the transaction
commits and the second is the
more advanced option which does
logical decoding of transactions
that are not yet committed, right
like so it basically does decoding
for in-flight transactions, right
that is number two and then third
is it lets you decode two-phase
commit transactions, that is third
and the fourth is it lets you decode
in-flight transactions and
apply them in parallel to the target
which is the subscriber
which is PostgreSQL basically, right.
Now in that blog which you're talking
about we compared like
version one and version two because
they are more like the common
ones.
Two-phase commits are not like
very common, right that's the reason
we didn't go into that piece but
the biggest difference is the
impact of two is it improves logical
decoding performance because
you are not you're actually decoding
while the transaction is
happening, right like you are not
letting the transaction finish
and then only start decoding but
rather you're decoding as the
transactions go on.
So the idea is you're giving more
time to the decoder, the WAL
sender process to perform the decoding
and now this has a lot
of benefits, right like number one
it reduces the slot growth, right
like so in that blog we talk about
a situation where we have
two long running transactions that
are being committed.
And with version one, the decoding
process doesn't start and the
slot keeps growing, growing, growing.
And only after the transactions
are committed, the decoding starts.
And then for a few minutes until
the decoding is finished, the
slot size remains the same.
Right now, this is with one.
With two, what happens is as the
transactions happen, decoding
starts.
Right?
Like, so you do see the decoder
doing its job.
And as in when the transactions
finish, the slot immediately
falls.
And the advantage of this is so
the summary was with version
one, the slot growth of like, I think
we showed like 5, 6 gigabytes
remain for like 5, 6 minutes until
the entire decoding finished.
But with version two, it remained
there for an instance, because
you know, like the decoding was
already happening as the transaction
was going on, right so this can
have tremendous impact in use
cases which have long running or
sizable transactions and that
is very common, right like we
recently working were working
with like a fintech customer, right
like where they had a lot
they had like sizable transactions
right like hundreds of like
you know operations happening in
a transaction and then these
transactions were like interleaved
and now the advantage with
like to the two would have helped
them a lot because as the you
know operations are happening the
decoder like you know just
decodes them.
With one the problem that was happening
was there was one large
long-running transaction which
took like an hour and then every
time the decoder needs to decode
committed transactions it was
decoding this long-running transaction
also so so the order of
time is more quadratic basically
with like a version one right
because you know wall is like sequential
right like so you keep
writing writing writing and then
like and then as and when there
is a commit like the decoder like
starts like working now the
problem is with version one as the
existing like the long-running
transaction is not getting decoded
still the decoder process
decodes it for like other transactions,
right?
So but with like, you know oh this
problem wouldn't have happened
because once this long-running
transaction is committed I mean
it's already consumed it's already
decoded if that makes sense
right so the high-level summary
is version two is very helpful
when there are like, you know,
long-running transactions that
are like, you know, interleaved.
Nikolay: So the lesson is upgrade
guys, right?
Just use the latest PostgreSQL version.
Sai: Version two comes with PG 14
version four is in PG 16.
Nikolay: Sorry.
I meant a small thing.
I've noticed you mentioned a subscriber,
which is PostgreSQL.
This phrase, attracted my attention.
It's not always PostgreSQL.
But parallelization which to apply
to long transactions it's done
only for PostgreSQL subscribers,
not for others, right?
Sai: Correct.
Exactly.
So the fourth version, which does
like, you know, parallel apply
of these in-flight transactions
is more relevant for PostgreSQL
to PostgreSQL replication.
The apply piece is, you know, PostgreSQL
and that's the standard
logical replication.
And logical decoding is more like
a subset of logical replication
where external clients like pg_BADGER
can read the slot and replicate
to homogeneous or heterogeneous
targets.
Nikolay: In your experience, does
it mean that there's no sense
to have this feature for other
situations like to Snowflake or
others.
Sai: The parallel apply feature
you mean,
Nikolay: right?
Right.
Sai: Yeah.
Good question.
So that is the onus of the ETL
tool.
So in PADB we have an option to
do parallel apply or not do parallel
apply.
So it's like a single flag that
we provide.
And the difference is with parallel
apply, we cannot guarantee
the order of transactions across
tables.
Nikolay: So foreign key and referential
consistency, similar
to multiple slots in regular Postgres
to Postgres.
Correct.
Sai: Very, very similar.
Right.
And then both, I mean, customers
do it, but the advantage of
parallelism is it will improve
latency, right?
Like replication, latency reduces,
lag reduces, etc.
But it doesn't guarantee, you know,
consistency across tables.
But the sequential does that, right?
So customers like we have both
customers.
Nikolay: Yeah, that's interesting.
I know like in many cases people
think, okay, we work at like
half of our capacity, no worries
if TPS grows using a single slot.
If TPS grows, we know the bottleneck
is on the subscriber, we will
just use multiple slots.
But then they realize that with foreign
keys, you need to basically
agree that they are broken temporarily
on the subscriber and
can cause a lot of troubles if
you point the application to such
nodes, right?
That's interesting.
Yeah, absolutely.
Makes sense.
Sai: And then I did want to call
out, right?
Like for that, I mean, there is
a setting for disabling foreign
key just like, you know, complete
that there is a setting called
session replication role basically,
you can set that as replication
and that is pretty standard, it's
like that's the reason Postgres
gave this setting so you can set
that to replication and it would
disable foreign keys and triggers
on the target.
Nikolay: But by default,
if you use multiple slots, it's
disabled.
Sai: By default, is it like disabled?
Nikolay: Yes.
I know it since recent work to
use it for upgrade, zero-downtime
upgrades.
Michael: Got it.
Nikolay: It's indeed disabled and
it's interesting.
So you need to think maybe you
shouldn't use it.
I mean maybe go back to a single
slot and just choose a different
time for your workload.
I mean, work on weekends, guys,
right?
Lower traffic and you can afford
working with a single slot.
This means only for like logical
for a temporary time, like for
upgrades, we don't need it.
Exactly.
It really
Sai: depends on the use case.
Nikolay: Yeah, yeah.
But if for a long time, forget
about foreign keys maybe.
And you said latency, but it's
also about throughput.
We need to process more bytes per
time.
Sai: Absolutely.
Nikolay: Lags can be nasty.
I don't like dealing with logical
decoding in Postgres 11, 12.
It's not pleasant at all.
So these improvements are exciting
and everyone should upgrade.
That's what I see here. You just
uncovered a lot of details.
That's great.
Sai: Yeah, absolutely. And the tricky
thing I mean with the good
beauty of like Postgres is
that like this is already inbuilt
in like Postgres core, right?
Like logical replication does
this by default.
It's just like, you know, a setting
you need to do say that like,
Hey, I want to use version 2 and
it will be significantly more
performant, reduce replication slot growth.
But if it's an ETL tool, who's
using logical decoding, the transaction
logic needs to be managed by the
ETL tool.
Because as you're reading like
in-flight transactions, we need
to keep track whether this transaction
is committed or not, and
then only we should push it to
the target.
It becomes tricky for ETL tools,
which we are working on in PADB
now as we speak.
But the beauty of Postgres is that
it just gives you the setting
out of the box and you just need
to upgrade your Postgres version
to 14, 15, and 16.
Nikolay: And with this, in 16 we
have an example.
I mean, the developers of these
tools like your company, they
now have an example of Postgres to Postgres
native logical replication to show how
it can be implemented.
So the reference example exists.
So it's good.
Yeah, that's great.
Sorry if my client interrupted
you.
No, not at all.
Michael: This is a really good
diversion.
But we've also, we've talked about
3 things already, right?
We've talked about throughput,
we've talked about lag, and we've
talked about slot size growth.
And I feel like when you said initially,
the performance of the
version 2 protocol can be better,
I'm guessing we're talking
mostly in terms of that lag metric.
Like we can start processing things
faster, therefore the lag
is lower, makes total sense.
And when you said the slot size
growth is reduced, I think I
Understand where you're coming
from.
I think the specific benchmark
in the blog post shows that the
growth is the same, but then reduces
it, like it kind of reduces
quicker.
But when you mentioned the like
interleaved long running transactions
I guess it's in a less synthetic
workload where you've got lots
of potentially overlapping long
running transactions the peak
will be lower in the version if
you're using.
Correct.
Sai: Exactly right. Like because
in the scenario we did it's like
just do 2 transactions that are
long running and then we just
had 2 of them right but then and
then they ended at the same
time but in real-world workloads
like it I mean it can be very
arbitrary right but with like version
2 as the slot consumption
is faster and like slot size falls
quicker right the cumulative
effect can be very significant
in slot growth.
Michael: Nice.
Sai: If that makes sense. And that
is the next benchmark we want
to do, where in the real world benchmark,
I want to see how does
the peak slot size like compare
with version 1 and version 2.
We are actually building that feature
so we can we plan to like,
you know, we have a few like customers
we are design partnering
with, you know, to implement this
feature.
And we have like, we know, you
know, what is the peak size slot
size we are seeing with version
1.
And with this design partnership,
we will get to know, okay,
what is the peak slot size?
My like understanding is it will
fall very quickly because like,
you know, you're giving more time
to the decoder, right?
Like it's more efficient.
And with these long running transactions,
it cannot be quadratic,
like the decoding process, right?
But that's a very good question
you pointed, Mike.
And that is the next benchmark
we want to do.
Michael: Nice.
So then the question then becomes,
are there any workloads?
And I'm thinking now that it must
only be if you had, let's say
you had a lot of long run transactions
that end up getting aborted
or rolled back instead of committed.
I'm thinking that might be the
only case where the version 1
protocol might be like you might
prefer to carry on using it
or are there other cases where
you might prefer to carry on using
the v1 protocol over the v2 one
as long as you're on 14 and above.
Sai: Even when the transactions
are rolled back the WAL is not
removed.
So it still needs to be like decoded
basically, right?
Like for future transactions.
That's exactly what happened in
a customer scenario where there
was a long running transaction,
which the customer killed basically.
It was running for like few hours,
right?
But still that impacted other transactions
because Postgres is
not yet smart where it removes
the WAL.
Nikolay: This is my favorite use
case.
So you can overload WAL sender.
If you just do a very simple trick,
you create a table with like
a million, 10,000,000 rows and then
delete rows in transaction
and roll it back.
Massive delete rollback, massive
delete rollback.
I like this workload in many cases,
this included.
And this makes WAL sender consume
100% CPU very quickly.
Because this transaction spams
WAL, writing xmax constantly
for a lot of tuples.
And then saying, okay, this xmax,
this transaction ID was rolled
back, so it doesn't mean...
Like they're still alive, right?
The next transaction comes, does
the same, and the WAL sender
is becoming crazy.
And I think there should be some
way to optimize decoding to
understand it, maybe.
Because if you start processing...
Ah, when it starts processing,
it's not known that this transaction
is already rolled back.
But maybe there is some opportunity
to optimize because if it's
known already it's already rolled
back why we can probably skip
better or something I don't know
I don't know details here.
Sai: That's a very good point Nik
and I was recently chatting
with one of the committers and we
don't even like skip rows is
my understanding basically like
so every time like we try to
decode basically So the immediate
optimization is if there is
an entry in the WAL of a rollback
transaction, then we don't
decode that entry in the WAL.
But you bring up a great point
here.
So there is a lot of scope of improvement
there.
Nikolay: Yeah, and this led us
to the conclusion that we won't
be able to deal with logical at
a few modifying transactions
per second.
It was wrong conclusion because
this was pathological workload.
And In production, I think it's
a very rare case when a WAL
sender is hitting 100% of a single
core.
Or it's not, I'm not sure.
Because the problem is you cannot
scale it.
You add a second slot, but you
see a WAL sender is hitting 100%
of a single core CPU, with the
constantly or from time to time,
and you think, okay, I will try
to distribute workload among
2 slots or 4 slots, 8 slots.
But this doesn't help.
All of them are hitting 100% of
different cores and that's it.
So this can be bottleneck, but
it looks like it's very rare right
now in production.
Sai: Yeah, absolutely.
And that is another optimization
that like, I mean, the community
could do is where we, I mean, currently
if there are multiple
slots, right?
Like the decoding process runs
for every slot, right?
If there are ways in which the
decoder runs for just 1 slot and
the other slots reuse these like
decoded changes, right?
Like if that makes sense.
Nikolay: Yeah, exactly.
The same work that's done multiple
times.
Everyone is processing everything.
This is what like definitely there
is opportunity for improvement
here.
But I think people just don't see
this as a bottleneck in production
often.
That's why it's not optimized yet.
Sai: In production, what we see
is like, it's like these 1 off
batch operations where customers
delete like a bunch of data
or add like do copy with like millions
of rows where there is
like a WAL spike.
And you know, the recommendation
that we give right, like is
guys have more disk.
I mean, it is, I mean that it's
hard because like faster and
like, you know, larger disk because
logical replication can
only scale up to 25,000 to 30,000
like, you know, messages per
second, basically.
So have larger disk so that like
once this small spike like falls,
it'll catch up.
Okay, sure.
At that time, there is more latency.
Nikolay: It's a good CPU if 25,000,
30,000 messages per second.
Because for PgBouncer on modern
Intel and AMD, I didn't see more
than 20 messages per second.
We can compare this like also messages
per second.
PgBouncer processes transactions
just by passing them to backends,
Postgres backends and returning
the result.
And the logical replication also
similar, some messages, just
some messages, right?
Different type of work because
decoding maybe is more consuming
but 30,000 sounds very good.
Sai: Yeah, 100%, but the thing is
it has to be done well, no I mean
like it has to be done properly
because one of the tips that
we do right, like which have seen
lack in a few ETL tools, is always
consuming the slot we cannot give
up on consuming the slot we
need to constantly consume the
slot and flush the slot right
and not to lag and the thing
is if you give up and start
reading it again, it doesn't read
from the confirmed flush LSN
it reads from the restart LSN.
And sometimes that restart LSN
can be very like old.
And I don't want to get into when
Postgres updates the restart
LSN because it's a more complicated
thing.
But the idea is when I give up the connection and reacquire the connection
again, it starts from the restart LSN, which can increase my
decoding times a lot.
So this is very common.
When we initially built pglogical, if the slot size was 150, 200 gigs,
we were giving up the connection periodically.
And whenever this slot is big, for like 10, 15 minutes, it was
just stuck in Walreader.
It was not even getting 1 change.
So that is one tip I recommend, where you always consume the replication
slot.
Nikolay: I was trying to recall another problem I saw when I
inspected what's happening if you start using multiple slots.
And it's kind of an interesting problem when you start publishing
changes to a single slot you use for all tables, for example.
For example, for upgrade, it makes sense.
And this is great.
It works very well.
But if you start distributing tables among multiple slots, you
need to alter the table, I think.
Anyway, you need a lock for each table individually in the group.
So you divide groups, and then you need to start publishing changes.
And this is a similar lock as, like DDL, basically.
No, not similar.
It's not an exclusive lock, but it's surely.
I remember it was, it got stuck.
It was not, it's not an exclusive lock.
Yeah.
So it's, it's not that bad as, as altering a table and adding a column.
So I remember it got stuck.
It couldn't complete, but at least it didn't block others.
Sai: Right.
And was this while creating the multiple slots, or was this just
like creating publications?
Nikolay: Creating a publication.
So you specify tables for which you need to create a publication.
And this needs to acquire locks on all of them.
It can be successful for some of them, not successful for others.
It's a single operation, so you're waiting.
And this is not good.
But not as bad as DDL, which can block selects, which come after.
We discussed many times how dangerous it can be under heavy load.
Yeah, that's interesting.
Anyway, multiple slots are tricky.
This is the lesson here.
So you need to test and learn before using them.
Michael: Yeah, absolutely.
Nikolay: Good.
What's next?
Michael: Well, before we move on from that, I'm interested in,
Are there any other like hard,
so you mentioned that rule of
thumb, 25 to 30,000 messages per
second.
Like, are there any other like
rules of thumb or hard limits
or anything that people could be
thinking, oh, we've probably
not tuned it enough because we're
not getting that much throughput
or that low lag or something like
that.
Nikolay: I guess this number depends
on the, on the core, on
the type of CPU you have.
If it's old, it can be 10,000 for
example.
Sai: Yeah.
A hundred percent.
And then, you know, this setting
is this number that I shared
is more on the higher end in a
non-local environment where the
source, the target are not local.
They are in the same region, but
in different boxes.
So network latency is a very big
factor.
1 thing we do want to see is, what
is the logical replication
performance when both are on the
same local server, which is
not in real world that doesn't
happen.
So this, whatever I'm saying, is
network latency bound.
Because it's single-threaded, and
the network latency kicks in,
and it can only scale up to say
20-30 Mbps if it's like you know
30-40 Mbps if it's done well also
right so those are also some
things to keep in mind now coming
to gotchas right like a few
things that I did like you know
make a note of of logical replication
first is it doesn't support like
replication of DDL commands
which includes like adding of columns
like dropping of columns
like you know adding new tables
creating indexes like you know
truncating tables right like so
these are not supported out of
the box.
And this is 1 of the common concerns
that we hear from customers,
because in the real world, you
know, people, I mean, you add
tables, you like you have these
Django migrations, which like
add a bunch of indexes etc etc
right.
Michael: Partitions we talked about
last time.
Sai: Yeah, but they don't do that
also.
Michael: So yeah, when you say
customers raise it as a concern,
is that before starting or like
realizing it later on?
Where does that pop up for you?
Sai: It happens both ways, right?
Like because I mean, obviously,
Postgres logical replication
is the easiest way to like, you
know, replicate databases, right?
They start off and then there's
a column that is added and then
logical replication breaks.
And now the good thing is you can
add a column manually on the
target and it would continue But
then this becomes like difficult
in production, which is when they
reach out saying that hey you
guys like are also, you know Doing
using logical decoding and
you support Postgres as a target
and that's how they come to
us.
And we do support schema changes,
but only add columns and drop
columns.
We don't yet support creating tables
and indexes and truncate
and all of that yet, basically.
And sometimes it happens before
itself.
We're like, okay, this is a, I
mean, we keep adding tables, we
keep like adding columns very often
and like this doesn't work
for us.
So like we want another solution.
Michael: Yeah, nice.
A couple of the other things that
I see people confused or struggling
with are things like sequence synchronization.
I guess that's just a one-off task
at the beginning.
Do you see anything else catching
people out?
Sai: That's a good question, right?
Like I think DDL commands is number
1, replication slot growth
issues is number 2, which we talked
a lot about basically.
Yeah.
It's like, and, you know, always
keep reading the slot, avoid
long running or like open transactions,
you know, monitor slot
growth, right?
Like fourth is, you know, use protocol
versions, right?
Like upgrade to like latest Postgres
versions and start using
the version 2.3.4 which is faster.
The third thing that we see is
Postgres logical replication doesn't
replicate toast columns out of
the box.
You need to set up replica identity
full to make that happen.
And replica identity full could
be expensive if you are not on
Postgres 16 because for updates
and deletes it could lead to
like a sequential scan.
Nikolay: And this thing what happened
CTID was used or is used
or like why it's better than 16?
Sai: It's using indexes now.
It's like using indexes basically.
Nikolay: Okay.
Sai: And the thing is,
Nikolay: If you have them.
Sai: Yeah, if you have them and
if you have the right indexes
And another thing is some customers,
right?
Like who have primary keys, but
they have toast columns.
We still like need to do replica
identity full and that helps.
So replica identity full with primary
keys is more efficient.
And there is a good blog that the
Zalando guys wrote, which
I really liked.
And that is something that I refer
to customers where, hey guys,
like you can add this.
I think it increased like CPU and
IO by 30%.
That was their analysis.
But that is about like toast columns
where you need to have replica
identity full and the impact of
that can change based on scenarios.
Michael: Yeah.
Awesome.
I didn't know about that one.
I would love to read that blog
post.
If you send it to me afterward,
I'll include it in the show
notes as well.
Sai: Absolutely.
And the fourth thing that we have
seen is logical replication,
at least, you know, logical decoding
did not support virtual
and generated columns.
If there are like generated columns,
they didn't show up on WAL
decoding.
And the way we, I mean, we easily
solve that, I mean, is on the
target that customers set up like
a virtual column or they use
like DBT or like some transformation
tool to, you know, backfill
this, right?
Like, so that is number four.
And then last but not the least,
the fifth one that we see, right.
Like, I'm still, you know, trying
to discover, it's like slot
invalidation issues where the restart
LSN becomes null, and have
seen that happen in two scenarios.
So the first scenario is there
is a setting called max_replication_slot
size or something, I mean, like which
actually limits the size
of the slot, and in this, this setting
was recently added in 13 or
14 which lets you like safeguard
from storage out of storage
issues. So as and when the slot
hits, you know, over a 2 terabyte,
right, like or if that's the setting
that you have, it basically
invalidates the slot, and in that
scenario, the restart LSN
becomes null, and the slot gets
invalidated, and you need to restart
logical replication.
The second scenarios we have seen
is sometimes it happens randomly,
also, unseen, and we are still
figuring out why that can happen.
And now we reached out to the community,
right, like, and we heard
that like it could be because of
cloud providers' Postgres, right,
like because cloud providers have
a bunch of like backup operations,
some, some forks, you know, of Postgres, right.
And then it could be. I mean, the
community doesn't say that,
like, I mean, their point is, like,
can we try to reproduce this
on Vanilla Postgres, right?
Like not cloud.
And then that was not becoming
easy for us.
But like, we have seen that out
of the 7 to 8 months, 1 to 2
times, randomly the slot gets invalidated
and we don't have like
an RCA on when that can happen.
And immediately we think that we
check, hey, is the max_wal_size
for the slot, like, less than, is
it set up, right?
But it's still set to minus 1, it's disabled,
but still, like we run
into these issues.
So I'm on that quest of figuring
out when this happens because
it's a scenario that we need to understand better.
Michael: And that's terrifying.
And yeah, how do you even begin
to go about reproducing that?
Good luck.
Sai: Exactly. So we were able to
reproduce this on one of the cloud
providers.
They're like for high throughputs,
like over 50K transactions
per second, every 15 to 16 hours,
it was like getting invalidated.
Right?
Michael: 50,000 per second and
every 15 to 16 hours.
Sai: It's a lot.
That's a lot
Michael: of messages.
Sai: That's not real.
I mean, I don't think that happens
often, but yeah.
Michael: Right, wow.
Cool.
I don't feel like we talked about
use cases much.
Is that because it kind of these
aren't use-case dependent?
I guess other than people using
two-phase commit, do you see any
differences between people that
are using logical replication
for analytics databases versus
for, do you see any difference
between use cases basically?
Sai: Good question.
Like I mean, logical replication
is very powerful.
I mean, it supports like, you know,
migration, online migrations,
number 1, then it supports like
online upgrades, right, like
which Nik mentioned, right, like
number 2.
Number 3, it also supports like
HA and backup scenarios.
So I have seen some customers who
use logical decoding or
logical replication to have HA
across regions or in like hybrid
environments.
Like, okay, I'm running stuff on
my on-cloud and I still want
to have a backup or like a replica
on my on-premise like, you
know, self-hosted environments,
right?
And the only way it's possible
is via logical replication because
cloud providers don't give access
to like WAL, right?
That is number 3, where like HA
and backups is another use case,
right?
4 is replication to like non-Postgres
like targets or even Postgres
targets for like workload isolation,
use-case isolation, where,
okay, I have my OLTP database running
on Postgres.
Now I want to pipe this to another
database.
Like it could be Snowflake or Postgres
for analytics or for search,
like I want to pipe this on like
a Clickhouse or like Elasticsearch,
right?
Like for optimizing search.
Yeah, logical replication is very
powerful and I believe that
it's going to be the future for
Postgres and down the line it
can open up like active-active
use-cases also right like where
you know, okay, you have like cross-region and I recently read
a blog where folks were using logical
replication for like active
active I'd like so I mean it opens
up like a plethora of use
cases and makes Postgres more,
you know, powerful extensible
and yeah
Michael: Yeah, I read that blog
post too.
I think it was version 16 change
that made it possible to like
not replicate changes that got
made by logical replication.
So you could kind of like set up
logical replication in both
directions.
It seems super dangerous to me.
Like, okay, cool.
Would you recommend it at the moment
or do you think we need
more features before that?
Sai: I think it's more, I mean,
it's more like intense than that,
right?
Because like conflict resolution
and all of this is tricky, right?
Like, and, you know, I mean, if
it's like very workload specific,
where like, okay, I don't touch
the same rows, right?
Like, and, you know, maybe there
it could work.
But I mean, out of the box, like
implementing that is tricky,
and it requires more like effort.
And maybe we go there in the future,
right?
Because we are seeing a few cases
with customers where they want
Active-Active and there is not
an out-of-the-box solution.
Nikolay: Why do people need Active-Active?
Sai: Good question.
I think I have like a lot of thoughts
here.
So like the thing is, I think it
really helps with like HA, right?
Like for example, I remember this
scenario in Microsoft where
customers were having like a bunch
of like SQL Server Active-Active
across regions and then this was
a bank and then 1 of the region
went down And then every second
is like thousands of dollars.
And then they immediately pivoted
to the other region and it
kind of worked seamlessly.
Nikolay: But doesn't it mean that
all clusters need to perform
the same writes?
And I didn't buy this idea that
replaying changes through logical
replication is less expensive than
applying them initially, which
was advertised in the BDR documentation.
I didn't buy this.
I hope to find time to test it
properly and write a blog post about that.
So far didn't find time yet for
this.
It's an interesting topic because
like in BDR, it means like, for
example, you have multiple regions,
4 or 8.
I know such cases as well.
And they suffer because everyone
needs to perform the same writes.
And if there is a spike in 1 region
or this, it's not a resilient
system at all.
I don't understand this.
Sai: Yeah.
A hundred percent.
I agree.
I think, I mean, that's the reason
there is a big opportunity
there.
And I recently saw one company,
like what the PGYedge who are like
trying to do something like
this.
Nikolay: Yeah, new wave of this.
I know every CTO is dreaming about
this, I know it.
Because we are like, we work in
multiple regions.
Maybe we should first check that
all our physical standbys are
in different availability zones
than primaries, right?
Sai: There are more basic things.
Nikolay: Yeah, but multiple regions
is a great thing to have,
but still I'm very skeptical.
But I see in DBA's mind, there
is a mind shift as well.
Like 10 years ago, all DBAs said
you don't need this.
But things are changing.
So in Postgres 16, this to avoid
loops, infinite loops, right?
This feature, it's interesting.
So I don't understand use cases
and how we can scale writes if
we don't split them, like in sharding.
We split them in sharding, and
that's great.
Here, everyone needs everything.
Well, I'm very skeptical.
I don't understand this part of
I don't understand this part
of landscape.
Sai: Yeah, 100%.
And like, I think it becomes very
critical in tier 0 use cases,
guys.
I mean, not like, I mean, tier 1,
tier 2, tier 3, I think it's
kind of lesser, but like, it's
more these tier 0 where like,
it's like a Chase bank or something
like that right? Like when
it becomes like hyper.
Nikolay: But you think it's possible
to build good system?
Sai: No, I don't think it's, I mean
with Postgres I am also very
skeptical, but I think there is
an opportunity there, right?
Like, and community, both community
and like, you know, I mean,
I mean, community will be very
critical here, right?
Like, I don't think that it can
happen just by logical replication.
Nikolay: Maybe if we build system,
categorizing data in tiers
as well and replicating writes
only for most critical data between
regions, right?
Michael: We're probably tiptoeing
into a different topic here.
It's
Nikolay: not different.
It's not different.
Many people think logical leads
to multi-master, definitely,
like so-called old term.
Michael: But I think the tier 0
use case feels like it's a long
way from being supported by native
Postgres.
There are a lot of providers out
there for that kind of thing.
And I think the general term for
it is being called distributed
SQL or like that seems like dist
SQL is what I've been referring
to.
But I would warn people against
using logical for this in the
short, like anytime soon.
Yeah.
Nikolay: Okay.
Sai: I agree.
Nikolay: Okay.
Then what, what other questions
do you have?
Michael: I didn't have anything
else.
I wanted to thank Sai.
Did you have anything else Nikolay?
Nikolay: Well no.
Thank you so much.
It was interesting.
Thank you for coming.
Sai: Absolutely, guys.
I really enjoyed chatting with
you and thanks for inviting me.
Nikolay: I hope you won't stop
posting interesting technical
posts.
Sai: No, no, no.
So content is our currency, guys.
I mean, the thing is, for me, I'm
building the company because
I'm very curious, right?
Like now the thing that is like
haunting me is that slot
invalidation, not like, I'm not understanding
why it can happen.
Right.
Like, so, so I've been like, so
because of curiosity, we will
be publishing a lot of content,
but
Nikolay: yeah, benchmarks, graphs,
data and so on.
Reproducible also.
Yeah, that's great.
Thank you so much.
Michael: And also, if any of our
listeners have seen that as
well, have any theories or have
a reproduction case for it, let
us know.
Nikolay: Mm-hmm.
Absolutely.
Michael: Good.
Wonderful.
Thank you.
Thank you both.
Take care.
Sai: Thanks, guys.
Bye-bye.
Bye-bye.