Postgres FM

Nikolay and Michael discuss logical replication — some history, initialization, change data capture, how to scale it, some limitiations, and ways that it is getting better.
 
Here are some links to some things they mentioned:

~~~

What did you like or not like? What should we discuss next time? Let us know via a YouTube comment, on social media, or by commenting on our Google doc!

~~~

Postgres FM is brought to you by:
With special thanks to:

Creators & Guests

Host
Michael Christofides
Founder of pgMustard
Host
Nikolay Samokhvalov
Founder of Postgres AI

What is Postgres FM?

A weekly podcast about all things PostgreSQL

[00:00:00] Michael: Hello, and welcome to PostgresFM, where we show you about all things PostgresQL. I am Michael, this is Nikolay, this is episode 63, what are we going to talk about today, Nikolay?

[00:00:10] Nikolay: Let's talk about logical replication decoding, everything logical.

[00:00:14] Michael: Everything logical, yeah, I like it. so this was your choice, why did you choose it?

[00:00:19] Nikolay: Just because I realized we didn't talk about it and also I wanted to I'd like to congratulate our colleagues from GitLab with very, very good, successful upgrades, two major upgrades without downtime at all and without data loss at all. So it was based on logical, so I think I have something to say about logical now, based on that work.

I expect from them... We actually discussed with them and agreed that there should be materials published, textual and also talks, we actually could invite them to discuss it, it's probably a good idea, but it will be mostly about major upgrades involving Logical, but let's discuss Logical itself.

And because it's interesting that more and more systems are using it.

[00:01:06] Michael: Yeah, absolutely. And it's getting better and better, right? It's, been getting better in recent releases, it's about to get better again in the next release, and you were just telling me before the call, there's some exciting changes coming in the year ahead, hopefully, as well.

[00:01:21] Nikolay: yes, I would like to use version from 2025, I think, or so, because it will be definitely much better. It's getting improvements every year. there are still pieces that people need, but don't have yet. For example, DDL, replication of DDL, one of the biggest pieces left unsolved. But, uh, it's already much better than five years ago, definitely.

[00:01:46] Michael: Wonderful. Shall we start there then? Shall we start with a bit of history?

[00:01:50] Nikolay: Yeah. Let's talk about history. Probably. I remember when there was discussion that replication should not be implemented in Postgres core. It was like 15 years ago. There was opinion that replication is something like external to Postgres. Uh, let, let's rely on other tools.

It's not the job for Postgres. It's all strange, similar to discussions of autofillover and so on, and, such systems we are created, one of the most noticeable were Slony, once of Slony, and Londeste from Skype, and also Bukardo. I used two of these, Slony and Londeste, a lot of work, to set up and to maintain and constant headache.

Especially Slony, it was lot of work, a lot of maintenance work. And these systems were logical replication. They were based on, for example, on triggers, and some insert or any write happens on a table. It's also written to different table and propagates to different Postgres. And, was... Like, overhead was not only maintenance, but also performance, obviously, because you, basically, you double your writes. So, you need to write to two places, and to avoid data loss, it slows down all transactions, writing transactions. yeah, so, I mean, Londister was great, it was based on PGQ, PGQ was implemented In a very good way in terms of scalability, in terms of performance, because it was using partitioning and like everything was well, but still some maintenance was needed and it was not a pleasure to deal with it, but in version, like later in version, I don't remember, and some version, maybe 9. 0, not 9. 0, physical application was created and it became the main way to replicate data from one node to another.

And so, I mean, people, before physical replication implementation, people relied only on third party tools like Slony and Londesty, and only on logical replication to achieve HA, high availability. With physical replication, where wall records are replicated as is, it became much, like, less headache. I mean, physical replication also not trivial, but it's, it's much more reliable.

And it's based on the idea that we have wall already, right? Wall is our way, write ahead log, is our way to recover after crashes. So if we crash, we just redo all changes since the last successful checkpoint, and we are already in consistent and good state again. So the idea was let's be constantly in recovery state on standby nodes, constantly recovering. that's why, by the way, the function to check if it's primary or standby is called pgIsInRecovery, which is confusing because you might think we recently crashed. We didn't, I mean, we probably crashed, but if we crashed and in recovery, you cannot connect. If you can connect and ask pgIsInRecovery, it means it's not in recovery, it's just a replica. I mean, standby.

[00:05:21] Michael: Yeah, that's funny, I didn't know that.

[00:05:23] Nikolay: it also reveals the implementation details, how

replication works. I was created based on recovery. And then, uh, hot standby was created, meaning that you're not only recovering on the standby nodes, but you also allow read only queries because why not? And so of course a lot of work was done to achieve this, but, uh, I'm just like trying to have some high level overview of history here.

And then after several years of development of, physical application, obvious idea was the very same wall records we use to provide recovery and physical application can be used also for logical application. It means that with logical, it's called logical decoding, you just. Read what is in wall record, if it, like, insert this, this data, okay, we can say let's execute insert, SQL insert.

By the way, it's very confusing for those who use tools like pgWalldump, and it's now in fresh Postgres, it's available. Through SQL, you can read wall yourself and see what's there. It might be confusing to see insert, delete, , and update uppercase because it's not SQL. It's physical operation, tuple level, like If you delete, hundred records, a hundred tuples, uh, you will see a hundred independent deletes, delete, um, records in wall. And they will be translated to 100 delete, SQL statements on logical replication. By the way, it also shows that, logical can be implemented differently, in different ways.

For example, in MySQL it's very different. It can be statement level. You can just take a statement and propagate it to a different machine. But in logical, decoding logical replication in Postgres, it's based on wall. The same wall we use for physical application. So, if you have multiple... Tuples, operation with multiple tuples, you will see multiple, SQL queries on the subscriber side.

It's called Subscriber, Publisher, Subscriber. Logical has different, logical replication has different terminology than physical replication. Not primary standby, but Publisher and Subscriber, things like that. So, logical decoding was created, and it opened. We opened a lot of new possibilities to replicate based on the same world, to replicate at logical level.

Meaning you can recreate cluster, you have different cluster, you lose your bloat, you can lose your indexes, you can have different set of indexes. You can have different major version, and you can have different database management system. It can be not Postgres, but something else. Based on logical decoding, right?

So, it's great. I mean, The development of it is not super fast. I mean, it's a very complex thing, and other systems implemented things Postgres lacks many years. But, currently, in 2023, logical decoding, logical replication, especially with Release 16 15, a lot of work was done to make them mature, and it's already can be used.

For very heavily loaded systems, dozens of terabytes, hundreds of thousands of TPS. And this can help you to achieve zero downtime operations. Of course, you need to understand many things to achieve this. A lot of interesting things. And still some things are missing. Things are improving. That's like my small overview of history. Details. I don't know anyone who could, like, consider slowness or roundness these days. I mean, maybe some people think it's better right now, but I would just consider always logical decoding and replication if you need logical level.

[00:09:17] Michael: The only other thing I've seen it used for that you didn't mention is that you don't have to, unlike physical replication, you don't have to replicate everything. So you can take a

[00:09:28] Nikolay: Oh, you can filter. Exactly. Or maybe you are forced to use logical. For example, if you Postgres, RDS, Cloud SQL, they don't provide physical replication connection outside of their... So it means that if you need to replicate between clouds, for example, or on your managed Postgres service, or you need to migrate, you are forced to use logical replication.

It's a bad thing, but it's also a good thing because cloud providers, they also rely on logical replication to migrate customers. From Oracle or from, from

competitors. And this helps, uh, develop, uh, logical application further because they are big users of it.

[00:10:18] Michael: And it's, in order to achieve zero downtime, right. It's, kind of the change data capture part of it. If, if, I mean, we could migrate from those systems through dump and restore, right? But not without downtime. it's the key aspect that enables you to do it without or with minimal downtime.

[00:10:37] Nikolay: So downtime is not related to dump restore. I'm not sure why you,

[00:10:42] Michael: So let's say we're, let's say we're, moving from Oracle to Postgres and we want to, we, we could take a, a dump modifier and send it through something and put it and restore it as Postgres, right? We could take the, the,

[00:10:59] Nikolay: Well, right, I'm, I'm not an expert in Oracle and... First of all, AWS and GCP, Google Cloud, they have so called Database Migration Services, DMS, and they support from system to system, different directions. Also, there are companies like, systems like Click and Fivetran, commercial systems. Also, there is Debezium open source.

So, they help to make, to create replication from... Different database management systems. When we talk about logical decoding and replication, it's actually either between Postgres or from Postgres to somewhere else.

[00:11:34] Michael: True, True,

[00:11:35] Nikolay: But I don't know anyone who migrated from Postgres to Oracle.

[00:11:40] Michael: No, not yet actually, I don't know

[00:11:42] Nikolay: Right. But let's discuss in general, picture in general.

So when you need to use it, you have two stages. First is initialization, and second is so called CDC, Change Data Capture. Streaming changes. Not streaming. Replicating changes. Because streaming is about physical replication. Let's not mix terminology. So, initialization is very challenging if you have a lot of data.

I mean, yes, it's kind of dump restore, and it can be, dump restore can be with dash J, means like with parallel, of course, you put a lot of stress to your disk system or network system, but if you can afford it, it's good, I mean, you can move faster, and there's a big trade off, for example, if you consider of logical replication for if you have, for example, 10 terabytes and 100, 000 TPS, What you do prefer?

to read from disk slower and have, like, the problem is when you read while you are reading, you already open logical slot because you need future CDC to start from the very beginning of your, like, when you read, you read it with, uh, BridgeDump reads it with repeatable read, transactionalization.

Why? Because we need a snapshot. So, all data, all tables are in consistent state, all foreign keys matching, and so on. Even if you run pgDump j, there is a snapshot there. And this snapshot cannot go away, because we need to finish initialization, copy, initial copy first, and then start CDC from that point where this snapshot was created.

So, if you read slowly... Disks are okay. I mean, like 10 percent of your capacity in terms of disk, IO, but, it means that you need to keep this, Postgres needs to keep the snapshot very long, and this will affect, guess who? Autovacuum, right? Because Autovacuum cannot delete dead tuples, which became dead after our snapshot creation.

And so we have a trade off. We want to disturb Autovacuum less. So we want to move faster, but it means that we will put a lot of stress on to our disk system. And every time you need to decide based on capabilities of your infrastructure in particular situation. So there is no universal recipe here. For example, if we say we are creating this from some system.

Which nobody is using. For example, it can be a dedicated replica with no failover, no load balance flags in Patrony, it's like a dedicated replica. Probably we can utilize 100 percent of disk I. O. capacity and move as fast as possible, because it will report with constant byte feedback, it will report to primary.

Or we can turn off constant byte feedback, in this case, allow replication to be practically paused for a long time, but we need to adjust settings for it.

[00:14:47] Michael: Is this, this is new though, right? Is this, this is new in 16?

[00:14:51] Nikolay: What?

[00:14:52] Michael: being able to do logical replication from a physical

replica.

[00:14:57] Nikolay: it's a good question. Um, you can create a slot on the primary. Uh, remember it's position in terms of LSN and snapshot. And then you can ask, open transaction on standby, a repeatable re transaction, asking to deal with the same snapshot and read it from there. Yes. You just, it is, this, this can, it's possible, it's like synchronization, and pgDump actually has not only j but it also has, uh, I don't remember, option, it has ability to specify snapshot ID. So you can dump from standby, not disturb primary. But standby, we, we have different dilemma, different trade off, hot standby autovacuum issue again. Or able physical replication while we are dumping, because we have a long running transaction. So, host by dilemma is interesting thing.

We can zoom into it one day. But in general, it's possible. And of course, if you do it from primary... Uh, yeah, it's, so, so what you're referring to Postgres 16 feature to, have logical replication including CDC from standby. This is great. This is super great.

[00:16:22] Michael: Okay, cool. Yeah.

[00:16:24] Nikolay: But I'm talking about initialization

[00:16:26] Michael: Yeah, that's a good point.

[00:16:27] Nikolay: And last bit of information here, by the way, there is a very good post from Michael Pacquere. I'm sorry for pronunciation. I am constantly using this. There are some blog posts which I'm using for years and recommend all the time. And people say thank you all the time. Like, I think we should have like something like...

Um, golden library, how to say, like, golden hole of blog posts, because there are some posts which are useful for very long time. So there is a post about how to synchronize position of slot and pgDump. It was in 9. 5, I think, it was implemented in 9. 5, this kind of thing. The only problem, if you want to know...

Logical slot position, not only in terms of LSN, but log sequence number, but also in terms of this snapshot ID, you need to create this slot, you cannot create this slot with SQL, you need to use a replication protocol, so it's kind of different type of connection, and many vendors of logical replication systems, including Qlik, Don't understand that for years and I tried to help them, they don't understand.

They continue using SQL, in this case you cannot synchronize properly and you have duplicates if your subscriber lacks unique keys. So inserts might overlap and happen twice. So, this is initialization. One last bit about initialization. You can convert physical replica to logical. This is super cool for very large databases.

You just use recovery target LSN. You create slot, you know LSN, you physically replicate until this LSN LSN and then just reconfigure Postgres to continue with CDC, with logical replication. It's possible.

It's

[00:18:11] Michael: This is so cool. last time when we were talking about replication more generally, you brought this up, but I think it was a theory. You'd not, you'd done it in tests, but you'd never done it in production. I believe now... You must have done. Well, I'm guessing that's how GitLab did it.

[00:18:25] Nikolay: one of the tricks is this. The only problem if you first, achieve recovery target LSN, switch to logical, and then upgrade, this is a problem, and there is a post on hackers which discusses while on logical you cannot run pgUpgrade, you need to first run pgUpgrade and then switch to logical, and this is one of features I think which is also under development to make it officially supported.

Without manual actions. But it's doable already. So, yeah, this is what was used there. again, like, how to understand what you need? Which kind of provisioning you need? If you have, like, less than 1TB, probably you will be fine everywhere. If you have 1 10TB, probably you would like to read faster to...

It cause less bloat. By the way, if you cause bloat on old system and you are going to switch the new one, you don't care.

[00:19:22] Michael: It's a temporary

[00:19:23] Nikolay: okay, some degradation on since logical replication, in this case, we are dealing with fresh cluster with you are losing your bloat. Like like, have a joke like. With RDS, they don't allow physical replication, they allow only logical replication, so they don't allow me to download my bloat.

Like, I can download only my data, losing my bloat, but what if I need my bloat? In some cases, for example, testing, like lower environments, for experimentation, you probably want your bloat to be preserved, right? Because this would help to, have more, like, realistic testing. More, more like in real production,

[00:20:07] Michael: I mean, I get what you're saying, but you could create your own, like, in a way you are creating your own bloat on the logical replica, right? If you're, if you've got Autovacuum running in the same way and you're running the same workload, you probably are getting bloat in yourself.

[00:20:25] Nikolay: right? But

imagine

[00:20:26] Michael: not the same bloat.

[00:20:28] Nikolay: exactly, if, if you have like 90 percent bloat, rel pages is

higher, it will affect plans. So

it's, it's,

not, it's not a good problem. Yes. Yes. So rel pages.

[00:20:41] Michael: I'm very familiar with this problem, but you're getting, you're, you're getting kind of like a different bloat.

[00:20:46] Nikolay: Different bloat is different, I mean, and also, okay, but it's a different topic. So, if you have 1 to 2 terabytes, for example, you probably want to be very careful optimizing how you do this initialization of logical application. But, also important, if you want everything or some part of it.

[00:21:07] Michael: True.

[00:21:08] Nikolay: Above 10TB, I would probably think about, if above 10TB, and if you want everything, it's better to convert physical replica to logical, preserving all bloat, by the way,

and

[00:21:23] Michael: what? So converting a physical replica to a logical replica?

[00:21:28] Nikolay: Again, you

can, you have, you have physical replica, you replay, you say, uh, recovery target will send, putting

the...

[00:21:36] Michael: with you now.

[00:21:39] Nikolay: You create a slot, you know LSM, you put it to your physical replica. And say POST, for example, or PROMOTE, depends. I don't remember, this is also interesting detail. I don't remember our decision actually here.

We experimented with it. You can say either PROMOTE or POST when it reaches LSN. Once it's done... You can say, OK, now it's logical replica. All, all my bloat will be with me. It's, it's bad and good. I mean, it's good for testing. It's bad for, I mean, I would like to get rid of my bloat. If you provision logical replica in regular way, you are getting rid of bloat.

It's like dump restore. It's a good thing, but in this case, uh, we, if we have a lot of data, like 15, 20, 30 terabytes, probably we want to convert physical replica to logical. But only if we don't need to take on it. For example, if we want, if we need to take 10 percent of it, it's kind of resource waste.

Like you will need to perform a huge cleanup afterwards.

[00:22:40] Michael: Yeah, of course, because you're physically replicating the entire thing for only 10 percent of the benefit.

[00:22:46] Nikolay: Right, because physical replication doesn't allow you filtering. While with Logical, you can say, I want these tables only,

[00:22:54] Michael: Or even, like, as of, as of recently, you can even do, like, just these columns from these tables, or even, even, like, where, it's pretty impressive how much you can filter that now.

[00:23:05] Nikolay: right. Next step would be, let's probably replace PII with something, right?

[00:23:11] Michael: Yeah, I guess that is a change data capture problem. Yeah, you could, that's a cool idea, yeah.

[00:23:16] Nikolay: Like, let's create some rules and in some tables, some columns, we will apply some modification on the fly. And during initialization as well, this would be super cool feature, I think.

[00:23:30] Michael: Nice.

[00:23:31] Nikolay: Gexp or something. Yeah. I think it's doable. And it's doable right now already if you go down to logical decoding level, losing all features from logical replication, native for logical replication, and just apply yourself, making some transformations if needed. But again, like if it's a small database, regular approach. It's a, it's a larger database. Think about this trade off, disk versus, overhead related to vacuuming. And if it's a super big database, consider a physical to logical approach. It's an interesting trick, underappreciated, I think, right now.

[00:24:06] Michael: Nice. Do you think there's a limit, size wise, like, if you're accumulating changes on the primary during that, that flip, what's the impact?

[00:24:18] Nikolay: This is already about CDC.

[00:24:21] Michael: Okay, cool. Okay. Sorry. Do you want to move on to that?

[00:24:24] Nikolay: Yes, yes. So, so CDC is interesting. First of all, most of the problems you will experience with logical are related to CDC. Inability to apply changes from standbys, from physical standbys until Postgres 16, which hopefully will be released the day before we publish this podcast.

Since we are not online, right? Fingers crossed, right? So, this solved. Like, why this is important? Because nobody wants to start from scratch if primary is down. For example, and it's good to be able to, uh, replicate from standbys also in terms of. Overhead. Of course, logical decoding is small overhead. There are cases when, , Valsender, which sends data for logical replication, can saturate, , one core.

And if you have multiple, like, single slot, it's always single process, well, single Valsender. And usually, its utilization is quite low. In some cases, for example, if you delete a lot of rows and then rollback, delete a lot of rows, rollback, delete a lot of rows, rollback, Like, this is a harmful workload for logical replication, and at levels 1 or 2 walls per second you will see...

Wall sender saturating your CPU, even quite good one, like Xeon, Modern, like Cascade, like something, you will see 100%, and it will be bottleneck of logical replication, which, by the way, is very hard to solve, because even if you So if you, for example, decide to use four slots, for example, all of them will be parsing everything, even if you need only part of data, they will still parse all data.

So you will see four slots. But it's very rare. It's interesting. In my experience, I saw it in non production, and I thought it's a big bottleneck, but in reality, our real workload is not like that. It's not like marking a lot of tuples with X marks, meaning they are deleted and then rolling back. No, no, no.

Normal, normal writes are different, so probably you won't see a situation of false senders. If you do see a situation of false senders, or CPU situation, it's bad. It's not solvable right now. Bad news. But, again, it's rare. So, probably you don't, like, being able in Postgres 16... Being able to perform CDC from standbys, from physical standbys, it's good because you don't have overhead on primary.

different thing, what happens if primary is down, I don't know if it's possible to... Like, I'm new to PostgreSQL 16 also. So, big problem is how to sustain various failures. Logical applications should be also highly available, because right now, at least until Postgres 16, if... Primary is down, or the node, like, from where you perform CDC, Change Data Capture, is down.

You need to start from scratch, you need to redo initialization, and this is a huge headache. I see, the work to sustain switchovers, failovers. ongoing for Postgres 17, and I also know, using some tricks, Kukushkin implemented for Patroni, Kukushkin implemented support for switchover. So, we stream, we logically replicate from primary, but if we switchover, we don't lose slots.

Slots should persist. this problem, I think it's not solved in 16, but it looks like Good chances it will be solved for and it's solved in Patroni using some tricks, using your own risk. I remember, I think, Craig Ringer, Craig Ringer helped in

the GitHub issue with some good insights how to implement it correctly, but officially it's not supported. I mean, officially by Postgres it's not supported. And it's a big problem of CDC, right?

[00:28:28] Michael: yeah, that makes sense that I didn't know it was solved in, or there was a solution for it in Patroni. That's very cool.

[00:28:35] Nikolay: Yeah, so there are two problems, big, big classes of problems, high availability and performance. I already touched a little bit both. About high availability, node is down, okay, use Patrony and this feature is good. What else bad can happen? For example, DDL can happen,

right? And DDL is not solved. It's in commit first, requires review. There is, I think, version 4 of patch proposed. It's good to see that it's in work, in progress. This is super good if this will be solved. In future Postgres versions. Because if DDL comes, usually, at least you can mitigate it in many cases. I mean, what happens, uh, CDC stops.

I mean, logical replication stops. The slot starts to accumulate a lot of changes. Risk to be out of disk space. And you need to just, you just need to replay the same DDL on the, subscriber in most cases. It will help.

[00:29:42] Michael: And then it will restart automatically or you

[00:29:44] Nikolay: Yeah. Yeah. It will pick it up. here we like must say logging all DDL is a good idea. Always. All DDLs should be logged. Must be logged. You will like save a lot of nerves if you have logs of DDL.

For example, here you just fix it manually. I did it manually many times. Some DDL happens, especially if you have some automated partitioning.

And, yeah, yeah.

[00:30:11] Michael: I'd heard people, like, for example, if you're doing a major version upgrade using know.

no changes while we're doing the upgrade, but if you're partitioning that's happening automatically like

[00:30:25] Nikolay: You need to post automatic partitioning.

[00:30:27] Michael: Yeah, or create the next few or like, create the next few in advance, like, depending on how you're doing things.

[00:30:35] Nikolay: Right. Another interesting thing that some DDL can be left unnoticed. For example, if some index was created on the publisher. Nobody will notice, because indexes don't matter, but you can lose this index, and like, one month later, you will realize you have slow queries again, and you need to recreate it, because it doesn't work.

Index creation won't block you.

[00:30:57] Michael: No, but it's also not replicated.

The, well, that's also a feature though, isn't it? Like, I like the, I've heard of people using it for analytics database where you want the data index in a different way or you want, like, You want to do roll ups on it or something, and you don't want those indexes, so it might be, I wonder if they're going to do it so it's optional to, or you can control which DDL gets replicated.

[00:31:21] Nikolay: All right. Yeah, I agree, yes, yes. That's a good idea, and with logical replicas, so we don't have hosting by feedback problem. But we might have problem with... Deleg. For example, in many cases, one slot is not enough. and I'm talking not about wall sender, which is quite rare. I'm talking about logical replication worker on the subscriber side, and it's an ability to catch up all the changes in proper time.

It can be related to disk IO or CPU situation. I saw both. And in this case, you might want to consider multiple slots. But here we have an interesting problem many people don't realize. It's a minor problem actually. So the problem is that if you publish everything, this DDL is very light. You say, I'm going to publish all tables to this slot.

Okay, it's easy. Just, we can just continue. But if you want to balance, first of all, you need some balancing algorithm, and it's good if you base this algorithm based on tuple statistics from pgStatAllTables. Saying, okay, we want to distribute writes between 4 slots or 10 slots, like, as even as possible.

But then you are going to publish a bunch of tables to one slot. And this publishing requires alter table. And after table, guess what? It requires a lock. Unlike publishing all tables.

And when you want this lock, guess who can block us? our old friend, Autovacuum running in transactional zero parameter prevention mode. Which everyone, every DBA should hate. Because you prepared beautiful procedure, you are going to implement, to run it at 1 a. m. not sleeping, and then you forgot about this. So any changes, any, like, any DDL, anything, any complex changes, always should keep in mind that Autovacuum is yielding and will kill itself, unless it's running in transaction ID wraparound prevention mode, which won't yield and you need to take care of it yourself.

So this is interesting, but fortunately, this kind of issue, this kind of conflict won't lead to downtime, unlike alter table at colon. Because alter table add column will block selects. This kind of log, I don't remember details, but this kind of log, uh, which is needed to define, publication for table.

Even if you cannot acquire this log and see it for 10 minutes, for example. It happened some, some day in my, in my experience. You will just be blocked. This is the worst outcome. Selects will still be running. This is good.

[00:34:12] Michael: What about modifications?

like, updates, deletes?

[00:34:16] Nikolay: yes, so, DML won't be blocked. So, probably, likely your users won't notice it. So, it's, it's a super interesting thing how to distribute between slots. And there is also another very big caveat from having multiple slots. With single slot, all things are simple. With multiple slots, for example, imagine table A and table B, and they have foreign key between them. Table A is published to one slot, table B is published to different slot,

[00:34:46] Michael: Like a race condition or something.

[00:34:48] Nikolay: slots, two different legs.

Right, so, so foreign keys is a problem. By default, they will be ignored, by the way, by logical replication. I didn't know about it until recently. So logical replication ignores foreign keys and allows... Eventual consistency, if you use multiple slots. So Postgres, uh, eventual consistency is, uh, eventual consistency is a, um, official approach in Postgres.

As

[00:35:14] Michael: Wow.

[00:35:16] Nikolay: just use logical and multiple slots, that's it. And you can enforce it, of course, but you won't like it if you enforce it. So, it depends on the situation, of course, but if, for example, it's for big migration from major Postgres version to another major Postgres version, or you deal with glibc version change, something like operational system change, in this case, probably you want to allow it, just don't redirect read only traffic to such replicas.

[00:35:45] Michael: Yeah. Yeah.

[00:35:46] Nikolay: I don't, like, of course, don't redirect the data, because it will be split brain. Or multi master. Split brain, multi master, it's almost the same, right? I'm joking.

But,

[00:35:55] Michael: you probably want a short pause before resuming traffic. Like

[00:35:58] Nikolay: of course, to catch up all the slots, to ensure no data change, no changes are lost. Of course, of course. And in this case, it will be already in a consistent state again.

Because publisher guarantees it. Foreign keys on publisher's side guarantee it. But it's interesting, right? So if you use multiple slots, you cannot redirect it only to AFIC as well. It means testing strategy is different.

[00:36:24] Michael: the only thing I don't think we've covered that I had. to look out for was, I've seen a few people, port issues with sequences. So like, those not synchronizing between, the publisher and the subscriber.

[00:36:37] Nikolay: Honestly, I don't

remember. details. Like, I think there was some work done to support it. Eventually it should be supported. Not in 16, right?

Not yet. Right. But it's not a big problem. You just dump all values and you... Propagate them to new primary and just add some constant for safety.

sequences are 8 bytes. So if you add one million, nobody will notice it. I mean, in some cases, of course, big gaps are, is a problem. It depends. But, uh, it requires some additional coding, but it's not as, uh, annoying problem as a lack of DDL replication, in my opinion, in my case, in my cases I had.

[00:37:22] Michael: Yeah. It's, or it's solvable as opposed to not solvable I've just seen it trip people up on their first attempt.

[00:37:29] Nikolay: Right. Right. Or just use UUID, but version 7, please. Which is not officially supported by Postgres, but some, some, uh, libraries in your favorite language already support it. By the way, news from there is that, standard progress is happening. hopefully we will see standard, uh, releasing version 7. I mean, RFC for UID version 7 soon, and we already have a patch for Postgres.

If things go very well in Postgres 17, we'll support this.

[00:38:03] Michael: Very cool.

[00:38:03] Nikolay: Postgres is waiting for R f C to be finalized.

[00:38:08] Michael: Nice. And these are like timestamp ordered

[00:38:12] Nikolay: They, they call it Le Lexi. Lexi, or, yeah, so, so basically if you order by new, new ideas will be close to each other, right.

Orderable UUID, right?

There are many things about logical. There are many things. I don't remember everything. I just wanted to share some most, like, challenging things I had recently.

[00:38:37] Michael: Yeah, that's wonderful. I've learned a lot.

[00:38:40] Nikolay: about multi slot especially. I've learned myself recently many things about it, but I think, honestly, if you just, you can...

The good thing about... Logical replication, if you do it very carefully. Not allowing slot to accumulate of a lot of data to be out of disk space. By the way, Postgres 15, I think, got, or 14, got, previous versions, got some setting to say maximum number of the slot than to allow out of disk space. Right, so, if you do it carefully, users shouldn't notice your experiments. And you can run these experiments on production and test it. Prepare good procedure and when it comes to deployment, you already have everything verified right on production, right? So this is good thing about it.

And if you just start with one slot and see it's enough, it's good. You're

lucky. If you have, for example, multiple thousand... Tuple writes per second. For example, 10, 000 tuple writes per second on primary. It's quite significant load. So in this case, probably you will need to have multiple slots. In this case, if you have partitioning, you will say thank you to yourself, because if you have huge table and partition, it will go to one of the slots, representing, for example, like 50 percent of all load. Partitioning helps here. It doesn't help in some cases, but here it helps a lot. To redistribute writes among slots.

[00:40:16] Michael: Mm

[00:40:16] Nikolay: And then, uh, if you use multiple slots, be careful with this foreign key issue. Because, uh, your replica will be inconsistent in terms of referential integrity at any given time until you perform switchover.

And switchover is interesting. Probably we should discuss switchover failover one day, also.

[00:40:38] Michael: Yeah. I liked your idea of the algorithm to work out which tables should go where, but the foreign key thing makes me think if you've got kind of clusters of tables that, have foreign keys between them and a separate cluster that... Have one, like, if you could separate those, that would be cool.

[00:40:54] Nikolay: I explored this idea with machine learning with k means. It, it, it works quite well in some cases. Depends on, on your schema, so you can, right, and we, we know k here if you want four loads. Our K is four , so it's easy. But, uh, unfortunately in well developed systems, relationships between tables are usually quite complex, but it's, it's a valid idea. I agree.

[00:41:21] Michael: Yeah, awesome. Well, thank you so much, Nikolay. Thanks everybody for listening, and see you next week.

[00:41:28] Nikolay: C o of course, as usual, reminder, not to forget, we have YouTube, we have. podcast published everywhere, literally. So, Spotify, apple Podcasts, everything. Just go to posts, fa and subscribe and you'll see it. And choose your favorite method. As usual. Please subscribe. Please rev reviews, stars reviews, and leave ideas on YouTube or on Twitter, what you want.

We have a document, uh, where you can add your idea for future episodes, some feedback. So we would really appreciate it. And on YouTube, we continue publishing Polish, transcripts, subtitles, right? So if you want, you can recommend your friends who barely understand my Russian accent or, Michael's British accent and, uh, Subtitles help a lot.

And also they can be automatically translated quite well to many, many languages by YouTube itself. So you can see. subtitles in your language,

[00:42:32] Michael: Yeah, so yeah, YouTube's great for the transcription and for the translation. And the podcast ones, we do edit out some of the, the ums and the ahs and also some bits where we go off topic a little bit too much. So if you, if you want the slightly shorter, version, then those are there for you too.

[00:42:49] Nikolay: right, right. So we invest in quite an effort to, to, these subtitles and we will continue improving. Okay. Thank you. See you next time. Bye.

[00:42:59] Michael: Bye.