Postgres FM

Nikolay and Michael discuss major and minor version Postgres upgrades — what they are, how often they come out, and how regularly we should be upgrading.

Here are links to a few things we mentioned:

Postgres versioning policy
why-upgrade (by depesz)
postgresqlco.nf (by Ongres)
postgresql.conf comparison (by Rustproof Labs)
pg_upgrade
Logical replication
CHECKPOINT
amcheck
Locale data changes (e.g. glibc upgrades)
ANALYZE
Upgrades are hard (summary of panel discussion by Andreas 'ads' Scherbaum)
spilo
Recent pgsql hackers discussion about using logical and pg_upgrade together

------------------------

What did you like or not like? What should we discuss next time? Let us know on social media, or by commenting on our Google doc.

If you would like to share this episode, here's a good link (and thank you!)

Postgres FM is brought to you by:

Nikolay Samokhvalov, founder of Postgres.ai
Michael Christofides, founder of pgMustard

With special thanks to:

Jessie Draws for the amazing artwork

Creators and Guests

Host

Michael Christofides

Founder of pgMustard

Host

Nikolay Samokhvalov

Founder of Postgres AI

What is Postgres FM?

A weekly podcast about all things PostgreSQL

037 Upgrades
===

Michael: [00:00:00] Hello and welcome to Postgres Film, a weekly show about all things Postgres career. I'm Michael, founder of PG Mustard. This is Michael who? Nikolai founder of Postgres ai. Hey, Nikola, what are we talking about today?

Nikolay: Hi Michael, let's talk about upgrades

again, your choice as usual, very boring topic, which I don't like, but we all need to do it. I like to be always running the freshest version, including minor version of pocus, but no, we don't have, C I C D with automatic deployments to all POCUS installations, unfortunately, right.

Michael: That's one of the reasons I really like this topic though. I think it's something that a lot of people want to be often on the latest version or there's, there'd be benefits for doing that, but there's enough.

Downsides or there's, there's enough friction at the moment. It's not always easy for people to be on the latest versions, and there can be quite a lot of work involved.

So I wanted to make, like to discuss that a little bit. Things that we can do to minimize that or, give people tips on, on things to look out for. . But yeah, I was actually surprised how [00:01:00] many people, I did a quick poll before the episode on social media, and I, I think about a third of people are like you, they like to upgrade it every year, which is, well, I guess that doesn't necessarily mean they're in the latest version, but, chances

Nikolay: at least once per quarter. If we talk about minor grades, right? And.

Michael: Yes. So should we go to that first actually, in terms of what the differences.

Nikolay: Uh, yeah, let's discuss. So minor upgrades, uh, is, uh, like security or back fix upgrades, and they don't change in terms anything in terms of functionality most often. Sometimes, uh, some features are, and, uh, major upgrades are every year we have big, uh, major positive version, new one with a lot of new things. usually. Good ones.

Michael: So, yes. So in terms of it's, it's not as, it's not like some projects, right? So some projects you get kind of semantic or simva and they'll only, they, they will release new features in minor upgrades, whereas Postgres doesn't. And I think that's really [00:02:00] important. I think not only will they only release new features in major versions? It also means a major version does not necessarily mean breaking changes, whereas in other, some other platforms, a major version will bring breaking changes naturally. It's also the only place Postgres will, uh, introduce breaking changes.

but it doesn't mean there are, I think they're quite rare actually. I think Postgres is really good at maintaining backwards compatibility. So I think sometimes people think maybe they're a bit too scared of major versions in Postgres, whereas, obviously you need to test, but it's, if you read the release notes, there aren't many things that your existing system, will stop being able to

Nikolay: Right. Well, yes.

So, uh, speaking of, differences between minor and major oceans, it's still worth remembering that, POCUS changed, uh, versioning schema from three numbers to two numbers, uh, roughly five years ago, right when, uh, POCUS 10 was released after post August 96, [00:03:00] and this caused, two issues. Some people started to think that, uh, there is such version as.

just major version nine never existed, but some people vice versa. They started to think that, uh if you check no, there's no different problem. Sorry. Right. Or the, I don't remember.

Michael: Yeah, I didn't see anybody confusing, for example, 10.0 and 10.1 and thinking those were two different,

Nikolay: Yeah. I also

Michael: versions. But I did, I do see people referring back to like 9.5 and 9.6 as if they're the same version. But those, so if you, it's, that's the time it changed right after the, the 9.6 was the last point Release.

So Nine Dots. Oh, so I'm calling that a appointment release. Um, that was a major version. And then we went to 10.0 and then we went to 11.0 being like the, the next major one after that. So even back when it was nine point something, these were major versions that were released annually.

and yeah, it's. Now looking back, [00:04:00] the oldest supported version is no longer, confusing. So it must have been more than five years ago, which is cool.

Nikolay: But we still have some production systems with all the version, fortunately are not supported and not upgraded yet, which lag in terms of upgrades a lot. I wanted to, uh, returning to this question, when to upgrade. there are two different, uh, also different directions. First, uh, we all want to be up to date and benefit from, uh, all fixes, including security fixes and also from new features. Including performance related features. And at the same time, my nature is more development nature and although I, advise a lot of, ops people. And companies in terms of, uh, administration focus, I, I always tell me, tell, like, like people that my nature is development. I always try to move forward with new features, new exciting stuff, and so on.

but of course, uh, over time, since I have many years of experience, uh, [00:05:00] I developed, some practices to avoid, moving too fast and too early to untested and so on, but at the same time, Big, uh, idea. Let's not use the latest version because it's probably not, uh, ready for our very serious production case.

And this is reasonable. So there is, for example, when new measure version, is released, some people say, okay, I will wait until couple of bug fix release. Right. Like, okay, now 15.0 released, I will install 15.2 only. Right? I will wait. And this is reasonable because, uh, back fixes happen with a new measure version released a lot of new testing hap is happening in reality in re production testing.

Right? And you can benefit from it if you wait. But if everyone is thinking in this. Imagine that, uh, it'll not be tested well enough. So it's, it's,

Michael: I also think in [00:06:00] practice it hasn't worked out well like that. If you look back at it's, at least since I've been following Postgres, there haven't been major issues in the point 0.0 releases except for one occasion. So in the last seven, eight years, there haven't been any, huge issues in.zero that weren't in.one except for 14.0, which had the, which was only.

Nikolay: 14.4. Yes. But

Michael: Exactly. So you, that's the only examp

Nikolay: a lot of corruption cases happened as well and so on. No,

we've just, well in, in all, uh, versions, something happens, for example, uh, create recreate index concurrently, uh,

rein index concurrently create

Michael: example I'm talking about from 14, right?

Nikolay: Right. But there, there, there were several problems with create index.

Concurrently, under index, concurrently, not only in 14 before also happened. [00:07:00] Issues happened in 13, where it appeared in 12, I don't remember. In in 11 maybe. Right. Several, like almost every year something happened with this unfortunate, uh, but very important functionality, which doesn't follow a C I D principles because it can leave you with some leftovers.

And this is not transactional of course, but under load, this is what you need. Uh, create, uh, index concurrently, or index. So this is only one example there. There are issues in Jed, some issues in gene, in gene, in gene indexes. There are issues. Every release has some issues, measures

always

Michael: of, course. Right. But otherwise we wouldn't need bug fixes. But what all I meant was that the biggest issue I've seen in the last six, seven years wasn't fixed until dot four. So anybody's waited until.one or do two didn't benefit from that strategy in that

Nikolay: Yeah, that makes sense. But this example shows, uh, the downside of this, uh, tactics. Uh, let's, let's [00:08:00] wait a couple of back fix releases. I agree with you. Uh, and, and also well, security with back fixes happen at any time. It can happen with 10, 15 anytime,

Michael: Well, until it's out of support. And that's the other thing, driving upgrades, right? If you, if you want security fixes, you need to be on a supported version of pocus and that's quite important. Cause eventually, like there's a nice window, five years is a nice window in terms of length, but it's also not that long like for large enterprises.

by the time you've tested an upgrade, maybe you've, that's taken you six months to a year. You've already lost one of those five years, and then you need to start testing the new one before that one gets out of support. So even large companies are probably having to look at this at minimum on a three to four year horizon.

So it's, even if you're not doing it every year, even if you're waiting, like, so it's not the latest version, and then you, and then you want to be, make sure you're on a supported version. Your window isn't that long, but it's long enough. I think I, I think it's a very good [00:09:00] policy.

Nikolay: Yep. So, uh, long story short, we want, to have, uh, if we are good, positive ecosystem citizens, we should, uh, start using new major version earlier as soon as possible on maybe a smaller databases or less critical databases. First to help testing and so on. In production, I mean here at testing, I mean using in production and deliver feedback and so on, and, and if bug happen, okay, you participated and found something, but I understand those who wait a little bit Also, I understand them.

Michael: Yeah. I think also if you, if you've got a system that would particularly benefit from a new feature, like if you've got, let's say it, it might not be one of your more critical ones, but there might be another reason that you, that you can take that risk with a. Different database, right? If the, if the benefits of the new feature might outweigh some of that risk.

so I, I've definitely seen some people skip some major versions if there's, there's not any major features in there that [00:10:00] would make a big difference for them, but then go quite quickly onto the next major version because there is a feature that's particularly interesting that would help them a lot with some other problem.

So I, I, I've seen that strategy a little bit as well. kind of picking and

Nikolay: or or since like, upgrade might require a lot of efforts from many people. Sometimes the companies have, major upgrades only once per two years just to save, on overhead because it's painful. Still a lot of, lot of pain there, but minor upgrades. Let's talk about minor regrets.

Michael: Yep.

Nikolay: Oh, do you think, uh, it's worth upgrading always to the very fresh version in terms of mine upgrade?

For example, if I'm on 13,

I I want which version?

Michael: I, I can't keep track. It's probably the number of quarters since the,

Nikolay: it's printed on the main page always. Uh, uh, well, not always, but, uh, latest releases and so on. So now it's, 13 point 10. All right, so it's

Michael: So two, [00:11:00] about two and a half years since it came out, I

Nikolay: 11th, upgrade 11th minor version in this major branch.

Michael: So a good question. I'd say most of the time, yes, I don't see much downside to upgrading minor versions. It, there's very, it's, we're gonna, I guess we're gonna talk quite soon about, Some of the difficulties of major version upgrades, but we don't have those with minor versions. A lot of the cloud providers make it extremely easy

Nikolay: But their leg, always leg because,

Michael: do, sorry, but for minor versions, they're quite good.

Um,

Nikolay: no. I, I, I mean, I, uh, depends, but sometimes it's several weeks and if it's security bug, it's not good.

Michael: Yeah. Agreed. So, yeah. What do you think? Do you think it, there's no, that you should always be on the latest line.

Nikolay: yes, I think yes. But you still need to test it. Some, uh, some automated testing should, some workflow should be developed, like tested first on lower environments for a couple of days on vendor release and also, Extensions. [00:12:00] They also, they have some of their own lifecycle, except those which are country models inside pog distribution, POGO country.

so they also require testing. And I had the bad incidents, uh, causing SEC fault, uh, when some, Extension was released, not properly tested, and it happened to reach some critical production system, not released, not tested in both cases by developers who decide to release it and by, sre, who, who just forgot to put it to some stop list for upgrades, proper test to pro, and they had a automatic upgrade of the extens.

Using it was Santo or Buntu, I don't remember, but it was some automatic upgrade of packages, uh, without proper testing. So two mistakes and we have bad consequence. Some testing is needed, but, uh, in general, you, you should be on the latest version in terms of minor upgrades and there is a perfect resource, uh, to check the diff in terms of [00:13:00] functionality.

Why upgrade de com? Why do, why upgrade with dash.desh.com. Always recommend it.

Michael: Yeah, it's fantastic. It's really good for major versions too. I like that they. kind of big red warnings for security issues. Um, it's quite a nice resource if anybody's ever interested in being able to search like text wise as to whether like there have been any changes to a certain feature or any commits with mentioning a certain feature in the last couple of major versions, or if you want to see what's in specific minor versions.

It's a really good resource. Should we talk through the options people have for major version?

Nikolay: Let, let me confuse, uh, you here, uh, you, you, I just wanted to point out, I saw on why upgrade there is also different. for major versions. Which options were removed? Which were added in terms

Michael: Yes. So like configuration.

Nikolay: Yes. Yes. So, It's, it, it has as well also useful cause, uh, not [00:14:00] only like entries in release node, but also configuration options diff, which probably, uh, returns us also to another source.

Uh, I, I mentioned quite often postgre scale core NF to deal with, uh, details about configuration options for different versions, major versions.

Michael: There's one other resource that I've forgotten the name of, but Ryan Lambert from Rustproof Labs released it and it will, it'll show you not just the config, like it'll show you any defaults that have changed as well. I'm not sure if the depe one does that, so like if it's an existing configuration option, but the default has changed.

Nikolay: Default is very, very, very fragile topic because there is default defined in, um, postgre source code. And there is also default, which sometimes is different. For example, for shared buffers, which is defined by official postgre scale package up or, RPM package. And, uh, you, [00:15:00] which default are you talking?

Michael: I think he builds from source, so I suspect it's that one.

Nikolay: the, that one you probably see, uh, when you select from PPG settings. There is, uh, is that value or something there. And this default came from sources, but usually people don't have different defaults slightly. So if you check how postpositive scale, for example, up package is, uh, created, then you will see that it, it overrides some defaults.

So it's interesting topic.

Michael: I think what these sites are good for is kind of giving you, it's like reminders of things to check. Like you should still do your own checks, right? It's not that you should treat these as, gospel. It's more that they remind that you, oh, these, some of these settings might have changed as well. Does, what's that like in your system?

Is, does your cloud provider change it to something different? Like, does your cloud like you,

it gives you an idea. I, I understand.

but I think it's, I find them a good reminder of things that I should be checking, like should, like is, are there things that I would've [00:16:00] forgotten to without looking at some of these tools?

Nikolay: Right. Well, yes. So, regarding minor upgrades, how do we do it? I prefer doing it, with minimal downtime.

Michael: Mm-hmm.

Nikolay: it's possible to do it almost without downtime if you, if you have, uh, some, uh, proxy, connection pool in the middle, uh, supporting Pauls resume approach, like pitch bouncer does. It's possible other others also try to implement it.

I see. so, but it's slightly different topic. Uh, my main advice here is don't forget to issue explicit checkpoint because if you just restart you, you need to restart focus. You need to, your mind upgrade is just replacement of binaries, basically. You, new binaries, you need to. Stop using old ones and start using new ones.

So let's restart. But restart. If done just straightforwardly, like not thinking about what's happening under heavy load with [00:17:00] higher max Maxwell size, if someone tune checkpoint properly, restart. Might take time. Significant time because. Pogs, uh, when, uh, shutdown attempt is occurs. Pogs, uh, first Executes shutdown checkpoint to flush all dirty buffers to the disk.

So to, to save, basically, to save all pages, which are not safe, uh, from memory to disk, obviously. Right? But while it's POS is doing this, no new queries are accepted. So clients start observing errors and if it takes a. in have load clusters. It might take a minute. it's not good. So instead of this, instead of just allowing pause to implicitly do every, everything, you issue explicit checkpoint right before you restart.

In this case, shut down checkpoint will not have work to do. Right? So you already say, well, a little bit because the, like, how much of buffers happened to became [00:18:00] Jordi again? Jordi means like Chan changed in

Michael: Mm-hmm. Mm-hmm.

Nikolay: And just a few, right? Like if you do explicit checkpoint, just SQL checkpoint under super user and then immediately restart immediately after it.

well, not immediately. Of course, everything has duration, right? You restart. Only few, uh, buffers, are now dirty. And Sha chip point have just little work to do and restart becomes fast. I see problem here that POS doesn't tell somehow, like it could tell there is opportunity here, but it's for, uh, improvements.

Of pogs, there is opportunity to suggest what POGS is doing right now in terms of this shutdown checkpoint. For example, every one second, it could report to logs or to output to like, uh, shut checkpoint, like progress bar or something like 20% done or something, right? Because some people observe it many times.

Some people issue restart or, stop for positive service, and after 10, 15 seconds [00:19:00] they become very nervous. Not understanding that it's still doing some work and they start doing very crazy bad things. For example, kill minus nine if kill and so on. And then this is like not good at all. Uh, you already have a crush basically and need to recover.

And then during recover also they have similar issue not understanding. And what this post is doing right now, if you check ps, what I do, usually I check PS during recovery time. If, if it happens with you, we have recovery. And in, in PS in detailed, um, in top, for example, top, uh, you run top and then press C.

You see common details. and there you will see details for pogs. Even if it's still starting up and doesn't allow connections, you will see Aand and it's progressing. this is the most important. I just discussed it earlier today on different topics. The most important, uh, progress bar, ux, feature is to let you know that something is, is [00:20:00] happen.

For good. Right. So some good work is being done right

now. Unless if you do don't have it, you can start be very nervous and do mistakes. So this is what we have. I have in my mind, uh, speaking of restarts and for minor group, we just need to care about restart. Well, some other features like what about extensions as well?

They want to upgrade. At the same time, if, if they, oh, of course you need to check release notes. So that's why we mentioned why upgrade, because the reason notes might, might, might tell you that, for example, you need to index some indexes.

Michael: Yes. We, we we're, we are not talking about, I, I guess actually Yeah, for even I was thinking it shouldn't be true for minor versions. Yeah. I was thinking it shouldn't be true for minor versions, but Exactly. Right. Yeah. Um, really good

Nikolay: for fortunately, if if you check what happened carefully, for example, for 14.4, it told that some indexes might be corrupted. Well, we have some [00:21:00] tools to check which indexes are corrupted, at least for b3. There is official tool arm check. It does work for Gene and just indexes.

Yet there is work in progress, still not, committed, but, uh, for arm battery we have quite powerful tools. So if you don't want to index everything, also you can plan to index everything just for, for your safety to get rid of blood and so on. But if you don't, don't want it to.

during downtime because for 14 or four we require downtime, unfortunately. or if, unless you involve logical replication. We talked about it. We will talk about it in a minute. So you need to find exactly which indexes are already corrupted and index claim

Michael: Well, uh, yeah, I think you're right. I think that it, that would give you a quite high degree of, um,

Safety, but equally you, it, amk will tell you that a index is definitely corrupted, but it can't tell you [00:22:00] for sure that an index is not corrupted. So I would still be nervous, like if, especially around 14.4, I would've still in reindexed everything personally.

Nikolay: think you are not right. Uh, I remember problems that early versions of, uh, jist and gene support in arm check, which is still in progress. They had, uh, false positives telling that, uh, some indexes. Um, well, there are different types of corruption, first of all, right? And when we check heavily and this check requires, some exclusive.

Not exclusion, but some heavy logs of blocking at least, uh, or either DDL or rights, I don't remember exactly. So you, you cannot do it on the primary, just, just online. You need to do it on some, uh, copy of your database. So if they, if the tool says nothing, it's, it's indeed nothing in terms of what it checked.

So no false negatives.

Michael: Okay, this is, this [00:23:00] confuses me. I thought it was the other way around. I think I might have even made this mistake before. I'm getting

Nikolay: Well, I might be mistaken as well. Worth check worth. Double, double checking. But, in, uh, real life we used, we, like, we could not afford the indexing everything, like during downtime or immediately after. I actually, I, I think for this case, when we do minor upgrade, if we know already that we have already.

Corruption. We found it. Uh, well, we can do it online after a minor upgrade. And prob probably think about, consequences of this kind of corruption. it, it's depends on the application, how, what exactly happened and so on. But, well, of course, worth checking how I'm check is working. It's like it requires additional, expertise I don't remember everything. I, I'm checking myself all the time as well. I, I can share only my mistake. I did it twice. I thought, um, check cannot be par, cannot work on parallel. Of course, if you want to check, it's like with Lin indexing as well. Sometimes you need to move [00:24:00] faster. Like, uh, ization of analyze and some processes, useful during upgrades.

So Amek can be executed with the J option if it's c m check and I keep forgetting about it. And I have, uh, several versions of my own implementation for pluralization of ze. So don't be like me, just use the existing tool. It supports it.

Michael: while we're in this area of like talking about corruption, I feel like it might be sensible to discuss gipsy versioning and

Nikolay: but it's more about major upgrades. Sometimes people try to combine, combine several big changes in, in ones because it's, it's stress. Sometimes you need some downtime. Most often you need downtime to plan downtime when you run picture upgrade. Even if, if there's dash K or dash dash link, so her links involved should be fast, like, like a minute or two, but still it's a minute to two.

[00:25:00] You need to plan. Then also you need analyze probably in stages depending on application. If your application works well, when default statistics target is low. So, rough statistics,

still, it's either downtime or no, like depends, but. overhead, significant, and sometimes management might decide to combine multiple steps in one, and if you change, for example, operational system version, Switching from Aldo Buntu, for example, 1604 is already out of picture.

In terms of support we need to change. Every gypsy version changes should be checked very carefully in terms of, possible corruption. So again, like unchecked is very important there. But in generally, if you can afford not doing multiple big steps in, in one step, not not merging them, I would do separate upgrades.

Operational system, separate, upgrade for , measure version, probably hardware. Hardware is easier usually, [00:26:00] and so on, just because it reduces the risks, of being in situation when you don't understand what caused, some issue like you upgraded, you see it's something goes wrong and you start, you have wider.

field to analyze what, what is like the possible, uh, reasons of it, right? So root

cost analysis might,

Michael: multiply like Right. They like It could be one. It could be the other way.

Nikolay: mm, dependencies.

Michael: so many. Exactly.

Nikolay: So root cost analysis, uh, in case of issues after upgrade might be very expensive. and, Problematic. So if you can afford doing separate steps, you should afford doing separate steps. This is my usual advice. But, uh, of course, uh, real life sometimes tell us, let's, let's combine steps in this case.

Uh, just proper testing and so,

Michael: I kind of forced us onto major versions of upgrades, I guess. Was there anything else you wanted to talk about on the minor version? Upgrade side?

Nikolay: Well, , that's it. I, I, my advice is [00:27:00] Checkpoint and that's it. I, I don't know what else to mention there. So

Michael: I don't, I don't think so either. Only the, the security patches come out, like, I think they come out every quarter unless there's a really big security

Nikolay: Yeah. There is schedule. Mm-hmm. , if you Google post's version, uh, policy or something, there is Wiki page, uh, describing the schedule.

Michael: I'll make sure to share that. and that they get back patched to old versions for about five years. Uh, little, little bit longer, I think, but not much.

Nikolay: And those, those uh, um, visits usually happen on Thursdays.

Michael: nice.

Nikolay: just

Michael: I didn't know that.

Nikolay: It's like, my, my favorite game is released on Thursdays as well. It's in virtual reality, and I think those people want to have Friday for, for to react to issues, not, not weekend, right? So it makes sense, right?

Michael: Fair enough. cool. Alright, so major versions. we've mentioned a couple of things already, PPG upgrade briefly. We talked about logical. do you have like favorites in different scenarios? How do you tend to approach

Nikolay: It's, like, let's [00:28:00] be brief because we are already approaching our limit in terms of time and, uh, this is of course a huge topic. A huge topic. And

Michael: I have a recommendation. It was it your, you did a panel discussion. Was it recorded? Yeah. Was

Nikolay: No,

Michael: ah, that was good. I, I think there might be, anyway, I'll look for that. If there's a recording of that.

Nikolay: No, it was not recorded.

Michael: Okay.

Nikolay: So, uh, yes, but there isn't some article, uh, with some summary, we, we can attach it. So if you, uh, want to upgrade with, uh, like right now, the standard defacto is using PPG upgrade. In place upgrade. It involves dumping schema, create creation of new, new, uh, cluster. Cluster means like p data directory, and uh, then dumping, restoring, schema, and then dash k.

speeds things very well, hard links, and then, uh, you already can open gates and [00:29:00] analyzing stages first. query plans, query execution plans will be not good. because statistics is not yet, yet collected. But, uh, analyze and stages starts from very rough statistics. Uh, low number of, of buckets, and then jumps, jump jumps.

And then finally you have, uh, proper. I, I would like to recommend looking at all nuances here, uh, how Alexander Kukushkin implemented this in Spilo part of. Qui, uh, operator for Kubernetes from Solando, and Alexander also is maintainer of Patoni. So Alexandro implemented it very well in Python, checking also things about extensions and so on.

And already, I suppose it already very well battle tested at Salono where they have a lot of clusters,

Michael: Yeah.

Nikolay: so, so they already, I think, upgraded I'm, I'm not a hundred percent sure. I'm almost sure they upgraded a lot of clusters. We can check with Alexander and Twitter. He's active on Twitter sometimes, so you can ask him [00:30:00] there.

So, Alexander, by the way, just side note, he's, he's not on at Salono anymore. He's working at Microsoft Patro recently started to support cs. It's interesting. But anyway, this is, in my opinion, very good implementation of automation around pja upgrade because PO selects a lot of things. To automate these things, you need to, you need to automate many steps.

It's not like a button and that's it, right? So, Unfortunately this is a problem. but then there is also approach involving logical replication, to. , minimize downtime almost to zero. Not really zero because you need to switch to, to switch over, to implement, switch over, to execute switch over. And during, during switchover, you'll, some of your users will see some error.

So it's not zero downtime. It's like, it's so called near zero downtime. and by the way, if before switch over, you also don't forget about check. , on older versions which require [00:31:00] restarts during chi and patron restart at all node during switchover. This is important. In this case, they happen faster and newer versions, uh, promotion is implemented better in, in a better way.

So, speaking of logical, usual approach is if you have not big databases, is to initiate logical replication to new. Basically data will be brought there at logical level. Think about it like dump, restore, blo, good side effect, right? But this also happens involving quite long transactional resource and sometimes this process cannot converge.

If you have dozens of terabytes under heavy load, it's very hard to use it. But for smaller cluster, You have logical replication, it's working and then you just switch over there. Good thing here that you can do many additional steps while logical is working. You can reindex some indexes, [00:32:00] fixing any issues, you can adjust some things compatible.

Of course, logical has huge list of limitations. Still they are being. in 15, post 15. A lot of improvements. In 16, there will be more improvements. For example, in general, you should think about sequences. Sequences need to be synchronized. Otherwise, after switchover you'll have overlapping usage of old values.

So, and basically insert won't work if you have some sequences and some old tables. So you need to reset value to match all, new value plus maybe some gap. If you use, uh, eight big in primary keys, you can afford making gaps like millions. Nobody will notice such big gaps because we, our capacity is huge.

and then you also need to think about, blocking DDL during this procedure for existing versions because d ddl. Logically replicated and several more, uh, [00:33:00] restrictions worth checking. But again, for logic clusters, this recipe doesn't work because we cannot simply cannot initialize such, using this logical level, there is a trick we can use.

Physical standby converter logical. Recovery target. Now we have logic, we have logical replication, and we can upgrade it initially and then switch over. Right. Thought I, a month ago, and I was mis mistaken because during running pja upgrade, logical replication is, uh, becoming inactive and we have some data lost there. So, so new recipe. Yeah. But there is new, by the way, there is a good discuss. in, uh, hackers, there is some hope that future versions of POGS will, allow you to use, uh, logical and together. Officially, what I'm talking here is some recipes from not a good life. You know, [00:34:00] we, we, we try to invent something for, heavily loaded big systems and there is such recipe.

We instead. Using logical conversing, physical to logical, and then running pitch upgrade. We just create a slot, allow physical to reach, recover, target, matching the slot position in terms of lesson, and then we don't, switch to using logical replication. We keep accumulating some leg on the slot, on the primary, all primary, and then we are ready around pja upgrade.

It's very quick, right? We don't analyze, we analyze later. And then already we switched to using Logical, already cholesterol already upgraded. So we just starting to use Logical. We don't have any data loss in this recipe at all. And then we can analyze, we have quite a good time to run, analyze, even without stages we can analyze it for our recovery target.

Yeah. Yes. During this, we keep using logical. The only, requirement here is to know a hundred percent that [00:35:00] Logical will keep. I, I mean, it'll not lag. And here also like there, it depends on workload. In some cases, wall sander is, is our, bottleneck because it's single. You cannot, uh, uh, perise this work, right?

If you have multiple wallers, they, all of them, they still need to pass whole wall stream. So if you had, a hundred percent CPU on one wall cell, 1 cent wall sender, you will have. Multiple Wallers using a hundred percent of multiple cores, not good, but on the recipient side, we can use the trick paralyzing work among multiple slots and have multiple while receivers and so on.

And in this case, uh, part of our tables, our processed by one logical replication stream, another part by others and so on. In this case, we can move. Keep like lower and uh, have again, neuro zero down. zero downtime. I'm great for large clusters. Final three, [00:36:00] the real here. If you can afford post resume, depending on workload, you might, may have real zero downtime upgrade, right? But I never saw this recipe described. We just like some secret with testings right now. Maybe we will publish some article in some future.

Michael: Definitely should. I see this come up from time to time, but honestly, most of the time I'm, people I talk to don't have. Zero downtime is like a really hard requirement. If you can afford a few seconds or here or there, or like a little window every once every year or every couple of years for this, the options become much, much simpler.

So I tend to see people go that route. So, but yeah, for, for huge systems, unload. I can see that. How That's awesome.

Nikolay: Right. And I agree, but sometimes we have a problem that, for example, if we have a small downtime, uh, it can cause requirements for additional downtime like window start to grow. [00:37:00] Starts to grow because you know, like if we need to stop it for one minute, we need to stop some other components. And when we already have 10 minutes requirement and then it grows to half an hour and so on, it's, it's not good.

But I agree, in many cases we can afford some brief downtime. In this case, a recommendation is to take care of your application code and make sure that, uh, data loss doesn't happen in case of when database is done. And all systems should be resilient to brief out.

Michael: Or even have like a read only or, yeah. Rero is a great, or have a read only mode where,

Nikolay: read only more pla plus retry for rights. This is the best approach, and this should be used by all backend developers in my opinion. But the problem is usually they have people like they understand it, but postpone, you know, like, it's like with C I C D coverage testing. We have good testing, but always room to improve and so on.

But in, in reality, you need to design your system for not to lose the rights. Uh, read only mode and test it regular. For example, SIHO. Okay, some rights failed, [00:38:00] but user sit didn't notice. Or they, they're told explicitly that the form was not safe or some data was not safe. Least they try explicitly press the button once, one more time.

The worst, the worst implementations we told user we saved it, but at the best lost it. It's bad. This is what can happen.

Michael: Yeah. Not good. cool. Alright. Anything else you wanted to make sure we covered?

Nikolay: Well, we didn't cover a lot of new, uh, small topics in the pja Great area. It's a big topic, how to benchmark properly, how to test it properly. It's a big topic indeed, but I think, uh, it's good enough for this shorty episode.

Michael: I think so, and we can, we, we'll link out to various of things we mentioned. couple of good resources on this, but hopefully given some people some good things to look into.

Nikolay: Yeah, and I hope, uh, future versions of POCUS will have better pja grid workflow, more automation around it, and so on. But, uh, unfortunately, for example, this pause resume. , which, uh, is [00:39:00] implemented in er. I don't have hope here because, uh, all attempts to bring polar inside pogs failed, unfortunately. So polar is considered as external thing, and, uh, we cannot achieve real zero, absolutely zero downtime using just possible score.

We need additional pieces, so it makes

Michael: I wouldn't be shocked. Yeah. I wouldn't be shocked to see this solved at the operator level or the, even the cloud provider level. Lots of cloud providers, like maybe they'll work together one day.

Nikolay: operators.

Michael: Yeah, exactly.

Nikolay: we do already this thing. Like I I, I'm laing in I a lot of interesting things happen in development of pogo separators for Kubernetes or Kubernetes separators for pogs. How, how to what, what's the right choice here? Uh, they automat more and more, and. Like, for example, I know Stu has a lot of automation including this, maybe not fully fully, but some parts of it, and they have like indexing, index maintenance automation.[00:40:00]

We had an episode about it and so on. So yeah, I think this is good, area to expect improvements from the developers of, uh, operator.

Michael: Nice one. Well, thanks again Nicola. Thanks everyone for listening and catch next week.

Nikolay: Okay, bye-bye.

More episodes

Chapters

Creators and Guests

What is Postgres FM?