A weekly podcast about all things PostgreSQL
[00:00:00] Michael: Hello and welcome to Postgres FM episode 65. Michael, this is Nikolay. Hey Nikolay, what are we talking about today?
[00:00:08] Nikolay: Hi, Michael, let's discuss backups, finally.
[00:00:11] Michael: Yeah, this has been on our list for a while, and I feel like this is a small win for me, because finally you picked the most boring topic, that we've ever had on the list, so yeah, I'm happy.
[00:00:23] Nikolay: Boring and also one of the most important ones, Some people, why they migrate, for example, for RDS from self managed Postgres. Very popular answer is, I'm afraid, uh, I don't want to be responsible for backups anymore.
[00:00:43] Michael: Yeah,
[00:00:44] Nikolay: obvious reason, right?
[00:00:46] Michael: and not just backups, right, like, just the
[00:00:49] Nikolay: of course.
[00:00:50] Michael: it, yeah.
[00:00:50] Nikolay: But backups is on the first place of the list.
Of course the list can be long. So, but backups, we are afraid to lose data. Of course, some projects are not afraid, like, some projects are not afraid to lose data and they value availability. More than, data loss, uh, risks. So, like, it's, sometimes one is more important sometimes to the other, sometimes both.
Actually, in most cases, both characteristics are important to be... 100 percent available, or like 5 nines, 4 nines, 3 nines available, and also, uh, don't tolerate any data loss, but I, I see there is a tendency to say, okay, we can be down for some time, but we cannot, uh, lose any data at all.
[00:01:43] Michael: Yeah, let's come back to that, I think that, like, that is definitely a really... Interesting discussion. I think we should quickly talk about why do we even, like, need, like, what are the basics here first? I think that's a good, like, trade off discussion, but what are the basics? Like, why, are backups something we need to care so much about?
Like, what are the main things we're protecting ourselves from or against?
[00:02:05] Nikolay: Right, so database system should not lose data. If we inserted something, it should be present there. And, the main internal mechanism not to lose data is write ahead lock, first of all. So, if we perform some write, it can be insert, update, delete. It first goes to this special... System, wall, write ahead log, and it's written there first, and not only written in memory, it's also written to disk, and if you have normal settings, if you committed your transaction, it means that the data is already on disk in wall.
As for the normal data files, tables, indexes... Not necessarily they received this change on disk, they received it in memory, but not synchronized to disk yet. This synchronization happens during checkpoints. But wall, it's definitely there. But unfortunately, sometimes we have... So, if we have a failure, database system knows how to...
Postgres knows how to reach... The old point of consistency, just replaying proper wall records since the latest checkpoint. And the bigger distance between checkpoints, the longer it takes to replay these changes, to redo these changes. so it means that if a crash happens, we are good, the database can recover itself.
But unfortunately, sometimes we lose the whole server.
[00:03:41] Michael: So, yeah, that's one reason, right? Yeah.
[00:03:44] Nikolay: Right, or something is broken, uh, with disk, for example, and we cannot achieve, consistency at all, and we cannot start, or something,
[00:03:53] Michael: So I think, you're listing the most, like, the thing people think about first, but that tends to not be the time I see backups actually used most often. It tends to be protecting people from themselves or like customers doing things that they shouldn't have done or didn't mean to do.
[00:04:09] Nikolay: that's a great point, and this explains why replicas, cannot serve, for backup recovery, for disaster recovery purposes, there is such term, DR, disaster recovery. It's like a set of measures we take to avoid data loss and so on. So, right, so, I explained only problems from hardware. You're explaining problems from humans, and they're also quite common.
For example, I deleted something and want to restore, but if you rely only on replicas, this change is already propagated on all those replicas, and it's hard to restore from there, right?
[00:04:47] Michael: And that, and I only mentioned humans that were doing things accidentally. And then we've also got to worry about people doing things maliciously, like, Ransom attacks, that kind of thing. There can be other, other reasons why backing up your data can be really important and, yeah, having a copy of it somewhere.
[00:05:03] Nikolay: Right, and in this case, not only you need to back up your data, you also need to store it in some... Second relocation, because there is a special type of attack when, uh, uh, if hacker steals access to whole cloud account, all backups are in danger as well, not only Postgres servers themselves.
And in this case, the risk to lose everything, including backups, is also not zero. And from this kind of problem, we can be only protected if we store data in some additional account and only few people can access that account. This is hard to achieve, by the way, especially, for example, if you are an RDS user, you cannot store backups on, for example, Google Cloud or Azure, because RDS doesn't allow you to access those backups directly.
Right? You can store only dumps. Maybe it's time to discuss the difference between dumps and backups.
[00:06:02] Michael: Yeah.
[00:06:03] Nikolay: So, yeah.
[00:06:06] Michael: What I, yeah, I was just gonna say, I wanted to make sure we didn't, like, move on without doing that. I think there is one more as well, actually, before we do move on, is that, that's, corruption. I think it gets mentioned a lot, but I tend to not see people... I don't tend to see it being the most used time for backups, so it's like, people mention, like, your data could be corrupted, but people tend to, like, move forwards from that rather than go backwards in my experience, at least sometimes, at least in the cases where the corruption is quite silent and hasn't been noticed for a while, for example, you don't know exactly when it, uh,
[00:06:38] Nikolay: There are mixed cases. For example, you cannot recover some table or some part of some table. So, you may be fixed it somehow with resetting a wall or something, which is quite risky. But if you have backups and you can perform point in time recovery... You can restore data that you lost, or you cannot read, and so on.
So, this, having backups is useful in this case as well. If you want additional reason why backups are useful, additional reason is, you can provision new, uh, nodes. It can be replicas, or it can be independent clones. You can provision, uh, these nodes from backup system, from archives, not touching, real...
Live nodes bec not making them busy with your cloning or provisioning activities. So this is one also a popular reason to have backups as a source of, for, uh, node provisioning.
[00:07:35] Michael: Yeah, good one. Right, so logical versus physical. And this is, this is similar to previous discussions we've had in their definitions, right? So, logical being, well, logical being taking it without bloat, maybe, is the simplest way of putting it. But, um, like recreate, recreate the objects from scratch, insert the data from scratch, uh, so recreating the entire database.
Whereas physical would be, at the file level, at the disk level, right?
[00:08:05] Nikolay: Block level, right. So right at file. Actually. Also, also, right? So if you. consider P Dam PGE store as a set of tools for backups, you are actually following official documentation because it says PG Dam is a backup tool. But, a lot of database people, including myself, are against this phrase because backups should have special characteristics.
and one of them is, uh, very. Like flexibility, right? So we can talk about RPO, RTO right here. RPO, RTO is two important characteristics for any backup system. RPO is recovery point objective, RTO is recovery time objective. It's maybe confusing, but one of them is about how much data you lose if you need to recover from zero, like from after some disaster.
For example, all nodes are lost. Whole cluster, primary, anomalies, replicas, all of them are lost. And RTO is how much time it's needed to recover. So, one is measured, for example, in bytes, another is measured, for example, in seconds. Good characteristics can be close to zero data loss, and, okay, like, roughly one hour for one terabyte to recover.
Maybe it's too much? It depends, right? But,
[00:09:34] Michael: I think.
I was going to say, I think the genius of whoever came up with these is it forces the discussion of a trade off here. I think if you ask business leaders what they want, they want zero data loss and super fast
[00:09:51] Nikolay: it depends. Ah, well, yes, they want both, of course, right.
[00:09:55] Michael: Yeah, they always want both, but this forces the discussion saying, well, which are you willing to let us have some leeway on here? Because zero data loss... Uh, in zero milliseconds is not possible, like generally, like, you know what I mean? It's so it forces that discussion or is extremely expensive to achieve.
[00:10:12] Nikolay: Right, right, right. For example, you can have quite frequent cloud snapshots for your disk, and, in this case, restoration. Well, it still will be quite slow, a lot of minutes, not hours, at least, right, usually. But, in this case, you lose some deltas between these points of snapshots. So... If we go back to PgDump, PgDump creates a set of SQL statements.
This is what it creates. You can speed it up to go with multiple workers, and then you can speed up recovery as well. For example, if you do it during some maintenance window, or in case of disaster recovery, you can go with full speed, for example, taking number of CPUs, or knowing how your disk is... How these capabilities look like and saturating these capabilities, but not significantly, right?
So, in this case, we have both characteristics in very bad situation. First of all, PgDump, you cannot do it very often. Especially if you have a large database, again, like roughly one terabyte per hour. Some people get more terabytes per hour, some people get less, it depends on your infrastructure, first of all disk, then CPU, also network, important network.
And second thing is, so you can do it only for example once per day, which is actually quite
often for dumping.
[00:11:47] Michael: but means we might lose up to a full day of data. If we, if we relying on that as our backup tool.
[00:11:54] Nikolay: why you cannot do it often? Also, let's mention it right away. So, if you do it often, you put a lot of stress. Each pgDump is a stress. We discussed it a few times. There is such setting, HotStandByFeedback. And in clusters where replicas are receiving read only traffic, usually hosted by feedback is on.
Meaning that replicas report to the primary, I need this version of data, this snapshot, these tuples. So when you run pgDump, it runs transaction. Or multiple sessions
a repeatable, in a repeatable read isolation level, right? It's not serializable at least, but it's already higher than default
read committed. And, uh, why is it needed? Because we need to get consolidated snapshot of all tables. All data and even, even if you do it in a single session, you still need it. If you, if you do it in multiple sessions behind the scene, it will tell all sessions to deal with the same snapshot to, to get, data in a consolidated form, right?
Uh, so, so consistent data in terms of, uh, version of tuples, refreshing consistency, everything. So, during this, uh, you. So, hold this snapshot, and this means that Autovacuum cannot remove dead tuples which became dead after you started. So become a problem to Autovacuum. And this means that even if we started on a replica, since it reports to the primary, It's bad.
So you can hold this snapshot for multiple hours, and during this we accumulate a lot of dead tuples, and later when we finish, Autovacuum will delete these tuples, converting them to bloat. This is how you can get a lot of bloat. Just perform frequent pgDump activities. That's it.
[00:13:58] Michael: So we can't perform frequent PG dumps. So therefore we have bad RPO, but we
[00:14:02] Nikolay: All
[00:14:03] Michael: have bad RTO
[00:14:06] Nikolay: Exactly, yeah, because not only we need to load all the data to tables, we need also to rebuild everything, first of all indexes, and also verify constraints, like we're starting from scratch, it's like, not taking our database, we're taking fresh database with the same data. This is what the Dump Restore process does.
It creates new cluster, completely new data. Not cluster, database. But with the same data. it verifies constraints, and it rebuilds all indexes, and it's very CPU consuming operation. Of course, IO also, but CPU as well. And if you have a lot of indexes, interesting thing that, of course, indexes are not dumped in terms of data.
Each index is just one line, like CreateIndex. Easy, right? But if tables are large, if you have a lot of indexes, it's a lot of work when restoring. And it means that time to recover will be significant. it can be even bigger in terms of time than data load itself. If you have a lot of indexes.
So, building indexes can take longer time than loading data. And, overall, it can exceed a simple copy at file level. That's why, Base Backup with Wall Replay might be faster.
[00:15:25] Michael: so I think you've nailed why, like, I think you've nailed something, but I want to add something quickly, which is I don't agree that pgDump is not a backup tool. I think it is, I just don't think it's a good backup tool. And I think that would be like an interesting addition to the statement. I think everybody that says it's not a backup tool would be happy adding that word in, but other people can't disagree anymore. It's just not a good one, it's definitely not the best one available to us.
[00:15:52] Nikolay: Well, yes, but, you know, like, the third law of Newton, if you apply force, there is opposite force. So, since the commutation is quite strong force, this statement is so, like, bold and written... Very clearly, we need to, somehow, to mitigate this, right? So, that's why we're trying to say... I agree with you, it's kind of a backup tool, not ideal, yes.
In some cases, it's fine for tiny databases. But if you have one terabyte already, it's definitely time to switch from... And I see so many projects, so many teams start with pgDump and then suffer. Thank you for watching! Open documentation, CBCAP tool, and they have a false feeling that they are fine until they are not, like terabyte or like thousands of TPS and maybe even hundred gigabyte is already not good. So, and I see some platform builders. Also make this mistake.
[00:17:01] Michael: Oh,
[00:17:02] Nikolay: I, I, I'm not going to, to
[00:17:04] Michael: Okay, good.
[00:17:06] Nikolay: but, uh, yeah, it happens.
[00:17:08] Michael: should probably move on, though, to physical. Like, what should people be using instead?
[00:17:13] Nikolay: Instead, we need to use physical backups. Unfortunately, if you are, are using managed Postgres, you just need to rely on what they provide. And it's a pity that, it's not a pity, I think it's a huge mistake, and, It's not fair that RDS and others don't provide access to backups. If those backups are regular Postgres backups, users should...
be able to reach them, to download them. But, instead, RDS and Cloud SQL and others they provide only access to at logical level. You can not get your database. You can only get data from your database. I'm trying to continue my, concept of, like, dump restore is like cloning your database. It's not taking your database.
Taking database means taking data directory. As it is, right? So, if you're working with RDS, they do physical backups, but only behind the scenes, right?
[00:18:14] Michael: So, yeah, this isn't something I realized, actually. So, you, you're saying, there's a couple of, a couple of tools for doing physical backups, for example, are pgBackrest or wargee. I can't use one of those to backup from RDS or the equivalents. Is that
[00:18:30] Nikolay: you can, you cannot, right, right. So, the basic backup is just copying data directory with, for example, cp, scp, rsync, doesn't matter. And of course, while you are copying the files, they are changing. And there is an internal mechanism to deal with this inconsistency. Basically, you can copy, but you will get an inconsistent copy.
And, for example, you can, again, you can perform a cloud snapshot of your... Disc again, it'll be inconsistent. I remember times when Google Cloud says they do consistent copy of disc. They, then they remove these words from their documentation. So, so internal mechanisms to deal with such inconsistency is two functions.
Pg start backup and PG start backup up. And you can do it on replicas as well. So you say pg start backup. You get a copy of your data trajectory. Anyhow, and then you'd say pgStopBackup. In this case, this copy of data directory in pgWall sub directory, there will be enough walls to reach consistency. So, yes, uh, data files are kind of from different times, points of time, right?
But when Postgres will start, it will know that it was a backup, it was, uh, a copy which was created. Between pgStartBackup, pgStopBackup functions, so there are walls and we can replay them and reach consistency point. Great. Again, it's possible to do it on replicas as well. But, of course, it's not super convenient.
It's like low level. And I remember I, like, quite experienced... DevOps guys had very big fears to do this approach. told them, if we say pgStartBackup, oh, by the way, super important, this mistake is very frequent. if you say pgStartBackup, you need to maintain this connection. if you close it... you'll lose proper files. And for example, if you, if you do, uh, full backup using JI or other tools, same, you need to maintain the connection where you start this. So if you close this session, when you say Stop Pgal capital report, that lost connection. So you, you, the, the like terabytes of data, you just backed up.
Cannot be used. So you need to be careful with this connection. And I remember how DevOps guys, like, had fears to do this, like, how come database is live, it's writing constantly, I cannot copy data directly, it's not possible. No, it's possible if you're between pgStartBackup and pgStopBackup, you're fine.
But of course it's low level, so to make it more convenient there is a tool called pgBaseBackup. Which automates a lot of stuff. And I also remember some time ago, by default it didn't copy wall files. It was, so many backups were created without wall files. If you don't have wall files to reach this consistency point, these terabytes are useless.
You can move them to a trash bin. you need walls. But right now, by default, pgBaseBackup will take care of walls as well, so you're protected right now. I also made this mistake, of course, not once. If you just follow, like in all documentation, you just follow steps, it's normal to forget about walls, and then a few hours later you have a lot of terabytes backed up, but no wall.
but it's not enough. Why it's not enough? Not only you need these walls to reach consistency point, we also want to have almost like close to zero RPO. It means that we don't want to lose anything. And if our backup starts at midnight, but disaster happens at 11 p. m. We want all 11 hours of data, right?
So, we need to replay all changes. Perfect way to replay changes is wall string. Right, so, so we just need to have, okay, we perform full backups, for example, daily or weekly, but we also need to write all walls, and for this, natural approach is to use Archive command. Most backup tools use Archive command.
only one tool I know which use different approach was Barman. I think it's already using both approaches. But, originally, Barman... I used a different approach, was like a physical replica, I'm receiving changes on streaming replication, yes, and I save this data. It's also an interesting approach, but Archive Command is kind of more reliable and it's standard de facto.
So, Archive Command just archives all walls, usually 16 megabyte files. On RDS I think it's 64 adjusted, you can also adjust it, but the default is 16 MB, and each file is just compressed, usually, additionally, so there are two things, two kinds of compression, I will describe if it's interesting, but in general it's usually compressed to some smaller megabytes and archived, so if we have whole stream of walls, Without any holes and, like, overlapping or something, if it's consistent, right?
We can replay until the latest available point, which means we can achieve near to zero RPO. But it will take time, increasing our RTO, which is interesting. And replaying might be quite slow. Usually, it's slow if your shared buffers is low. For example, sometimes we have additional replica with smaller... , number of gigabytes of memory, so we put fewer, we allocate fewer gigabytes to shared buffer and observe how replaying is significantly slower than original, so we become, we have a lag accumulated.
But without lag, very rough rule is, like, usually it doesn't take to, to replay doesn't take longer than originally. It was, right? It should be much faster. So if we need to replay 11 hours of work, and of course we will be replaying less, because some hours, it depends on the size of course, but some part of it will be replayed while we reach consistency point.
Related to full backup. But then we need to replay more and more and more, and of course it should not take longer, usually a few times faster, but still a lot. Maybe 10 times faster, it depends on data nature, hardware, a lot of factors.
[00:25:29] Michael: So what are our options then for reducing that art like that RTO?
[00:25:34] Nikolay: It's a great question. So, some tools also offer Delta backups. Instead of replaying changes as... Like, replaying changes means like we are kind of becoming a replica. Temporarily. It's the same as having a physical replica based not on streaming, but wall shipping. We just ship not from server, we ship from archives. By the way, I forgot to mention, usually these days we keep backups on some object storage such as S3 or Google
[00:26:10] Michael: Mm
[00:26:10] Nikolay: GCS or something like that. so, if you... Replay? It takes time, but there is an idea to have delta backups. Instead of performing full backup, we consider previous backup as the basis and just backup difference, delta.
hours we have full backup. And every four hours we have Delta backups. Or every week we have full backup, and every day, or every six hours we have Delta backup. why we still want to have full backup? Because maybe we don't trust fully to Delta backups, they are quite fragile.
But these
[00:26:51] Michael: and you need a full one, you need a full one at some point, right? Like there's, yeah,
[00:26:56] Nikolay: Right, and then you need to apply multiple, multiple deltas. And then I apply additional walls between deltas, we still have some distance in time, right? So it's like three, three stage process. Full backup, applying deltas, and then, applying wall, walls, right, right.
So this can speed the process up, definitely. And also reduce space, because if you perform full backups every day, It will cost you,
in terms of object storage price, price, price. So,
[00:27:29] Michael: Yeah. so what are, like, what do you tend to, I think you've said previously, you tend to use wargy for like the management of these. For backups, different.
[00:27:39] Nikolay: let me finish with, I just, realized, there is also obvious idea that, and I know platforms, they use this approach themselves, instead of restoring full backup, base backup. From archive, which will take time, we can just restore snapshot of disk and then apply VALS. And this can be super fast, minutes, for many terabytes it can take minutes, AWS, GCP, they all offer snapshots of disks.
The restriction here is that... You can restore only to a new disk usually, you cannot restore, for example, two backups on one disk. You need to create a new disk and restore from this cloud snapshot. But it can be super fast and unfortunately, for example, Doesn't support, uh, I cannot say, you know, I have snapshots, let's consider them instead of full backups, let's consider snapshots and just perform all orchestration around it and just apply walls and so on.
I'm not sure about pgBackrest, I also
think,
it's, it's so, yeah.
But the idea is restoring full backup and deltas as well, from zero, from scratch. It takes a lot of time increasing our RTO, and we need to move faster, because if it's a disaster, our service is down. In this case, cloud snapshots and restoration of disk, full disk, from such snapshots, it's good.
So, I mean, we can consider them instead of full backups, and then just perform point in time recovery to the latest point using wall replay. Restore command is responsible for this. We just configure our new fresh Postgres to continue. Restoring, replaying changes from archive and then we reach the latest point.
In this case, you can achieve quite stable and like kind of predictable RTO, which will be almost like, like dependency of this RTO on the database size will be minimal, because well, it will be, but it will not be such. Significant as for restoration in traditional way from full backup. So, I mean, you have like 50 terabyte database.
It takes the, like, I dunno, like 15, 20 minutes to restore from cloud snapshot. And then you just need to replay, uh, it depends on how active this database is during a day, if you need to replay up to 24 hours in the worst case, for example, if we, if we take snapshots only once per day, actually we should.
Snapshots more often, uh, in this case, uh, 24 hours, it's quite a lot, but maybe it'll take a couple of hours if, uh, hardware is good, right?
[00:30:39] Michael: Yeah. Interesting. It does, like, for smaller databases, though. You said it's not that dependent on database size anymore. I would guess for smaller databases, it would still be significantly faster.
[00:30:53] Nikolay: It can take a few minutes. For example, if you have one terabyte that you can restore in, in a few minutes, uh, from, and you, for example, one terabyte every four hour we perform full snapshot, um, cloud snapshot. And we, for example, have, I don't know, like, half a terabyte, no, that's too much, 100 gigabytes per day worth of wall data.
It's good to know, by the way, the distance in terms of bytes between two points, like, how much during busy day, how much of data is written to wall. And you don't, you can expect backups, in this case. And, for example, barman... Reported just number of gigabytes, it was good. Volji doesn't report, but it reports LSN positions.
If you take two LSN positions, you can just subtract one from another, and difference will be in bytes. And then, yeah, yeah, and I do it all the time. So you know the difference, you know, okay. I generate 100 gigabytes per day. Replaying 100 gigabytes on quite good software, it will take dozens of minutes, maybe just minutes.
In this case, our RTO will be, we can achieve RTO below 10 minutes in this situation, which is good. And almost zero RPO. I just explained recipe for good characteristics of backup system.
[00:32:14] Michael: Yeah. Nice. I feel like there's one thing we haven't talked about yet. I feel super guilty not mentioning in an episode on backups and that's testing your backups,
[00:32:25] Nikolay: Right, so we need to remember two mantras about backups. Pagedump is not a backup tool and you didn't
convince
me.
[00:32:31] Michael: not a good backup tool.
[00:32:33] Nikolay: Okay, not a good backup tool. Okay, good. And second mantra is, uh, uh, not tested backup is a Schrodinger backup. Uh, it means it's not backup as well. So we don't know. The state of this backup is unknown.
Even if... The tool which performs full backup exited with code zero, meaning no errors. Who knows, is it restorable?
[00:32:57] Michael: Well, and this isn't just a philosophical question, right, like, there are real things that could have gone wrong, right?
[00:33:03] Nikolay: Yes, so, my personal story is I was reading Hacker News in 2017, in February, And I was reading about GitLab case, of course. They lost a lot of data and they had many problems. backup methods applied, but poor testing of backups, and they lost data.
So this, there is a good, not super accurate, but entertaining YouTube
video about that situation in 2017, which explains it like in a very entertaining form what happened. But, again, like, kudos to GitLab team. Many years they share everything with community, with wider community, not only those who work with Git or Ruby, but also Postgres. And I remember I was super curious to know about details and to learn from mistakes, of course.
And then, uh, this is actually how... So, we started to work with them and the backup system remains to be super important focus not to allow any issues anymore. So testing is super essential and ideally you need to test every full backup and not only every full backup but also whole wall stream. So, you need to verify that you can restore from backup, that Postgres starts.
This is like minimal verification. Postgres starts and can receive connection. Ideally, you should check data which is there, right? Reading some data. Some people tend to dump to DevNow to verify that all table data can be read. Uh, p dump to death. Now it does mean index data can be read, for example, it's indexes can be corrupted and some people u tend to use, uh, arm check.
And we discussed, , how to deal with, corruption risks. So arm check is good to PGM check with, , dash J as many vcps you have because this machine probably is temporary also to, check that index data. can be read, but it takes time as well, so it's expensive to perform such operations, maybe not every time you want to do it.
But also you need to verify all walls, and ideally you restore from one backup and replay all walls until the point next backup, right? This is whole verification.
[00:35:38] Michael: that's, yeah, I think it's worth saying that the re like, you're testing for the reasons it could have not worked, right, like, you're checking that it's not empty, or that it's not you might want to
[00:35:49] Nikolay: It's
[00:35:49] Michael: recent data.
[00:35:51] Nikolay: restoreable. First
[00:35:51] Michael: so you've checked that it's restoreable because it hasn't fa the restore hasn't failed, so that's a that's a great starting point.
You're checking it's Checking if it's not empty, like that's a potential failure, maybe you're backing up the wrong database within your, maybe you're backing up one of the, you know what I mean, there's so many things that could have gone wrong. Checking if it's not empty, I like the idea of checking for recent data,
[00:36:12] Nikolay: Right.
[00:36:13] Michael: making sure it's not an old backup yeah, exactly.
so like, there's a few kind of smoke tests you can do. I think, I've seen some people doing counts of tables, but like, I don't think you need to do anything. Like, well, you can do as much as you want, obviously, maybe the more the better, but there's a limit to how much we want to do, I guess.
[00:36:33] Nikolay: Timestamps. Right. So, we can talk about two classes of verification here. First is infrastructural class. So, Restorable and Wall, WallStream is fine. We can replay until next backup. And actually, we can skip the second part here if you have... A replica which you monitor well, and it's based on wall shipping.
If that replica is fine, it's constantly replaying from backups, it means walls are fine. But if you don't have such replica, you need to do it during your backup verification, it will take time again to replay all walls. But you will know your RTO better. From such logs, right? And second layer is logical.
This is what Yur describes. Timestamps, counts, we can read data, and so on. So, arm check, corruption, logical layer.
[00:37:23] Michael: Last question, you mentioned it's difficult to get physical replicas from clouds. How do you tend to do
[00:37:32] Nikolay: It's impossible, not difficult, it's impossible.
They don't provide access to pgData, they don't provide access to backups, and they don't provide physical replication connection. Some of them do. Crunchy Bridge. They provide access to physical backups. This is a great example. So, when we say... This is your data.
You can take it. Well, yes, but I want, I not only, I want my data, I want my database. If I pay for it, I pay bigger cost than to issue two. Of course, I pay for, for backups to work. H ha everything. I want my database. If it's SREs, gimme, gimme this, . I want data director and I want backups.
[00:38:12] Michael: I was wondering if... doing logical replication to maybe a self managed instance and then backing that up with a fit, like, I'm wondering about, like, having a secondary one off site, maybe that could, it's not perfect because you've got the lag, the logical replication, lag. But, it's better than not having one,
[00:38:31] Nikolay: Well, again, we discussed there are two types of problems, again, physical and logical. Physical, like, for example, something happens with hardware, logical, human mistake or intentional attack and deleted data. If it's logical, replica without lag, and the change is propagated there as well, and so you cannot recover from replica in this case.
[00:38:55] Michael: Sorry, I'm talking about having a logical replica and doing a physical, backup from that, like having pgBackgress running off your logical
[00:39:02] Nikolay: Ah, well, yes, it's possible, but again, it's, same data, but different database. I mean, you will not get your bloat. Sometimes we want to get our bloat because, for example, for some kind of research, if some incident happens on production, we want carefully, research what happened. Perform root cause analysis.
In this case we want exact copy of our data, exact clone, including all the bloat, how data is distributed in tables and indexes. And logical replica has different distribution of data.
[00:39:36] Michael: Yeah, all I was thinking was maybe you might be able to achieve lower RTO, RPO, on a cloud system, even in the event of a major outage on that cloud, for example.
[00:39:48] Nikolay: Well, ideally you should use what provider of managed service provides you. For example, RDS, they have good reliable backups, I don't see details about testing of those backups, I cannot test them myself except restoring to our, some kind of explicit restoration and running some tests, but we kind of trust them, because if these backups were not reliable, we would see a lot of...
Blog posts, tweets, LinkedIn messages about it, right? We don't, so they are reliable, they have large scale, right? But I don't like it's not open source, like it's kind of they say it's Postgres, but I don't have access to database, I have access only to data. As I say, heavy clouds are gathering upon open source, you know?
[00:40:36] Michael: It's the cloud pun intentional.
[00:40:39] Nikolay: Right, right.
So, quick mentioning, some people do, some people who control their databases, not only data, they also consider using delayed replica, because physical replication allows you to having a replica which is like 8 hours behind, for example. And this, is exactly for the reasons of faster recovery of some...
Like, something was deleted, let's replay only a few number of minutes or hours quickly. If we do it quickly, because if we're already past, like, something happened, but we learned about it already 10 hours later, but our delayed replica is 9 hours behind. It's already too late.
[00:41:25] Michael: Or even if you find out about it like 7 hours later, you're on like a stopwatch for, you've got an hour to fix this problem.
[00:41:31] Nikolay: Right, right. But you can quickly pause
[00:41:33] Michael: Oh, true.
Yeah, yeah.
[00:41:35] Nikolay: Yeah.
[00:41:36] Michael: Good point.
[00:41:37] Nikolay: Stopping replay. And then you can, you can know when exactly the problem happened and replay exactly like one second before it, for example, and get, extract data manually. It happens. Sometimes it's a good idea to have such tool. But, for example, database lab users.
They have even better. They can provision, clone, and there is a recipe to replay walls. If a database lab engine is working based on physical replication, walls from archive, for example, you can create a snapshot which corresponds to time before the point of incident. Then you can replay, again, from point in time recovery.
There is a recipe. It's not fully automated, but it's definitely possible. And then you can... Achieve very, very good RTO in this case. This is for partial deletion, when manually we deleted something. good.
[00:42:37] Michael: Cool.
[00:42:38] Nikolay: I think we covered all main principles of backups. It's not super deep material, but at least it's something. Maybe one day we will discuss more deeper things, you know.
[00:42:54] Michael: Or specific tools, maybe. awesome. Well, thanks so much, Nikolay. Thank you everybody for listening, and catch you next week.
[00:43:01] Nikolay: Good, see you later, bye.