Postgres FM

Nikolay and Michael discuss PostgreSQL subtransactions — what they are, and four issues they can cause at scale. 
Here are some links to things they mentioned:


What did you like or not like? What should we discuss next time? Let us know via a YouTube comment, on social media, or by commenting on our Google doc!


Postgres FM is brought to you by:
With special thanks to:

Creators & Guests

Michael Christofides
Founder of pgMustard
Nikolay Samokhvalov
Founder of Postgres AI

What is Postgres FM?

A weekly podcast about all things PostgreSQL

[00:00:00] Michael: Hello and welcome to PostgresFM, a weekly show about all things PostgresQL. I am Michael, founder of pgMustard. This is my co host Nikolay, founder of PostgresAI. Hey Nikolay, what are we talking about today?

[00:00:11] Nikolay: Hi, Michael. Let's talk about sub transactions and be very short with this. This is a super simple topic. Because I have , super simple rule, just don't use sub transactions unless you're absolutely necessary. And I'm going to describe details, but not going too deep. Let's keep this episode short,

[00:00:31] Michael: Sounds great. and also it's Thanksgiving. So hopefully people,

[00:00:35] Nikolay: Right, exactly,

[00:00:36] Michael: lot of our audience are listening, uh, spending a lot of time with their family. So happy Thanksgiving to anybody that celebrates, and hopefully you'll be thankful for a short episode.

[00:00:44] Nikolay: right.

and I wanted to thank you, of course, for doing this with me so long.

[00:00:50] Michael: Oh, likewise, it's been, it's been a pleasure. Where should we start then?

[00:00:55] Nikolay: So, let's start with the idea of sub transactions very briefly. The idea is, if you have a complex transaction and you want to handle some kind of failure, like for example, a statement might fail, but you don't want to lose the previous statements already completed in the same transaction, or you want, for example, to explicitly...

Undo some steps, right? it might be, actually, multiple statements you can return to previous point. So you can use sub transactions, also called nested transactions, or in some cases, although it's not very nested, but kind of, okay, it's nested. Nested transactions also, there is a SQL standard keyword I think it's SQL standard, right?

Safe point.

[00:01:49] Michael: Oh yeah, I don't know if it's standard, but I think it is because I read the Postgres docs for this and they mentioned one tiny part of it. That's not standard compliant. And so therefore I assume everything else is.

[00:02:01] Nikolay: Right. Yeah, I think it's standard. Yeah. What was that part? I also remember, but I don't remember which

[00:02:06] Michael: Oh, it gets into the details, but it's around the naming of sub transactions, so you can name them so that you can then roll back to the name, and it's, if you use the same name multiple times, the standard says you should, uh, destroy previous ones, whereas the Postgres implementation allows you to roll back to one with the same name and then roll back to it the same name again, but that rolls back to the previous one with, anyway, hopefully that helps. Made some sense.

[00:02:33] Nikolay: idea is nice, right? If you have really complex transactions and you expect that you need to roll back sometimes, you just put some safe points in various places. And you can return to them, and you can imagine it, like, there are two ways to imagine it, I, I'm not sure both are correct, let's discuss it.

So one is like, we have some flat, picture and save points, and we can jump, and another is nested, and I think nested is more, like, it's more correct. although it, uh, if you look at code, usually it's, it doesn't have... An indentation, right? It's like flat, vertical, semi indented, lines of code safe 0.1, SafePoint, like seven point name, one safe point, name two, and some statements between them.

And you can, uh, roll back to pre, to previous, one of previous SafePoint. But internal there is nesting there. Applied, because when you return to some older safe point, you lose the deeper, like, newer, fresher safe points, because they are internally implemented, like, kind of deeper. And one of the problems we will discuss, it's related to this level of nesting or depth of this nesting, right?

[00:03:49] Michael: Yes. And I think on the, just on the basics, I think it's worth mentioning that even if you know that you haven't used save points anywhere in your code, you could still be using sub transactions without

[00:04:01] Nikolay: Exactly. Yeah, all it can call it nested transactions. So like, I think nesting is the right concept here. Although again, like, it looks like not nested, but it's nested. And I think Django, they they use nesting word and they Django and maybe some other ORMs will provoke you using this, to use it, like, implicitly.

So you just do something and you don't know, but you use it. Also, PLPGSQL will provoke you to use it, because if you, at some point, if you use a lot of PLPGSQL code, as I do, at some point you will want to have beginExcept and endBlocks, except when, blah, blah, blah. So, just to handle some... Feathers, and this is 100 percent already sub transactional, because there is another begin block which defines the main body of the PLPGSQL function, right?

[00:04:59] Michael: that's the big one, I think, so the ORMs can be using it behind the scenes without you realizing, and if you've got any exception handling in functions, whether that's functions or triggers, like, are using sub transactions, or you might be using sub transactions.

[00:05:14] Nikolay: right, any exception handling, or maybe the top level exception handling at, is it possible, like, say, begin, blah, blah? We always have begin end block for like top level, right? We can put exce...

[00:05:27] Michael: or, oh no, you mean in a function.

[00:05:30] Nikolay: We can

[00:05:31] Michael: I think exception, I think as soon as you're using the exception clause, you are using,

[00:05:36] Nikolay: Yeah, I think so as well, but I'm just curious. Okay. But definitely if you use additional begin exception end blocks inside the body of the lpsql function, it's already a sub transaction for sure. So, it's an implicit sub transaction. So, what are the problems with them? Let's be short, as we promised.

The very first problem is that we have still 4 byte transaction IDs, 4 4 billion, 4. 2 billion space, and we can use only half of it, because one half of it is the past, one half of it is the future. So there is a transaction ID wraparound problem. What happens after 2 billion, 2. 1 billion transactions if, freezing didn't happen, if you blocked Autovacuum, for example, with long transaction or something, or just turned it off, but it should wake up even if you turned it off.

Okay, so we have four byte. Transaction IDs and to optimize how these IDs are spent, there is also the concept of virtual transaction ID. So if you just run select or some read only transaction, it won't increment the transaction ID counter, right? So you won't waste XID value. if you use sub transactions, a transaction with a couple of sub transactions is already free.

[00:07:02] Michael: Yeah. That's the big issue, right? Uh, well, not the big issue, but in this

[00:07:06] Nikolay: a big issue,

but you move faster. You start spending these quite, like, not cheap XSEDEs faster than if you didn't use subtransactions. So, but also there are multi XSEDEs, it's a different topic, right? So this is not, I think it's not a super big topic, it's just worth understanding that we move faster if you use subtransactions quite often.

[00:07:30] Michael: Cool. Let's move on to problem two.

[00:07:32] Nikolay: Yeah, second problem is this nesting. Nesting level, like depth of nesting. There is a cache, and if you reach nesting level 64, I don't know who does it, but this is quite well documented in mailing list and other places, quite well documented overflow problem.

[00:07:53] Michael: Yeah. So we're talking about doing 64 levels of, that nesting depth. I did see a blog post. Was it by. It was on the Cybertech blog, I think by Lawrence. Was that on this? Yeah. So I think it's quite easy to demonstrate this problem in

[00:08:10] Nikolay: and we had benchmark for it, synthetic benchmark, we did it. It was easy, so you just Too many save points, and you reach the level 65. This is cache per backend, so it's like local cache, and in this case, degradation is obvious.

[00:08:25] Michael: Yeah. So it's a, not a hard limit, right? It's just a, you pay a performance penalty at that point. And I've never seen this in the real world. Have you, have you ever seen a case



[00:08:34] Nikolay: only in synthetic

[00:08:35] Michael: Yeah. Okay. Fine.



[00:08:37] Nikolay: so not in production. And, , it's not a hard limit in terms of the problems it gives, but it's a hard limit. You cannot change it. I mean, it's a constant in source code, so it's hard. But it's not huge. You can go there, but probably don't want to go there. So a third problem is related to multi seeds, this additional space of transaction IDs, multi transaction IDs. Multisect ID, so like, actually, I see it's not very well documented, these things, for example, multiseed, the wraparound also might happen. We don't see big blog posts like from Sentry or MailChimp as for regular transaction IDs. But it still can happen. So documentation doesn't have good words, like details about how to properly monitor it and so on.

It's possible. Anyway, multi exec, it's a mechanism, it's like different space, and it's a mechanism when, when multiple sessions establish a lock on the same object, right? For example, like share lock, for example. For example, when you have foreign keys, you You deal with multi exact, like implicitly again, so Postgres establishes multiple locks on the same object, and exceed value is assigned.

And there is a problem, like this is a very good blog post from Nelson. L H? L H A G? How to pronounce? You

[00:10:08] Michael: I have no idea, but Nelson seems like a bet for the first name. We'll link, I'll link it up in the show notes, of


[00:10:14] Nikolay: Right. This is excellent blog post from 2021. I'm not sure which company it was, but the blog post explains several problems actually as well. It explains like, there is a well known problem. Honestly, I must admit, I didn't know about that problem as well, but the blog post assumes everyone knows it. Okay.

So the problem, it's not related to sub transactions, but there is a problem with select for update. So, there's like performance cliff, this blog post explains it, so once I read it, I started like, be very careful when recommending to use select for update to, parallelize, for example, queue like workloads. And

[00:10:55] Michael: even select for share, right?

[00:10:57] Nikolay: select for update. This is, it's not related to sub transactions. We start from there. So it says, there is a performance cliff related to select for update. Okay, it explains details, blah, blah, okay, but, and now I understand, okay, select for update, we should be very careful with it. So, and that's why I became a bigger fan of single,

Session consumption of queue like workloads, because usually with single session you can move very fast. Because once you start using SelectForUpdate, you can meet this performance cliff, of course, at larger scale, like you have many thousands of TPS, for example. So, okay, forget about SelectForUpdate, then the post explains SelectForShare.

We, we, we're, oh, sorry. Sorry, I... I may be messed up with this. You think SelectForShare is the known problem and SelectForUpdate is where sub transactions come to play with, uh, yeah, so, yes, sorry, sorry, sorry, yes, so SelectForShare is very well known problem as blog post explains, but then if we use SelectForUpdate and sub transactions...

We have similar problems as with select for share. I'm, I'm very sorry.

[00:12:11] Michael: No, no, no.

[00:12:13] Nikolay: So if, if you use select for update, for example, QQ like workloads and you have multiple consumers and you don't want them to fight with a local acquisition, so you use select for update and also you use sub transactions, you might meet a similar performance CL as for select for share

So anyway, uh, watch out. be very careful if you use a lot of like thousands of TPS. Be careful with both SelectForShare and SelectForUpdate. This is like, the explanation. Again, I'm, I'm, like, making mistakes here because I didn't see it myself in, in, in production. I saw it in synthetic workloads again, but not in production.

That's why I already keep forgetting, forgetting. What do you think? About this problem, particularly.

[00:13:01] Michael: Yeah, you've made me question which way round it is, and I'm not even sure if, uh, uh, worth rereading this if this is something that you make extensive use of at high volume and want to be aware of, but if neither of us have seen it in the real world, especially you, it feels to me like this is not the one most likely.

Like, we've, we've done three issues already, and I think it's the fourth that's going to be most likely to actually affect people and most interesting. Yeah.

[00:13:26] Nikolay: not sure, because again, this post by Nelson explains. If you use Django, for example, and uses nested transactions, you have save points, and then you think, oh, it's a good idea. I have parallel workers consuming, like, fighting with the same rows to consume, like. Q like workloads or like lock them, I'm going to use select for update and here you have the similar performance as they explain for select for, for share.

But again, it happens only at very, under very heavy loads, like

thousands of TPS

[00:13:57] Michael: Yeah, and one point they made in fixing it, uh, was they changed. Yes, for Django, uh, was there was a, a parameter they could change that was the past save point equals false,

[00:14:10] Nikolay: Transaction Atomic, yeah,

[00:14:12] Michael: which completely fixed the issue, right? Like

[00:14:15] Nikolay: yeah. Disable save points, exactly.

[00:14:18] Michael: it's great that people can do that. And it's really interesting that that's not the, it's not the default.

[00:14:23] Nikolay: Again, we fix problems, getting rid of sub transactions.

[00:14:27] Michael: and they actually, before we move on from that post, they included one more line that I really liked, which was them talking to somebody else who they considered an expert. , on the Postgres side and, uh, they quoted sub transactions are basically cursed. Rip them out. That was, yeah, cursed or cursed,

I guess.

[00:14:46] Nikolay: now, now I'm the, like, the, the, like, um... How to say, I'm, I'm bringing these thoughts to people and sometimes I, I see pushbacks like, uh, no, no, no. Like, we're still good. Well, sorry, . It's not only my thought. I, I inherited it from this post as well, but, so let me explain The we what we saw at GitLab

[00:15:06] Michael: Yeah, this is problem number four.

[00:15:09] Nikolay: T four and I, I saw it myself and.

Help troubleshoot, and they had also a great blog post about this, and this led to a big effort to eliminate sub transactions in GitLab code. So, there is another cache, which is kind of global, not per session, and it has also a threshold 16, hard coded in source code, there is a constant 16. And, when you reach, Sixteen sub sub transactions, on replica, uh.

So, and you have a long running transaction, different one, some, some select, for example, or just somebody open transaction and sitting, not consuming ACID anymore. On the primary or on all replicas, at some point, you will see a huge degradation and some sub subtrans SLRU contention. Lightweight log, if you check, so it looks like a huge spike in pages activity of active sessions and these active sessions are sitting in WaitEventSubtrans SLRU.

Of course, it happens again under heavy load, like thousands of TPS, so you won't see it if you don't have a load enough. But it's a very hard problem to overcome, so it's like... And also, worth to mention, it's great that you can see it in wait events, uptranslates LRU. Also, very great that, Amazon, , RDS guys documented, uh, wait events.

They have best, wait event documentation, Postgres itself doesn't have. this level of detail for wait events, so I'm constantly learning from those documents. And also, if you have Postgres 13 or newer, there is a new, um, system view, pgStatusLRU. When we saw that GitLab, unfortunately, it was an older version, so I was thinking, oh, so, so, so sad I'm not having this, view, because it provides a lot of content.

Good counters, and you can see, okay, events of trans SLRU is popping up there, so you, this helps with diagnostics. after spending some offers, we replicated this problem in synthetic environment. And discussed it, and the conclusion was we need to eliminate sub transactions. It was, again, as I said, a big effort,

[00:17:27] Michael: I found that really interesting, by the way. I found it really interesting why reducing sub transactions wasn't enough. And it was really a case of actually eliminating them.

[00:17:37] Nikolay: Yeah, well, sorry, I said 16, 32. Here, like, 64 is for this local cache, and here 32, uh, this limit of, uh, SLRU buffer size. So, yeah, you can reduce them, but it will, it will happen only. At some extent, the problem is this, uh, long-lasting, um, uh, redundant transaction on the primary. It can be like you, you have this, and also the level of, , seed consumption also.

Right? And if you have high TPS. The higher TPS you have, the shorter this long trans read only transaction needs to be to. So you can hit, or your replicas hit, hit this performance cliff. So for example, if you have lower TPS, you can still still have this problem, but you'll need to wait more with this long running transaction.

[00:18:34] Michael: You will only, you will only hit it if like the queries, let's say, Dozens of seconds or minutes, or maybe even longer. Whereas you, when, I think you showed nicely in your blog post, or maybe it was the GitLab one, if you ramp that up enough, this could even be achieved within seconds. Like I think the synthetic one you, you were benchmarking was like 18 seconds or something, but it was quite easy to see that this could be low numbers of seconds as well, quite easily.

[00:18:59] Nikolay: yeah, this, like, we thought about, like, maybe we should just, control long running transactions on primary and, uh, be more, , proactive, for beating them, but six, like, seconds, right? It's, like, already too short.

[00:19:12] Michael: The reason I brought, the reason I wanted to hammer that home is I don't think most people think of two second queries when they're talking about long running transactions,

[00:19:19] Nikolay: Yeah, long is relative,

[00:19:21] Michael: Yeah. True.

[00:19:22] Nikolay: And so also if you don't use replicas for read only traffic, there is no problem here. So it doesn't happen with primary. It was a difficulty when we needed to troubleshoot it because with single node experiment you cannot reproduce it. You need two nodes.

Replica, to see how it works. But again, we have publicly available reproduction in synthetic environment, so you can play yourself and see how it hits you. There good attempts to improve it, also Andrei Borodin involved. This guy is everywhere. When I have some problem, I see some effort already, he participates.

So there is a, effort to increase the salary buffer size and also change algorithm. Search algorithm there and so on. It's interesting. I hope, uh, eventually it will make it into source code. And finally, my approach is let's eliminate subtransactions altogether. , if you aim to grow till, like, Many thousand TPS and so on. But I think we need to eliminate the source code of application because, like, we have less control on it. But sometimes I still use sub transactions, for example, when deploying some complex

[00:20:40] Michael: interesting.

[00:20:42] Nikolay: right? Because when you need to do something, and then you need to perform some... Additional step and this step might fail because you are gentle. I mean, you, you know, you need to acquire a log, but you have a low transaction, a low, log, time out and retries you can do this retry. At higher level, but you will need to redo all the steps before it, right? Or you can retry here with sub transactions.

[00:21:13] Michael: and crucially you want a transact, like the transaction's giving you a lot of benefits that it will roll back if it fails, ultimately. Like that, so the transaction itself, the, the top level transaction is really valuable. So you, you

can't, you can't, get rid of that.

Like you need, you can't do it in steps. You need to do it all or nothing.

[00:21:32] Nikolay: right. For example, you have some table, you put some data into it, and then you establish foreign key, and you need to establish a foreign key, you need to retry some stuff. But we also need to... To make sure these retries don't last long, if you don't want to keep all the logs you already acquired so far, because logs, the rule, remember, the rule, logs are, like, are being held until the very end of the transaction, rollback or commit.

They cannot be freed at the middle of the transaction, right? This is impossible, so you should be careful with all locks already acquired. So yeah, here we can use certain actions, retry logic, and then you just deploy it when we have lower level of TPS, and we definitely don't have long transactions. Read only transactions or other transactions on the primary. This is the key. Yeah, yeah. So, it's an interesting and it's like story of two years old already. but it was super interesting to see these things. And I, last probably my last words like, I'm very thankful that it's implemented to Postgres and I hope it will be improved.

I mean, I'm not... An enemy of sub transactions, like some people might think. I think it's a great feature, but we should be very careful under heavy loads.

[00:22:48] Michael: Yep, makes a lot of sense. I, there's a couple more super minor things that I wanted to mention before we finish and that's the Subtrans SLRU. in Postgres 12 and older. So I know Postgres 12 is the only older version that is still supported, but, it was, it had a different name, right? Subtrans control lock, just in

[00:23:06] Nikolay: Right, so it was renaming happened.

[00:23:09] Michael: Yeah. And the last one was the GitLab. Blog posts ended on really positive note that the reason they were able to diagnose this with you with some help from other people internally was that Postgres source code is well documented and that the source code is open. And this would have been a really difficult issue to.

Diagnose on a closed source system anyway, I thought that was a really nice point that even though Postgres does have this issue or, can hit this issue at scale using save points or sub transactions in general, the fact that it's open gives it that huge benefit of people being able to diagnose this issue themselves.

[00:23:52] Nikolay: Right, makes sense. Yeah, like, just keeping in mind, that's it. Yeah, and use, use other people, issues already documented to improve.

[00:24:03] Michael: Nice one. Well, thanks everybody. Thank you Nikolay happy Thanksgiving. Hope you enjoy it and see you

[00:24:10] Nikolay: bye.