Postgres FM

Nikolay and Michael discuss pg_stat_statements — why everyone should use it, but also some downsides!
 
Here are links to a few things we mentioned: 

~~~

What did you like or not like? What should we discuss next time? Let us know on YouTube, on social media, or by commenting on our Google doc.

If you would like to share this episode, here's a good link (and thank you!)

~~~

Postgres FM is brought to you by:
With special thanks to:

Creators & Guests

Host
Michael Christofides
Founder of pgMustard
Host
Nikolay Samokhvalov
Founder of Postgres AI

What is Postgres FM?

A weekly podcast about all things PostgreSQL

[00:00:00] Michael: Hello and welcome to Postgres fm, a weekly show about all things post Chris Gu. I am Michael, founder of PG Mustard. This is my co-host Nikolai, founder of Postgres ai. Hey, Nikolai, uh, I picked the topic this week and we're gonna be talking about PG stat statements,

[00:00:15] Nikolay: let's do it.

Good choice.

[00:00:17] Michael: you find this one interesting. Finally. the reason for this is I'm also hosting this week a blogging event that was started by, I believe, why Booz, probably amongst others, called PG SQL Friday, so. Each month somebody hosts this, somebody different, and they picked the topic. And the topic I've chosen for this week was statement. So I'm hoping to give our entry as an audio one. I'll, I'll write a blog post as well, but, it'd be nice to have an audio entry from us as the podcast.

[00:00:46] Nikolay: Okay, we count on you,

[00:00:47] Michael: Yeah,

[00:00:49] Nikolay: but, uh, it's a good choice. Uh, this is extension. It's hard to find any, user who doesn't use it,

[00:00:55] Michael: yeah. It's a really popular, I mean, we call it an extension, but it's, it when, um, I mean, we've both done the, there was a, there's a really good series of interviews by Andreas, or ads, uh, Sheba, I'm not sure if I'm pronouncing that correctly, but, this interview series that he does, he asks people, what's your favorite extension?

And it's not, everybody saying this, but the most common answer, it's peach stat statements. And I think that mix of. Utility, super useful. but also as it is another one of these contra modules, uh, because it comes with Postgres. Most people have access to it if they want to use it.

[00:01:33] Nikolay: Yeah. Well, s assessment is, the major, observability part of POGO ecosystem, uh, in addition to logs, single log file, actually, right. auto explain we discussed last, last year, I think is much, much more popular. By the way, let's briefly mention that, our previous episode about auto explain was released on Thursday.

And it was, you claimed that Heroku doesn't have what to explain. Right. But we recorded it on Wednesday and on the very same day, Heroku released it. it took like 10 plus years. Right? Because like among all those years, uh, they could use choose any day, but they choose the, the same day you claim that doesn't exist.

That's funny.

[00:02:17] Michael: Yeah, exactly. And to let listeners in on, on kind of what happens behind the scenes. We normally record on a Tuesday, but this week we couldn't, um, get the, we couldn't get a time we could both do until the Thursday. So we really were recording last minute on the Thursday for a release on the Friday. So I couldn't believe the timings you sent me some, history from their, I think it's from their GitHub showing that they were working on it a week before, so I could have found out if I'd looked deep enough.

[00:02:42] Nikolay: The proof to, decrease our ego and, and say that it was not, it was not a reaction. It was like independent event. But it's a very interesting coincidence. so Heroku has auto to explain, and of course they have, PK statements.

[00:02:56] Michael: Yes. And actually a little bit of trivia to link the two. I didn't realize that they were both added in the same release of Postgres 8.4,

[00:03:05] Nikolay: Hmm. Interesting.

[00:03:07] Michael: 2009, I think,

[00:03:09] Nikolay: yes. 2008 was, uh, 8.3 when XML data happen. Functions went to the core. I remember it by, by my heart. So, but, uh, next, next year was eight four.

[00:03:21] Michael: So super old as well. It keeps getting better. Lots of, additions to pg sta statements even in recent years. so it's, an old extension, but one that people keep improving.

[00:03:33] Nikolay: lot of disadvantages.

[00:03:37] Michael: Go on.

[00:03:37] Nikolay: well, we should discuss why it's needed and so on. If you want start, if you want to start from negative parts, we can start from them as well. Like, I, I, I don't care.

[00:03:46] Michael: I'm intrigued,

[00:03:47] Nikolay: For a starter doesn't register. Failing queries, so some failing queries can take a lot of time and resources.

Consume a lot of resources, but you don't see them. If you analyze only PK statements, and this is big issue because probably it's like a half of your traffic, which is failing queries. Right? Some, some, in

[00:04:07] Michael: If you're in a

[00:04:08] Nikolay: in ex in extreme cases. Right. But Patricia said, must tell you it's okay.

[00:04:12] Michael: yeah, let's start with those then. So any other big, limitations or downsides?

[00:04:16] Nikolay: Yeah, of course, For example, if, our, listeners check out, the latest, pogs hiking recording on pogs tv. By the way, pogs TV reached, 2000 subscribers. Not a huge number, but it's growing pretty well. I like, thank you. And also our podcast published there, I think, uh, contributed a lot.

And so we had a session with Kirk and Andre as usual, and we wanted to improve. Performance overhead, of pta, Kash. It's very related to performance overhead of PTA statements because basically it's the same code. so PTA statements, has special like list of, queries in normalized form, so-called normalized form.

I, in my materials, I quite often call them query groups to highlight that inside those queries, normalized queries, we can have many, different queries with different plans, right? So I call them like grouped query group. so, when a new entry needs to be inserted, it needs an exclusive, look on shared memory, area for and in, in this case, If some cell are happening concurrently in parallel, those cellex, hold, uh, shared lock and, uh, your insert needs to wait.

So for example, imagine we have like 5,000 that's default, records and producer statements, and somebody is slowly reaching them during, for example, hundred milliseconds. It's quite slow.

And while they are reaching it, one session, uh, they hold Sherlock and any new query, which is not yet known to produce statements.

Trying to be inserted. It is blocked and we have like a spike up to a hundred milliseconds in latency for that query. It can be any query which is trying to insert and some, sometimes it's some set command we can discuss it Also, also disadvantage, it's called track utility commands. Uh, set application names set.

Something like it's any set, set,

[00:06:21] Michael: Off by the foot, right?

[00:06:23] Nikolay: owned by default, I think, no,

we should check. I usually see it from my perspective in client, uh, setups. You know, we have many layers of defaults including cloud defaults from RDS or so I usually see it being on and it's an issue, uh, exactly because of what I explained here.

So if you have, cases when new entry needs to be inserted, quiet often to purchase the statements. For example, you resided state, uh, you, you called Patricia statements reside, every minute, for example, extreme case

[00:06:57] Michael: You are right, by the way, is on by the default.

[00:06:59] Nikolay: yeah, so if you need to, if you reset it whole, like it's empty and a lot of new entries need to be written and all the needs, to lock whole structure, whole producer statements, content quickly, like for very short time.

But exclusive lock is needed. And if some monitoring is, uh, reading all the content often, for example, every 10 seconds, why not you have a contention issue here at, uh, shell Buffer's area. p statements have their own shell buffers area, not, not general shell buffers. And there are plans to improve this So in this case, if like, or for example if you have a lot of different, uh, groups and, but statements keep only 5,000 according to some, actually, I don't know details about this algorithm, but, uh, some kind of list recently used maybe, I don't know,

[00:07:56] Michael: It says it gets, are you talking about the ones, like how does it deallocate or how does it, um, things drop off, it says the least

[00:08:02] Nikolay: yeah, sometimes some query is known, but it's evicted because there is no space, only 5,000 max statements dot max if is 5,000 by default, you can increase it, but not more than 10,000, I guess.

Right. Which is quite, almost the same just to X. Right? So it's if query is evicted and needs to be registered again, the problem is that registration is synchronous process. And, uh, when you call some query. Does know it and you need to insert, like it should be inserted in background synchronously, right?

With the same backend, uh, you have latency spike for your query execution, right? And in this case, this eviction and like and forth, it's also a problem. And this can be, definitely the case when you, for example, have, track utility in producer statements on track.

Utility means all set commands also considered, as queries. And the problem is that all current POCUS versions, utility commands are not normalized. And if you have a Java app, I had it in my life, a Java app which calls set application name. Two, and then session ID in each session.

We do this every time. This session ID is quite unique, it's a new session for several queries. In the same session it'll be the same, but we need to set it for each session and spas, our PTA statements, content. I saw like 50% of all, normalized queries in PTA statements, which I call query groups.

It was, uh, set commands and in this case, this. amplifies the problem of, this contention and I observed it for some eCommerce during Black Friday. It was interesting case. I was wondering why we have latency spikes up to, it was like up to 500 milliseconds, but every five minutes.

But we've shift like 0 2, 0 7, minute 12, minute 17 and so on. And then I recalled, oh, it was me who put some, pull, like select from start, from Ments remotely to, to have some poor man storage, uh, because monitoring was weak. And I implemented this and I shifted just because I know like sometimes people have a lot of other Chrome jobs, starting at, even, time. So I shifted at two minutes and it helped me re recognize this. It's my, my own script, is doing this. And then I, I reproduce this problem and like, so a couple of weeks ago we had this session on Thursday with Andre and Kirk, and, uh, we implemented like some proposal, when, new entry is inserted instead of synchronous unconditional insert, we try to insert it.

We make several attempts. during like 10 microseconds and if it's failing, we give up and just do it next time. Why not? And I propose like, I think it's a good idea. we discussed it with producer K cash maintainer. There are reasons to say that maybe it's not a good idea. So there is ongoing discussions in Preta repository, but again, since the code is very similar, I agree with the point that it should be fixed on in both extensions, cash, that maybe somehow differently.

But the problem, problem is quite serious in heavily loaded setups. So current advices turn track utility off

[00:11:26] Michael: Makes sense.

[00:11:26] Nikolay: and this problem goes down not, not fully disappears, but goes down, pressure decreases. But good news in postal 16, utility commands are normalized by P statements. So it's good. So in set application name to something will become single entry in PSA statements,

[00:11:47] Michael: Yeah. So there's a commit for this, right? We should probably say that It's not guaranteed that that'll be in 16, but it's very, very

[00:11:54] Nikolay: yes, yes, yes, It's not huge, uh, it's not much. Right. So I hope, it'll make it to positive

[00:12:00] Michael: Yeah. Fingers crossed. So thank you to everyone involved in that.

[00:12:05] Nikolay: Any, you you want more bad things?

[00:12:08] Michael: Yeah, let's do it. Why not?

[00:12:10] Nikolay: Okay. But Well, of course, I, when like, sometimes people listening to me say, you are pocus hater. You know,

[00:12:17] Michael: Anybody that knows, you knows that's not true.

[00:12:20] Nikolay: I hate it. Like, new audience when, when they join they think, I, talked so many bad things about pogs.

So I'm a hater. Well, if, if I'm a hater, why? Why I'm using it? 18 years everywhere.

[00:12:33] Michael: and it's not like you are locked. It's not like you're locked in. But, one thing I just want to check quickly, the, you mentioned PTA statements, max having, Most 10,000. I can't find documentation that suggests differently, but when I go to edit it on a cloud provider, for example, it says the max value's much higher than that.

So I'm a

bit, confused.

[00:12:52] Nikolay: I might be easily wrong here and we should, we should check source code. Of course. Uh, cloud providers can re restrict, restrict additionally. For example, I was checking cloud SQL documentation yesterday and learned that, for example, they allow you to set shell buffers, not more than 60% of memory or, and effective cash size.

Not more than 70 or 80%, I don't remember. But sometimes I want more than a hundred percent because I want to full positive, uh, to make planner workers on different size machine. So, cloud providers can cannot additional, uh, limits. But here I would check source code and maybe it's more than 10,000, right?

interesting question. What's the overhead of, uh, having bigger number here?

[00:13:32] Michael: Yeah, I was wondering whether that would cause more contention or less, but actually,

[00:13:38] Nikolay: depends on the workload.

Yes,

[00:13:40] Michael: makes more sense.

[00:13:42] Nikolay: of course. On, on this, uh, number as well, but on workload. Definite, definitely. And, um, when P statements appeared, as you mentioned, 14 years ago, right.

it was like a leaf because, uh, before that we used only logs, and then Berger understanding that it's a tip of iceberg. Because only above some log mediation statement, like one second or 500 milliseconds, uh, definitely tip of iceberg. He, Patricia statements register everything. Uh, almost everything we discussed, well, it's limited for normalized number of normalized queries, but for each, query, which is already known to statements, to increment metrics,

[00:14:24] Michael: Like in an aggregated

[00:14:26] Nikolay: right in, in a reliable way com cumulatively in a reliable way. Yes. So it, that's, that's great. Unlike you, if you, uh, use log-based query analysis, you always have the problem that you see only the tip of weisberg in terms of duration or you see only slow queries. But, fast queries can be an issue. Many, in many cases

[00:14:47] Michael: Yeah, exactly. That probably leads us nicely into the times that it's most useful. So like one of the, one of the times I see it used most

[00:14:56] Nikolay: no, no, no. We talk about bad things. It, it was, it was, it was small, small break. Uh, so, so bad thing is that, overhead is significant from producer settlements. It's noticeable.

[00:15:08] Michael: So, have you seen any, like, I've looked for benchmarks on this kind of thing. I've seen numbers mentioned, but very, very little data behind

[00:15:16] Nikolay: There is common consensus, it's kind of 7%, but it's so like dump number out of nowhere, so it did depends on many things. It's hard to measure. It's worth measuring for particular system,

right? Particular workload. And this with it, without it. But, worth mentioning that if you, created extension, if you actually edited to sharper libraries and then create extensions global for all logical databases inside this PO cluster, right?

I guess it's some kind of several percent,

[00:15:48] Michael: That's it. Like I've heard a lot of single digit percent I've heard, I've heard probably more commonly I've heard 2% as like a rough, like estimate. But, I guess is no point, throwing out ne well, we didn't even talk numbers when we talked or to explain did we really? But people need to measure it themselves.

But I wish there was a bit more, like a few more, publicly available benchmarks on this

[00:16:09] Nikolay: Right. Also, also trick timing enabled versus not enabled, right? Because business assessments also can have it or can skip it.

[00:16:18] Michael: are we talking

[00:16:19] Nikolay: track right? Track IO timing. io timing, sorry. Yeah.

[00:16:23] Michael: Yeah. So, well, that's a recent edition, I think. Was that p was that Postgres 15? I think only, or only a recent maybe. Maybe a couple more versions back than that, but, it's good that we can have that now, impeach that statements, but there is warning in the, in the docs again, that that does cause additional overhead.

[00:16:41] Nikolay: Right. So this, I think it makes sense to measure overhead only for particular systems, but in general you can claim quite reliably that it should be below 50% and I personally would pay up to 10%. Right. Those good discussion about the overhead from observability tools recently, um, uh, raised by Jeremy Schneider mentioning, uh, Oracle experts, and Tom Kate, I guess, and so on, saying that, this overhead is negative because even if you see it's as positive, you optimize queries using this tool.

So you already on very good side. Right. So ne it's negative overhead. It's

interesting point. Yeah, it's interesting point. So you, if you analyze whole picture, having. This tool is beneficial and you can improve your performance using it, especially if you do it routinely. But speaking of bedside of producer statements, again, uh, lack of, uh, visibility of ongoing queries as well.

Yeah.

[00:17:40] Michael: things need to have completed

[00:17:42] Nikolay: Right. uh, also lack of plants.

[00:17:45] Michael: and planning is off by default, right? Like we don't even get planning time. Even in the newer versions that do include planning time, which is not that many. It's off by default. So, even the, the, I use Cloud Secret at the moment and they don't even let you turn it on. So it is quite difficult to get planning information.

[00:18:03] Nikolay: Yep. But I'm talking about the, this fact that a single normalized query can be, can contain many cases of like if you, for simple example, when I usually use select star from some table where column equals one or equals two. For example, if 99% of values is one and only 1% is two, in one case, you will get sequential scan and another case you will get index scan.

It's two different cases, very different in terms of performance and PK statements does allow you to distinguish them at all.

[00:18:39] Michael: No, good point. So yeah, no example plans.

[00:18:43] Nikolay: lack of weight event analysis because it could, could be combined with perta statements, for example, if you have a normalized query, you understand that, like if you could build some pie chart understanding when we spend time. Okay. We have total time meantime mean mark's time.

We, we, we have this

standard

[00:19:01] Michael: of the building blocks. Yeah. Uh, like query,

[00:19:04] Nikolay: we identifier

as

well

now.

right? In some cases, uh, we'll lose time doing some actual work. It can be cpu, can be io In some cases we just wait being blocked by other sessions. Uh, in this case, CPUs free can do something else, right? But where is this for?

Right? We don't have for at all. are many other things, and I know Perona, try to address it. Uh, we've produced that monitor. It's quite good and worth looking. but there overhead I would definitely measure, as we discussed recently, I would definitely measure and understand before going to production.

But it's an interesting, uh, alternative statements.

[00:19:42] Michael: If they give you even more ability to tune queries, maybe it's even more negative negative as an overhead.

[00:19:48] Nikolay: maybe. Yeah, it, it has a, it addresses many, things like, uh, error queries and so on. by the way, the biggest disadvantage in my opinion is the lack of examples. until, uh, very fresh POCUS versions, it was very hard to find examples in logs and so on. Uh, like we have, we have some obviously not good, normalized query in p attachments.

We see it's slow or it's too frequent. Okay. What are our examples? Especially okay, if it's frequent, probably we can find, oh, but how to find the proper place in code. It's also like when I call disadvantage, uh, uh, it, it doesn't mean something is bad. Something is maybe not just yet developed yet. there is opportunity here.

So, for example, we found some query, which is not frequent, but just slow in terms of average latency. We need to troubleshoot it. We need to run, explain lies to perform mi, mi micro uh, optimization, single query optimization, right? And in this case, uh, we need examples because, uh, as as mentioned, parameters define.

Concrete query, like we need example, uh, to work with

[00:20:59] Michael: Yeah.

[00:21:00] Nikolay: And here we can go either to, we can like, we can go to logs, we can go to activity, or we can use a modern approach with BPF or something like,

[00:21:11] Michael: Or like, sometimes sometimes developers are familiar enough with their application to know what a, what the kind of parameters that could be, uh, inserted would be. So like if it's a, if it's like a slow dashboard that's been loading in their SaaS application and they know that like an account ID is likely to be of this format and then you've got, you know, probably a date range or some something you could like know that maybe

the

things that people have available. Exactly. So

[00:21:38] Nikolay: but it's, it happens and it, it is great. But in general case, it's very un, it's not uncommon to, to hear. It works for me. Well.

[00:21:46] Michael: yeah.

[00:21:47] Nikolay: because different parameters were checked and we like, uh, for 90% of cases for this query, we have good, uh, situation. But for 10% we have very bad situation and we need those examples.

And uh, good news is that, uh, in modern POCUS versions 15, future 16, maybe even 14, uh, previous one, they had, uh, connection between observability parts has four ages. Always it had, it always, uh, if you consider currently supported positives versions, all ments have, uh, query 80 some weird numbers, sometimes negative.

And, logs also have the same number, uh, number in modern Postgres and also produced activity. And activity is, is my favorite choice because logs are usually above only log duration statement. It can be also to explain by the way, auto to explain doesn't log right.

[00:22:41] Michael: I think again, it's committed to 16, but

[00:22:44] Nikolay: Oh, great, great, great. So we log slow log pro.

Sometimes it's easier way, but uh, in many cases, uh, we have a case, but, uh, we don't have good examples because none of them, reached loging duration statement. In this case, we got to activity and then we bump into the problem. Our limit. Track activity size, I don't remember always. Like it's, uh, one, 1020 24.

And uh, to change we need to restart. Right. And in serious setup, it's,

[00:23:15] Michael: A big ask.

[00:23:16] Nikolay: well it's, you need to like trade off because the restart and, yeah. So it's worth increasing this direct size to 10,000 or something cause This will allow you to collect samples of queries easier and with query.

You joined this data with Patricia statement's data and understand the whole picture. Is this according to Monds? And here are the examples. Even fast examples, we will capture them just with some sampling. It's easy,

[00:23:45] Michael: Nice. Yeah.

[00:23:47] Nikolay: Yeah. So this is, this is also like kind of disadvantaged require some efforts to benefit from pivar statements because otherwise, it's just a building block because you cannot use it directly without any dance around it because, uh, you, you need examples. You also need snapshots because it's okay, we have cumulative medics. They are growing, growing, growing. But it doesn't remember when it was ette, by the way. It was also added to 16. Right? Ette time for, or no, I remember the patch, but I don't know, was it, committed or no?

[00:24:22] Michael: do you mean specifically?

[00:24:24] Nikolay: If you reset PK statements and then you have beautiful numbers, which you know, they are like, kind of can be analyzed, you don't need two snapshots.

You can work with single snapshot. if you know the reset time, for example, you reset at one hour ago, you have snapshot, you know, these numbers are observation. During one hour, you have averages, right? But, uh, if time is not remembered, unlike, uh, regular statistics, uh, pta, you can check it in PTA database when it was reset.

But for Risa settlements, you cannot check. That's why I

[00:24:59] Michael: There's a view, isn't there, isn't it like PTA statements, info, or something? It'll sh.

[00:25:05] Nikolay: maybe in some new versions, in, uh, current versions, it's not possible to understand. I u always resist statistics if I do it. I do, I try to do it, uh, less often according to, because of the problems we discussed earlier. But if I do it, I usually resist statistics, uh, both database statistics and producer statements.

And that's how I know the number or

deal with two snapshots.

[00:25:28] Michael: Yeah, added in Postgre 14. you can now see

[00:25:32] Nikolay: Oh, I'm, I'm legging. Good. So it's in 14 and 15 then. Okay, good. Good.

[00:25:38] Michael: Yeah. It contains two things. One is when, like a timestamping UTC of when the last reset was, and the other is about, Reallocation. So like how many, you can see how many queries have

[00:25:52] Nikolay: have been kicked

out

[00:25:53] Michael: or how many query groups. Yeah.

[00:25:55] Nikolay: Evicted That's great. Yeah. I missed that. I don't have, uh, six, I still don't have very good, uh, examples of even 14. I work, work in production systems. I observe from myself, I work, work with older versions, so that's how I missed it.

Okay. That's great news. Yeah. Thank you for correcting me. So anyway, we need to deal with two snapshots and here we come. Like, okay. It's a building block and, uh, if you want to work with it, you can work directly. It requires a additional effort, but it, it would be good if all, uh, monitoring systems supported it quite well, but they, there is no, like, almost none of them supported.

Well, for example, consider calls in total time. We need. All aspects of it, all dimensions. we need to take total time and understand, like, every second how many seconds we spent to process queries, assuming that we know all of them. Of course, 5,000 by default, not, not all of them, but if you disabled set by the way, set can also, if you disable, set you excluded from consideration and also not good, right?

So ob, I mean in terms of observer effect, but if like seconds per seconds, this is beautiful metric. If you forget about, this contention in terms of transactions and blocking issues, it would roughly. give you an understanding how many course you need, right? To process this workload.

If it's like two seconds per second, okay, two course, would be probably enough, but we will probably have a hundred percent. it's very, very rough approach, but it gives you understanding of your workload. And the same for reef buffers. Block numbers shall block, uh, heat, red, uh, temporary blocks and so on, local blocks.

we need to divide them. We need to deal with, two, actually three, maybe derivatives here. First is divide ev all metrics by calls. This give you average for individual query total time by calls will be average time you have it anyway. And all blocks by call will be on average H each call each query.

Takes this number of blocks, heat this number of blocks red and, and, and so on, iot timing as well. So we can divide by cost and also we can divide it by duration between, uh, distance, time, distance between two snapshots. So that monitoring system should do it. And this also super needed. And for example, data Datadog provides some of them, but not all of them.

And in some cases I had, uh, like why I don't have it here. Uh, and I, I'm speaking not about, uh, dashboard's, final, final dashboards and Datadog, but in, uh, metrics they collect, they don't collect this and you cannot, calculate it yourself. So there's lack of info. They don't collect everything. This is the problem.

And if you check out all other monitoring systems, this problem also the same. And once you already, like if you analyze, yesterday, for example, if you don't have this data, you cannot, understand what's happening.

[00:28:51] Michael: And the, the reason this is a problem, right, is because if we, forgot direct access to page stat statements. Right. But we only have it since the

[00:28:58] Nikolay: the

last

recent

reset. Yeah, it's

[00:28:59] Michael: current snapshots information.

[00:29:01] Nikolay: course we need to, to, to see spikes when it start, incident started, incident ended, and, and, uh, inside it we, I would like to have all, uh, metrics, producer statements has, uh, divided by calls and divided by time diff and I mentioned third one, it's percentage. So it's, it's interesting to understand for, for analysis, like top down analysis, you, you want to understand what, which query, is very, data intensive in terms of, uh, hits and, and and re reeds of in she buffers. Uh, I would combine the m divide and see, oh, maybe hits and reads separately. Also good because sometimes I had incidents when reeds was not a problem.

Heats was a problem, and it was not collected by monitoring, not presented. But, uh, eventually we realized that if it was, we would, uh, understand the root cost analysis, much better, much faster, because some query was dealing with data fully cashed in the buffer pool, but it was so intensive, not only because of frequency calls, but just like, because single query reading so many gigabytes from the buffer pole.

And it was not visible because nobody expected that. hit. Should be present in monitoring as well. It was surprise. And this, this is like, I, I would take everything divide. I know it, there is a price in terms of monitoring the storage and so on, but it, it's worth it because, we already spent some percent of overhead in POGO itself to collect this data.

Please deliver it to monitoring. Right? So currently I see monitoring, uh, they fail to, do this job pretty well. And, uh, some highlight in, in, in, I will have, tutorial, three hours tutorial. We will talk about various aspects of POGS monitoring and this is a very big aspect of it. how we work with producer statements and similar extensions and how we present data, which metrics we need and which derived metrics we need as well. Right.

[00:31:04] Michael: Nice. Am I allowed to talk about any

[00:31:08] Nikolay: okay. Okay. So let's about talk about positives. Uh, a hundred percent. You should install it. It's super useful despite of all downsides. Yeah.

[00:31:19] Michael: I imagine there's an exception to this, but for most applications I've seen they would benefit from having this information. A few times I've tried to help people and they didn't have it installed or like the hosting provider didn't put it on by default and we were kind of hamstrung until they turned it on for a while.

Like, it's very difficult to start to get an idea of what's going on without this kind of information, without any logging. So it help even if, you aren't doing anything with it currently in the future, if you need help from somebody, having it on can, can be really beneficial. one thing that you mentioned dividing by a number of calls a lot and I think I see quite a lot of guides and that's the kind of example query.

They give querying ptat statements looking at a few things like. Total execution time, and then mean execution time, uh, using that calculation and a few other things, but then they order it by mean execution time. and I understand looking at the slowest queries by average time, but more often I see people that want to look at queries by total load.

So they're looking to reduce total load on the database more than, the, the average ti, you know, the, the worst performing in terms of length of query. So I, I prefer to start looking at that ordered by total time, and that, goes to what you were saying about looking at even fast, fast in inverted commerce

[00:32:38] Nikolay: queries that are,

[00:32:39] Michael: a huge number of times.

[00:32:41] Nikolay: well, yeah. You know, I have, uh, my, my approach here. I think we need to order by everything almost. I, I like, we can, we need to have different tables. And if we, for example, if our monitoring also limits, uh, the interest, not like five, 5,000 tickets, it cannot handle all of them and it needs to, uh, limit them.

We probably need to, to have a few lists. For example, if we limit hundred or 500, we need to have several, uh, ordering, uh, approaches. And of course, total time versus meantime is the most popular. But also there are IO metrics ordered by hits plus reads, or ordered by hits, ordered by reads, two lists.

But speaking of total time versus meantime, in order by. Uh, I have simple approach here. if you care more about humans, you should order by meantime. If you care more about, machines, computers, you need to order by total time because optimization goal, optimization goal, if you target to reduce, uh, resource utilization, you order by total time because total time, it's like final metric.

Of course, there are, again, there are some local issues, blocking issues, which can increase total time. And unfortunately we cannot distinguish if we had a wait event in four, in statements combined easily, right? We could say, okay, we order by total time, but we exclude a lock. we don't want to, because lock, it's like some application logic problem.

It's not a resource uh, um, utilization. But a rough approach is like order by total time and optimize all queries, which exceed 20% of that calculate percentage. If it's, if some query group normalized query exists 20% good candidate for optimization. If you care about machines, probably

users won't notice cost.

Yes. So, well, yes, if you want, if you reduce, uh, if you reduce utilization that sometimes also converts to

[00:34:34] Michael: Well,

[00:34:35] Nikolay: dollar

[00:34:36] Michael: And, and users care. If like a system is overloaded, like that's affecting everybody on the system, for example.

[00:34:44] Nikolay: on edge case if it's, uh, saturated. But if it's, uh, 20% C P U and you order by total time, you reduce it. And maybe the people won't notice because maybe it's a query which, uh, executes below one millisecond on average, but it's so frequent. Right. It's super fast query, already well optimized, but it's, it's called thousand times per, per second.

I had many cases, like for example, select one, and you see 30% of total time is we select one. What? Like how, like why do we need it? We pay for it, right? But users don't see it because it's just one millisecond or even below one millisecond if to select one, it should be like some dozens of microseconds probably.

But, uh, if you care about humans, or you have not all tp, but some analytical system, and again, users, it means right, uh, we need to order by meantime, average time, and find, users suffer. Due to those queries. And if we optimize them, probably our computers won't notice because, for example, it's accessible for less than 1% of, uh, total time.

But it's still, in terms of meantime, it's very bad. Like above one second, users need need to wait. Or political system, like one minute, like it's very low resource, uh, usage, but very bad user experience. So the resource usage, is total time. UX is user experience is meantime, this is my rule. It's quite easy to explain and people usually understand it,

[00:36:12] Michael: I like it

[00:36:12] Nikolay: but let's not forget about buffers, right? Our motto is buffers first. Sometimes not maybe here if like, but if. If you think, okay, total time is big or meantime is big, what's happening here? We should take, uh, shabo hits, hit and red, and then divide either by uh, or like we can take it as a whole between two snapshots and let's say if we care about total time or we need to divide by calls.

If we, if we care about meantime and we understand, okay, meantime is huge, but because on average single query deals with gigabytes of data and that's problem probably it's a select count or select some or something aggregates, right? Let's think

[00:36:55] Michael: Yep.

[00:36:55] Nikolay: to improve it.

[00:36:57] Michael: Yeah, so you can look at, uh, it is really good for splitting out all of the different buffer numbers. So I think there's about 10 different types that you get through. Uh, the same ones you see in, in explain, explain, analyze buffers. You get via p stat statements. And it could be used, like we could use look at temp blocks, for example.

We could look at which queries most often or using the most temporary space, like what and what do they use on average. Like that kind of thing could be really handy. The other thing that they've added there isn't buffers related, but I think could, I didn't realize they had and could be really useful for certain, for like one certain case.

Well, will they do add? Uh, but I was gonna say just in time compilation statistics, I've seen quite a lot of people considering upping the cost limits for that or, or

[00:37:40] Nikolay: turning off

[00:37:42] Michael: But this would

be a really good way of

[00:37:43] Nikolay: turn it off completely for lgp. That's

[00:37:45] Michael: Yes. But it's quite cool that you can look in PTE statements and see which queries are using it and which ones by average, or like you can, so you could

[00:37:53] Nikolay: Yeah.

[00:37:54] Michael: that for that ad hoc analysis,

so,

[00:37:56] Nikolay: decision

versus just listening to me. Right, right, right. Yeah, that makes sense. Definitely. But, , speaking of, uh, hits and reads and so on, if you, divide by calls, you have average, it's super important to have average for hits and reads. it makes sense also for like temporary files, but, temporary files, sometimes we just quickly realize we need to increase work, ma, that's it.

But like, but if you think about like, sometimes yes, it makes sense and Sher buffer is the most interesting. And, uh, this is like, okay, our average is this, oh, gigabytes for single query not, not good, but if for, for total time, centric analysis. for machines, right? If we consider, , hits, we again, like we can divide by, duration number of seconds between two snapshots and we can imagine this flow.

Okay? This query group is responsible for like, okay, one gigabyte per second or a hundred gigabytes per second, uh, dealing with sharp buffers and hundred megabytes per second of dealing with underlying, page cash. Cause Reese is maybe not only discs page cash as well, depending on how on its size. So we don't have visibility without just that KC cash to, to this.

And this is super good for, engineers to understand these, these streams. Like, okay, this generates this IO and this even without timing, it's already very, super useful number of buffers per second.

[00:39:18] Michael: Especially in a world where you're seeing, we're seeing, you know, Providers that charged by that, you know, if that's the metric that you're paying

[00:39:25] Nikolay: Aurora, for example,

right? Aurora? Mm-hmm. Yeah. Makes sense.

But walls, wall numbers, it was added to post 13 maybe, or 14, 13. I think both to explain, analyze and, in 13 and both to explain, analyze and produce some statements and that's super good. Yeah, sometimes you have, uh, terabytes of data generated of wall per day.

You enable compression to reduce it. It's already better, but still a lot. It's, uh, like it's increasing. Your backups are growing, replication needs to transfer. All of these things, probably it's because of index amplification, probably because of something else. And purchase assessments can show you which queries are responsible.

For example, 20% of all, of all generation among called queries, of course, actually create index also will go there, right? It'll be registered by purchase assessments, create index

[00:40:19] Michael: But yes, if you

[00:40:21] Nikolay: or something.

[00:40:23] Michael: Would it be under utility commands?

[00:40:25] Nikolay: Uh, no, I think it's ddl. It's not utility. Utility set. Yes. And auto, auto won't show up here. Auto. I can also generate some wall processing tables and indexes. It generate some wall, but we won't see it in statements. But normal sql including DDL or, or index maintenance, like recreated index. Concurrently we will see them and we can again, understand percentage for each query group, for each normalized query and uh, make conclusions.

Okay. Our index maintenance, it's a lot of wall, but we move it to weekend, so we are fine. Or these inserts together, like whole group of inserts. It's responsible for, I dunno, maybe not inserts, updates. Updates, for example. Responsible for, 10% of all, all wall generation or 20% and we check, we probably drop or adjust some index.

We'll make these updates hot. Uh, reduce index simplification problem or maybe drop some indexes, sometimes unused, redundant. We had episode about index maintenance. It's important, and in this case, less wall will be generated and we will be better in terms of replication, backups, and so on. Less pressure to those subsystems,

[00:41:45] Michael: I feel like we've gone into quite a lot of detail here. I'm glad we have, but I feel like we might be coming to the end, would

[00:41:50] Nikolay: Yeah. I'm, I'm just showing, like, I'm just illustrating, why everyone should enable in create extension PTA statements in just, in just one database. It's not needed to install it everywhere. It's a single thing. Actually. You can install it to POS database, for example, and work with super user and you will see everything from all users, all databases, all, all queries globally.

[00:42:15] Michael: And there's a, there's a chance on, on some cloud providers, it's on by default, so you might not have to, but worth checking.

[00:42:21] Nikolay: Well, yeah, our, the super user. And so, yeah. By the way, also interesting dimensions, db ID or database and user, like which user, if we have like segregation, well we, if we split like divide and conquer this parts of our application, use this DB roll DB user humans, they have their own database users. So we can perform analysis and we have segmentation here.

We can perform analysis and understand who is using what, who is generating, like, uh, is responsible for total time and uh, IO metrics and so on. And also super interesting case, like, okay, we found some query. We, we talk about, performance analysis starting from database point, but database is just the heart of any system, right?

We have also applications, humans, many things and , sometimes we need to trace. And understand which code is, is doing this. And this is super hard topic because, because for example, if you, that data doc, it has application performance, , monitoring, apm, So if you analyze performance from application side, You can trace and say this is the code generating it.

you can also have network latency, , involved in, in latency analyze, but it's, pro and it's cons as well. Like it's, it's beneficial because you, for application, you have final latency, right, including network around trip time. But, , you cannot distinguish how much of those milliseconds or microseconds is our network from pogs.

You don't see this like network latency and you cannot trace to, uh, code unless you make some tricks. For example, sometimes people start putting some comments. For example, for Ruby there is a, library called marginal. It can, , help you trace , it can be url, for example, to put to comments and, uh, PTA statements will ignore comments.

It will, uh, use first comment, keep it, but all subsequent will be ignored in normalization process. And this is, this can be used. So you put some comment here and you can trace then, okay, this query group is from that parts of my application, that URL or this module or this file. you need to invent some, some way and it requires some efforts.

There is interesting, uh, effort from some Russian, developers. , for RIS K cash originally. So the idea to tag, so you have some dimensions, uh, for ris K cash entries. you tag, like for example, this is application module equals this. And, uh, you start distinguishing, , different calls and metrics and you can, uh, have specialized segmentation depending on your application logic.

I think it would be good to consider similar improvement for PTA statements as well, because this tracing problem, it's, it's sometimes will lose in like database experts. , we lose with our tooling, , we lose to application, , tooling. This APM stuff, they say, you know, it's more convenient for us because we see which code exactly generated it with your business attachments. We cannot do it. Okay. I'm talking about downsides again. All right.

[00:45:31] Michael: I think time to call it a day. that was really helpful. Thank you so much, Nikolai. thanks everyone for listening

[00:45:36] Nikolay: Thank you for, uh, this topic. It, you see, it resonates me inside me a lot because, uh, yes. I spent many years working in this area. I hope it was helpful. Good luck with

[00:45:46] Michael: on

[00:45:46] Nikolay: Thank you. Okay, good

[00:45:48] Michael: one.

[00:45:48] Nikolay: you too. Bye. Bye. Bye Thank

[00:45:50] Michael: Bye.