Postgres FM

Nikolay and Michael discuss benchmarking — reasons to do it, and some approaches, tools, and resources that can help.

Here are links to a few things we mentioned:

Towards Millions TPS (blog post by Alexander Korotkov)
Episode on testing
Episode on buffers
pgbench
sysbench
Improving Postgres Connection Scalability (blog post by Andres Freund)
pgreplay
pgreplay-go
JMeter
pg_qualstats
pg_query
Database experimenting/benchmarking (talk by Nikolay, 2018)
Database testing (talk by Nikolay at PGCon, 2022)
Systems Performance (Brendan Gregg’s book, chapter 12)
fio
Netdata
Subtransactions Considered Harmful (blog post by Nikolay including Netdata exports)
WAL compression benchmarks (by Vitaly from Postgres.ai)
Dumping/restoring a 1 TiB database benchmarks (by Vitaly from Postgres.ai)
PostgreSQL on EXT3/4, XFS, BTRFS and ZFS (talk slides from Tomas Vondra)
Insert benchmark on ARM and x86 cloud servers (blog post by Mark Callaghan)

------------------------

What did you like or not like? What should we discuss next time? Let us know by tweeting us on @samokhvalov / @michristofides / @PostgresFM, or by commenting on our Google doc.

If you would like to share this episode, here's a good link (and thank you!)

Postgres FM is brought to you by:

Nikolay Samokhvalov, founder of Postgres.ai
Michael Christofides, founder of pgMustard

With special thanks to:

Jessie Draws for the amazing artwork

Creators and Guests

Host

Michael Christofides

Founder of pgMustard

Host

Nikolay Samokhvalov

Founder of Postgres AI

What is Postgres FM?

A weekly podcast about all things PostgreSQL

032 Benchmarking
===

Michael: [00:00:00] Hello and welcome to Postgres fm, a weekly show about all things Postgres Cure. I'm Michael, founder of PG Mustard. This is my cohost Nikola, founder of Postgres ai. Hey, Nikolai, what are we talking about today?

Nikolay: Hi, Michael Benchmarking.

Michael: Simple as that. very easy word.

Nikolay: Data. is benchmarking. , but actually not only database benchmarking, I think you cannot conduct database experience benchmarks without, , involving also smaller micro benchmarks. So let's talk about them as well and of, of course, in the context of databases.

Michael: Yeah, it is extremely complicated topic that I'm sure we can, , start simply with at least. And I'm looking forward to seeing how far we get. It's not something I have a ton of experience with personally, but I have tried a little bit of dip my toe in, so I'm, I'm looking forward to hearing what the, like the more complex side of it, can

Nikolay: Why, why, why difficult? You just rampage a bench and see how many TPSs you have. That's it. No, no. Like it's easy. easy

Michael: Right? The more I learn about it, the trickier [00:01:00] it gets, but yeah.

Nikolay: Actually, I'm I absolutely. I agree with you. I, I consider myself as a, as a 1% of database benchmark expert, not knowing other 99% trying to constantly learn and study more areas. So I agree. This is very deep topic where you think the more you know, the more you understand. You don't know.

Michael: Yeah. Awesome. So where should we start?

Nikolay: We should start about the goals. I think, for benchmarking, there are several cases I can think of. number one goal usually let's check the limits. And why is it so? Because by default, P bench, , does exactly that. It doesn't limit your TPSs does limit anything, so you are checking, you are basically performing stress testing, not just regular load law testing, but stress testing.

You are exploring limits. your system. And, , limits of course depend on the various factors and of on [00:02:00] database itself and workload itself. But if you just get pbe and just, create, I don't know, like hundred million, records in pja bench, accounts, I think it's dash. 10,000 or so. I don't remember.

By default, I, one in initialization step means, , a hundred thousand, , rows in the table, right? So basically it's just the, just only one table is big. Others, other three tables are small. And then you just, uh, if you do it, As is, it'll run as much as many TPSs as possible. If you remember that you have not only one core, but many course, you'll probably try to use dash C and dash J and then, , it's already more interesting, but basically your performance, stress testing, and. exploring limits. It's definitely a good goal, but I think it should not be number one goal for database engineers. Number one goal should be [00:03:00] let's make, data-driven decision, when comparing several situations. For example, we want to upgrade before major upgrade, and we want to compare the system behavior before and after.

Second goal, and I think it's more important than exploring limits. Limits. Knowing limits is definitely a good thing, but we don't want our system in production to live under, like near to limits, in situation, So we want it to live like CPU below 50%, other resources, network, , memory and this cayo, they should be below some thresholds. So that's why. Second goal, supporting decisions. I think it's more important, and I wish Pja bench default behavior was like, let's limit. Tps, , to have some more realistic, , situation and focus on latencies. So second goal.

Second goal is, making decisions. And what about other goals? What do you think? There are a couple of more maybe.

Michael: Well on the, on the [00:04:00] decisions front. I think that's super interesting and I think there's maybe even a couple of categories within that. We might be checking something we hope would be an IM improvement. So it's kind, it's kind of like an experiment where you have a hypothesis. It, I might be checking something with that I hope has improved matters, or I might be checking something like the upgrade and trying to check that there aren't any regressions. So either way, I have kind of like a null hypothesis and I might be trying, hoping for an improvement, or I might be just checking that there aren't, there isn't a big = change in the other direction.

The other times I see this as like new, new feature work. people planning ahead. I think that's your, so if people want to kind of plan ahead, see, see how much headroom they have. But I guess that's the, seeing where limits are. I guess it's the micro benchmark side of things that you are talking about, but even in just little performance optimizations, if people are doing performance work, they might want to check, on a smaller.

Nikolay: I have an idea. Let's, , keep, , goal number one is, is a stress test to understand limits and let's split the goal number two to two [00:05:00] goals, like number two and number three. Number two will be exactly a regression testing. So you want to ensure. That, it won't become worse at least, but maybe it'll be improved in various suspects.

And, , the goal number three is, for example, an urgent situation when you, need to, , deploy a new system. And you need to make a choice between various types of, types of, in instances. So it's not regression, it's making decision. What, type of platform, Intel versus MD or or arm to use and so on, which type of disks to choose and so you comparing various things and make choice. As for micro benchmarks, I think this is underlying goal, which needs to be involved in each goal, but I have number four goal actually.

Michael: Yep. Go on.

Nikolay: benchmarking to advertise some products.

Michael: Yep. The, famous bench marketing. But it is often because it's the, because a lot of the other ones are internal benchmarks. It the one that a lot of us [00:06:00] see other companies publishing The majority of benchmarks we'll see out in the world are these kind of marketing posts that are often vendors comparing themselves to others based on some

Nikolay: Right. Some, sometimes I like this. For example, I remember the post from Alexander Kko, Postgres, how, how to make it possible to have millions, 1 million c p. So with Postgres it was 9.4 or 0.5, and some improvements were made in, in the buffer pool behavior and so on to reduce contention and so on.

It was good post and it was also done. like in pairing mode with my SQL Fox. So my SQL also achieved, uh, 1 million TPS Select only, but still, like both systems achieved that, relatively at the same time. So four goals, right? Big goals. and, I think, uh, number two and three, and maybe two is more important because if you, you do upgrades at least once per year or two years, usually right?

To, to keep up with a pace of positive [00:07:00] development that this, that's recommended. And also, sometimes you switch to new types of, uh, new type of hardware or cloud instances. Sometimes you upgrade operational. and gipsy, ocean changes and so on and so on. Sometimes, sometimes you change provider but what I would uh, mention here is, uh, I, I think it's, , not a good idea to use, , full fledged benchmarking to check every product update.

So if you just release feature first of. It's very often situation. Usually you, some people have a couple of deployments or more per day and, benchmarking is expensive task to do it properly. So in this case, as usual, I can mention our episode with about testing. I recommend, , to do it in shared environment. I would not call it benchmarking, but it's kind of performance testing. in shared environment using, single connection and focusing on IO number of rows and buffers in another episode. So, this is for developers, but for infrastructure folks who, who [00:08:00] need to upgrade pocus, annually or be annually and upgrade professional systems once per several years, benchmarking is very important, right?

Michael: Yeah. And just to check, you mean biannually, like every two years, you don't mean people trying to upgrade twice a year.

Nikolay: yeah, and, and exploring limits also important, uh, but, I would, say this is extremely hard topic because, first of all, like, let's discuss how do we do it? How do we do benchmarking? We need like the most difficult, straight to the point. The most difficult part is.

right. So,

Michael: in making it realistic or what's the, what do you mean by that?

Nikolay: exactly. So there are several, several, uh, approaches, to, to have proper workload. First is, uh, very simple. You take peria and you rely on what it does, but it, it's very, very simple workload. And I remember like some years ago, less than 10 actually, so not [00:09:00] that long ago. P branch was criticized heavily for, not having, various kinds of distribution.

When, when it sends cellex, like single row cellex based on primary key, lookup, it chooses, ideas randomly. and this is like evenly distributed, choice. And it's not what happens in reality. In reality, we usually have like some hot area right? Of, of users or of posts of any, some items and the kind of cold area, right?

And, uh, I remember si Sibe, author, criticized that, , saying, SI bench supports po. He, he, he's my sequel guy. So it was quite heavy criticism. I remember quite, like very, emotional articles, uh, in Russian actually. and the idea was right, CI Bench supports pogs and it supports xFi and distribution.

Zen distribution is, is closer to reality to how social media is working. [00:10:00] So with this hot, hot area, but since then, PG Bench already got this support and we, when we run Jibe, we can choose, Zian distribution to be closer to reality. But of course, it'll be very far from reality since still right, because it's some kind of synthetic workload.

What? How can we do better?

Michael: is it worth noting that for some things PG bench is fine? Like if, depending on exactly what we're testing. I see, I saw Andres Freud, for example, when he published about the improvements to connection management, on the Microsoft blog. Very, very good blog post, but to just, to, just to show how, the.

worked well with memory management at higher numbers of connections. PG banks is great. It did a, you know, you don't, it doesn't, so it's so depending on exactly what you're testing, the nature of the workload might, may or

may not matter so much.

Nikolay: right, but this is for post's development itself, and it's about limits again, but I'm talking again, I, I, I'm, I'm talking about a general [00:11:00] situation where database engineer, like DBA D B R E or some backend developer who wants to be more, uh, More active in the database area.

they, need to usually need to compare and they need to talk about their own system. So in this case, PBE can still be used, but it should be used very differently. we need to move from synthetic workloads. , purely synthetic. Like so in space, right? something.

Living in space, uh, far from reality, we need to move closer to reality. And on the other side of possible workload types is, uh, the real production workload. And, like the main approach here is to have mirroring of production work. , which should be very low overhead, which is tricky.

And it's just like for imagine some proxy, which, receives all queries from application and sends these queries to the main node, which is production node, and also it sends them to a like [00:12:00] shadow production node in ignoring the responses from the. It would be great to hear, right? It doesn't exist, uh, in, in, I don't know any such tool developed yet, but hopefully it'll be created soon.

There are some ideas.

Michael: a previous employer of mine, go Cardless, released a tool called PG replay Go. and that, that the whole,

Nikolay: This is different.

Michael: I know it's different, but it's the, the aim is the same, right? It's to

Nikolay: no. I mean, I mean, yeah. Well, well, uh, imagine the, like if we talk about mirroring of, of course, aim is the same to have, similar to production workload, uh, ideally identical, right? But with mirroring, you can achieve identical. With replaying, it's more, it's much like more d.

Michael: But, but I would say that most people's workload, like. yesterday's workload is very likely to be similar to [00:13:00] today's and to tomorrow's, right? Like if, if you're talking about getting close to,

Nikolay: right.

Michael: but yeah, I do appreciate that you can't, like, I, I think also even, even mirroring production isn't quite what we need, right? Like if we, if we want to, we want to kind of want to anticipate, right? When we're benchmark, we want to say, what will we want to know? How will it perform? Not with today's workload, but with tomorrow's, you know, or like in a month's

Nikolay: goals, if it, if you talk about regression, uh, I would, I would like to replay, I would choose this. because in this case I can directly compare, for example, behavior of new pogs and next pos major POGS version or, behavior of pogs in different buno versions Right. Or for example, compare with, different instance types. And of course you can get it and put it, uh, into like your fleet, like a standby note. In this case, if, uh, Gipsy version didn't change, if major postgrad didn't, didn't happen, you can. for some kinds of testing, you can do [00:14:00] that.

And it would be like, kind of a bit AB testing or bluegreen deployments because, some part of your customers will use it, really use it. But mirroring gives you, less, like fewer risk. Because if something goes wrong, customers won will not notice. But still, you can compare all s like CPU level and so on and so on.

A replay will be, wasn't a next item on my list. I, I

like replay. I, I spend a couple of years exploring this path right now. I don't use it, directly, like replaying from logs. I, I believe in this idea less. , my belief dropped in little bit after I spent some time with, PPG play and PPG play.

Go. Why? Because the hardest part of replaying workload is how to collect the logs, like Logs. Collection under heavy load is a big issue because, it become, it's, it's strictly becoming bottleneck and, uh, observer effect hits you very quickly. fortunately these days [00:15:00] already we on the log only some fraction of.

whole stream of queries because there are a couple of settings, for sampling, right? For example, we can say we want to log all queries, but only 1% of them. And we can do it at transaction level, which is good. And then when replaying, we can say, okay, let's go X in terms of what we have. But in this case, the problem will be, you will work with smaller set of ideas, for example, right?

You need to find a way, like, and we come to the same problem of distribution, how to find proper distribution. So in reality, what we usually really do, we combine synthetic approach like, simulation. We've, uh, we, we take information from PTA state. , right? And, uh, then we try to find a way to get some, query, parameters.

And then we can use various tools. For example, J Meter even pi bench bench, you can feed multiple. files using dash F [00:16:00] and you can balance, uh, you can put some weights in terms of how, often each file should be used, at sign and some number. So in this case, you can say, I want this file to be used in two times more often than the other file, for example.

And you can have some kind of workload you can try to achieve. Some kind of situation you have in production, at least having like top hundred queries by calls and to top hundred queries by total time. Usually we combine these things both and, uh, you can replay quite, reliable in terms of, uh, reproduction.

Also, big question. Replay gives you good idea that you can replay multiple times, right? So you can use workload you already created and replay, replay, and have a whole week playing with it. With mirroring, you don't have it. Usually it's only right now, if you want to compare 10 options, you need to run 10, instances right now, and if tomorrow you have some other, , you already don't [00:17:00] have exactly the same workload.

So there are pros and cons for, these options as well. But I still think metering is very powerful for aggression testing. In some cases, replay, well, replay is hard, but possible, and if you understand the downsides of it,

Michael: Yep. Makes sense. on the pros and cons side of things. where do you end up, like is there like a size, like I'm thinking replay might be good up to a certain volume, like for smaller shops that still want to check, performance is still important, but they're not, they've not got a huge, uh, number of transactions per second.

I'm guessing replay is like a nice, nicer way to go.

Nikolay: Yeah, I just want to warn folks about uh, this, um, uh, observer effect. When you enable a lot of logging, you there, there you definitely need to understand the limits. So I would test it properly. Understand how. lines per second. We can ha afford in production. in terms of this scale, first of all, and of [00:18:00] course, login collector should be enabled, uh, in CSV format.

It's easier to, to work with and, uh, I've, I've myself put couple of, uh, very important production systems down just because of these type of mistakes. So I, I will, I would like to share this. Don't do it. I, I knew it'll happen and still. I was not super, super careful, uh, careful enough, and, uh, had a few minutes, uh, of downtime in a couple of cases.

But in one case, I, uh, we implemented one ki kind of crazy idea also, any, any change with like enabling login collector, and changes in the login system, and requires the restart. This is a downside. So you cannot, for example, say I want to temporarily log hundred percent or queries to Ram using like ram disk, T tm, PPFs, right?

And then switchback. But in one case it was my own, social media startup. So I, I decided to afford that risk and I [00:19:00] decided to go that route. We implemented quite interesting idea, we. Allocated some part of memory to that. And we started to like very quiet, intensive, log, rotation.

So we send those archived logs, new log created, so not to reach the, the limit in terms of memory. And in this case we enabled a hundred percent of quiz to be logged and we had good results. So we collected, for replay, we collected a lot of logs and. , right? So it's possible, uh, if you can afford restarts.

Actually, the downside was actually in that system. We enabled it for all like permanently. . So just wanna restart. And, it's working, but of course there is some risk to reach, limit and have downtime. It's quite dangerous approach.

So let's, let's, uh, rewind to the fact that like replay versus mirror versus simulation. I, I call it actually a crafted workload.

Michael: Crafted workload. Oh, like the, like the hybrid [00:20:00]

approach.

Nikolay: yeah, yeah. You just, you use producer statements, uh, understanding proportions, uh, of which query group, but producer statements doesn't have parameters. You extract parameters from somewhere else. For example, you can use producer activity, or actually you can use e D P to extract parameters. It's possible, uh, we had this, like, it's very new approach, but I think it's very promising

Michael: Have you ever used, there's a, there's a extension, I think by the power team in Monitoring tool. They do. I think. Yes,

Nikolay: Yeah, that's a good idea. I haven't explored that path. I think it's also a valid idea to understand, but the problem will be correlation if query has many parameters, how to, like all ideas to use pretty statistic, for example, to invent some parameters, some kind of. Almost random, but with some understanding of distribution, they are not so good because, uh, you correlation is a big thing and usually you don't have, create statistics.

Not many folks these days still use create statistics, so you like, [00:21:00] it's hard. But, extracting from previous activity is, is I think it's my default, choice. Default there. for query column default is 124 1024 characters only. I think it should be increased like to 10 k usually because, because if you have large queries, you, you definitely want to have it this is one of defaults that also should be adjusted, uh, recalling our last episode, but, Uh, usually people, do it, but unfortunately, again, you need to restart

Michael: Yep.

Nikolay: But once you increase, you can collect, samples from there, majority of queries will be captured. The only problem will be how to join these sets of data.

And, modern pauses, latest versions have query ID both in produced activity, in produced statements. Right. And maybe in logs. I don't remember, exactly. But if you have older posts, for example, posters 13, 12. 11, uh, you need to use a library called [00:22:00] Que to join these sets and, for each, entry in produced statements to have multiple, uh, examples from produced activity.

So this is, uh, how we can do some kind of crafted workload, and then we use some tool, even bench as possible to generate, to simulate workload. , we understand it's far from reality, but it's already much better than purely synthetic workload. So let's talk about journal picture of very briefly. I have like structure.

I have, I have a couple of talks about, benchmarking. I call them database experiments. Regression is number one case for me usually. but, if you imagine something is, as. And something is, is output and input is quite simple. It's just environment. instance type, post version database you have actually database you have is it's already object, then workload and maybe some delta.

If you want to compare like before and after you need to describe. The difference is this, you can have [00:23:00] multiple options there. Of course, if you compare multiple options, right?

Michael: I mean, it is just like a sci science experiment, right? You have all the things that you're keeping the same, and then hopefully just the one thing that you're changing or you know, the, if it's multiple things, the list of things that you're changing, and then you've got your null hypothesis and you've got your, you know, like you checking that.

Uh, in fact, actually, that's a good question. Do you take a statistical approach to this? Like, are you calculating how long you need to run these for, for it to be significant?

Nikolay: He will come to to the output and analysis. This is like we discussed workload too much because it's very hard to have very good workload in each case, but the most interesting part is output and analysis. And for that you need to understand performance in general without understanding, you will have something measured.

And like, okay, we have this number of tps, but with, for example, very basic example, we have this number of TPSs, or we had, controlled TPSs and we have these latencies for each query and average latency [00:24:00] or tiles involved. But question is, where was the bottleneck and, referring to Brendan Greg's book, chapter, chapter number 12, question number one is can we double.

All our, like, can we double throughput and, uh, reduce latencies? Why not? Right? To answer that, you need to, to understand bottleneck, and bottleneck, we need to be experts in performance system, performance in general, and database particularly. So we need to be able to understand how we, are you bound or CPU bound?

If cpu, what exactly happened? If io what is causing this? to give you some example, uh, we had, couple of years ago, we considered with, uh, one client who considered to switch from. two AMG on Google Cloud and a amg. Somehow we, jumped straight to database benchmarking and it, it showed that, uh, on AMG we could, we can, like all latencies are high throughput if we go with stress [00:25:00] testing is worse.

even if throughput is controlled same, but latencies are high, like, what, what's, what's happening? And without analysis we, we just would make conclusion that, it's worse. But with analysis, we quite quickly understood that we suspect that this, has issues there on MD instances. Epic Rome, I think it was second generation of epics.

So, , then we go, went down, to micro benchmarks finally, and checked f io and understood that indeed, if we remove positives out of picture and see that even if you compare f io basic tests, uh, everyone should know how to do it. It's quite easy. Like random reads, random rights sequentially. It's sequential rights.

We see that indeed, uh, this disks behave. We cannot reach limit. advertised limits. two gigabytes per second throughput, a hundred thousand tie ops. we check the commutation and see, we need to have them on, [00:26:00] have them on img, we don't have them. We went to Google Engineers, Google support, and at some point they admitted that there is a problem.

by the way, disclaimer, right now it looks like they don't have it. Recently we checked once again, it looks like they don't have it anymore. Probably fixed. and it was not a, it was definitely not a problem with AMG itself. It was something in goo Google Cloud particularly. But this raises the question, why didn't we use f I checks in the very beginning, right?

And the answer is, uh, I think we. always start with micro benchmarks because actually micro benchmarks is much, easier to conduct you

just, uh, and faster, right? You just take your instance, your virtual machine or rail machine with disks and you run f io for, IO checks. Check thisk and joran, for example, cis bench to check CPU and memory and.

Same test, lasting some minutes. It's very fast and compare [00:27:00] right? And, uh, in this case, uh, you don't need to, like, to think about very hard workload. or to fill all the Cass. Oh, well, okay. In, in the case of memory, you probably need, but with f i disc checks, it's easy. Like, I mean, you don't need to fill the buffer pool and so on.

So in this case, it's, it will already give you insights about. difference in machines and also important point. We actually had it in some, in couple of cases , where we automated benchmarks. I think it should be always done, even if you don't change instance type. If you don't change disk type, you still need to do it to come.

If you run benchmarks On different actual machines, even if it's the same type, you need to compare, them, because sometimes you have some different cpu. Even if it's the same instance type or you can also have some faulty ram, for example, or faulty disks. So micro benchmarks help you [00:28:00] understand that you are comparing apples, you are having apples versus apples comparison terms of, environment,

Michael: Yeah. And it's, I guess, uh, you made some really good points. It's not just about not trusting the cloud providers. It could also be something else that's gone wrong.

Nikolay: Yeah. especially since it's becoming more popular to consider moving out of clouds. I just read another article this morning about this, so cost optimization. Yeah. So yeah. And then analysis. Analysis is a huge topic. Really huge. But, my approach is let's collect all artifacts as many as possible and store them.

Let's automate collection of artifacts. So for example, let's have snapshots before and after each run for all pg start. Uh, system views including PTA statement succession, of course. So we are able to compare them and, uh, of course all logs should be collected. For example, one of the, one of mistakes is, okay, we have better TPSs, but if [00:29:00] you check logs, you see that some, uh, of transactions failed.

That's why it was so fast on average, right? , it's like error checking is very important. And if you don't collect logs automatically, you probably. , be very sad afterwards cause if you already destroyed machine or re rein, initialized it you don't have those lock look like artifact collection should be automated.

And as much as possible, you need to collect everything.

Michael: Yeah, I was gonna say, because you can spend, you could be spending hours on these things. , in terms of even just letting them run, nevermind all the setup and everything. It's not trivial to run it again to, to get these things. Or if, if somebody, if something looks off, if somebody, if you presenting it to your team or somebody doesn't trust the results, they can look into it as.

Nikolay: Yeah, actually one of the concepts Brandon, Greg explains in his book and talks, uh, that, benchmarks should not be running without humans involved. Uh, like humans should be like, because you can [00:30:00] have good thoughts, just checking artifacts afterwards. But, during benchmark you can have some idea to check something.

And it's better to be there, at least in, in the beginning, first several round. but still, benchmarks can be automated and run without humans, but it just requires better level of implementation. And, uh, I agree with you. Like, you collected artifacts, your colleagues may raise, new questions. If everything is automated, you can.

But of course it's usually quite expensive in terms of resources, hardware and time, human time, engineering time. but like I also wanted to mention in terms of artifacts collection, one of the artifacts I, I consider is, uh, dashboards and charts from monitoring. And, I'm very. surprised how still most monitoring systems don't think about non-production and benchmarking.

If you, for example, have Grafana and register each new instance, which you have temporarily. You will have spam in your host list. Right? [00:31:00] But some instance, lived only a couple of days, for example, already destroyed, but we still need the data to be present. And, instance names are changing depending on benchmarking activities.

And you have spam. Like what I like is, to use net data, which is installed using one liner and it's. in place monitoring already there, and then you can export dashboards manually. Unfortunately, it's only like its client side and browser feature, so automation here is not possible. Although I raised with, uh, NA data developers that, and they agreed that it would be great to have API for exporting and importing dashboards, but at least right now you can open.

To file, and then later you can open several dashboards for several runs and directly compare cpu, rem, all the things. I would like to have this in all monitoring systems. , especially with api, it automated it would be great. Right now we can provide, in our show notes, a couple of [00:32:00] examples.

How my team is conducting this kind of benchmarks. It's quite good. Like we store it sometimes, uh, openly. . So for example, let's enable wall compression. Will it be better or worse? And we can conduct experiments. We, store them some. This kind of experiment It's publicly available and you have, artifacts there so you can load them later to Nate data and compare without and with compression and play with those dashboards yourself.

And of course, we automate everything so anyone can reproduce the same type of. Testing in their own situation. so I like actually benchmarking it. it's a very powerful thing. but, of course it requires a lot of knowledge. So we, we try to help other companies in this area as we can in terms of consulting as well.

Michael: Nice. in terms of resources, you mentioned, Brendan Greg's book. Is there, are there any other ca like case studies that you've seen people publish that you think are really worth reading through [00:33:00] or any, any other

Nikolay: Oh, since, yeah. I, I like Thomas Wondros, micro benchmarks for file systems in the, in, in post context. It's so cool. It's not only micro benchmarks, it's real. I think, PBE was used there in terms of, benchmarks, people conduct. I very rarely see super informative benchmarks, but usually when I see some benchmark, I quite quickly find some issues with this and, uh, like I have questions and want to improve, but so I cannot mention part.

benchmarks accept this, file system benchmark, but may, maybe there are some cases.

Michael: Yeah. Uh, there's only one that, comes to mind in addition, yeah, there's, there's some great ones in the Postgres community, but out slightly outside of that, I think Mark Callahan's doing some good work publishing, comparisons between, between database management systems for different, types of benchmarks.

I think that's pretty cool. Worth checking out for people

Nikolay: Well, I, I like if everything is open and, can be [00:34:00] reproduced and, fully automated. And then we can mention here this, new positive based project, uh, called Hydra. , uh, like open source snowflake. And we had, we had episode of them on POS tv recently, and they used, a tool from click house. but of course the first question was there was why so small instance and, that small size of database and, and they admitted this is so, but this is what this tool does and I hope they will, publish, new episode of benchmarks, with more realistic situation for analytical.

Because on analytical case we usually have a lot of data, not enough ram and so on, and in instances usually are bigger and so on. But what I like there is that some, uh, like kind of standard, the factor for analytical databases tool was used and you can exactly, you can compare a lot of different systems, inclusion POCUS itself and timescale and this new Hydra project and you can reproduce yourself.

It's [00:35:00] great. So this is. Kind of approach that I do like, although I have, I still have questions about, realistic, or not. So,

Michael: of course. Wonderful. Any last things you wanted to make sure we covered?

Nikolay: uh, yeah, I read, uh, Ben Gregg's book. I have here actually the second edition system performance, a lot of, um, good things and, uh, Become better performance engineer, uh, if you want to conduct benchmarks, and understand what's happening under the hood. So without understanding of, internals, it's quite black box benchmark and it's not working properly.

And I did it also many times and failed in terms of, okay, we, we c a, but uh, we conclude B. But what happened in reality was C, right? , so.

Michael: Awesome. Well, thank you so much. Um, thanks everyone for listening and catch you next week.

Nikolay: thank you as usual. Uh, reminder, please, distribute, help us grow. I, I saw [00:36:00] Michael sh showed me numbers this morning and they look very good, I would say. So we are growing, but I would like to grow more and more of course, because, I like feedback, uh, and it drive. to share. Of course. I like when people, uh, suggest ideas.

Actually, we have a few more suggestions that happened last week, so probably we should review them and, use them next episodes. But, uh, I, I particularly ask everyone to consider sharing with your colleagues, at least for those episodes, which you think can be helpful to them. And as usual, please subscribe.

Don't forget to subscribe. Like, but sharing is most important. Right. Thank you. Uh, we, we love your feedback. Thank you so much.

Michael: Yeah, absolutely. Thanks

Nikolay: Thank you. Bye-Bye.

More episodes

Chapters

Creators and Guests

What is Postgres FM?