A weekly podcast about all things PostgreSQL
Michael: Hello, hello and welcome
to Postgres.FM, weekly show
about all things PostgreSQL.
I am Michael, founder of pgMustard.
I'm joined as usual by Nikolay,
founder of Postgres.AI.
Hey Nikolay.
Nikolay: Hey Michael.
How are you?
Michael: Nope, not falling for
that.
And we are joined by a wonderful
guest, Lukas Eder, founder of
Data Geekery and the creator of
jOOQ, a very popular framework.
Thank you so much for joining us,
Lukas.
Lukas: Thank you for having me,
Michael.
Michael: It's our pleasure.
So, how do you describe jOOQ and
what's it helping folks with?
Lukas: So, jOOQ is an internal
domain-specific language written
in Java where it models the SQL
language directly as a Java API.
This helps Java developers to write
SQL queries more efficiently
and more correctly within their
Java applications.
There's a code generator that generates
the entire schema as
Java objects and you can interact
with those objects and create
type safe queries.
That's the base of it.
Michael: Yeah, it's super cool.
But I'm super impressed with how
much you focus on developer
experience in general.
I hear quite a few people say they
really love jOOQ and they
give about 15 different examples
of things they love about it.
But what are the main things you
think that people particularly
like about it?
Lukas: Well, the first thing is,
and I've studied the market
extensively before making jOOQ
and the first thing is it really
looks as much as SQL as it's possible
to do when you model an
internal DSL API in Java.
So, of course, there's some limitations.
But when you have in mind there's
a group by clause, you just
start typing group by where you
expect it, and then it just starts
compiling.
So there's no surprises.
There's almost no surprises in
terms of SQL syntax.
So if you know SQL, then you immediately
know the jOOQ API.
You don't really have to learn
it.
I guess that's 1 of the most popular
things, But other than that,
it's also very nicely integrated
into the Java language or Kotlin
and Scala, that works as well.
And if you've ever coded PL SQL
in Oracle or PL/pgSQL in Postgres,
You kind of like the way how the
SQL language is embedded in
the procedural language.
So if you loop, if you iterate
over a result set in those procedural
languages, you have the same kind
of type safety and you have
the possibility, for instance,
to insert bind variables into
your statements at the right spot,
and you don't have to worry
about the infrastructure and the
logistics of connecting to the
database, and it just feels embedded.
And I think that's what a lot of
people really want to do, even
more than having the type safety,
is this feeling of embeddedness
into the target language.
And when you map your result sets
to Java objects, that also
feels very native.
So it feels like the database is
part of the Java of the JVM.
Michael: Yeah.
Even to the point of getting like
auto-complete suggestions in
IDEs, but like that is super nice.
Lukas: Yeah.
There are little details like this.
So when you say autocomplete, there's
also possibility to comment
on your database tables and comments
and that translates to Java
doc.
So whatever you comment directly
in the database, it's translated
to documentation inside of your
job program you hardly ever have
to go to the database to see what's
there so.
Michael: Nice.
Lukas: Yeah.
Michael: In terms of the history
how like I did look it up in
or at least the GitHub history
How long have you been working
on it and how does that feel?
Lukas: I think the first public
version started in 2009.
There were prototypes before that
in 2008 I think but 2009 was
the first public version, not on
GitHub then, it wasn't first
port.
I don't think GitHub was very popular
at the time.
And then Subversion.
That's how old it is already.
I had the idea already back during
my university studies at EPFL
in Lausanne.
There was a professor who said
you could model any kind of programming
language as an internal DSL.
I'm not sure if you use that term
yet.
I think DSL was more coined by...
More recently, but he said you
could model any language in terms
of an API if you just have Java
interfaces that return interfaces
that are return interfaces.
And this kind of idea stuck with
me for all this time, but I
didn't have a use case back then.
And when I started working in the
industry, in 2006 was my first
Java job.
I did some PHP work before, but
Java started in 2006.
I've seen that all the companies
that I've applied to, and even
the 1 that I worked at, they implemented
dynamic SQL libraries
because this was, this was something
that everyone needs, right?
So everyone has dynamic SQL queries.
When you have a UI with various
input boxes and they're all optional,
so you have to compose your SQL
queries in a dynamic way.
And everyone wants to avoid string
concatenation because it's
not safe and it's boring.
So they did that for both, they
had one for SQL and they had one
for Hibernate query language, which
is essentially the same thing,
just more limited, but at the same
concept.
And no one, no one actually thought
about making this a true DSL.
It's well, it was always just a
query builder with some weird
API.
So you had Java style methods where
you add clauses to the query
object.
And it didn't feel like SQL.
It felt like a tool to solve exactly
this dynamic SQL problem,
but it didn't feel like you didn't
enjoy using this thing.
And I think JPA still has this
thing with their criteria API,
which is still very useful if you
want to do dynamic JPQL.
But I've never heard anyone enjoy
using that API because it's
just, you have to learn it.
So you have to learn one more thing
and it only serves this one purpose
or at least that's how, what it
feels like.
It serves this purpose of dynamic
JPQL.
And that's when I started prototyping
it.
And in the very beginning, I had
to first implement the infrastructure
wasn't very diesel style API either
so I created the query builder
that could do some select from
where very simple conditions But
then I started to really implement
this idea because I remembered,
okay, this professor, he had this
idea and I'm going to try this.
Is it really possible?
And it turns out it was.
And it's really crazy how many
SQL statements today we have in
Juke that are all using this kind
of approach where you just
start with the same object and
then you start typing and you
auto complete your SQL syntax and
it's all one huge graph of methods
that helps you construct your SQL
queries.
So Juke has come a long way since
2009 I'd say.
Michael: Yeah, right.
I was looking in your documentation
about how many different,
how many different, you call them
families of SQL dialects that
you support.
So how many different databases,
but also the different versions
of different databases that may
or may not support different
syntax and SQL features.
How do you maintain that?
Lukas: Well, first off, luckily
the database products are all
very, very backwards compatible,
If we take one or two of them, we
don't really care as much, but
most of them really value backwards
compatibility a lot.
So supporting new versions is just
looking at the new features
and adding support for them, but
the old stuff still works.
So at least that's already covered
there, more or less.
But other than that, I mean, supporting
so many dialect per se
is, yeah, it's a lot of work and
a lot of testing, of course,
a lot of automated regression testing
to make sure nothing breaks.
And also a Juke really values backwards
compatibility a lot,
just like the database products.
So when you embed the joke query
into your database application,
you don't want stuff to break.
So if I'm not allowed to break
stuff, I'm probably not breaking
stuff.
So this kind of simplifies maintenance
as well, because stuff
that once works is probably not
going to break just like that.
I mean there's some internal refactorings
but there are no really
major, how to put it, major paradigm
shifts where everything
works in a different way now and
I was breaking everything.
So with automated regression tests,
it's not...
It's hard, but it's not that hard,
I think.
The hard part is just to find out
some weird syntax that implements
a standard feature on this particular
dialect.
And that's quite creative at times,
especially with the newer
dialects that have just been released
recently that are not that
mature yet, or Some dialects are
very opinionated.
So you kind of have to think in
terms of why did they make those
decisions and how can I map standard
SQL onto this database?
But I think that's 1 of the things
that people really like when
they work, especially when they
work with different database
products.
So if you have a company that works
with maybe Postgres and SQL
server, you don't actually have
to think about the differences
all the time.
So you can just write substring
in the jOOQ API and you don't
care if it's subster or inster
or whatever the database product
calls it.
It's just top string and jOOQ will
translate it for you.
Nikolay: But there are differences
as well, right?
And JSON is a big topic, right?
Lukas: Like huge.
JSON?
Nikolay: Yeah.
Yeah.
Those functions and yeah.
Lukas: It's crazy.
I mean, Postgres was 1 of the first
to actually support JSON,
but it didn't actually
Nikolay: before standardizing
Lukas: stuff.
So someone has standardized it,
Oracle did a couple of years
ago, and now Postgres started implementing
those standards from
what I've seen.
I've still not engaged with that
part yet, but you think now
that there's a standard, things
have stabilized, but still every
Database product does it differently.
It's crazy.
The devil is in the details.
Nikolay: Yeah.
And have you, like, do you need
sometimes to check the standard
or like you don't need it at all?
Lukas: I do it
Nikolay: all the time.
Lukas: Yeah.
Okay.
I'm trying to, the Juke API, if
it's not about a vendor-specific
feature, the Juke API really tries
to follow the SQL standard
in terms of syntax.
So a vendor-specific feature would
be Postgres on conflict clause,
which I think was a mistake.
They should have done merge from
the beginning and not invent
something new, but now we have
both.
And in that case, Juke also models
the on-conflict part like
Postgres did, and then SQLite copied
it and a couple of others,
I think.
But for instance, merge is a standard
SQL feature and many database
products have some extensions which
I then study, but I always
first look at the standard.
What does the standard do?
Because that's a thing that's going
to still be there in 50 years.
And different implementations might
have their quirks, which
sometimes I hope they deprecate
eventually and move towards the
standard.
Because usually I think the standard
is quite nicely...
Nikolay: Well, they behave differently,
so many people got used
to on conflict.
Lukas: Yeah, I get the point of
doing on conflict.
I mean, it's simpler for 80% of
the use cases, obviously.
Nikolay: Yeah, also merge only recently
received support of returning
clause, which is not standard at
all, right?
As I remember.
Lukas: The SQL standard has the
table, the data changed out of
the table, which was implemented
by DB2.
And then some minor database products
like H2 implemented as
well.
And it's not as powerful as returning,
especially because it
only allows you to fetch either
the data before or after the
updates for update statements,
which is also part of merge.
You maybe want to return both versions
to the version before
and after the update.
With returning in principle, that's,
that's a feasible.
So Oracle implemented this recently.
I'm not in follow.
Did Postgres allow for accessing
both versions of the row?
Nikolay: I don't remember as well.
I'm only like returning, well, if you DELETE, definitely you
have ability to return old thing you deleted.
Lukas: Well, there's only the old thing.
Old thing is deleted.
Nikolay: Let me quickly check.
Lukas: Why not?
But with updates, I mean, SQL Server always supported both versions.
So before and after.
And DB2 with the standard syntax supports only, you have to choose
which one, but you can't return both.
And Oracle 23 AI now supports both as well, like SQL Server.
Oracle uses the Postgres syntax, the non-standard syntax also
with the returning keyword, so maybe you can start from there
if you don't do it yet in Postgres.
Nikolay: I think in Postgres for updates we cannot return old
data.
We have access to it in triggers but it's different, right?
So only the new value can be seen there, but for deletes, it's
Lukas: probably more useful.
Nikolay: Well, I can imagine some cases when we want to report,
for example, what happened.
Lukas: Yeah.
When you do auditing, you want to have both.
Nikolay: Yeah.
Maybe let's add it.
Because
Lukas: standard supports this data change delta table, and it
accepts a merge statement as well.
So in principle, in the standard, you could have returning with
a different.
Nikolay: But it feels like more like a heavier approach in terms
of manipulation.
Returning is just like one line, right?
Return a star, that's it.
Lukas: Yes, yes.
Super easy.
It seems easier, yeah.
Also, there are tons of limitations.
I mean, you put your data change delta table inside of a SELECT
query, but then you can hardly do anything with that SELECT query
for instance, I'm not sure what exactly what is forbidden, but
unions I think are forbidden and joins are forbidden, so maybe
even aggregations.
I'm not sure anymore, but you're very limited with what you can
do.
So I'm not sure if there's any benefit with allowing that in
the SELECT statement as opposed to just returning.
But ultimately it's kind of the same thing as what Postgres does
when you put the ML statements in, in with.
So that that kind of behaves the same way, at least from a user
perspective.
Nikolay: Speaking of with, I know Juke supports CTEs.
Does it support recursive CTEs?
Yeah, yeah.
Yeah.
So everything, lateral join and so on, everything.
Yeah.
Cool.
Lukas: There's a high chance if SQL supports something and it's
not really very funky like the match_recognize clause, then Juke
will support it as well.
So match_recognize could be supported as well in Juke, but it's
such an edge case and still only supported in Oracle and some
very esoteric database products that Juke doesn't support yet.
So, so I'm skipping this for now, but with is, is everyone uses
with, so Juke supported as well, including recursive with, I
mean, there's some table valued functions that are like generate
series in Postgres, which have to be emulated elsewhere.
So if you want to ever, probably hardly anyone does that migrate
from Postgres to whatever, and you want to translate your old
generate series queries, then Juke will translate that to a recursive
with.
Michael: Cool.
Nice.
Nice.
In fact, I feel like you skipped over that a little bit, but
that's such a cool feature that basically...
Translation?
Yeah, exactly.
We can get access to standard features that Postgres hasn't implemented,
or that whatever database we're using hasn't implemented yet
because you do the transformation, I guess.
How do you, translation did you say?
Lukas: Yeah, I call it translation.
So there's even, you can use Juke as a purely translate product.
So some customers actually do that.
Mostly when they migrate from Oracle to Postgres, you can just
embed Juke as a JDBC driver into, into your Java application.
And you don't actually have to use Juke directly.
So let's say this is a legacy application and uses JDBC or something
JDBC based like Hibernate or MyBatis, and you just put Juke
in the middle as a translating layer and probably 80% of your
queries will work directly on Postgres.
Of course, there's always edge cases and you have to do testing,
but at least your migration project will go much, much faster
because you can directly use Oracle queries on those other database
products.
Like for instance, Oracle has connect by and that can be translated
to with queries with recursive.
That's very hard.
So some cases don't work, but maybe many cases do.
And you don't have to actually look at those anymore.
Or Oracle special outer join syntax can be translated to left
joins in, in, in standard SQL.
So your migration just goes much faster.
You can also use it not as an embedded library, but just as the
web, the website is free or you use it locally as a CLI tool,
which is to have files, you have input files and Juke translates
those directly.
And you can manually check if it's correct.
So translation is a, is a, I'd say minor use case.
So a lot of companies work with only 1 database product and maybe
10 years, they, they might reconsider their choices and migrate.
But even then most, most companies stick with the database product
they started with.
And so many Juke users don't actually use this feature, but some
do.
Nikolay: May I ask slightly off topic question about JDBC since
you spent a lot of time with Java, obviously.
There is such thing as extra float digits.
Right?
Lukas: Extra what?
Nikolay: Extra float digits.
Okay.
There is such setting, and by default it's 0 in Postgres.
Or 1.
Yeah, no, 1.
It's 1.
Lukas: I've never heard of it.
Nikolay: Okay, but yeah, I remember some bugs in my code, and
I usually try to use pure SQL and psql, and you know, if I can.
I Remember that Java developers implemented the very same thing,
same queries, behavior was very different.
And then I noticed that the DBeaver, who else, other Java-based
IDEs also behaved the same way Java application did.
It was not the same as I saw in psql.
It happened with these extra float digits.
If you didn't notice this, this is maybe some very edge corner
case.
But I wonder like, why did the JDBC keep different default is
like no layers of some defaults.
This can be very annoying.
But okay.
Lukas: Well, the main thing that you always see The main difference
when you work with JDBC is, as opposed to psql is, in psql you
usually don't work with bind variables, so you don't have the
whole kind of effect that you get when you have bind variables,
although in Postgres This might not be that big of a difference,
but when you work with JDBC, you always have prepared statements
and bind variables.
And let's say my background is
mostly Oracle.
So with bind variables, you have
an execution plan cache and
the cache will store some assumptions
about your bind variables,
like a normalized distribution
and expectation regarding cardinalities,
et cetera.
So your actual bind variable might
be, if your dataset is skewed
and your actual bind variable may
be completely different from
what Oracle expects.
And, and then the execution plan
will be bad.
But from what I take, Postgres
doesn't have this problem because
it doesn't have this feature.
So this would be 1 of the main
differences.
But you were referring to some
logical error, not...
Nikolay: Yeah, Default and values
returned are different from
what I see in PC.
Yeah, but
Lukas: I've never heard of this.
Nikolay: Well, okay, maybe it's
already fixed by the way.
It was like 5 years ago or so I
saw it.
So, yeah.
Well, on the topic of JDBC,
Michael: I mean, do you have any
issues with it?
It seems like it's a remarkable
piece of software that's just
chugging along, but equally it
doesn't get much love.
It doesn't get much love.
Like I only, I generally only hear
negative things about it,
but it must be amazing given how
much it's used.
Lukas: I think it's the best API
that Java has, even better than
collections and everything else.
It's really, I mean, okay, maybe
they stole it from Microsoft
from ODBC, it's kind of the same
thing, but It's really the best
thing that Java has because it's
such a good standard.
I mean, everything can build on
top of it and you don't even
have to write SQL queries.
So there's, for instance, Neo4j,
which is a NoSQL database with
their own query language and you
can use JDBC with that database
as well as long as you wanna have
some tabular results And it's
a very, very good abstraction of
the network protocol layer.
And the reason why people hate
it, it's not targeted at developers.
So you can use it directly if you
want, But it's a bit hard to
use.
You have a lot of objects.
You have the connection object
and you have to create a statement
object and you have to remember
to close it.
And you have to remember to create
a result set object and close
that as well.
And it's not integrated with the
rest of the JDK libraries.
Like for instance, when you iterate
the result set, you have
your own API methods instead of
just a list or iterable.
So you can't use the for each loop
on a result set, for instance.
So the ergonomics of JDBC is not,
not up to date.
I once tried to convince the spec
expert group, it's a mailing
list to, to update their API.
There's a couple of things that
I think in, in at the time it
was Java 8, but even now you could
improve a couple of things
without changing anything fundamental
in JDBC.
For instance, you can skip the
statement part.
A lot of times you don't actually
have to think about preparing
the statement explicitly.
You have to do it when you want
to reuse the prepared statement.
So you can save some time in case
there's a resource or some
cache connection directly.
But a lot of times that's not the
case in some database products
and a lot of times it doesn't even
matter at all.
So you could just have a connection
and run the query.
So that's a minor improvement that
would just reduce 1 step for
each statement you write.
Yeah, but Oracle couldn't allocate
any resources to such a project,
even if the people on the list
agreed that many ideas were reasonable.
So I guess that's it.
It's an amazing piece of software
for integration products like,
like Juke or Hibernate or MyBatis
and everything else to build
on top of and all the drivers they
just work.
So a database company can just
publish a driver and it will work
in all kind of software.
For instance, if you, if you work
with DBeaver, it just works with
all the database products, right?
So you don't have to think about
how do you, how do I connect
to this new database thing, like
the cows or, or DuckDB or whatever,
it just works.
All these database products, they
just work because they all
use the same API in Java.
So it's really a wonderful piece
of technology, I think.
So I totally agree.
Nikolay: Yeah, this is exactly
where I saw this problem, comparing
to, I mean in DBeaver, comparing
to Java.
And I realized, okay, it's also
Java application, so yeah.
Good.
Lukas: Yeah, but that was probably
just the driver bug.
I don't think it has to do with
the database as an API.
I mean, obviously the drivers can
have bugs.
Nikolay: Right, right.
So You see bugs often, right?
I mean, in this...
Lukas: Yeah, not necessarily in
the drivers.
I mean, the drivers, they really
solve a very basic problem.
So it's mostly about network connections
and these kind of things
that no 1 actually cares about,
unless you have to optimize something,
like multiplexing or whatever.
But I think this stuff kind of
just works.
And when I find bugs, it's more
SQL related.
So it's inside of the database
and I do find a ton of them very
rarely in Postgres.
I have to say, I think in my career
I found 2 or 3 bugs in Postgres
only, as opposed to hundreds.
Michael: We didn't pay you to say
that.
Lukas: No.
It's really very good.
It's surprisingly good.
I mean, also, absolutely no regressions
at all.
So if there's a bug, it's an edge
case that no one thought about,
then it's really a very weird kind
of thing.
Nikolay: Yeah.
By the way, how do you check regression?
Just conversion from...
Lukas: In the database products?
Nikolay: Yeah.
For Juke, for example, some code
is written and then you just
check the conversion to SQL or
you check results as well.
Lukas: No, you can't just search
the SQL statement.
I mean, Juke's integration tasks
have a standard database, which
has about 5 tables.
It's very simple.
So there's a bookstore with books
and authors, and it's not a
big database.
So it has 4 books and 2 authors,
4 languages, this kind of thing.
But then I just know there are
4 books and they have these titles
and every kind of query has to
return exactly the same result.
So there are so many queries being
run in each database product,
maybe about 10, 000 queries, I
think, and also update statements
and all kinds of statements that
make the assumption that if
you have this kind of input database,
then this must be the output.
Or also, if I create a table with
Juke, then it has to have these
properties and throughout the API,
it has to be the same thing
for every database product.
Nikolay: Right.
I just wonder, earlier you mentioned
that usually database systems
maintain reverse compatibility,
but sometimes it's broken.
For example, in Postgres 12, CTEs,
before they were materialized,
like optimization fans.
So every step is materialized before
12, but in 12 default behavior
changed.
Right.
So, so is it something that Juke
should care about or no?
Lukas: I don't think that's a logical
regression.
I mean, it was a conscious
decision and it's only affects
the performance.
There's no logical difference of
the result.
So I mean, databases are allowed
to do these kinds of things.
I re I recall there was once a logical change in the UPDATE statement
in Postgres, which was a incompatible change when you UPDATE
a row.
So you had used the row syntax, you use parentheses And you UPDATE
2 Columns at once and suddenly both required the row keyword,
but only there.
This was a very weird.
Nikolay: 1 more recently, 1 more change related to subqueries.
I don't remember exactly.
It was a demonstration included in the generate series, I remember.
And yeah, maybe it was like up to 5 years ago, but there was
some change which was logical as well.
But I understand, you're like SQL standard, don't care about
Indexes, care only about results.
Yeah.
Right, I see, I see.
Lukas: No, I mean, a juke doesn't make any guarantees at all
with respect to performance either.
So, so from a Juke perspective, Juke doesn't try to fix your
SQL.
If you write a really crap query with, I mean, crap, what does
it even mean?
So if you write a bad query with a lot of nested subqueries,
et cetera, et cetera, in principle, there's nothing wrong with
it.
Right?
So there's, you will find hundreds of blog posts that, especially
from, from old times when optimizers were really bad still, where
people advised against using derived tables, then they advised
against using correlated subqueries, and then, but there's nothing
logically wrong with these things, right?
So you can nest as many levels as you want, as long as it's correct,
it's correct, and the optimizer should figure it out.
And Who is Juke to judge you for your query style?
So Juke doesn't fix these things or even assert anything, and
it would be completely wrong for Juke to do these kind of things,
because you want to write exactly this SQL, and Juke should render
exactly this SQL.
You have your reasons, right?
So...
Nikolay: I understand.
And my question is, I'm curious, how do people usually approach
performance?
For example, if some query is written using Juke with a chain
of SELECTs from blah blah, And then it turns out it's slow.
And then we find the fix, and we need to adjust something in
the query if it was a rock SQL.
Is it usually simple to adjust a Juke version of.
Lukas: Yeah.
I mean, there's no difference.
I mean, you're probably going to
use the, I'm not, I'm not sure
anymore what the Postgres version
is called, but you have your
statistics views in your database,
your performance schema, and
you query those to see what's,
what went wrong and then you make
assumptions, you rewrite the query
and you test again.
Maybe even using a benchmark, of
course then the benchmark is
written in Java, but I don't see
a difference when you write
jOOQ query or native SQL query,
It's the same process.
Michael: I was watching a really
good video by Kevin Davin, who
did a talk on jOOQ, but 1 of the
things that he really liked
about the product was it logs out
to the console, to the actual
SQL statement as well that it ran.
Lukas: When you do a debug level.
Michael: Yeah, which is super helpful
for performance.
So if that was slow, you can then
run it in your favorite editor
in PSQL wherever with EXPLAIN,
ANALYZE, et cetera, et cetera.
BUFFERS.
Yeah, always BUFFERS, hot topic
this week.
But yeah, so that seems like a
really developer friendly way
of getting back to the SQL and
then diagnosing the performance
issue.
Lukas: This was a feature from
day 1.
My assumption was that when you
develop, you have debug mode
logging.
And when you go to production,
you switch that to info or even
warning.
So you're not going to have the
performance penalty of re-rendering
the query.
It's even formatted, so you can
actually see the query.
It's not just 1 line, which is
when you execute the query, it's
just 1 line of SQL string, so to
have it more compact.
But when you debug log it, the
version contains the bind variables.
So you can just copy paste the
whole query with the bind variables
in a formatted way and study it.
I thought that was what everyone
wanted.
And Also you get the first 5 records
of the result set also in
the debug log, because that's probably
also what you want while
you develop at least.
Michael: Yeah.
In a pretty little table.
Nikolay: It's nice.
Lukas: So you can immediately see
what's going on.
And even then, I mean, 1 of the
things, 1 of the reasons why
I implemented it this way is when
you have a huge result set
with a lot of columns, that bothers
you while developing.
So you kind of start thinking,
do I really actually need all
these columns?
Because if you don't, then you
have a better debug log.
So I kind of think you have to
punish developers as early as
possible for their performance
problems.
I mean, jOOQ doesn't judge you.
It just gives you some tools.
Nikolay: Just giving you some convenience
looking at logs.
Okay.
Yeah, that's interesting.
So, reduce number of columns.
Good.
Michael: Changing tact a little
bit, I think you've got a really
interesting perspective on Postgres
from a broader landscape
of SQL dialects and databases in
general.
What kind of things do you see
that are relatively unique to
Postgres?
Or do you see people in the Postgres
world particularly using
or liking?
Lukas: I mean, the 1 thing that
is very unique to Postgres, there
are 2 things from my perspective.
The first 1 is it's very developer-centric.
So you can see that with other
database products like Oracle,
DB2 or SQL Server, they're very
production centric, very operation
centric.
They used to base their entire
sales on these kinds of things.
So this means you have a lot of
production tools as well, which
are great actually.
So I'm still waiting for the like
of Oracle Enterprise Manager,
if you know it, for Postgres where
you can analyze production
workloads and, and, and query what,
what kind of workload you
had 5 months ago, and it's still
there.
It is still happy.
I mean, you have to pay for these
kinds of extras, but it's still
there.
You can still analyze anomalies
in terms of performance, what
happened 5 months ago.
So these kinds of tools are a bit
lacking, but on the other hand,
because it's so developer focused,
you have this whole extension
system, which is unprecedented.
And I haven't seen it since either.
So anyone can extend anything in
any layer of the, of the database
product.
So you have these wonderful extensions
like PostGIS for instance,
which is, It's not really an extension
because it's also part
of the SQL standard, but it has
been implemented as an extension
just, I guess, to show what is
possible and such a huge additional
product can be made as an extension.
And you have indexes and data types
and all kinds of things that
other vendors have to either offer
out of the box or they don't
have it.
You can't extend Oracle really.
So that's really something very,
very amazing.
But as I said, from the other perspective,
from an operations
perspective, this is something
that personally, I think Postgres
is still very much behind Oracle, for instance.
So I'm an Oracle guy.
I can't really comment on SQL Server or others, but the Oracle
database where I worked at before I made jOOQ, it could easily
handle very, very complex queries that produced 500 lines of
execution plan on billions of rows, and it would run in milliseconds.
And I don't think you can do that with Postgres.
I wouldn't risk it.
So we all remember the days when there was a hard limit of number
of joins in Postgres, from which, starting from where, you didn't
have any smart optimizer anymore to reorder the joins in the
right order.
I forgot what the limit was, but I think you had to be very careful
if you have more than 10 joins or something like that, because
then it would just join from left to right syntactically, and
this is horrible.
Nikolay: Join collapse limit should be 8.
Lukas: Yeah, that's the join collapse limit.
So, once you know these kind of things, obviously, make the right
choices, and you probably won't run into big issues, but it's
just, it's a very unique focus that I often missed in the past
from Oracle, for instance, or completely ignored developer experience
for decades.
It was like, you don't even have a Boolean type, right?
So this is hard for me to understand.
It's such a simple thing.
You think it's such a simple thing.
So very, very useful, but Oracle wouldn't have implemented it
until recently.
So this is a focus that Postgres has and few of the others had,
even MySQL wasn't very developer friendly from that perspective.
So this is very unique.
If developers were the only ones to choose the database product,
it's always Postgres, because I mean, you're just gonna implement
everything with Postgres.
You can write stored procedures, you can have your own indexes,
you can extend it yourself.
Probably you don't have to because someone already wrote an extension
for whatever you need.
Oh, the second thing is, it's amazing how standards compliant
it is.
So a lot of times, Postgres really waits until something is standardized
before they implement it instead of innovating.
Nikolay: Right now we are waiting on UID version 7 because of
standards.
Like it didn't get into version 17 of Postgres because RFC was
not finalized.
Every library already started to support it, but Postgres decided
to wait on RFC, which is like very conservative decision, right?
Lukas: Most of the times.
Yes.
So I think that's a good thing
in the end, because the standard,
in my opinion, in most parts is
well-written.
It's a very complicated document,
of course, but I think in terms
of syntax, it's kind of everything
feels like SQL.
So some database products think
they have to invent some syntax
And it just doesn't look like SQL
anymore.
It's just very weird, funky thing
that doesn't fit the rest of
the language.
And the standard is very, very
consistent, I think, in most parts.
And to wait for the standard to
appear is a good thing.
So sometimes there's an exception
to the rule, as I said, on
conflict was 1 of these exceptions.
JSON as well, in case of which
I think it was something everyone
needed and the standard was severely
lacking.
And Postgres just did their own
thing.
So I kind of think a lot of people
kind of regret the heterogeneity
of the JSON API.
So you have to always consult the
docs to see if something uses
a JSON path, which is a standard
thing in the JSON world, or
if it's just an array of path elements,
or if this function returns
a table or a set of something,
or whatever data structure.
So I think now with the JSON standard,
this is going to be much
cleaner once it's completely rolled
out.
So I'm not up to date where the
Postgres stands.
I'm still not supporting that syntax,
but from what I've seen,
it's going into the right direction.
Nikolay: Yeah, some parts support
it, some parts I think are
still bending.
Yeah.
Yeah, because I
Michael: think A lot went in in
17.
Nikolay: Right.
Lukas: Yeah.
And speaking of this, I'm going
to be really curious.
I saw a commit recently regarding
temporal primary keys.
Is that correct?
And it is absolutely not the standard.
So this is very curious.
I'm very curious about this.
Are you aware of this commit?
Michael: I'm not.
Lukas: It's still on a branch.
I think it's not, it's not going
to be merged, but I mean, Postgres
has these range types.
Right.
Nikolay: Right.
Lukas: And they're useful obviously,
but the SQL standard since
2011 defined how you should write
temporal databases, both by
temporal.
So you have system versioning and then logical versioning.
And this is something that is still lacking in Postgres.
And I think a lot of people would like to see that.
Nikolay: Yeah.
In Postgres, there is an opinion that it's in the past, because
it was
Lukas: in the past,
Nikolay: it was removed.
Michael: So that's a nice temporal joke.
Lukas: An extension I think?
Michael: It was in Postgres, the original Berkeley version.
Nikolay: This is how MVCC inherited from these versions
of tuples.
So your time travel basically was supported very, very
long ago, but then it was removed.
So like interesting idea.
Michael: I think for
performance reasons.
Nikolay: Yeah, So I think so.
But yeah, this is a super interesting, like specific topic.
Sometimes we do need to implement something with triggers and
so on, like, which is very
Lukas: Yeah.
Yeah.
It's very hard to do manually.
I think the standard helps a lot with, with, I mean, you can
write an update statement and it's automatically transformed
into a delete and insert.
So if you have to split the record logically, this is really
a pain to do manually, right?
So it's
Nikolay: not super pain because you can have, for example, Postgres,
you know, like I remember your very old post, everything is a
table.
Or maybe it was a post of some 10 tips or something, and 1 of
the tips was, consider everything as a table.
In Postgres, you can collapse all columns into a single record
type and insert.
For example, in JSON form right now, I would insert it and I
would have a shadow table.
I implemented this several times.
You have a shadow table which serves as like a guy for all tables,
which has, has trigger to track changes.
Right.
And then you, you just track timestamp action, like it was delete
or update, who did it, and then whole row you pack it in a single
column, breaking first normal form.
Lukas: But then you normalize in a very bad way, and secondly,
this sounds like a lot of work.
Nikolay: It's not a lot of work, it's like 3, 5 lines of code,
that's it.
But it's not possible in jOOQ 3
version because triggers are
not supported, right?
Triggers are supported only in
Enterprise version.
Lukas: I mean, if you want to write
the trigger with jOOQ, yes,
then you have to use the commercial
additions, but hardly anyone
writes triggers with jOOQ.
I mean, this is only useful when
you have to have, when you support
5 database products and you want
to have, you want to write the
trigger only once you write it
with jOOQ.
But I mean, jOOQ doesn't care if
there's a trigger on the database
table, right?
So if there's 1, so it's transparent
to jOOQ.
Yeah, okay.
You can't really say jOOQ doesn't
support them.
It's just, you can't create them
with jOOQ.
Nikolay: Great answer.
Free version, you mean like an
Lukas: enterprise version?
Free version, yeah.
Nikolay: Yeah.
I'm very curious.
I'm not against this.
It's good.
I mean, it's great that you build
both like open source product
on Apache tool license and business.
I was just curious, like, have
you ever had thoughts about going
full closed source or fully open
source?
Well,
Lukas: it was fully open source
in the beginning.
So when I started this, I had no
idea this could be a business.
I mean, I just had to try it and
then put it out there for people
to see and give me feedback.
Is this a good idea?
Should I do it like that?
And there was a very active early
community of people who worked
with the all open source jOOQ.
And then I started seeing even
more and more banks and insurance
companies using it, but never contributing
as you know, the story.
And I thought it was my fault.
Nikolay: I get it.
Pong Jones, let's, let's generalize
Oracle users.
Lukas: Well, that was the choice
I made, but, but it doesn't
matter.
You know, It doesn't matter what
they're using in the end.
I just never got any...
I mean, I got contributions, but
I didn't care about code.
I cared about money much more than
code.
So, to make them a sustainable
business, in the end, I thought...
There are models where you pay
for support, but in my opinion,
if you want to make money with
support, then you have to earn
your money by working for it.
So I earn money when I work, and
if I have a licensed model,
then I earn money when customers
work, right?
So I thought this was going to
scale much better.
Michael: I like that a lot.
I also think if you charge for
support, you're incentivized to
hurt the experience.
Lukas: And I think, but you're
going to say that, but that's
how I think.
Michael: But I think, I think it's
not surprising that as somebody
that focuses so much on developer
experience and wants to keep
pushing that forward and making
that good, it's unsurprising
that you don't want to be incentivized
to make it bad like that.
It makes perfect sense to me.
Lukas: Yeah.
I mean, that's a, that's my thought
as well.
I, I don't want to name companies
or names, but I've heard some
open source products are very,
very unusable and you really need
the support and not all of them.
Of course.
I mean, Postgres and Linux are
examples, which aren't, which
this isn't the case, but there
are some products that are super
complicated and they're just pushed
out there and you're free
to use it, but you won't figure
it out.
So you have to call someone even
to install it and that's what
I'm thinking.
So in those cases, you need a sales
representative that pushes
the product down someone's throat
and they make a lot of money
with the support contract.
And I think it's less honest than
licensing, which people think
about, okay, is this a good product?
Should I really spend money on
it because I have to pay it up
front right from the day 1 I have
to pay money to use this product
and they're gonna test it really
really really well and there's
a lot of competition so I mean
people could use hibernate or
whatever but people choose juke
despite the cost which in the
end is very cheap for a company.
So I think cost is never the issue,
but, they're still going
to evaluate it because they have
to still go through purchasing
and make it a business case for,
for, for the purchase.
So I really want to have a very,
very good experience.
And 1 part of this is I also always
answered all the questions
on Stack Overflow within a short
period of time, because if I
answer it, it's gonna be authoritative
and I'm gonna answer with
all the details and all the caveats
and all the edge cases.
So anyone else who finds the question,
we'll find this question
and the answer.
And they're not going to ask me
again.
Right.
So Google is my friend and now
Chat GPT is my friend as well,
because Chat GPT can answer jOOQ
questions very well because
of all the answers I've given on
Stack Overflow.
Nikolay: Yeah, but open knowledge,
right?
Lukas: Yeah.
I don't want to earn money with
knowledge.
I mean, I could sell books and
all these kind of things, but
it's just the wrong kind of effort,
I think.
Nikolay: Destruction, right?
Yeah.
From the core.
I wonder for enterprise, like paid
versions, is source available
or it's like kind of
Lukas: yeah you get the source
code
Nikolay: well that's great
Lukas: the right to modify so you
can if you have a bargain and
Nikolay: you can fix it yourself
and propose a fix.
Lukas: Or maybe it's not even a
bug, but you think it's a bug,
but you disagree, you can fix it.
And some companies do that.
So this
Nikolay: is what we like in commercial
software, like full transparency
and see how it works
Lukas: inside.
I mean, most people don't actually
fix stuff, but maybe they
want to debug stuff to better understand
it.
And if you've ever worked with
a commercial JDBC driver, like
I do, with the Oracle driver, it's
always like, Oh my God, there's
a null pointer exception in the
middle of the driver and I have
no idea what I'm doing wrong from
an API perspective.
So I have to reverse engineer it.
Not, not with the code, but just
try an error with API calls
until I find out why the exception,
why the bug inside of the
driver happens.
And with source code, it would
be much easier, right?
So bugs happen, but if you know
exactly why the bug happens,
then you have a much easier life
than if you have to guess.
Nikolay: Yeah.
Last question from me.
What do you think about nulls in
general?
Lukas: Nulls?
Nulls, yeah.
Or they're a blessing in a curse.
I mean, I understand why they have
been introduced, because it
was easy to do, right?
And I mean, There are 2 takes here.
So nulls in most languages are
mostly a pain because they lead
to exceptions.
And null in SQL is mostly a pain
because it doesn't lead to exceptions.
It's just a value like many others,
but it's a special value
and it just propagates.
It's these kind of philosophies
and both are right and wrong.
But what else would you want to
do, right?
I mean, you kind of need an absent
value, right?
I'm not sure if the idea of the
unknown value is so useful, but
an absent value is very useful.
I mean, sometimes you just, you
can't normalize everything to
6 normal form to avoid NULLs, right?
So it's very pragmatic.
Nikolay: It's very pragmatic indeed.
Yeah.
But in SQL, we have so many places
where people make mistakes
all the time.
Lukas: Yeah, SQL is weird about
it.
I mean, this unknown value, I used
to do a SQL training and it
was about 30 minutes about it was
just NULLs and how it's different
here from there.
For instance, 2 NULLs are not distinct,
right?
Nikolay: Yes.
Lukas: But they're not the same
either.
So there are different terms in
the SQL standard, same and distinctness.
And then even operators are inconsistent.
So personally, I think Oracle did
the pragmatic thing when they
concatenate a NULL value to a string,
you probably don't want
the result to be NULL, I think.
You would just want to ignore the
concatenation.
This is pragmatic, but it's very
unlogical and leads to a ton
of side effects, which probably
means that it was a bad idea
to start with.
Nikolay: And there are NULLs in
other languages, I mean, in regular
application languages.
And I remember how I struggled
trying to explain this 3 value
logic and like, observing what
people do in code, in regular
application code with NULLs.
They do things I would never do
like in SQL, right?
So different philosophies, right?
Lukas: Yeah, there are different
philosophies.
But I'm not sure if the SQL philosophy
is really useful.
I mean, I'm trying to explain how
it works by explaining NOT IN
in the NOT IN predicate.
When you have a NULL in NOT IN,
everything falls apart.
I mean, it's very logical and consistent.
You can explain it with 3 value
logic, but it's never useful,
right?
This kind of invalidates the whole predicate.
Nikolay: Yeah, it's a bomb.
Lukas: It's a bomb?
And I mean, usually people don't
put a NULL value in a NOT IN
list, but what about a subquery?
So NOT IN SELECT something, and
then you have a NULL value in
there, and you don't get results.
And it's not obvious from just
reading the code or the query.
there, and you don't get results.
Why?
It's horrible.
Horrible.
We
Nikolay: had the whole episode about no, so if Michael long ago,
yeah, yeah, let's stop because definitely it's not, it's not
going to go away.
And I, I'm quite sure I will make more mistakes with nulls in
my life.
Lukas: But maybe the 1 thing I really think SQL should have done
differently, and there are some database products that actually
do it differently, is the default data type should be not null,
right?
So when you create a table, When you don't write anything at
all, then it should be not null.
The fact that the default is nullable is wrong.
Nikolay: This forces you to define default.
Or, well, no.
Michael: The default word is confusing.
You mean, if I create id int8, I shouldn't have to say not
null.
Lukas: Yes, you should have to say I want this to be nullable,
because then it's a conscious decision and they know what you're
doing.
But the fact that maybe by accident you have something not null,
something nullable.
Yeah.
It's just wrong.
Nikolay: And this isn't standard, right?
Or if you remember?
Lukas: I don't remember.
I think maybe the standard says it's implementation specific.
Nikolay: If it's a standard, We definitely
Michael: can't change it now though.
Lukas: No, you can't.
But I remember Sybase is 1 of the database parts that does it
differently and ClickHouse as well.
Interesting.
Another 1 I forgot.
Nikolay: So in ClickHouse every column is by default not null?
Yes.
That's cool.
Okay.
Lukas: Although they messed it up as well.
So you can still insert null in a not null column.
You just get a 0.
What?
It makes it worse.
Nikolay: Okay.
Okay.
Lukas: That's terrible.
Amazing.
I think this is a performance idea.
You actually, if you just ignore
the check, you kind of don't
check for NULLs, then you're faster,
but whatever.
Michael: Lukas, I'm super conscious
of your time.
Is there any last things you wanted
to say?
Lukas: I hope there are many many
more exciting Postgres releases
in the near future.
I'm looking forward to all the
good stuff that's coming.
Especially if it's ever coming,
temporal tables from SQL 2011,
that will be very exciting.
Michael: Nice, okay cool.
Nikolay: Thank you for coming, I enjoyed.
Lukas: Thank you for having me.
Nikolay: Thank you.
Michael: Yeah, very much so, and catch you
next week Nikolay.
Nikolay: Bye bye, have a great week.