Talking Postgres is a podcast for developers who love Postgres. Formerly called Path To Citus Con, guests join Claire Giordano each month to discuss the human side of PostgreSQL, databases, and open source. With amazing guests such as Boriss Mejías, Melanie Plageman, Simon Willison, Floor Drees, and Andres Freund, Talking Postgres is guaranteed to get you thinking. Recorded live on Discord by the Postgres team at Microsoft, you can subscribe to our calendar to join us live on the parallel text chat (which is quite fun!): https://aka.ms/TalkingPostgres-cal
Why people care about PostGIS and Postgres
===
CLAIRE:
Welcome to Path to Citus Con, the podcast for developers who love Postgres, where we're going to talk about the human side of open source and databases and Postgres and the many PG extensions. This podcast is now available on all of your podcast platforms, so you can subscribe, and rate, and review, and you can get to past episodes and links to all the platforms at aka.ms/PathToCitusCon, all one word. The transcripts are included on the episode pages on Transistor too, you just click on the Transcripts tab. Thank you to the team at Microsoft for sponsoring this, all of these community conversations about Postgres. I'm Claire Giordano.
PINO: And I'm Pino de Candia.
Today's topic is "Why people care about PostGIS and Postgres." We have two amazing guests from the Postgres community. Our first guest is Paul Ramsey, joining us from British Columbia, Canada today. Paul is chair of the PostGIS Steering Committee and co-founder of the PostGIS Project. PostGIS, of course, is probably the most famous Postgres extension.
Welcome, Paul.
PAUL: Thanks for having me.
CLAIRE: And our second guest today is Regina Obe, joining us from the east coast of the United States, president of Paragon Corporation, a consultancy that does Postgres and PostGIS consulting. Regina is the author of several books, including "PostGIS in Action" and "pgRouting, A Practical Guide."
Welcome Regina.
And hopefully you're not on mute but if you are, hopefully you'll come off mute soon. We're really glad that you're here with us today.
We'll figure out what's going on with your audio in just a second. So PostGIS is not just the most famous Postgres extension, but arguably the most popular. For those of you who aren't familiar...
REGINA: Hey, I'm sorry about that.
CLAIRE: Yeah, you're here. Okay, good. Awesome. Yeah, let's get going. All right.
So those of you who aren't familiar, PostGIS, we keep pronouncing it different ways. We'll talk about that in a minute. But it's a geospatial extension to Postgres. It's open source. It's been around for 21 years, first released in 2001. And it adds support for storing and indexing and querying geospatial data in Postgres.
And it has integrations with all the third party tools you might expect. The usual suspects like QGIS, GeoServer, MapServer, ArcGIS, Tableau, etc. So, my first question is, am I pronouncing it right? Like, Paul, Regina, how do you pronounce PostGIS?
REGINA: I pronounce it "Post-jis."
PAUL: I pronounce it either way. There's no, there's no correct way. It's... I guess there's a more canonical pronunciation for PostgreSQL that people have more or less come to agree upon, although some people drop the QL. For PostGIS, I hear people use both and I'm happy for people to use both, it's certainly easier to say "post-jis" but if you say "post-jis", that means you're pronouncing G-I-S as "jis", which nobody in the real world would ever do.
No one would say that, say that "I'm a 'jis' professional." They would say I'm a "G-I-S" professional. And GIS is an industry term of art which not everybody knows the meaning of. It stands for "Geographic Information Systems". And it's been around as a term since I guess the... the mid eighties when it became possible to start to take mapping data and process it at a reasonable scale inside computers.
So PostGIS is actually kind of a double play on words, we throw the "post" in front so it sounds a little bit like Postgres and we put the GIS in the back and that sort of gives a clue to the fact that our way of doing GIS operations inside the database is kind of a new and quote unquote "more modern" way of going about it than the old desktop and workstation GIS programs that came before spatial databases.
CLAIRE: Well, so that's, I think you're doing exactly what we hoped for, which is laying the introductory groundwork about what these things are. So another question we have for you is, I mean, obviously many of us have heard that PostGIS is a geospatial extension to Postgres, but what does it mean? To be this geospatial extension to Postgres in terms that not just Postgres experts will understand, but maybe people who are new to this will understand.
REGINA: Yeah, I mean, the, the best thing I've heard about it is it's computer vision. So, for example, "PostGIS in Action" is listed as a computer vision book. And the reason for that is because normally when you want to go measure something, like you want to measure your backyard, you go out and you take your stick and you measure it, but if you model your whole world in a database, then you can just ask the database: "what's the length of my fence?"
PAUL: Yeah, it brings the real world into the database. I mean, that's the practical utility and the same practical utility as GIS has. There's a whole world out there. If you model it in your computer, you can have a digital version of that world that you can then ask questions about that might be hard to ask otherwise.
PINO: Paul, can I ask, what did people do before PostGIS? Did they use other databases or do this in the application layer or something else?
PAUL: It depended a lot on the software. At the time that PostGIS came out, this is 2001, there was one other mainstream spatial database out there, which was Oracle at the 8i.
They added a spatial type, the SDO_GEOMETRY type. And then, but it was not super widely used, like it was a brand new feature of Oracle. The only people who would kind of know about it would be like database experts who were using Oracle. Now, the people actually doing practical work in geospatial at the time were using workstation-based GIS systems, and they would end up having very local database systems.
So the database would be running on the workstation next to the GIS where the GIS would itself be manipulating the files. So it's sort of. Indicatively, sort of most popular GIS system at the time was a system called ARC/INFO, where ARC stood for the spatial side and INFO stood for the database side.
They used an embedded database called INFO, which has long since disappeared the successor to ARC. INFO was a program called ArcView, and ArcView used dbf files to hold the non spatial parts of the data sets. And dbf was the file format that FoxPro used at the time. So you get, it was kind of these, these very local versions of databases.
And that was fine, because mostly people were worried at that point with capturing the spatial side of the problem and asking the spatial side of the questions. And most GIS professionals were siloed in these special little GIS departments not interacting directly with the enterprise databases of the organizations they were working for.
PINO: I was just going to tack on another question, which is so PostGIS helped bring maybe democratize GIS for users and maybe at the same time other changes, you know, the Internet, all the websites that were created, created a lot more use cases for GIS. Would you say?
PAUL: What do you think, Regina?
REGINA: I would say so, because I think it's the first database that you could actually interrelate with relational. Because as Paul mentioned, Oracle was the only one. There was another, ArcSDE, which is Esri's companion to ArcInfo, was kind of gaining steam. So people were either divided using ArcSDE or they were using Oracle or they were using their desktop tool.
So I think PostGIS provided that sweet spot of being affordable and approachable to both the GIS people and also the relational people. So I was from the relational world. Not from GIS. GIS meant nothing to me when I first got introduced to PostGIS.
CLAIRE: So when Pino and I were preparing for this conversation, we came up with this long list of questions.
There's so many things we're curious about. But one of the things that I promised at least one person in the audience that we would cover is, it's kind of in the title of the episode. Like, why should people care about PostGIS? And what I'm hoping you can do is walk us through a few examples of different use cases of like how people use PostGIS in the real world or in their applications or to solve problems or whatever.
What are those different use cases?
PAUL: Yeah, so you've sort of... obviously it's been 20 years, so there's been a sort of a progression, a change in the story of how people used or why people would use PostGIS. And you guys have kind of touched on the first one already, which is the idea of democratization in the web.
In the early 2000 era, one of the big motivating drives behind people in the GIS world was to start to make their data more accessible. And the way to do that is to push it out on the web. Once you start looking at that problem, particularly in that that era. The tools to do it were sometimes really expensive. Certainly Oracle Spatial was a part of the Enterprise edition of Oracle.
Basically you're talking like a quarter million dollar ante to play that game.
REGINA: Yeah, and on top of that, Oracle Spatial was in addition to the $20,000 you were paying for Enterprise. You had to fork over another $40,000 for the Spatial part.
PAUL: There's a whole range of organizations for which that was like, not on.
And there were also at the same time, as PostGIS was coming up, a number of different open source technologies for doing web map publishing, and they naturally used PostGIS as one of the input data sources that you could map out. They'd also use like, GIS file format as data sources, but file format had serious drawbacks for any sort of web map that had any GIS kind of data.
It's much more difficult to update the live data in files than it was to have a database that has your data and update that database. It was far more reliable and far easier to build it into live workflows. So we sort of started off with these small organizations picking up.
CLAIRE: So wait, web map publishing, help me out.
Like, can you give me an example? What is web map publishing? Who was doing that?
REGINA: Yeah. So actually I got into PostGIS after I got into web map publishing, because the first thing I needed to do was publish city of Boston data on the web. And so I used map server for that. And I was using shapefiles that the mapping department gave me.
And so that actually came first. PostGIS just became an easier solution than trying to deal with DB, dBase files and shapefiles.
CLAIRE: Okay, so these are city departments, county departments. City planners, people like that.
PAUL: Yeah. And it's worth noting that, again, early 2000 era, who has large corpuses of spatial data, it's almost exclusively government but at all levels, so national, state, regional city.
They would have in the previous 15 to 20 years as part of the GIS revolution, taking what were previously paper maps and digitize them for computer storage and analysis. So they were the ones who had large quantities of data. Question is, how do you make that data visible to your constituents in the new web era? And that's where the web mapping revolution came from.
CLAIRE: Are there other use cases too, now?
PAUL: Yeah. As you start working your way from that early era to the present you cut across at 2007, the introduction of the iPhone, which is like a huge super, I guess it was introduced a year earlier anyways, regardless, the second version they released had a GPS chip in it. And this is, like, cataclysmic change because all of a sudden the idea that everyone is going to be walking around with a location sensor in their pocket really changes the idea of who's going to be gathering this data and who's going to be analyzing it and for what purpose.
The idea that governments are going to be the prime holders and users of geospatial data just completely disappears. All of a sudden you've got all kinds of folks, anyone building a modal app potentially has questions about that data.
CLAIRE: Okay, Paul, you cut out for a second. So anyone building a mobile app, dot, dot, dot, finish the sentence, please.
PAUL: Oh, is going to need to be able to store, manage, and query that data. It's not just going to be governments who are the sole users of geospatial technology anymore.
REGINA: Yeah. And to some extent that also caused the spring up of all this crowdsourcing, like OpenStreetMap, everyone's running around with a phone that they can now define their whole area with.
PAUL: Yeah. There's also a feedback loop in the web publishing. As you get web publishing technology, it becomes possible to put a base map, that is to say a map which has things like streets and mountains and water and that kind of contextual information up on the web, and then ask people to annotate that base map with things they know. And so that allows you to sort of cycle through and change the data gathering loop, the data gathering loop prior to web mapping and crowdsourcing really involved things like flying planes and taking pictures and then digitizing the features you see in the pictures off the picture or doing the same thing with satellites.
REGINA: Yeah, to some extent that's even gotten better with drones. So everyone can practically afford a drone now.
PAUL: Yeah, I haven't gotten away yet. It's gotten even more, particularly with AI being able to slurp features directly off the imagery. It's become even more important, that channel of data-gathering, but putting the maps on the web allowed you to open up a whole new channel, which is crowdsourcing.
And actually, if you look at the publications and articles from around the 2007 to 2010 period in the GIS world, the word crowdsourcing, or I can't remember the name. "Neogeography" was all based around this idea that you put out maps and get the world to annotate your maps. And that becomes a whole new source of information.
And people were just agog that this was possible. "Oh my God, people will provide mapping data. We don't need to go out and gather it in a sort of primary source way anymore."
CLAIRE: I think that Aaron just dropped a note in the chat, but it's worth calling out that this crowdsourcing, this feedback loop with respect to maps has been super useful in disaster situations, right?
Getting people who are out there helping with the rescue to be documenting and capturing what's happened, right? What's been demolished or what's changed or what's not there where they're finding people. Is that right?
PAUL: Yes, 100% sure. It's a project called Ushahidi, which I'm not sure if it's still extant that really leverage SMS for third-world crisis management.
And there's a whole offshoot of the OpenStreetMap project called the Humanitarian OpenStreetMap. Who are very, very good when disaster strikes at getting super recent satellite and aerial imagery from the major satellite providers, putting it online, and then getting a group of people to digitize both the previous state and then the current state of the area that's constructed by disaster.
So they can quickly do things like figure out where the worst hit areas are and things like that.
PINO: So I hadn't heard about OpenStreetMap before. Maybe Regina, could you say a little bit more about that? I'm curious about how does that get organized? Where does the data go?
REGINA: Yeah. Yeah. So OpenStreetMap is an interesting project because they actually started on MySQL and then they started having performance problems with MySQL and they were also using PostGIS for their tile servers.
So they decided to consolidate on Postgres. So most of their editing stuff is just using raw Postgres and still the tile servers use PostGIS but that's kind of how it started. It started as a crowdsourcing, you know, rather than pay Google millions of dollars to get base maps. Why don't we just build our own?
And it started in UK, I think.
CLAIRE: I mean, small, small segue here for a second. I think my very first talk at a Postgres conference, I needed a map. And so I used Google Maps for my screenshot, and I put it in the talk, and I think at least five or six people came up to me afterwards and said, "Yeah, you should probably use OpenStreetMap for that, right?"
So that you're using a map that is built on top of Postgres. That'll resonate better in the Postgres community. So anyway, now all my map screenshots are from OpenStreetMap. So Regina, have we missed any of the examples of use cases? Talked about web map publishing, we talked about how government entities use it for their planning and understanding of the territories and the cities and stuff. Talked about crowdsourcing, humanitarian crowdsourcing in the case of disaster. What are other use cases?
REGINA: Okay, so I think a lot of the really interesting use cases for it are non-visual. So for example, I have clients in insurance. So they use this information to basically price their portfolios based on locations, like: what's the likeliness of hurricane, what's the likeliness of fire, and all that stuff, to come up with the deductibles and everything.
And then a very popular area is real estate. So right now, most governments provide their parcel data, which defines the footprints of all the land parcels they have. And also, they often provide the building footprints. And so real estate developers, I have a couple of those as clients as well, use that to assess what are good plots of land to build new houses based on whatever the divisional restrictions are on rules.
So they'll scan the whole list of parcels based on area and proximity of things. And they can assess, "oh, I can group these parcels together to, and build this building which I can sell for X." And I have to pay the person who currently owes it X so they can determine what their profit margin is.
CLAIRE: I remember, in Paul's keynote back in Lisbon he had some example that had to do with Starbucks. And I don't remember the specific example that he used right now, but I'm thinking about retailers. Like, if I were a retailer, would I be using something like PostGIS to help determine the location of my next coffee shop or my next store?
REGINA: Yeah, but you'd probably also be using pgRouting, which I should throw that out because not only do you care about the proximity of your Starbucks, but you want to know how easily they are accessible from streets and train stations and so forth and how long it would take someone to get there.
And you also want to do something. You don't want two Starbucks too close to each other. So you'd want to apply traveling distance calculations to that.
CLAIRE: All right, let's do a quick definition for people in the audience who've never heard of pgRouting. What, what is it?
REGINA: Okay, so pgRouting is an extension that basically packages a whole bunch of algorithms for routing, for network routing, and even though it doesn't use PostGIS for those computations that uses PostGIS to derive networks from spatial data, which is the most common use of pgRouting.
So you take OpenStreetMap data or you know, you take train lines, train stations, bus stations, and then you define a route, and then you assign costs to how far, what's the cost to go from this station to this station, which stations aren't accessible from each other.
And from that you get the true distance that people have to travel. Because, like, if you look at Boston, even though a place might be one mile away, it might take me 60 minutes, just because of how the city is planned out.
PINO: You talked about some use cases earlier that made me wonder if there are other tools like pgRouting, certainly one of them I was going to ask about, but when you attach video images or temporal data, current conditions, what does that involve?
REGINA: When you attach temporal data... So there's another extension called MobilityDB, which actually tracks things in motion. And one of the cool things about it is you could take, like, let's say you have GPS coordinates and travel, you can actually determine how fast you were traveling at this portion of your trip, whether you've accelerated and so forth, which is the single, you know, apply the speed.
CLAIRE: So shout out to the team at MobilityDB. They're actually, it started as a research project, I believe at the Université libre de Bruxelles and it works with Citus, so that's how I know the team, right? They did a bunch of work to make the MobilityDB extension work, not only with PostGIS, obviously, but also with Citus.
And so I got to work with Mahmoud Sakr on a blog post that explains how to use all three of those things together. And anyway, so I'm a fan of their project.
REGINA: And they've also contributed a whole bunch of functions to PostGIS.
PINO: Does PostGIS itself include support for things like, if there are images of the street, not necessarily video, but images keeping track of what's north, what's south? What's direction and images that are attached to geo coordinates?
REGINA: Well, sort of. So that within PostGIS, there's another extension called PostGIS raster.
So you can geo-reference a image. You can load it in and load it as a PostGIS raster format. And then if you geo-reference it, you say the beginning of this point corresponds to this long lat. Then you have information that you can then you use to generate vector data. But you can also determine things like if you're measuring population and so forth, you could store that as imagery too, so I think, like, Facebook, for example, stores their population data as raster.
So you don't really think of images as being matrices, but they really are. And so it's a consolidated way of storing matrix information that you can then intersect with your other data, you know, with your relational data is say, "oh, what's the average population in this area?" Or "what would be the temperature generally at this time? at this time of the year,"
PINO: Temperature, climate too. Sure.
CLAIRE: Yeah. Okay. So now that Paul is back, I just want to, you can test out your voice here live in front of everyone, or you can hop onto the private voice channel for a moment with Aaron and he'll join you there. And you can test there. What do you want to do Paul?
PAUL: Well, I think I'm just going to answer your question and see if that works.
CLAIRE: Yeah, that works. Oh yeah, that's better. You're all good. Okay. I'm still reeling from the definition of pgRouting. I just assumed it had something to do with, you know, computer routing and didn't realize it was being used for spatial data.
And when you wanted me to call out the name of your book on pgRouting, I was like, well, why does Regina want me to do that? Like, that doesn't have anything to do with PostGIS, and I was wrong.
REGINA: Yeah. And I'm working on a second edition, which I'm hoping I would be done with this year, but I've been saying that since last year.
CLAIRE: I'll try to get links to both of your books dropped into the chat for people who want to learn more. How do you do that? How do you, how do you do your day job and find time to write a book?
REGINA: How do I do my day job? I don't know. I don't have a day job. I guess that's the answer. Cause I don't really. I work at very odd hours, aside from when I have to go to meetings during the day, but normally I think I do my best work like 2 a. m. in the morning when I'm...
CLAIRE: Person after my own heart.
REGINA: Yeah. Well, the concept of having a "day" versus "not-day", it doesn't make sense to me.
CLAIRE: Okay. So we're still fishing for more use cases, understanding like more applications for PostGIS.
Are there others we haven't talked about yet?
REGINA: Well, there's one that Leo likes using, which has nothing to do with spatial, but I think he uses it less now that now that we have ranges, wait, not ranges Multiranges. So his old use case for PostGIS was to model time, because he had to do, yeah, he had to do health analysis.
So he had to determine like how long someone has stayed in the hospital, and he needed to store that as a single record.
CLAIRE: What does that have to do, what does time have to do with PostGIS?
REGINA: Because you can model time as a line string.
CLAIRE: Okay.
REGINA: And so you can store that as a single record and do all sorts of analysis on it. You can have like the whole history of somebody's health in one column.
PINO: That's amazing. Do you find a lot of these use cases that are very creative? Maybe for me, I'm new at this, but. Did that surprise you when you first heard of it?
REGINA: Yeah, Leo tends to come up with creative things.
CLAIRE: Wait, who is it that we're talking about?
REGINA: Oh, my husband, so.
CLAIRE: Who works in the same space?
REGINA: Well, he doesn't, he's more very relational-focused, more so than spatial-focused.
CLAIRE: Got it.
REGINA: But he likes the algorithms he sees in PostGIS. And he's also used pgRouting for decision analysis. So, for example, pgRouting, even though you think of it mostly for road network for road routing, you can apply it to anything that's graph-based.
So, in the case of, so we do a lot of work with labor unions. So they always have a need where they need to model grievance, so whenever a employee has a grievance, there's a whole bunch of steps that you have to go through, and then if you miss this one, then you have to go down that route to solve it, so he uses pgRouting to come up with these decision matrices.
That dictate, like, if you're here, you need to go here afterward.
PAUL: Yep. It's important to bear in mind, geospatial is really a specialization of a set of tools, and all PostGIS is providing at its core are a bunch of tools for handling data in cartesian space, up to four dimensions of cartesian space and tools that understand that problem.
And similarly, pgRouting, while it's mostly used for roads and so on, is again like providing tools that can do graph solving. So if you can state your problem as a graph problem, then you can solve it using pgRouting. On the Cartesian side, I've seen people use PostGIS for CAD/CAM, for computer-assisted design, computer-assisted manufacturing where they store the parts in the database and then use SQL to manipulate the parts and do cutouts and things like that using the same spatial tools that land managers use to handle ownership parcels, but it's all just cartesian math.
CLAIRE: And I imagine that in the energy space, so oil and gas, I imagine PostGIS is there as well. Is that right or not?
PAUL: Yeah, third dimension going down volumetric problems. What's an oil body? Well, it's a volumetric shape. So people definitely do three space modeling below the surface.
REGINA: In fact, the oil and gas industry kind of invented the whole spatial reference, right? Because if you think about it, I forgot which one.
PAUL: Yeah, well, the oil and gas industry was, I guess, as an international industry that was interested in where things are ended up generating the first and currently really the most widely used consolidated database of map projections because they could be operating anywhere. They'd be operating in Azerbaijan. They could be operating in the Gulf of Mexico and all of those areas have their own unique map projections. So the European Petroleum Survey Group is the owner of the big database of map projections and you'll find EPSG written into a whole bunch of places in the PostGIS documentation because to understand map projections and map reprojections, you kind of have to understand how the data lives in the EPSG database.
REGINA: Yeah, and so basically all that is just to say that even though the world is round, we need to model it as flat. And so we need to constantly be switching which projections we're using to model it for whatever purpose in whatever area we're trying to model.
CLAIRE: So if we go back to the title of this episode, why people care about PostGIS and Postgres we've talked about the different use cases and applications and probably haven't hit all of them, but hit a lot of them.
Why do people care about it? Like why, when you mentioned PostGIS in the Postgres world, at least there's an affinity that people are supportive of the project. Even if they don't work on it. Why do people care?
PAUL: People in the Postgres community, like the developer community, care about it because we've been around as a major user of the Postgres core since 2001, so pre, before the extension system was even invented, and we have a surprisingly large user community.
In terms of, like, volume of data, the people using Postgres for big web apps and log tracking and, so on, probably have the most data in terms of like number of installations, like individual Postgres clusters, I would not be surprised to find that the geospatial world. has the most clusters because every small organization that has GIS data to store would stand up a Postgres database, stick their geospatial data in it, and then tell people with a perfectly straight face that they manage their data in PostGIS.
They wouldn't even mention the Postgres part because from a geospatial point of view, it kind of goes along for the ride. So we've been around as a big, complex extension in the Postgres community since 2001, and we've always been the testing ground for big extensions. Until timescale showed up there was nothing even close in terms of the level of complexity and how deep a third-party extension would hook into the core functionality of the database. And there's been all sorts of improvements that have been made to core as a result of questions or problems we had. I don't know if I want to enumerate them because they're really super obscure, but they've allowed us to do things like have our planner work really well with spatial data, have our functions be able to hook in to those core planning abilities, our ability to cost functions because our functions tend to be really, really CPU costly in a way that Postgres functions aren't.
All that stuff has gone into core since we existed, has been really immensely helpful in making our project better.
REGINA: Yeah, I think a lot of the impetus for improving GIST is also PostGIS. Yeah. Because, oh, didn't Refractions subsidize some of the GIST improvements?
PAUL: Yeah, so I mean, the, the spatial indexes are built on top of the GIST API, index API in Postgres.
And things like the ability to have GIST tied into the Write Ahead Log. So you could do crash recovery and things like that were funded by the PostGIS community. PostGIS was the first community, what's the word, the community funded support project they did. So my company funded part of it. And then another half dozen companies around the world also chipped in money so that we could pay the developers of the GIS functionality for their work in making GIST indexes crash recoverable. Yeah. That's way back in like 2005.
REGINA: Yeah, and that's improved so many things besides spatial because you think about the JSONB and support and the hstore and even the MultiRange and everything.
So much depends on GIST these days.
CLAIRE: Awesome.
REGINA: So I think to Paul's point the reason why PostGIS is important is because it demonstrated, it showed the world how extensible Postgres is. And I think that just kind of led to other organizations using it and building on top of it. And then on to the other side, in terms of the fact that so many governments were building on Postgres and exposing their data that way, for example, OpenStreetMap a lot of the backups were kind of designed to be loaded into Postgres with a tool called osm2pgsql. So that, I think, increased the adoption by a lot of people, because if you needed any spatial data at all, Postgres was the simplest database to get spatial data into. So you would choose it over MySQL because as soon as you say, "Oh, I need this," then MySQL was out of the race.
CLAIRE: So you both have been involved in PostGIS from the beginning. Is that right?
REGINA: I was probably a couple of months into it.
PAUL: Yeah, three or four months.
REGINA: Yeah.
CLAIRE: What do you think, Pino? Should we ask the birth story? Do we want to know where PostGIS started?
PINO: Oh, yes. Let's do that. Let's go there.
PAUL: Okay. So 2001. May of 2001, at that point I was running a small consulting company here in Victoria, working for the provincial government, mostly working in the geospatial field.
Because all of our contracts came from the provincial government, we were very tied to the cycle of when they would release and sign contracts and the government changed that year. So all the civil servants were very risk averse in terms of signing new contract contracts. They didn't know whether what their budget situation was going to be that year.
So we ended up having like three months where we got almost no direct revenue. We, you know, didn't all go on vacation for three months. We were kind of young, excited about the field, so we thought, you know, let's see if we can build something to store the spatial data in a database. And we had used Postgres for one of our data processing projects the year previous, so we were kind of familiar with it.
And that experimental work in the spring of 2001 ended up being released at the end of May, 2001 as PostGIS version 0.1. And, you know, in the ordinary course events, that sort of would have been: we released it and nothing happened, but amazingly, we released it and, like, this almost incredible cavalcade of people showed up and said, "Hey, that's exactly what we wanted" because there was just this missing piece in the open source web mapping field and the open source geospatial world that everyone was really, really hungering for.
So we just happened to show up with the right piece at the right time, and we got people like Regina showing up, we got folks from county governments in Georgia showing up, and we got the project leaders of other open source geospatial pieces like MapServer and GeoServer and the Google library showing up within the first three, four months and that, community was kind of what made it fun from then on.
And it was like, Oh, now you're not just performing. Like, once you're performing for an audience, it's just way more exciting. Each release is way more exciting because all people will be so happy and I can't wait to see who else shows up and wants to use it. And that sort of cycle of enthusiasm was what kept me interested in and involved in the project really for the next 20 years.
PINO: How did you get, I'm just going to tack on a question there. In terms of a truly open project taking contribution versus the initial single company contributed code. How did that evolve?
PAUL: We started off yeah, as a project Refractions, the first website was PostGIS at refractions.net. And it stayed that way for quite a lot while.
In terms of getting contributions, part of it was, it did kind of end up acting a lot like just sort of a classic company-led project in that, we had always, one person on staff who could do core work and we devoted half their time to doing core core work. And we just paid for that. We got some external contributions, but not a lot.
We ended up turning one of the external contributors into a contractor because he was so good and he's still part of the community now. A guy named Sandro Santilli. It remained kind of a company project until 2008, 2009. The open source geospatial community in general set up a foundation to try and provide a neutral home for all these various projects.
It's called the Open Source Geospatial Foundation, osgeo.org. And at that point I had left the consulting company, the people, the owners who were left with it didn't want to invest a great deal into it anymore. Although they still kept a sideline in the business. So they were over time convinced to release the code to OSGeo.
So OSGeo became the holder that led us to governance and OSGeo suggested that all its projects use an Apache-style governance format. That's when we went from kind of a benevolent dictator situation where I was the benevolent dictator to a project steering committee set up very much model on the Apache Foundation model.
CLAIRE: And both you and Regina are on the PostGIS Steering Committee today. Is that correct?
REGINA: Yeah. And so is Sandro. So Sandro, the 3 of us have been, I think, the longest lasting.
PAUL: Mm hmm. I was actually surprised to find out how early Sandro showed up too. He was like within the first year too.
REGINA: Yeah, I think he was before me because he accepted my first patch.
And I remember I thought he was a jerk.
CLAIRE: What?
REGINA: I thought Sandro was a real jerk. He just...kept on scrutinizing everything, like my indentation was wrong, my line breaks were wrong, I should have done this.
CLAIRE: Okay, so just in case Sandro's listening to this podcast after the fact, you don't think he's a jerk today, right?
REGINA: No, no, no, we're the best of friends.
PAUL: Yeah, no, I mean projects require all sorts of people and Sandro is very detail-oriented. One of the reasons we have a really fabulous and deep regression testing system is because he plunked it in there early on when he was a contractor. I was just asking him, so what should we do next, Sandro?
And he said, well, we should do regression testing. I said, I don't really know what that is, but okay, go ahead and do it. And now we have this incredibly deep regression test suite so it feels very safe to make changes to the code base. Like, I rewrote whole chunks of the serialization and felt safe doing it because of all the work that Sandro had done putting that regression test sweep in place.
REGINA: Yeah. So all the, all the upgrading and everything, that's all Sandro's great work. Otherwise we'd have real pains.
CLAIRE: Okay. So you both have basically been working on PostGIS for over 20 years now.
How does that... when you get up in the morning and you're brushing your teeth and you're, or actually I recognize that it may not be that you're getting up in the morning and brushing your teeth before you go to work because Regina already explained, she does her best work at 2 in the morning, but my point is does your work still pass the toothbrush test?
Are you are you looking forward to it still after 20 years?
REGINA: Yeah. Yeah, I am. I think part of it is just the people I work with. I mean, I like Paul and Sandro a lot. And Darafei who's kind of well, he's been busy with other things, but yeah, I enjoy the company. And also I still find the topic very interesting.
I still think, wow, you can take a binary blob of stuff and you can apply functions to it and answer questions is really cool.
CLAIRE: Paul, what about you?
PAUL: I find it comes and goes. It depends a lot on whether I have a project, like a particular piece that I'm working on, which I've allowed to sort of get deep enough into my brain that it's a flow state thing.
So then I'm showing up in the morning going, okay, I'm going to get back into the flow and I'm going to be working on this and I'm going to feel that. That's the part that made me quit running a consulting company and go back to being a programmer, is that mental feeling. So to the extent that I'm getting that on a regular basis, then yeah, it's a serious good toothbrush time.
When I'm out of flow state, I'm just sort of forcing myself to find things or doing small bugs or even forcing myself to write a talk, which is the hardest part of doing a talk, the writing of it. And then it's kind of... then it's kind of harder to get up in the morning, but I get rewarded two or three times a year with you know, these big sort of month-long projects where I really put my head down and get into that mental state and then I go, "oh, yeah, that's that's why, that's why I do this."
PINO: Can you, since this is a little bit of a segue from what you just said, what are the challenging problems coming up for the next year or two? Do you have already have a view into that?
PAUL: Yeah, well, my big project right now is tying s2 into Postgres so that we have native access to some of the geodetic algorithms that are in that library.
It's a very big library. It's very C++-y, so a lot of it involves this translation layer between the model, the geometric model that PostGIS has and the geometric model that s2 has, which aren't quite the same enough to make it really, really trivial. So I've been enjoying that when I've had the time to get my head into it.
My other one I've not really started on, but as Regina mentioned, we have this raster subsystem, which was built in, I don't know, 2010 thereabouts. It was built in an era sort of pre, before everything was in the cloud, and I feel like we need a new upgraded raster-handling facility that thinks about raster objects as cloud objects first rather than as sort of local block objects.
And I keep coming back to that cause it feels like something that has an immense amount of utility what requires a new way of going at the problem.
PINO: Do you see any opportunities for the new AI advances? So either whether LLMs or--large language models-- or vector databases to advance some things that were already viewed as completed in PostGIS?
PAUL: There's some stuff, there's some places where that stuff really blows my mind in the geospatial world, but not directly tied to the actual spatial database or database side geocoding is kind of a really hard problem and it kind of blew my mind how good large language models are at standardizing addresses.
You can say, "here's the standard address format that the US Postal Service uses. Here's some random ugly ways that people type addresses in. Can you standardize them for us?" And we have an address standardizer in Postgres that's built on regex and it's pretty okay for the United States. But the Large Language Models just do the most crazy stuff and do it as well as a person would, which is all you can ask.
But that's not really a database op, it's just like a really super common use case op that the large language models just sort of chew up without any trouble whatsoever and no special programming either. And so that sort of boggled my mind. The other side of it, things like feature extraction and so on, is definitely a huge...place where AI is going to keep on changing and increasing the pace of data collection in the geospatial world, but it's usually built as an external, not-in-the-database thing. So it'll just result in, like, more data flowing down the pipeline to the database, but not necessarily a change in the database.
I don't think, what do you think Regina?
REGINA: You know, I've been trying not to think about it because I think everyone is thinking too much about it. I don't like thinking about things that everyone's thinking about.
CLAIRE: Well, actually that touches on my question. I mean LLMs, everyone is talking about it. And I feel like every developer out there is having to think about and decide how much of my time should I spend spinning up on this?
Trying to understand how LLMs are going to change my world, are going to change how I do what I do, are going to change my roadmap. How do you figure that out? Is it a fad or is it a fundamental shift?
REGINA: Yeah, I think it's too hard to tell when something is a fad and when it's a fundamental shift.
So I try not to bother until it's demonstrated that it's here, because one of the things that my brother always told me, you know, I'm always jumping into new stuff. He claims that he invented microservices 30 years ago. And so he said, don't jump on all this new stuff. It's just going to cloud your mind and waste your time. Just let everyone else waste their time before you bother with it.
CLAIRE: What do you think, Paul?
PAUL: Technological conservative. I think, is an IDE tech? Great technological shift? I mean, in terms of the capabilities they provide, if you're a programmer, practicing programmer, and you haven't at least tried it out on a few problems that you have, then you're doing yourself a disservice.
Like, you gotta at least try it and give yourself a chance to understand what it can do for you before you reject it. I've already been converted into at least understanding the utility of it for my day-to-day work.
REGINA: Are you using Visual Code yet?
PAUL: I'm using it for places where the project and the tooling help me.
So when I'm doing work in Go, I'm doing it in VS Code because the tooling works great and fits in with my projects and it makes me more effective and efficient. And I will definitely be asking chatGPT for coding help when I get back and start doing more writing of Go and I've used it for debugging.
I've used it for all sorts of stuff. And it's, you know, it's not replacing me, but it's saving a lot of my time.
CLAIRE: So wait, how is the tooling helping you? Can you give me a specific example in VS Code with Go?
PAUL: Oh, no, because I haven't actually used it. But in general, Python scripts for doing database or for doing data manipulations.
Just saying, "Hey, ChatGPT. I need a Python script that does this, this, this, this, and this." And it pipes out something which runs right away. And then it's really, really fast to get it to hone in and improve the Python, like it spits out really kind of archaic Python, but you can tell it, "no, do it this way", "no, do it this way," and soon enough you have a very attractive Python script, and then I've had some C bugs, like, really kind of obscure looking language, and say, "Hey, ChatGPT, what are the five most common things that spit out this particular error message?" And, oh, sure enough, you know, item number three was the mistake I'd made.
REGINA: Yeah, I mean, I think I've used VS Code longer than Paul. So even though I'm a conservative...
PAUL: I'm also conservative in some ways.
CLAIRE: We're not name calling, no judgment, no labels here. Just trying to understand your perspective.
REGINA: Yeah. From my perspective, I think that integration with Git is really good. So I can mouse over something and quickly say, "Oh, it's Paul's fault."
PAUL: It absolutely is.
CLAIRE: Is it usually Paul's fault?
PAUL: I'm the cowboy. We all have our roles, right? So Sandro is the detail-oriented one, and I am the cowboy. What would you say your role is, Regina?
REGINA: I think I'm the one that keeps peace.
PAUL: The middle child.
REGINA: Yeah, I mean, I make mistakes too. Sandro, he doesn't really make mistakes.
He just creates really complicated tests that take our CIs five hours to run. So the only thing I have to do with him is like kill some of his tests.
CLAIRE: So if anybody listening to this knows about PostGIS and has always been curious about like how to get started, how to kick the tires, how to figure out if this is an area they want to work in, how would you advise them?
How does someone get started in this space?
REGINA: I think the same advice as I'd give with Postgres is just to look at the code. And see what parts interest you of it. And of course, figure out how to compile it.
PAUL: Yeah, it's not, becoming a developer is not necessarily an easy thing. Obviously the first step is going to be, can you build your own copy?
Can you set up a development environment that works for you? Having a real problem of your own is like super important from a motivation point of view. The old "itch to scratch" motivation is really core. For people who are coming just, like, from the database side though, "I'm a database developer, I want to learn about this spatial thing."
Rather than pointing them, we have, on postgis.net, you'll find links to documentation, you'll find links to a nice step-by-step tutorial, which is a great introduction to a spatial SQL. But before even going to that, I tend to say for people who are totally fresh and totally new to the geospatial world, actually start with qgis.Org. QGIS is, in many respects, kind of an old school desktop GIS program, but because it's very visual, because it lets you see the maps and understand visually what you're doing it's a wonderful introduction to sort of the spatial gestalt before you get into writing SQL queries that use spatial functions.
So starting at qgis.org and doing their tutorial. And then, you know, as a database person coming over and doing our spatial SQL tutorial, I think is, like, the best path for someone who's coming from database world and wants to be geospatial database.
REGINA: Yeah, and we also have a PostGIS workshop, which we're beginning to translate and actually we've got a good percentage of translation in Chinese and Japanese, I think. So that's a good one to look at.
And personally, for me, since I'm more of a database person than a visual, I still like using pgAdmin and the geometry viewer in it. It provides me just the right level of being able to write queries and see it right away without having to jump full blown into some, into a spatial tool. But QGIS is a good tool. If you want the full immersion experience, then QGIS is the best. And I should add, QGIS exists because of PostGIS, because the developer QGIS.
PAUL: Wanted a viewer!
REGINA: Yeah, the initial developer of QGIS, Gary Sherman, he wanted a way to view his PostGIS data, so he created QGIS. And QGIS is now bigger than PostGIS.
PAUL: By a long shot, by a long shot. And it's an object lesson that everyone starts out writing a simple tool and then ends up writing a GIS. You just can't help it, you just keep adding and adding and adding, and before you know it, "oh, I accidentally wrote a GIS."
CLAIRE: wow.
PINO: And QGIS is the desktop program you mentioned a little bit earlier, right?
PAUL: Yep. Yes.
PINO: Okay. Okay.
CLAIRE: Yeah. I'm on their website now and I'm looking for their tutorials so I can drop it into the chat. What if it probably is under documentation? I don't know. I'll keep looking.
PAUL: Yeah, that's correct.
REGINA: Yeah. They also have, I think whenever they do a release, they have a visual release, which is kind of neat.
I think they have a video of all the key features that each release has.
CLAIRE: They have a "Learn GIS basics." "A gentle introduction to GIS." I wonder if that's what you're talking about.
PAUL: Is that it? That's the one. Yeah.
CLAIRE: Okay, cool. I'll drop it in the chat. So that's great advice about how to get started.
I love it. Especially love that there's a way to get started for people who are more visual. But also a way to get started for people who want to be just looking at the code and at the data and using SQL queries and looking at things from that angle.
REGINA: Yeah, and we have mailing lists, and we also have Matrix.
CLAIRE: The workshop that you mentioned, the PostGIS workshops, the ones that you're getting translated, what kind of workshops are those? Are you talking like in-person half a day, or in person three days, or is it all virtual, or what do you mean?
REGINA: It's all virtual, but it's based on in presence workshops that Paul gave. So it's, I don't know, how many hours do you think that is, Paul?
PAUL: If you, if you do it end to end, it'll take up a full day. You generally like end up doing like the first half for introductory workshops. And that's like three to four hours, depending on how much you stretch it out.
REGINA: Yeah, and I started adding up a topology and a raster section to it, which probably would take a half a day too...
CLAIRE: Now, before we started the recording today, we were talking a little bit about FOSS4G NA, which is North America. That's what the NA is. Version of geospatial event that's happening in October in Baltimore, or Balmer, as my husband's family used to say. In the local pronunciation back in the day, but you both are involved in that.
So just if somebody wants to learn more and they happen to be in Baltimore in October, what do you want to tell people about that thing?
REGINA: So there's going to be, you'll see quite a bit of Postgres use and PostGIS because I think most enterprise systems use PostGIS these days, but you also see quite a few talks on QGIS. Machine learning. So basically, if you want to be immersed in the whole open source geospatial world, that's the conference to go to. And there's another one that's, I forget when the next one's going. I think the next one's planned for Brazil, the international one. So there's usually... the Open Source Geospatial Foundation that Paul mentioned earlier, they're the ones that manage the FOSS4G, which is the international conference.
And that one, it rotates around the world each year, but the topic matter is more or less equivalent, except the North America one has a more North America focus.
CLAIRE: And you both are giving talks at FOSS4G this year. Is that right?
REGINA: Yes.
CLAIRE: So what's your talk about, Regina?
REGINA: So I am going to talk about growing the FOSS community.
I mean, in a project. Because I feel like I'm getting typecast talking about PostGIS, so I'm trying to get away from talking about them.
PAUL: No, keep talking about PostGIS. I don't know if we have the schedule. Do we have an introductory talk on PostGIS in the schedule?
REGINA: I'm not sure if it is, it's not me.
PAUL: I don't know. See, I wanted to back off and let someone else sort of pick up that thing rather than...
REGINA: I am doing the PostGIS workshop. So if you want a true immersion of PostGIS, the day before the conference is the PostGIS workshop, which is three hours.
PAUL: I'd like to point out one thing about FOSS4G, both the North American and international editions, which people in the PostGIS community might find surprising, which is this strange little niche community of open source geospatial people has gatherings that bring in between 600 and 1200 people each time which when you compare it to pretty much any Postgres conference, with the exception of EU, is a couple times larger.
But even our smaller regional conferences, like the FOSS4G North America, will be as large as the PGConf.EU event is. These are really quite substantial events and pretty much every attendee is a PostGIS user because PostGIS is so ubiquitous in our community. So it gives you an idea of like how big the PostGIS community is on a person by person basis.
CLAIRE: Which means that every attendee is a Postgres user.
PAUL: Correct.
REGINA: Yes. Okay, yeah, I always found it kind of odd how the two groups, I mean PostGIS people know they're using Postgres. So that's kind of known. But on the PGConf, they don't it because when I first started going to conferences, I actually decided to go to a PGConf once.
So I usually go to PGConf, the New York one. And it always shocked me that, oh, there are all these people that know what PostGIS is when in the other group, but then this group: PostGIS what? And so we would give a lot of talks and it was kind of lonely because here we were these kind of weird animals that talked about space and everyone was talking about relational.
CLAIRE: I went to a geospatial event in Ottawa years ago, must have been at least 10 years ago. And I met a man from Australia and I can never remember the name of this book. So I've got to ask you both now, maybe, you know, the name of a book and he recommended a book to me. And it was all about the concept of "where."
And he was trying to explain to me that this space that he chose to work in, he was spending his whole career in geospatial, was really interesting because it's so fundamental to human beings existence. And he's like, "look, anytime you meet someone, what's the first question you typically ask them?"
You know, it's, "where are you from? Where do you live?" You know, "where did you study?" "Where do you vacation?" "Where" is so fundamental. And so anyway, he was trying to persuade me how important a space it is. And I think it. It clearly is but do you know the name of that book? Some book about the concept of "where?"
PAUL: No, it does not ring any bells, but I mean, you're touching on what, to come back to your initial question, like, why should people care about PostGIS? And the answer is because it's very difficult to build a database that does not have some sort of spatial component to it, even if you're not being explicit about it. You will be modeling objects that exist in the real world and they will have location and you can either not model that location or you can model it, but if you do model it, PostGIS will help you do it and it will help you ask and answer questions that otherwise you'd be unable to ask and answer of your model.
REGINA: Yeah, I mean, one use case. I, I remember now in government, we did community notification. So it's this idea that, hey, you're building something here. You need to notify everyone within five miles. And so the way you did that is you take the voter information that you got from the voter, you know, the voter department, and you would geocode, then you'd have this rich piece of data that you could just mail out people and say, "Hey, we're developing, you know, this thing, come to our meeting."
And so, yeah, getting to the point of the "where," so where is the universal foreign key that most data has? And I think that's an important point, because when you think about relational, you always think about hard foreign keys that can be easily joined, but you can also use "where" as a join.
PINO: And since we touched on the subject of books, and we mentioned Regina's book earlier, any books for folks getting started on PostGIS and GIS, more generally?
REGINA: I can't think of any.
CLAIRE: Besides your books, Regina.
PAUL: Yeah. You're hard pressed not to pick up "PostGIS in Action". It is the best PostGIS introductory book that I've seen.
REGINA: I think Locate Press does have a lot of books.
PAUL: So if you look at the look at Locate Press and you'll also find a "QGIS in Action", which is a good starting point for that.
REGINA: They have a ton of of spatial books, a ton on QGIS. And then they have another one for theming. Well, I think that's a QGIS too, but then they also have a training. I think it's more for elementary and so forth. So I would recommend look at locatepress.com. They have a lot of good intro spatial books, especially on open source geospatial because that's their focus.
CLAIRE: Thank you. PGConf New York City is happening in October, early October in New York City and PGConf.EU is happening in December this year, which is unusual time of year for them. But that's another story. And it's happening in Prague this year. Are either of you going to either of those or giving talks on PostGIS at either of those events?
REGINA: No. Nope. I was rejected.
CLAIRE: From New York City?
REGINA: Yeah, but that's okay. I think I wanted to do a workshop and they said no. And I said, well, then I want to have three talks. So I'm not complaining. I got rejected.
CLAIRE: Well, all the best speakers all get rejected sometimes and you know that, so.
Yeah, happens to all of us. The reason I'm asking is just in case anybody listening is going to those Postgres events and wants to hear your talks. So sounds like FOSS4G is the place to go then.
PAUL: It's the place to go this year. It's going to be a good event. It's the first North American FOSS4G since 2019. So I expect it to be very well attended. The geospatial community tends to cluster. Again, this is because government has traditionally been the sole source of geospatial data. So one of the clusters is Washington, D.C. So it's going to be a very well attended event.
CLAIRE: And wait, when is that one?
PAUL: Oh boy. The exact date?
CLAIRE: So no, no, no. Okay.
REGINA: Yeah. He was talking cause all, all the government's going to is in DC. So it's going to be easy for government to attend it.
PAUL: October 23rd and 25th.
CLAIRE: Regina just read my mind. I was trying to figure out why Washington DC meant that the Baltimore event would be well attended. And obviously it's cause they're only, like, a half hour away from each other.
PAUL: Right. Yep.
REGINA: So different states. They have a FedGeoDay just before the event.
CLAIRE: Okay, so before we wrap up and say our thank yous and all of that, Pino, did you have other questions that we didn't get to?
PINO: I'm all out. This is fascinating. But we could go on for many hours.
CLAIRE: We want to thank you, Paul and Regina for joining us today and sharing some of the history of PostGIS and how you got involved and the other people involved as well in the birth story of this fabulous Postgres extension as well as the use cases.
I found the use cases part, especially enlightning to just try to envision how people are using this technology. And I know since I've worked on the Citus extension to Postgres we've always, like, looked up to PostGIS, right? You're the canonical success story in the extension world for this database.
So, thank you both for joining us. And we will be publishing this within a matter of days. Carol and Aaron. Carol Smith and Aaron Wislang, our two co producers, they've been in the background and they're fabulous. And so Carol will be getting this puppy up on all the podcast platforms. So you'll be able to share it with your friends and colleagues in the very near future.
I also want to let you know that the next episode is going to be on Wednesday, October 11th, also at 10 a. m. Pacific PDT, and our guests are going to be Andres Freund and Heikki, and I'm going to try to pronounce Heikki's last name, Linnakangas, but I'm not sure I got it right. And the topic for Andres and Heikki is going to be "How I got started as a developer and in Postgres."
And we'll drop, for those of you on the chat, we'll drop a calendar invite link that you can use to block your calendar if you want to join us again for the live recording in October.
PINO: Before we leave, just want to ask everyone a favor, especially if you've enjoyed this podcast, please, please, please rate us and review us on your favorite podcast platform.
It helps other folks find this new show.
CLAIRE: Thank you. Thanks, Regina. Thanks, Paul. Thanks Pino.
REGINA: Thanks for having us.
PINO: Great chatting with you all.