You may think of R as a tool for complex statistical analysis, but it's much more than that. From data visualization to efficient reporting, to improving your workflow, R can do it all. On this podcast, I talk with people about how they use R in unique and creative ways.
Hi. I'm David Keyes, and I run R for the rest of us. You may think of R as a tool for complex statistical analysis, but it's much more than that. From data visualization to efficient reporting to improving your workflow, R can do it all. On this podcast, I talk with people about how they use R in unique and creative ways.
David Keyes:Join me and learn how R can help you. Well, I'm delighted today to be joined by Kyle Walker. Kyle Walker is associate professor of geography, director of the Center For Urban Studies at Texas Christian University, and also a consultant, who helps folks with, geospatial stuff in general, a bunch of different things there. So, Kyle, thanks for joining me today. I really appreciate it.
Kyle Walker:Well, thanks for having me, David. I appreciate being on. I'm a big fan of the work that you're doing. And, we learn we're we're fellow Oregonians. At least I'm a displaced Oregonian.
David Keyes:Yeah. I'm, I'm the opposite. I'm in some ways the more typical Oregonian in that I'm I'm the one who who moved here, 10 years ago. Cool. Well, I'm excited to talk to you because you have developed several packages, but the one we're gonna talk about today, is called tidycensus.
David Keyes:And, we'll get into that a bit more, in a few minutes. But just to give people an overview, tidycensus is a package that allows you to interface directly with data and get data directly from the Census Bureau, which has been a huge time saver for me and my work. But first, before we get into that, I wonder if you could just give me kind of an overview of your background and how you kind of came to the point or maybe even before we get into the tidy senses, how how you started using r, and how that fits with kind of your background.
Kyle Walker:Yeah. That is a great question. And it's kind of a circuitous route, which is one reason that I'm particularly excited to be on this podcast and titled R for the rest of us. That's very much how I came to R and so, I'm from Oregon. Originally did my undergrad at the university of Oregon, studied geography, really was far more interested in foreign languages than programming languages.
Kyle Walker:So I majored in French as well in undergrad, and then went on to do a PhD in geography at Minnesota. And early. My goal was, I was more interested in the qualitative side of research, doing interviews and picked up a little geographic information systems GIS along the way, but really didn't evolve in grad school much beyond point and click software like, desktop GIS software point and click SPSS. And so I was first introduced to our, in graduate school. My advisor encouraged me to take a stats class, outside of our college.
Kyle Walker:So over, in the statistics department, which is very well regarded at Minnesota. And I'll tell you, it was not love at first sight. 1st day of the class. I remember it pretty distinctly. I was taught by a grad student and the teaching style was basically, this was before our studio.
Kyle Walker:The teaching style was the instructor had this long text file of our commands and would spend the entire class period copy pasting them into the console while we all frantically tried to type down everything that he did. And I just, I just didn't get it. I remember the first exercise he asked us to do, he said, okay. Use R to generate a 5 by 5 matrix of zeros, and then use a for loop to replace all the dye, the values in the diagonal with ones. And I had never written a line of code in my life before.
Kyle Walker:And I, I had no idea what he was talking about. And I ended up auditing the class and it turned me off from R frankly, for years, I didn't touch it again for 3 years. So I ended up getting a job in, in New York city. We moved out there because my wife got promoted with her company. And I found a job in New York city doing GIS for a pension fund.
Kyle Walker:And, I was just doing point and click arc GIS work, which was valuable, but it wasn't reproducible. And I had a colleague, who worked with me and he would look over it at my screen. He would say, you know, how do you remember anything that you did? And I didn't have a good answer for that Because I wasn't documenting any of my work, the point and click software was sort of pushing me into, I would say, bad habits. And so I started to shuttle around with coding a little bit.
Kyle Walker:I learned a little Stata. I learned a little Python, to automate some of my GIS work, but where I really got into R was toward the end of my stay at that pension fund. And then my really my first year at TCU. And I came across, like, I think many people in the field did that famous video that Hans Rosling did with the moving bubbles that show changes in life expectancy and income levels by country over 200 years. And I was enthralled by that video.
Kyle Walker:And I wanted to learn how to make those types of visualizations. And so I started looking into it. How do I need to learn JavaScript to do this? Do I need to learn D3? And that was, you know, that was an uphill battle.
Kyle Walker:I bought some books and tried to go the D3 route, but really where I ultimately was able to move in there was exploring R I came back to R I had a bad experience with R a few years prior, but are it evolved to the point that there were some libraries that had come out. And This is around 20, this is around 2012. This is around 10 years ago now, that either were an interface to the old Google charts API that did gap minder style bubble charts, or some really exciting innovations at that time pioneered by people like, Ram Nath Vadyanathan. Who's at data camp now, Kent and Russell, whose timely portfolio on Twitter, who were working on this library called our charts, which was basically in our interface to a bunch of D3 interactive visualization libraries. And I wanted to use these in my teaching.
Kyle Walker:And so that was the motivation to get back into R it. Interestingly, wasn't statistics. It wasn't heavy duty programming. I didn't learn to write code. I didn't write my really learned to write code until I was almost 30 years old.
Kyle Walker:And it was too, I wanted to make these interactive graphics to use for my students in class. And R was the gateway to that. And then eventually you just kind of build on top of that. And I started to enjoy it more because I had something specific that I could create with it.
David Keyes:The yeah. That's it's funny. I mean, there's so many things there, but I've I can't tell you how many people I've talked to who had a terrible first experience with R. And in some ways, it's it's a real testament Testament to are that people come back to it in spite of those terrible initial experiences because what you went through is is unfortunately, I think, very common. So and I think the other thing is, you know, coming coming at it with a really specific use case in mind Yeah.
David Keyes:Is so important as opposed to having some kind of general, you know, oh, I wanna learn to do something fun in our but it doesn't mean anything until you really have something there. So talk about like, I know when you, were working at the pension fund doing kind of point and click, stuff. And like you said, you know, you you took the the path that many people do where you feel like, oh, I should write down what I you know, the steps that I'm going through. But as I think we all know, that doesn't happen very much.
Kyle Walker:Right. You're on a deadline and
David Keyes:you you just need
Kyle Walker:to do it.
David Keyes:Everyone knows you should do it. Nobody actually does it. So talk about the difference in your workflow between that and and what you have now in a in a code based environment where you're working with R.
Kyle Walker:It's a great question. You know, I think back to that professional experience and I mean, it really was a transformative point. You look back at those little times that changed your career. And, you know, when my colleague, my friend, Derek Darbs, he, he looks over at my computer and says like, how do you remember anything that you did? And I just light bulb goes off.
Kyle Walker:Why? I guess I don't know. I just sort of remember it. And then I started writing down in a word document step by step the steps I was taking. But at the end of the day, it was still pretty limiting.
Kyle Walker:I mean, I remember talking to my boss, he asked me to do something and I don't remember what the specific task was, but there wasn't a tool in ArcGIS to do it. So it's the kind of thing that frankly would involve using some sort of in our map or l apply type workflow where you had to really iterate through something. And I didn't know how to do that. So I told my boss, well, it can't be done, which is really not something you should tell your boss.
David Keyes:And I
Kyle Walker:ended up getting away with it. But that, that was the thing I was in the old workflow. I was very much constrained by what the software could do. And frankly, desktop GIS software is very powerful software. You can do a lot with it.
Kyle Walker:And if you do learn the script with it, you can extend it. But at the end of the day, you're still more limited. You're, you're constrained to a degree by what the software can do. With R what's different about it. Certainly there's the reproducibility piece where you can document everything that you have done and show where you went wrong and then fix it.
Kyle Walker:That is massively important, but. Our is in many ways, sort of the gateway to so many other pieces of software. And that is immensely empowering. I've heard are described as the ultimate user interface, you know, are allows us to interact with these other technologies that if you're learning each of those technologies by themselves can quickly get overwhelming. Oh, I need to learn GIS and I need to learn LaTex to compile documents, or I need to learn JavaScript so I can make web maps or interactive graphics, or I need to learn SQL so I can interact with databases.
Kyle Walker:And this is not to dismiss the value of any of those technologies or any of those skills, but the fact that you can have a central portal through which you can actually engage with all of these technologies and then bring them together into a single workflow is immensely empowering. I mean, coming from a geographic information systems background, and frankly, one of the core motivations for writing tidy census, which we'll talk about in a little bit is exactly that All of these things where you'd need 4 or 5 different technologies to get it done, you can put it all together in a single technology and accomplish what you need. That's the big difference.
David Keyes:Yeah. Definitely. I mean, yeah, it's it's it's a really good rundown of kind of how R allows you to kind of spread your tentacles and do all these things. I mean, for me, for example, I came from Excel. And so when I started using R, it was very much like, okay.
David Keyes:I I I was doing just, you know, simple descriptive statistics in Excel. Let me see let me see how this works in r. Yep. And it was once I started really getting into r, I was like, oh, I could use this to, for example, make maps, which is not even something I had considered, doing it in Excel. And may I don't know.
David Keyes:Maybe I haven't used Excel much recently. Maybe there are ways to do it now, but at least when I was using Excel, it just wasn't even a thing. So I think in a lot of ways, the benefit of r is not just, you know, kind of recreating what you've done in other tools, but, like, opening your mind to think about working in new ways that Absolutely. Done before, which actually sort of gets us into tidycensus. So before I dig into the kind of nuts and bolts of tidycensus, can you walk through how you got from okay, I'm gonna learn r so I can make some things to show students in my classes to developing a package that, you know, allows you to access data directly from the Census Bureau.
David Keyes:What was what were the steps involved to get you to that point?
Kyle Walker:It's a that's a good question. Yeah. I I find it to be kind of an an interesting path because, again, I can't emphasize enough. I never thought of myself as package developer or software developer. I didn't learn to code till I was almost 30 years old and I didn't see myself as a programmer.
Kyle Walker:And I think that's something I tell my students a lot, you know, in some cases, you know, students will come in to my class and they'll say, well, I'm not a computer person. I'm not a software person. And I say, well, you have to stop right there. For 1. If you have a smartphone, you're already a computer person and you're programming by knowing the sequence of buttons or apps to tap, to do things.
Kyle Walker:So you just have to reorient yourself, but also, you know, You know, thinking about their trajectories. And this is one thing that I think is really great about the, our community. You have these vastly different trajectories where you don't have to be sort of the genius 11 year old who's designing software, in middle school. You can, you have people who have come to it from a variety of backgrounds. And so thinking about the evolution of tiny census, you know, I really started engaging in our programming around again, 2012, 2013 around, Ramnath, our charts infrastructure.
Kyle Walker:And there are a few people who are kind of making charts and, and, and tweeting them out. So the early sort of our stats Twitter community was pretty important for me getting started out. And I've always worked very heavily with data from the Census Bureau. Coming from the University of Minnesota, my tool of choice has long been NHGIS, the National Historical Geographic Information System, which is a wonderful project that, provides sort of harmonized and kind of aligned spatial and, and tabular data that you can download. But like many census analysts, I was very accustomed to and used to going to the census website and going through all the steps.
Kyle Walker:So, okay. I need to get my spatial data. So I'm gonna go to the tiger line shape files website. I navigate through the menus. I'm gonna pull down the data that I need.
Kyle Walker:I'm gonna unzip it. I'm gonna load it into ArcGIS. Now I need to get my demographic data. So I'm gonna go to what was then American FactFinder, find the right tabulation, pull that down as a CSV, load that into ArcGIS. Now I need to join the tables together.
Kyle Walker:Oh, but the sort of ID column in the shape file and the ID column in the CSV file, 1 is coded character and one is coded integer, and I can't make the join. So I need to modify that. And this was sort of the process. And I wrote a lab for my introductory GIS students to do this, because I knew it was important that they learn how to work with census data. And they just the amount of time that they were spending on it.
Kyle Walker:I would feel bad about it because it was a laborious process to get through. And this was for one analysis, every single analysis you would have to do this. And so I started dabbling a little bit in R package development, read Hadley's book on on R packages, and made my own personal R package. It's still up there on GitHub if you wanna see it. It's kinda clunky, but it does a few things.
Kyle Walker:It's called kwgo. And then started kind of experimenting, with some of these things. And so, talking through, You know, interacting. It was on, on Twitter again, and someone had tweeted out kind of, I wish there were a package that brought in census shape files into ARP and automatically did that. And, a friend of mine, Eli Napp who's, working out at the, at UC Riverside, said, well, Kyle, why don't you do that?
Kyle Walker:And so I thought, sure, this has always been tedious. I don't like going to the census website every time and pulling the shape files. If I can do that in R that would be fantastic. And that was the Tigris package. And so wrote Tigris first back in 2015.
Kyle Walker:And I didn't really know what I was doing, but, Bob Rudis, for those of you in the sort of the r stats Twitter community, is one of the most prominent voices. He noticed the package. He just sort of came in and volunteered his time to make it actually work. And, you know, now nearly half a 1000000 downloads later, it's, it's pretty heavily used. So, so Tigris was, really my first major, our package, but moving into tidy census After Tigris came out, I started getting some, consulting requests and, and people would sort of ask me to give talks on the package.
Kyle Walker:And I was using Tigris pretty heavily in my own work to really bring in the spatial data, but I didn't have sort of a seamless way to get the demographic data as well. And that was frustrating that the process was still fairly slow. And so I started writing some scripts that automated the process, basically used our, to pull down some census data from the API and then kind of join that with Tigris automatically to get enriched sort of spatial and demographic data. And I started to think, well, you know, this could be in our package and this is something that, you know, there are a lot of different ways that you can work with this data, but this is something I would use all the time. If I could have something where I could literally just say, give me income data for Multnomah county, Oregon with spatial data, and I can map it right away.
Kyle Walker:And I can do that in a line of code, that would be phenomenal and it would make my work so much easier. And so I ended up just kind of digging in for a few months and writing the package. And, the response was really, really good. It's one of those things when you develop in our package, sometimes the community picks it up. Sometimes it's mostly something you use for yourself, but, but people have found tidy census really useful.
Kyle Walker:And, and that, that makes me really happy though. Frankly, if even if the package weren't successful, I would have still saved so much time because it is literally software that I use every single day. And, should that that was sort of the evolution, maybe a long way of evolution of the package, but it's kind of how it came to be.
David Keyes:I mean, that's interesting that I I didn't realize, I guess, that you did the Tigris pitch for getting shape files first. And it was only after that that then you realized, oh, tidycensus would would kind of, meet a need as well. Because for me personally, I I did I came to tidy senses first
Kyle Walker:Mhmm.
David Keyes:Because all I at that point, I wasn't maybe I was doing a little bit of mapping. But I wanted, you know, just to get automate the process of getting data from the census bureau. And so tidy census was great. And then later on, I realized, oh, this guy, Kyle, has also written this package called Tigris that allows me if I just wanna get shape files or you can within tidy senses, bring in those shape files alongside the demographic files, which is super handy. So, this section of my book talking about, you know, ways of automating your work, and it's using tidy census as an example of interfacing of working with an API.
David Keyes:Yeah. If so I know, like, people just coming to programming here at API, and I know for me, like, it was it was kind of a scary thing. Oh, absolutely. So can you explain for someone who who's not familiar what what is an API, and how does it work in the context of tidycensus?
Kyle Walker:It's a great question. And frankly, it is something that, you know, often is useful to demystify because, you know, I teach, I teach at TCU. I've been teaching for several years now. Basically an intro to data analysis and visualization using Python, for frankly non CS students. So mostly liberal arts and journalism and PR students take the class in some business.
Kyle Walker:And so the the class is designed in a way to kind of hear these programming concepts. How do we make them intelligible outside of someone who's deeply embedded in software engineering And APIs are intimidating at the outset. For 1, what is an application programming interface? You know, that sounds kind of intimidating and it's especially intimidating. If you see an app, a JSON endpoint, which looks like a web address, and then you put it in your web browser and it spits out just this huge block of JSON.
Kyle Walker:And you look at it. If you haven't seen anything like that before and think, what on earth is this? And so basically the way that I like to describe an API, at least in terms of web data resources. So a data API is a way that you can access data programmatically over the internet. So there are lots of different ways to access data.
Kyle Walker:You can go to a website and you can download an Excel spreadsheet or download a CSV file. You can connect to a database. There are lots of different ways to do it. Where an API is really useful is it exposes data in sort of a developer friendly format. So a format that can be readily consumed by another website or a programming language like R or Python, and allows you to stream that data directly into your application.
Kyle Walker:So it's just trying to make it so developers can get access to data. And, you know, when I'm teaching about APIs, we will interact with, a variety of APIs I'm in Fort worth, Texas. So Fort worth, like many large cities has a contract with Socrata to build out an open data API. And so we kinda, we step through it. You know, JSON isn't so intimidating.
Kyle Walker:It's just key value pairs rather than rows and columns, but, really kind of demystifying that and showing this is just a different way of thinking about data ends up being pretty important. And where tidy census comes in. What tidy census tries to do is all of the tedious aspects of getting census data. It tries to do that for you so that you can focus on the fun aspects of census data. So making maps is fun, analyzing data and finding out insights about your community is fun and interesting.
Kyle Walker:But setting up a connector to an API or figuring out how to align columns in emerge, it's it's more tedious. And so tidy census tries to take away all the tedious stuff and do it for you. So what tidycensus will do is users will request for a given level of aggregation. We call that geography. So in census terms, there are what we call enumeration units, which are, are kind of, or statistical areas.
Kyle Walker:If you've heard of a census tract or a census block group, these are sort of small areas at which the census tabulates data. And then also what are called legal entities, so counties and states, which are both levels of aggregation in the census and, kind of actual government units. And so you request data, say, for counties, and then you plug in 1 or more census variable codes. And what tidy census will do is it will assemble all that information and construct a call to the census open data API. It will go to the appropriate endpoint, which is typically the dataset, from which you're requesting data.
Kyle Walker:It will communicate with that, the census website, bring the data back. The data comes back in JSON format. So JavaScript object notation, and then tidy census does all the work of tidying up that JSON for you. So I, you know, there are a lot of different ways that you could get data back, tidy census returns data in the format that I like the best. And so it's kind of following Hadley Wickham's concept of tidy data by default.
Kyle Walker:So it's what we'll typically call long form data, but it'll do all that sort of reshaping internally and give you back the data. So that's, that's kind of the process by which it works.
David Keyes:Yeah. That's great. I actually, it's funny because I was gonna ask, like, why why tidy census, because there's there is one other what's the it's like census API or there's some other package right that that that goes to be honest, I've never used it. I know it goes beyond what tidy census does in terms of allowing you to access different types of data or something. I I don't know the specifics, But I know one other thing that differentiates it is the the tidy aspect that you your package, tidy census, is very focused on getting data into that tidy format.
David Keyes:So explain why why you designed it, I guess, in that way.
Kyle Walker:Yeah. That's a good question. So, Hana Recht is the developer of
David Keyes:That's right.
Kyle Walker:I mean, she is fantastic, brilliant programmer. And, if you haven't seen her data journalism work, it's it's really, really good stuff. And and census API is is a massive accomplishment. And it's a package that I use quite regularly. So Census API is, another package that connects with the census APIs.
Kyle Walker:It has sort of allied goals to tidy census, but but different goals. So tidy census and the reason why I wrote it, was I wanted a package that gave back census data for the datasets that I used that allowed for automated joining with spatial data, because I needed spatial census data for my projects, in consulting and in my academic research. And so I have, pretty opinionated about the format that I like to work with. And so I wrote the package, frankly, originally just for my own work. I thought, you know, this is something I want to have, and so I'm going to make it.
Kyle Walker:And then I'll open source it. If somebody else finds it useful. Great. You know, if nobody else finds it useful, that's fine. Because I'm still going to use it.
Kyle Walker:Hana's package. And the reason why I say it's a, it's a tremendous accomplishment is it actually connects to every single census API endpoint and census has hundreds of datasets. Some of which you probably never even heard of. The big ones are the decennial census and the American community survey. So for listeners who are less familiar, the decennial US census, is a complete count of the US population takes place every 10 years and focuses on a select number of demographic characteristics of the US population, such as race, age, sex, and occupancy.
Kyle Walker:The American community survey is an annual survey of a subset of us households around 3 and a half to 4000000 US households now. And they do it every year on a rolling basis. And that asks all sorts of other demographic questions. So, the core demographic questions like race, ethnicity, age, sex, but also education, income, kind of housing tenure, housing stock, family status, lots of other things. And so, tidycensus focused originally on those 2 datasets.
Kyle Walker:And over the years has incorporated with the help of my coauthor, Matt Herman, who joined as coauthor a few years ago. It's incorporated a few other datasets, including, individual level micro data, which is one of my favorite features of the package and then migration flows data and, and the population estimates. We're always sort of adding new features as we go. Since this API automatically connects, HANA wrote the package to be generalizable. So you can actually go in and you specify which dataset you want.
Kyle Walker:And it provides a single general interface through a function called get census to any of those, census API endpoints. It's, it's a, it's a challenge to maintain. Certainly. I admire the work that she does quite a bit because you know, these APIs change from time to time and, occasionally modifications are made. It returned to the data in more, I would say a raw format than tidy census does tidy census does, some sort of opinionated data wrangling.
Kyle Walker:Internally, if you want something more raw from the census API, that's closer to what, the actual request gives back, then the census API package is a good good place to look.
David Keyes:Gotcha. So it brings it in, does that wrangling. Basically, it puts it in a in a tidy format, which, I'm actually in the process right now of teaching people about tidy data as a concept and the Sure. The logistics of getting your data in a tidy format, that's that's a huge a huge benefit to be able to, you know, access data from decennial census, ACS, and a few other things and get that data back in that tidy format is a is a huge time saver. Great.
David Keyes:Is there anything else you think could be useful to talk about?
Kyle Walker:Yeah. Honestly, again, being able to do this quickly, it was my major motivation for running the package. And what this opens up are there is so many different kind of maps you can make. So, I have a book coming out. It's called analyzing US Census Data methods, maps, and models in our CRC press this fall.
Kyle Walker:You can preorder it today. And, it's available also to read for free online. So, you know, it'd be great if y'all go pick up a copy because, you know, that helps me maintain the free version, but it is free and chapter 6, which incidentally is the most visited chapter of the book by far, is all about mapping census data. So you can learn in here how to make all sorts of types of dots, dot density maps, which I quite like, graduated symbol maps, interactive maps. There are a lot of different options that you can explore and, you know, frankly, that's what often excites me the most when I see people using tidy census, because this is the creative part of analyzing and visualizing data.
Kyle Walker:And if tidy senses help people get to that creative aspect of their data work faster, then there's so many interesting things that can be done with census data because they're so applicable to a wide range of different fields. And I will say as well, because David, you all have listeners from all over the world who might be saying, you know, this is, so we've got, you know, the us census data, you know, I'm interested in my country or I'm interested in a different country. So I encourage you to check out my book, chapter 12, which talks about international census data and shows some resources for census data around the world and these different methods that we've been talking about. There are a few other packages that are really, really good that, can apply similar types of visualization methods to this, to, to non US contexts. So
David Keyes:Cool. That's great. Yeah. And I get like, I, talking to you, you know, I'm based in the US. I I know, you know, a lot of folks are, and I I use US Census Bureau data a lot.
David Keyes:But the overall idea that you can use R as a way to access data through an API and pull it in in a way that's so much more efficient than the all the manual alternatives, that applies way beyond this specific tidy sense as example. And the nice thing about our being open source is people have made packages to allow you to access data from the Kenyan census or, you know, any other any other source. I've been pretty shocked actually at how many, the, you know, the wide variety of packages that can help you do that. Super. Well, this has been really, really helpful.
David Keyes:I will definitely include a link to your book. Where else can people find more information, about you if they're interested in learning more about your work?
Kyle Walker:Yeah. Absolutely. So I'm on the web atwalkerdashdata.com. That's my consultancy website. And so if any listeners are needing help with any of this stuff, feel free to send me a note.
Kyle Walker:I work with everyone from large companies to individuals and, you can follow me on Twitter. I'm, I'm, I'm often on on Twitter in terms of how active I am, but that is often where I do share new features that I'm developing. I also have a mailing list that you can join. The link to that is on the tidy synthesis documentation and, check on my GitHub as well. So that's a good place to stay on top of new features that are coming through.
Kyle Walker:Tigers and tidy census are, are the packages that people are most familiar with, but I have a few other packages too that that you might find useful. So, check on my GitHub. I'm reasonably active over there as well. But, yeah, drop me a line if anyone wants to chat further. I'm at kyle@walkerdashdata.com, and I love hearing from people.
David Keyes:Great. Well, thanks again, Kyle. I really appreciate you taking the time to share all of this with us.
Kyle Walker:Yeah. Of course. Thanks, David. This is a lot of fun.
David Keyes:Thanks again for listening. I hope you found this conversation interesting. If you have any feedback, I'd love to hear it. David@rfortherestofus.com. Thanks.