You may think of R as a tool for complex statistical analysis, but it's much more than that. From data visualization to efficient reporting, to improving your workflow, R can do it all. On this podcast, I talk with people about how they use R in unique and creative ways.
Hi. I'm David Keyes, and I run R for the rest of us. You may think of R as a tool for complex statistical analysis, but it's much more than that. From data visualization to efficient reporting to improving your workflow, R can do it all. On this podcast, I talk with people about how they use R in unique and creative ways.
David Keyes:Join me and learn how R can help you. I'm joined today by Alison Hill. Alison is the director of knowledge at Voltron Data. Prior to this role, she worked in data science and education roles at IBM and RStudio, and she was a professor at Oregon Health and Science University in Portland. With a PhD in developmental psychology, quantitative methods, and evaluation from Vanderbilt.
David Keyes:Alison has long taken a keen interest in helping people use R Markdown to make their work more efficient, more accurate, and more reproducible. Thank you, Allison, for joining. I'm delighted to have you on the show today.
Alison Hill:Thank you. I'm delighted to be here with you.
David Keyes:I know you've just switched to a new role. Mhmm. What is what is a director of knowledge do? What what does that mean?
Alison Hill:Yes. Well, so Voltron Data is a company that's all about bridging hardware, software, and communities. So the director of knowledge role that I've taken on there is trying to sort of build bridges internally between our different teams so that we can all stay united on the mission. So, you can imagine that a startup has sort of a explosive growth of, new employees and that's exactly what we're experiencing which also means an explosive ideas, and also documents. So there's a lot of knowledge being shared, a lot of knowledge that you'd like to have transferred to certain, you know, teams and people and especially knowledge about really highly technical concepts.
Alison Hill:So, Voltron Data is all about, accelerating, the Apache Arrow, open source toolkit. And so we've got some pretty technical people who are working on that and some pretty technical high ideals and things that we're trying to achieve. So I'm shepherding that information and trying to figure out a way and a strategy for us to be able to bring all the employees along for the ride and help everybody be able do their job better. So trying to rein in some of the systems and the ways that we talk with each other but also the actual content like what are what do people need to know in order to feel like they can talk about, what we do as a company and what we're aiming for and our vision and our mission. So it's kind of a broad purview but it's been, really fun.
Alison Hill:I've been there for about 3 weeks and I'm I'm still in the, like, drinking out of the fire hose phase. And
David Keyes:Sure. Sure.
Alison Hill:And it's been really exciting. It's also, you know, a lot of people, on the more technical side who are really committed to open source and community engagement. So I really love that flavor of, Voltron Data's mission.
David Keyes:Right. Where where does r fit in in terms of your current role?
Alison Hill:Yeah. It doesn't fit in much in terms of my day to day working. So right now, I'm doing a lot of, kind of knowledge sharing and knowledge transfer in, Google Workspaces and Notion and, reading through a lot of materials that have already been created, a lot of resources. But, yeah, the the company as a whole does have this mission of making making it easier to work with big data in whatever language you wanna use using whatever, you know, user interface you wanna use. So they want to support right now they're supporting, Python and, our users through supporting, dplyr and luberdate, which are some of the core tidyverse packages.
Alison Hill:So, you know, the idea is that you're able to work with big data wherever it is, and be able to use whatever tools that you want to be able to use. So they really wanna support sort of this, like, polyglot workflow. Like, you know, you kind of bring your own language and you can work with with the data that you need to. So I really love that part of it and there's definitely some, some some friends on teams that are working on the, Apache Arrow r package, as well as, some of the, some of the other kind of like I think of them as, sort of glue, software that's kind of helping make those, exchanges between r and some kind of big data systems a little bit easier. So there's a few open source projects that are going on right now.
Alison Hill:Substrate is one of them. Dplyr support is a part of Aero. And then there's also, a Python crew who's also working on sort of corollaries to dplyr and Python. So being able to use a, a Python package called IBIS to be able to, interface with data the way that you'd like to. So it's it's pretty exciting and it's nice to be able to be in a place where you can really support open source development, but also be focusing on the people at the company, who would like to understand all of that, but need it at a level and, kinda kinda delivered in a way that's a little bit easier than maybe reading through GitHub issues or GitHub release notes.
David Keyes:Sure. Yeah. 1, it seems like the kind of thing that that runs through all of your work is, one of the things is really being able to communicate effectively and bringing together, you know, different audiences, that kind of thing. And I've asked you on today because I wanna talk about, our markdown, which is Mhmm. If nothing else about, you know, being able to kind of efficiently and effectively communicate.
David Keyes:So before we actually dive into talking about our markdown, I'm curious about kind of your background and how you got into r and kind of what switched for you, when you did move to r in terms of your work. Mhmm.
Alison Hill:Yeah. Well, so I got into using r at first, when I was a new professor at Oregon Health and Science University. And I got into it because I had done all of my graduate research in psychology. I had done it all all the statistical analyses in SAS, and that's all my courses had been in all SAS. And I really liked SAS for those who aren't familiar with it.
Alison Hill:It's it's sort of like a baby stepping to a command line tool. You know, you are writing out text to be able to interact with data, but it's a little bit different than, you know, working with a programming language like R or Python. And so I used Sass, I was a happy Sass user, and then I joined, OHSU and I realized the cost of a Sass license. And the, the director of the program that I was in said, you know, look he had come from Bell Labs originally so that's where the s language was originally developed and R sort of evolved out of s. So he was super comfortable with base R and he was like, look I use R it's open source.
Alison Hill:I think you should try it. And at that point I was in this computational program, working at the center at OHSU that's no longer in existence, but it was called the Center For Spoken Language Understanding. And what we did was we trained people who were machine learning researchers and natural natural language processing researchers to be able to work with, medical and health related data and to be able to, you know, use that and use, you know, advanced computational methods to be able to derive, you know, unique insights about, you know, healthcare related, you know, issues like children's response to treatments, symptom progression, things like that. And so I started learning R and I was like wow, R students could learn to use R to do the statistical analyses that they need to do. So I found myself really drawn to being able to help people do the work that they needed to do to do their jobs.
Alison Hill:So I started teaching R, for data science at OHSU and that was sort of like my hook. Like I felt like it had super powered my workflows to be able to do better research. And then I started using R markdown because I had collaborators and my collaborators didn't know r. I was still, like, kind of that, like, island r user that, you know, I was in a computational department, but everybody else there actually use Python. So I used r and I was kind of like this lone wolf where I would, be knitting my little documents and being able to be you know, create things that were shareable, but also create things that were really dynamic.
Alison Hill:So I could go into a meeting with a collaborator who didn't want to use anything, you know, who maybe was the the clinical expert subject matter expert on my team and be able to show them, like, here's the data. Here's what I did with it. Here's some visualizations, and we could have really productive meetings looked at the group in this way? Or how is that different if we use this variable versus this variable? And it allowed me to just go into my arm art on document, edit the code as we were talking, and then, you know, regenerate plots as we were meeting, be able to regenerate tables, and then I'd be able to give them, you know, an actual artifact.
Alison Hill:Like, I'd be able to, you know, knit to PDF or knit to an HTML document that I could share with them and then they'd have that kind of, like, in their hands and on their computers so that they could look at it later. So for me, it really supercharged my ability to actually collaborate with other people and not feel like I was, kinda doing science in a vacuum.
David Keyes:That's interesting that it supercharged your ability to work with non our users.
Alison Hill:So, like,
David Keyes:as much as with at least initially as as much as with with our users. You describe it.
Alison Hill:Yeah. And as I sort of became more of an evangelist, you know, I I started training more people underneath me because I wanted to work with, you know, research assistants and graduate students who were using my same tools. So then it was really more fun when you got to have people that, you know, I started teaching classes in data visualization and in data science. And then, those people would come to work in my lab with me and then it was, like, you know, really supercharged fun at that point because you could be sharing documents, you know, we were using GitLab, to be able to share our code with each other, but we'd be able to, you know, really quickly iterate and have fun with poster presentations, papers, you know, progress reports, anything where we were consuming the data and trying to get a peek at, you know, what was happening, how our research was going. It was a lot more fun when I had other people who are our users to to join in also.
David Keyes:Yeah. Definitely. So you mentioned briefly our markdown. I'm curious because I always have trouble describing our markdown to to to non r users. I'm curious maybe starting out talking about our markdown.
David Keyes:How how do you define it when you're talking to people who are not familiar with it?
Alison Hill:That's a really good question. Yeah. I I think of R Markdown as something that if data scientists didn't have it, they would have had to invent it. Because, you know, if you're sitting there and you're writing your code, you might be surprised that somebody who doesn't, you know, maybe interact with code frequently or someone who's, you know, kind of, only consumed the output of people's code that a lot of people just use scripts and and kinda go line by line and create things, but don't necessarily have a shareable, artifact of what they did. And for me, Rmarkdown was both like a place to do work.
Alison Hill:So it was kind of an interactive experience because I was using the RStudio IDE, my integrated development environment, so I could kind of run code as I was working. So it allowed me to sort of, like, code while thinking, you know, so I could write notes to myself and I could try different things and I could iterate quickly. But then also it allowed me to package it up in kind of in a nice little bow and be able to say, like, okay. I'm gonna basically print this off. It's sort of like a Google Doc print to PDF.
Alison Hill:It's like I can give you this, this kind of fossilized version of my work, and I can also kind of edit that. I can make it, you know, more or less relatable to you. If you don't wanna see my code I can just like mute all my code and just show you my plots or I can actually use it for teaching materials. So I can actually show you you know, if you want to learn how to code, you can see my code and exactly what it produced. So it's sort of like that Swiss army knife element.
Alison Hill:And I think that's what's tough about explaining our markdown to people too is that it's it's not one thing. It's an R package. It's a file format. It's also sort of an ecosystem and, and then you've got, like, all these kind of different verbs around it and all these nouns about like file formats and packages. So I think it's hard because you can use the word R Markdown to to kind of, you know, connect with any of those concepts.
Alison Hill:So I think if you're just kind of on the outside looking in it's helpful to kind of define it at the different levels. Like it's both a file format like a dotrmd document is what you need to be in, but that's just a plain text document that just has some, like, special r chunks in it that allows you to include code with real
David Keyes:words. Yeah. And so r r markdown is often referred to as a form of of literate programming. I'm curious, well, what what does literate programming mean, and what's the value of literate programming?
Alison Hill:Well, so literate programming is a concept that was developed by Donald, I believe it's noose. No Knuth, I think. There's a pronunciation on his website, and it was really this idea of being able to weave together this code plus narrative. And I think in the original, you know, kind of, like, flushing out of the idea, it was really more for programmers to be able to write more literate code, and I think the labeling of it even, I think, in some of the original writings they talked, he talked about how he labeled it literate programming on purpose to sort of give it a little bit of a value judgment so that you, like, you don't wanna be an illiterate programmer. Right?
Alison Hill:Like, you want to be a literate programmer. So so it was kind of very intentionally labeled that way. But his idea was that, you know, he ended up feeling like he wrote better code and the people that he worked with were better code when they were kind of weaving together sort of documentation at the same time. So being able to say, you know, not just what you're doing, but why you're doing it and why you're doing it this way. And I think that sort of filtered down from the the programming domain for for scientists and for data scientists.
Alison Hill:So anybody who needs to work with data can also kind of be inspired by that idea and think about, like, okay, great. Like, that's also a really helpful concept for me to be able to, you know, package up my own ideas and work and have it have more of an impact. So I think it's one of those, like, great programming concepts that makes a whole lot of sense, but I think, you and I kind of share this, this reference, I think, to the curb effect. Like, it makes it easier for everyone, you know, to be able to to build a system that is more, open and accessible to people who are not the person who wrote the code. So the original intention was for 1 developer to be able to see another developer's program and be able to understand it better.
Alison Hill:But, you know, data scientists have sort of co opted it and said, like, okay, here's actually a way for you to be able to understand, you know, the science that I'm doing better.
David Keyes:Yeah. That makes sense. So one critique that I hear a lot, and this isn't specific to our markdown, but it's specific to our, is that taking our takes a while. And so I'm curious what makes it worth it and in particular what what in what ways do you think our markdown especially makes it worth taking the time to learn r?
Alison Hill:I think it really depends on what you're trying to achieve. So a lot of times when I would talk to, new especially, like, data scientists or even researchers who are doing academic research science, you know, being able to figure out how to, talk to other people in your groups is really important. And so if you are feeling that pain, then I think our markdown is kind of the the best Swiss army knife solution. It has a lot of benefits for yourself as well, but I think the you know, if you take a spring break and you come back, you know exactly where you left off because you've left yourself a nice little trail and it goes beyond commenting code. So a lot of people in scripts will just, like, comment and they think that's enough, but it's really more than that.
Alison Hill:And it's also being able to explain, like, your thought process why you're breaking things down that way, but also, like, the output itself. Like, if you have a plot, it doesn't stand on its own. It really helps to have words around it. Like, here's what you're seeing in this plot or here's what I'm noticing and pulling out from this plot. So I think if you're, if you're in that place where you're feeling like maybe your your work is difficult for you to understand when you come back to it or if you're in that place where you're, you're feeling like you're doing all this work, but it's not really, like, bubbling up to the level, the next level of, like, the people that you work with to be able to appreciate and understand it and, give you feedback on it, then I think that's kind of a a good sign that it might be helpful for you to think about a way to more easily share it so that people can consume it.
Alison Hill:And I think, so I think it kind of masks itself in this sort of shroud of, like, things aren't quite flowing and collaborations aren't quite happening, you know, what could I do differently. But certainly r does have a high learning curve. I think it's a lot better than when I learned it. I think when I first learned it there were sort of like the core tidyverse packages were, were out there, but I don't even think tidy verse as a name was out there yet. So I think it was, like, I was using ggplot2, and I was using some of the, you know, dplyr, package functions, but it wasn't all knit together.
Alison Hill:And I think, you know, for me when I was teaching and teaching a lot of beginners, I found that to be a more, a more pleasant on ramp for people, especially people who don't have any background with any kind of coding before, if they're coming from the place of, like, I need to do Excel plus then I think, that's a nicer on Right. For them personally. I think I think we might share that ethos. But, for me, I think that's made it
David Keyes:a little easier. I mean, yeah. It almost sounds like I mean, tell me if I'm kind of misinterpreting what you said, but one thing I heard is that in some ways using our markdown, it it forces you to kind of verbal well, not really verbalize, forces you to type, you know, kind of what you're doing. And in that process can actually help you to get clearer about, you know, what it is that you're doing. Because if you can't articulate it, you know, in your own, like, to yourself, then it's obviously you're gonna struggle to articulate it to others.
David Keyes:So as opposed to just having a script where you're like, oh, yeah. I know what this does. But then if you actually are forced to articulate it, you might struggle. Our our markdown basically forces you to do that step of of articulating it.
Alison Hill:Yeah. Exactly. And you can imagine that even if you, you know, even if you're not able to actually share, like, the the rendered artifact of, you know, an arm work done process, which is usually, like, some kind of static file that people can't really interact with too easily but they can consume it. Even if it's just like you're using it as an interactive workspace, you know, you can imagine that especially in today's, like, fully remote asynchronous work environment, it's so much easier to be able to hop on a video with a colleague, share your screen, and be able to walk them through, like, okay. Here's where I did, you know, some data exploration.
Alison Hill:And if they say, well, wait wait wait wait wait. Like, well, did you look at missingness or something like that? You can just hop back and be like, oh, right. Right. You know, I can just kind of skip to this little place where I have a bookmark essentially that says, like, this is where I did the missingness analysis and you can say, okay.
Alison Hill:Let's take a step back and talk about missing this first then and go through that. And then they can say, like, oh, but you might have missed that. You know, maybe this participant or this, you know, this set of participants maybe should be excluded from this analysis because we figured out that the, you know, maybe the, you know, the equipment that we were using wasn't correctly calibrated before that and we haven't let it, you know, get into the data dictionary. Like, there's all kinds of different things that you can get from that collaborative experience that are a lot easier if you've got the code and your rationale behind it in in one place so that you don't have, like, otherwise what you end up having or at least what I did which is not optimal is I had a script and then I had, like, little files that were describing what each script did. And we're like, okay.
Alison Hill:This is where I did this. This is where I did this. And then here's some notes about, like, oh, this was weird. I should go back and check on this. And then it the fusion of those two things, like, those never line up again for you.
Alison Hill:So if even if you're, like, the most meticulous and paranoid documenter, that is an unhappy workflow.
David Keyes:Yeah. Definitely. That's great. Well, if people wanna learn more about you, what would be the best place for them to do that?
Alison Hill:My website. My website is a pretty complete resource on me. So it's, apreshill.com. So apreshill, dot com, and I need to update it. But, it has a list of all the talks that I've given along with slides and video recordings if they were available and also links out to all my projects.
Alison Hill:And then a, I would say that I blog, about every 6 months maybe when I get the time. I have a 5 year old, so, I am, I'm not prioritizing that right now, but you can see some of, the things that I've worked on both at our studio and since then, on my blog as well.
David Keyes:And just to point out to folks, your website is made with, blog down. Right?
Alison Hill:Yes.
David Keyes:Which is kind of a related package
Alison Hill:blog down creating websites with our markdown is a great resource if you want to start pretending like you're a front end developer. You can you can just kind of bypass the front end developer stuff and just go to making fun cool websites. It's pretty empowering and if you're in the need for developing a personal website, it's a really great, great package for that.
David Keyes:Great. Well, I'll include a link, to your website, and and the book in the the show notes for this episode. Awesome. Well, Allison, thank you so much for joining me today and sharing about our markdown.
Alison Hill:Thank you, David. It was really nice to see you.
David Keyes:Thanks again for listening. I hope you found this conversation interesting. If you have any feedback, I'd love to hear it. David at r for the rest of us.com. Thanks.