You may think of R as a tool for complex statistical analysis, but it's much more than that. From data visualization to efficient reporting, to improving your workflow, R can do it all. On this podcast, I talk with people about how they use R in unique and creative ways.
Hi. I'm David Keyes, and I run R for the Rest of Us. You may think of R as a tool for complex statistical analysis, but it's much more than that. From data visualization to efficient reporting to improving your workflow, R can do it all. On this podcast, I talk with people about how they use R in unique and creative ways.
David Keyes:Join me and learn how R can help you. Well, I'm delighted to be joined today by Megan Harris. Megan is a data integration specialist at the Primary Care Research Institute at the University of Buffalo. There, she brings together data from multiple sources to create insights that benefit people affected by opioid use disorder. And when Megan's not creating slick data pipeline, she makes art using r, which she posts on Twitter at Megan s Harris, and that's meghans Harris, h a r r I s.
David Keyes:Meghan, thank you for being with me today. I appreciate it.
Meghan Harris:Thank you for having me. Super excited to talk about this today.
David Keyes:So, Meghan, just to learn a bit about you, when did you start using R?
Meghan Harris:So the first time or the successful time so the very the sure. So the very first time I started using R was probably in the year 2019. And that was because I had started a position as an evaluation associate, And, there was a project that I inherited where someone had used it, but the someone was gone already. And I was kind of just like, I don't know what this is. So when I found out, like, what it was, me being as hard headed as I am, I'm just like, I'm gonna make a shiny app without learning how to upload data into it or any of it.
Meghan Harris:So, clearly, I fail, and I felt really rejected and dejected from our studio. So I kinda just took a few months off. And then I had some downtime, I think, at the beginning of, 2020 in January, and that's when I kinda really was just like, okay. Let's do this right. Let's actually, like, do piece by piece.
Meghan Harris:So we could say 2019 or we can say 2020. It depends.
David Keyes:I mean, honestly, I think your your experience is is actually pretty common. I I personally did a workshop in 2015 at the American Evaluation Association Conference on making data viz with ggplot, but I had never used R before. And in retrospect, it was a terrible way to be introduced to to R because if you don't understand the fundamentals, making Dataviz and ggplot isn't gonna mean anything to you. Yeah. And it was only later when I went back and, similarly, said, like, okay.
David Keyes:I I I'm intrigued by this. Let me sit down and go through it step by step that I actually started to figure it out.
Meghan Harris:I kinda have the same thing. And, David, I don't know if you remember this, but you kind of aided in me kind of going back into this, really, because I I I wanna say I really took another look at it after I came back from the American Evaluative Association, conference in 2019. So by that time, I had already been trying. And, like, when I saw it flushed out because I think your presentation was on R Markdown, if I'm remembering correctly. Yep.
Meghan Harris:So once it was like, I saw someone else doing something, because you weren't the only one that was doing something in ours, other people too. I got so, like, hyped and jazzed about it. I was just like, no. This is where I need to be at. So I kind of, like, used the inspiration from everyone's things at that conference and came back and kinda that's when I went full force and things just started to click for me.
Meghan Harris:I I don't know. That's all I can, like, describe it. It's just things just started to click.
David Keyes:Yeah. Yeah. Definitely. I mean, you do kind of hit that moment and then things make sense.
Meghan Harris:Yeah.
David Keyes:And for me, what that feels like is even if I don't know how to do something, I know what I need to do to get to the point where I could understand it. Whereas before that, it's like, I don't know how to do so I was gonna ask you why you switched, but clearly, I know the answer, which is you were kind of forced to, and then wanted to. I'm curious what changed for you then when you actually did, start, you know, really diving into r and using it.
Meghan Harris:Yep. So what changed for me was the method. Alright? Because when I first started trying to learn first, it was that I had this code that I completely did not understand. You know, I did not understand what was going on.
Meghan Harris:The person whose project I inherited, they were using it to do regressions and stuff. You know? So I I was just like, oh my gosh. What is this? So it was this piece of needing to understand this, but and it was also kind of like I felt, like you said, I felt like I was forced to try to learn something to get this project out, you know, to, the client.
Meghan Harris:So what changed after I came back from that conference, I had a shift in mindset, honestly. And when people ask me this, they they ask me, like, how did you get, like, started? Stuff like that. What I ended up hap what I ended up use like use what I used to do was I would literally be like, okay, here's the R for data science book. I'm gonna read this cover to cover and have a I I used to have like a study schedule, you know, and it I felt so just frustrated because I nothing ever felt like it stuck.
Meghan Harris:So what changed for me was just realizing, I kinda work in this position where, like, my role's not really in data, but, like, I'm an evaluation associate. There's some aspect of data. You know? And there was some flexibility of, like, oh, like, I can do this evaluation report, but I had this this, opportunity to clean some data or to do a visualization or something like that. So that's kinda where I was just like, okay.
Meghan Harris:I have this data set. I need to clean it so that I can put it in Tableau. I learned how to do Tableau first before I learned R. And so when it was kind of, like, breaking it down and, like, finally understanding, like, okay. I don't need to read a book cover to cover because that just doesn't personally work for me.
Meghan Harris:What I need is to understand what it is I'm trying to do. And then I'll go ahead and spend 2 weeks googling how to do it. But at the end of 2 weeks, I will have done it. So that's kinda like it was just like a loop of, like, constantly new things I need to Google for a few weeks to figure out to do my job. And then it ended up being like, oh, I I made a Ggplot before.
Meghan Harris:I gotta do that again. Oh, okay. I've I've cleaned this. I gotta do this again. And that's where things just kinda started sticking for me, just having to do it.
Meghan Harris:You know?
David Keyes:Yeah. Definitely. I mean, I know I've talked to a lot of people who said, you know, they tried to take a really similar approach. Like, let me read our for data science, you know, front cover to back cover. And it's hard because even if you read it and it makes sense while you're reading it, when you then go into our studio yourself and you're trying to work, it just it doesn't quite click until you're actually doing something in context.
David Keyes:So I think that was that was really smart. So that was when you're, doing you're working as an evaluation associate at a in a different job. Yep. You then moved to a position, at the University of Buffalo. Mhmm.
David Keyes:Can you just give kind of an overview of what your job is there, what what types of things you do, and how you
Meghan Harris:use R? Yes. So I have been able to completely create entire pipelines for, like, my local department of, opioid. We have, like, for our local government, like, health department, we have an opioid department. So, when they created the position, I don't think they really intended someone to kinda come in and, like, do programming.
Meghan Harris:It was just basic data. I don't wanna say analysis because data analysis and, analysts are not basic, but it was kind of like the posting itself was really big. You know? There wasn't really a lot of, like, oh, you should have, like, this experience. Because, what I found out the hard way was that there really wasn't any structure there.
Meghan Harris:There wasn't any structure. So basically what happened was that I had my interviews, showed them examples of, like, you know, processing and analysis I've done in R. And when I did the interviews, it's like no one really understood. You know? I can tell no one really understood, like, how powerful, like, the R language was.
Meghan Harris:All they knew was that I knew how to do something that they didn't know how to do, and it sounded like a good fit, so they hired me. So I always say, like, I totally got lucky. I mean, not to discredit the work that I've done to get there being self taught, but, like, I got lucky. So basically I came in and they're just like, okay, here's the data go. That's it.
Meghan Harris:So, so I had to end up starting to, like, piece together this pipeline, and it took a long time. I mean, I was, in this position. It'll be 2 years, in a few weeks, actually, where I kinda just came in, started doing, like, these data landscapes of, like, okay. Where is all this data coming from? Because that's essentially what it is.
Meghan Harris:So I am a person that wrangles all of this data from different sources, whether it is our local police departments, whether it is surveys that the Department of Health created theirselves, whether it's census data. It doesn't matter. I'm pulling it in and making it in a centralized location. So what ended up happening through the years was that they needed all these different deliverables. So we have, pipelines that go from r into, like, Tableau.
Meghan Harris:We have ones that go into shiny dashboards. A part of that one is, like, an example of the one that I uploaded for you to review, where we kinda had this modularized set of pipelines just doing a whole bunch of stuff to give one output. So I think in a nutshell, that's that kind of embodies everything. I mean, there's a lot packed into that, though.
David Keyes:Yeah. Well, it does seem like data integration, specialist is a is a good good title for the types of things you're doing. Definitely. So the reason we're talking today, the reason I reach out to you or well, you shared with me that you had, done some work where you brought data in, using Google Forms. Is that right?
Meghan Harris:Yep. Google Sheets. Yep.
David Keyes:Yeah. And and was it a survey done with Google Forms that then went to Google Sheets? So what can you kind of explain, I guess, the the overall, process? Like, what what the project was? And, obviously, we're not gonna talk about the this that specific project because it has private data Mhmm.
Meghan Harris:That
David Keyes:we're not gonna that we can't go into. Right. But we'll talk about it in the context, of another example you shared with me. But so maybe just start out. This is a very long winded question, very long winded way of me asking you.
David Keyes:If you can just give an overview of what the workflow, of this example project you shared with me looks like.
Meghan Harris:Sure. So, I mean, in the actual work that I've done with, you know, UB or with the county, basically, what it was was that we had people that were going to Google Forms theirselves and enter entering data, answering our survey questions. So whether you use Google Forms or whether you have it just, data just being entered manually into Google Spreadsheets, the flow was kind of the same. Because when you use Google Forms, I believe you have that option. I'm not sure if that does that automatically, but in my experience and whenever I have Google forms data, it's always getting exported into a Google spreadsheet.
Meghan Harris:So, basically, what was what I was tasked with was just like, oh, okay. We have, like, a lot of survey data that's just kinda sitting there in Google Forms. Do something with it. So for the project that I unfortunately can't share, but I can give, like, a high level of what happened was that, okay, we had, like, hundreds of old Google Form Datas, and we needed to put it into some kind of PDF report. So that's where I kinda started getting this pipeline idea of, like, having these different sections of, like, okay.
Meghan Harris:This is the data coming in. This is how I can check to see if there's differences or new data and just a kind of a stepwise process of what was needed for whatever. So if I need visuals, I have a script for that. If I need, processing or cleaning, I have one for that. So it just really was dependent on what was in front of me at the time.
David Keyes:Yeah. And I think, like, the piece that I'm, you know, most interested in for for this conversation is that that way that you use R to connect directly to Google Sheets.
Meghan Harris:Mhmm.
David Keyes:And so if other people have worked with Google Sheets, I mean, if, of course, you couldn't like, a lot of people do analysis in Google Sheets, so they'll have their date their raw data maybe, and then they have separate tab where they're doing their analysis or whatever, making pivot tables. I don't know, because that's not my world. I don't know exactly what you would do, but I have a a broad sense. Other people might, like, download the data from Google Sheets and say work with it in Excel or something like that.
Meghan Harris:Mhmm.
David Keyes:Or or even work with it in R, but you actually used, a package Google Sheets 4 that connected directly to Google Sheets. Can you talk about, how you did like, what that package does, how and kinda how it works and what the main advantages to using it are?
Meghan Harris:Sure. So to my understanding because I've honestly only used Google Sheets for for this purpose. I have been meaning to take the time to sit and just like because there's so much more you can do, like interacting with, the Google console and platform through our with it. But for the purpose of this project, what it's used for is to kind of, like, make this API connection between your our console and the, Google server. So when you try to run it the first time so, like, you know, you have to install it, you know, your standard, you know, how you gotta use your packages in ARC.
Meghan Harris:So after you've done that, when you go to use it the first time, it's going to kind of go through this authentication process where you whatever Internet browser you use, Chrome, Firefox, whatever, it'll pop up, and it'll be like, hey. Like, R is trying to break into your Google stuff. Like, are you cool with this? So then you go ahead and you'll log in. And, thankfully, that'll be the only time that you have to do that.
Meghan Harris:But after that, any time you run it, you just have to confirm that, like, oh, hey. We have this email just saved from last time. Do you wanna use it? Yes. So in doing that or the advantage of doing that is for multiple reasons depending on, like, what's going on.
Meghan Harris:Right? So if you are working, for example, like, on a team, because for me, I'm the only person kind of working in this department right now, but there's gonna be a point in where I'm gonna have to transfer things or gonna have to pass them along. So for people that don't know R, but, like, are interested in learning, what I've been realizing is that people get really scared when they have to deal with the directories. And I know because I used to be one of those people. So you definitely, totally can, you know, go to Google Sheets and download a CSV or whatever.
Meghan Harris:You can do that. But taking the time to kinda set up that authentication process with Google Sheets 4 kinda takes that step out. Because if you have a situation where multiple people can edit and add data to, which is my situation at my job. In my job, I have, like, a, there is one account that is shared by many people, and it drives me insane that is not the best practice, but that's what it is. Because there's so much uncertainty that can happen, so many changes that can happen, it's really good to be able to pull it from the source, because you can have a saved file somewhere locally on your computer, think that it's updated and it's fine.
Meghan Harris:But if you literally forgot to re download it, then you're not updating it. So while you can totally do it without it, it's probably really good that you do. I feel like it maintains some type of integrity as well, so you can keep tabs of what's going on in your data. So I hope I hope I answered the question.
David Keyes:Yeah. Yeah. No. I mean, I think that's that's helpful just thinking about the idea, especially in a situation where multiple people are potentially editing. You don't know the you know, like, say you're doing a survey like you were for this, in this example, you don't know if more people have submitted the surveys since the last time you ran your code.
David Keyes:So what I if I heard you correctly, it sounded like what you're saying is one of the advantages is you can always be sure that you're getting the most up to date data Mhmm. Because your R code pulls it in directly from Google Sheets Yep. As opposed to a manual step of you going on to Google Sheets, hitting the, you know, file download, and then working with it.
Meghan Harris:Absolutely.
David Keyes:And that's I mean, you talked about an API connection. For folks who aren't familiar, that's just it it basically, an interface that allows one program to interact with another. In this case, our our studio to interact, or are, I guess, to interact with, Google Sheets to make that that possible. I I'm just curious. So I asked about advantages to using to to this type of workflow.
David Keyes:Mhmm. Do you see any disadvantages? Any any negatives that you've ever encountered?
Meghan Harris:Yeah. I've definitely encountered I mean, with this little example one, not so much, but, like, my bigger one, the disadvantages, I think, is not related to the use of the package itself. It's honestly like an organization department thing of, like, why do we have any people on one Google account to make any changes? Usually it's been pretty solid. I think there might have been, oh, like a a rare rare time where, like, something did not connect properly with the API.
Meghan Harris:And I I don't remember if it was because, like, I was sending the query to my like, I was, like, running the code back to back to back because, like, I was trying to debug something. That's when I kind of, like, started adding, like, these little different, things in the code. It's kind of like, well, why is that there? And it's like to make to make it pause so that we don't lock ourselves out, or to check for things. I I realized, like, I had to debug a lot.
Meghan Harris:And, like, when you're constantly pulling, pulling, pulling, like, that's why I have, in the code, I know we're gonna get into it, but there is, well, for this example, a really basic data validation of, like, oh, the the number of rows in the data changed since we last pulled it. So something like that will be like, okay. If it's you know, it'll it'll I think for this one I have, it I had it be that, oh, if it's math is the same, like, it doesn't have to go through a processing script or something like that. But you can change it to make it so that it won't, like, keep calling from it if, like, you have what you need in the session already. Honestly, I think that was the only issue I've ever had.
Meghan Harris:Honestly. I I personally just have not ran into any issues doing it this way yet.
David Keyes:Good to know. Alright. Well, Megan, thanks for joining me today. I really appreciate it. If people wanna connect with you, where would be the best place to go?
Meghan Harris:Best place is probably Twitter, at my handle. It's my first name. It's atmghansharris. That's my full name. Megan s Harris.
Meghan Harris:I am also on LinkedIn as well. I believe that name is just my first name and last name. So m e g h a n, little dash, and Harris, h a r r I s. Super.
David Keyes:Well, thanks again, Megan. Appreciate you talking with me.
Meghan Harris:So much, David. Thank you for inviting me.
David Keyes:Thanks again for listening. I hope you found this conversation interesting. If you have any feedback, I'd love to hear it. David@rfortherestofus.com. Thanks.