R for the Rest of Us

In this episode, I speak with Chris Knox, who is currently the Head of Data Journalism at the New Zealand Herald. Prior to that, he worked at the New Zealand ministry of health, where he led an analytics team focused on New Zealand's COVID response.

During our conversation, Chris highlights why he considers R as the optimal tool for data analysis and reporting, especially when dealing with frequently changing data sources and parameters. He also emphasizes the benefits of using R in a collaborative environment, where junior analysts can be quickly integrated into the data analytics and reporting process and assume significant responsibilities, thanks to the reproducibility of R code.

Connect with Chris on Twitter (@vizowl)

What is R for the Rest of Us?

You may think of R as a tool for complex statistical analysis, but it's much more than that. From data visualization to efficient reporting, to improving your workflow, R can do it all. On this podcast, I talk with people about how they use R in unique and creative ways.

David Keyes: 00:00

Hi. I'm David Keyes, and I run R for the rest of us. You may think of R as a tool for complex statistical analysis, but it's much more than that. From data visualization to efficient reporting to improving your workflow, R can do it all. On this podcast, I talk with people about how they use R in unique and creative ways.

David Keyes: 00:18

Join me and learn how R can help you. Well, I am joined today by Chris Knox. Chris is head of data journalism at the New Zealand Herald. Prior to that, he worked at the New Zealand Ministry of Health, where he led an analytics team focused on that country's COVID response. Chris, thanks for joining me today.

David Keyes: 00:39

I appreciate it.

Chris Knox: 00:41

Oh, you're welcome. Happy to be here.

David Keyes: 00:44

Great. Well, let's talk a bit, about we've talked previously, but, you know, I'd like to have you kind of give a a overview for folks who are listening to this. You have told me that R was really instrumental in helping to stand up the the COVID response of the government of New Zealand, to help that to help stand that up really quickly. Can you kind of talk and give an overview of of what the role our played was in in the initial COVID response?

Chris Knox: 01:19

Yeah. Absolutely. So just so I just for a little bit of background, I joined the zone COVID response, in, January 2021, so it had already been going for a while by then. So I was was working at in a pre in a different role at the Herald, and then moved over to to help out with COVID response. And I have moved back.

Chris Knox: 01:42

But so when so yeah. So the the very early stages I I'm talking I'm I'm talking about based on my what I observed rather than actually being there. Sure. But, so, yeah, so when I arrived in January, the entire way that we did all all of our reporting is basically based on a big set of our code, and and that was stood up. So, obviously, at the, you know, March 2020, when New Zealand went into to its 1st lockdown, we we didn't have anything in place, for for COVID reporting, or anything like that.

Chris Knox: 02:22

And, I think that, like a lot of countries, our general infectious disease reporting was not designed to be kind of day to day. So the real you know, I mean, there were there were systems in place for reporting notifiable diseases, but, you know, at the end of the year, after everything had been tidied up and and and the big Right. Kinda change that that COVID reporting bought into kind of the health system was the need to report on kind of ink or essentially incomplete data. So under an under a no under sort of normal health reporting circumstances, you wouldn't talk about something where, you know, potentially, the person you're talking about hasn't even been notified that they've got COVID yet. Right.

Chris Knox: 03:09

Because you you're basically, you're reporting about positive tests that have come through the lab. Certainly, you're reporting on things before case interviews have been completed or anything like that. So just maybe maybe for a little bit of background, New Zealand's COVID response was, we were we were very lucky to have a bit of a slow start. We saw what happened with COVID around the rest of the world, and and when it it eventually arrived in New Zealand, we went into a very strict lockdown, and eliminated COVID so that we had quite a long time when where there was no COVID transmitting in in the community. And so when there were cases where they were hand they were went through detailed case interviews and and contact tracing, and we actually we had a number of community outbreaks in 2020 and 2021, which were largely a book.

Chris Knox: 04:03

The, Delta was the last one, and that was almost eliminated before Omicron arrived. Then that was the end of, that type of response, and so we we moved into situation more like the rest of the world with, you know, thousands of cases a day. So it when when that happened, they they, you know so most just yeah. Most countries didn't have this kind of detailed case interview going on for most Yeah. 2020 and 2021.

Chris Knox: 04:31

And so we were reporting from, basically, information coming in from labs and from case interviews. And the nature of what was being collected was changing all the time, as as COVID changed. And so, the information that's being collected was being loaded into databases, but the way that basically, the way that we were having to report things each day or or at some point, it was essentially the reporting reporting requirements tank every day. And so we were producing daily reports, in the morning for for ministers, and then, a daily announcement went out to the public at 1 PM every day. And so that there's a hue a big set of analysis sitting behind that code and behind those announcements.

Chris Knox: 05:22

And and it was essentially the fact that we were able to to run that, workflow reproducibly, but also keep changing it is what, enabled us to do that. So, and and, obviously, as as things settle down, any particular type of, information that's being collected is settled down, they would or it would be some things are being captured in operational databases and then moved into reporting databases. Some things were being captured via phone calls and written down on spreadsheets. And so we're constantly dealing. And very early on, I there there was, you know, there was a bit of techno, sort of technological, catch up that needed to be done in some places.

Chris Knox: 06:17

So DHB is is district health board in New Zealand, so it's in much a way that our health system was managed. And so, yeah, we it it basically, I guess, the the summary is is is, yeah, lots of constantly changing sources of data, but the need to consistently report on kinda changing parameters. And so I think something like a and, I can't see sort of any workflow other than something like a a a, basically, an pipeline to to do to actually make that work where you've got the ability to to to reproducibly run most of what you're doing and then put the effort into changing the things that need to be changed.

David Keyes: 07:04

And can I ask you more about that? You said you can't imagine another workflow other than what you had with r. Why is that? Like, what what would be the issue with a different type of workflow?

Chris Knox: 07:15

I mean, I guess so I guess the other types of work you know, some of these types of systems work on, you know, a more a more classic BI type system where you have everything going into a database and then something like Qlik or Tableau, building a dashboard that, sits on top of that. And my experience is is that those types of systems. I mean, I think so building something like a dashboard to do this, it inevitably results in kind of a big development requirement whereby, changes in the back end require quite a lot of things to change in the front end. And so you got this. So it comes quite difficult, I think, to keep, like, the, you know, the the scheme is at the back end of changing all the time, you know, so it's quite a lot of effort to keep a dashboard or something like that up to date.

Chris Knox: 08:11

I think anything that the big advantage of a of a of a r or a or a sort of a text based, workflow is is that everything's written down, and you just run, you know, you you run those functions, things happen. And so whereas if if, I mean, if if you if it was a more kind of point and clicky type of analytics workflow, the amount the the amount that we would because the other thing is is this is running 7 days a week with constantly changing team members. We were burning people out reasonably quickly. And so there was, you know, there there was there was turnover quite a bit particularly at kind of at key moments. And and so having someone come in who had understanding of our embassy, you know, you could sit down and run run that, make sure there's no errors.

Chris Knox: 09:06

You know, people can pick that up really quickly. Whereas if you were having to train people to go through, like, an actual kind of point and click based workflow, it would have been a lot harder. So I think that kinda that the the most things written down in code are being run reproducibly with with small changes where needed, and lots of error checks. Yeah. It is is what I mean by the work.

Chris Knox: 09:31

Yeah. I I think that that just gave us that flexibility.

David Keyes: 09:35

Gotcha.

Chris Knox: 09:35

And then over over time, we add we added, the whole workflow moved into Git. So we had a a GitLab instance. So everything was checked in, which actually meant that we could, we actually kept the code. Each day's code was kept So you could go back, you know, and be like, oh, what what what what analysis did we actually run, you know, on the second July or whatever, which was quite useful, and, again, trying to trying to capture what you did in a point and click type analysis environment, I don't think is possible, in in the same way. And then also we introduced, targets, which is a a library that, is is part of the rOpenSci ecosystem, and it it that was fantastic.

Chris Knox: 10:30

So that made a huge difference. That revolutionized the way we were doing things, mostly in in that that the it, it allowed us to so the previous way that things have been set up, which, you know, was, worked extremely well, but became quite time consuming where, basically, it was sort of your standard. We've got a a central r script that calls a bunch of other r scripts, one after the other, and that worked well. But when the data got large, got pretty slow. And and the problem was if something went wrong at the end, you'd have to fix it and then rerun the whole thing.

Chris Knox: 11:09

And so we've got it. There were some days when the 1 PM update didn't happen at 1 PM. Gotcha. And that's not ideal. And so, to kinda to speak what the a target's workflow does is is it basically allows you to define kinda little steps of your workflow.

Chris Knox: 11:34

And if the input data and the code doesn't change, then it won't rerun that that caches what you've done. So then the initial load of data could be cached, and then we could look at kind of the the edge cases and handle those without having to rerun the entire workflow.

David Keyes: 11:51

Can you can you give me an overall sense, you know, at a very high level of kind of what were the steps that your code was doing? You know, I imagine it was grabbing data from a database, maybe doing some analysis. But I'd love to have you just kind of, again, at a very high level, walk me through what it what were the major steps?

Chris Knox: 12:11

Steps. Yeah. So kind of at at the highest level, we had a database of cases. And then first thing that the code had to do was work out whether or not we'd reported a case previously. And so there was a bit of sort of workaround.

Chris Knox: 12:25

Particularly once things got busy, people landed up being recorded in case databases more than once, so we're deduplicating. So on that and then the next step, there's a, the kind of the key step for for most of, most of the time was identifying whether or not cases were border cases or community cases. And so we had, usually, that would be recorded, as part of the case entry, but we generally want to validate that as well. And so we, look at, we'd have data, on flight records, and and also we so from a lot of the time, international arrivals went into managed isolation. And so we'd we'd want if the case was identified, we'd wanna, check and see that, what facility they were in, whether there'd be other cases reported in that facility, check and see, you know, like, if a case was, so we'd want we'd report on how many cases that arrived at the border that were detected sort of on the 1st day.

Chris Knox: 13:34

Essentially, we weren't weren't concerned. We expect cases to arrive at the border and be detected on the 1st day. But if a case is being detected on sort of day 7 or 8, then we we would flag that to to up to people that would then go and investigate. And so sort of basically, it's the I guess it's a yeah. So it's merging up all of those databases to then summarize it.

Chris Knox: 14:00

So it's like, you know, deal with this many cases today, and we don't need to worry about any of them. Kind of with sort of one level of messaging, or or it'd be like, here are 2 cases of concern going, and then people would go and check those out. And then information would come back from that, and then we'd be like, yesterday, there were 2 cases of concern, but they're not actually of concern because they were that sort of thing. So

David Keyes: 14:24

Okay. And and you talked so you talked about, you know, the first step being kind of some deduplication and then the piece you were just talking about trying to figure out if it was a community spread or or a case from from outside of New Zealand, the border cases. Yep. After you did that analysis, what did the reporting look like? Like, how did you do reporting?

David Keyes: 14:47

I mean, I know you said there was a 1 PM

Chris Knox: 14:50

Yes. Right.

David Keyes: 14:51

Thing that went up. 1st What does yeah. What did that look like?

Chris Knox: 14:55

So the fir so our reporting was, ironically, I joined, so as because I was coming from a data journalism background. A a lot of my previous work had sort of been in fancy interactives and and that sort of and so, but, actually, I ended up the primary output of of our r code was Word documents. And so there's a lot of r markdown, to to generate. And so, basically, the the primary kind of reporting was, collections. So at 9 AM, we would the first piece of reporting was actually a text, that was sent to to kinda high level ministers, and a few and a relatively small distribution saying this is how many cases are going to be reported in New Zealand today.

Chris Knox: 15:45

This is where they're from.

David Keyes: 15:47

Sorry. I'm sorry to interrupt you. How did how did it get from r to sending a text? What what was that process?

Chris Knox: 15:53

So that that was unfortunate. It was a signal text, and that was that was the most manual part of the job that

David Keyes: 16:00

we had

Chris Knox: 16:00

the analyst running had to do it. So we basically run everything, get a summary, and be like, okay. We're gonna send a text.

David Keyes: 16:08

I see. Okay. So that was

Chris Knox: 16:10

Go out.

David Keyes: 16:10

That was manual. Okay.

Chris Knox: 16:12

Yeah. Yeah. I mean, it might have been nice to connect our app to signal, but we never kind of quite had the right but it's Okay. To do to do that. But, yes, that was manual.

Chris Knox: 16:21

And then then there was a situation report that was sent each day at 11, and that, that was, ultimately a PDF that, started off as a Word document. And we would generate a whole bunch of charts and tables that went into that. And the key kinda a lot of work went into ensuring consistency across all of those reporting, which got was route was tricky because data would change. Status of cases and things change between 9 AM and 11 AM. And so, so about a lot so most we we cut off the data at 9 AM, but then the so most of it would stay fixed, but some of it would would change.

Chris Knox: 17:14

You'd have to take that into account. And then the 1 PM, basically, there was a media statement we'd provide, which is sort of pretty pretty standard format, which we provide input into in terms of all that, which was a high level summary of the numbers that was drawn from that limit 11 AM situation report. And then And what I'm saying? That point with there was a there's a website that we update, at 1 PM. And both updated.

David Keyes: 17:45

Am I correct that the 11 AM and the 1 PM summaries were those Word documents that you knitted from our markdown, or was was it some other format?

Chris Knox: 17:55

No. They they were. So what what what actually the way that it. So there was an 11 AM one, had input from other sources as well as just what came out of our kinda analytics pipeline. So the way that we would do it is we'd actually we'd have a kind of a a data sit sit rep, which we would net.

Chris Knox: 18:16

And then if if there were issues or things needed to be changed, we'd miss it again. And then, and then the the final compilation of that report was a manual step of actually picking up the latest versions of the chart to copying them into the another report, but and tables. But, basically, what we had we had everything set up so that that the output and and the automatically produced word document was essentially identical to what was in the manually produced one. So it was just a really straightforward operation for people to pull it over. But we didn't we we didn't go as far as getting non programmers to try and write into a markdown type environment.

Chris Knox: 18:58

That's that was sort of a step too far. So, there were other other things that were produced that were fully sort of markdown based, kinda on a list, frequent basis. But I was that one in particular. There was always Yeah. Kinda narrative, commentary that went with it, and there were other people writing that.

Chris Knox: 19:18

David Keyes: 19:19

Okay. And Yeah.

Chris Knox: 19:20

And then the the one b m was more of a web update. So then that was based on, we basically had a set of of, template web table templates, and we'd update those templates based using our so it's kind of the final output of 1 of the so the final output of that daily work pipeline was a bunch of HTML, which we'd then send off to the web team. And they would

David Keyes: 19:47

The copy of the Yeah.

Chris Knox: 19:49

Yeah. Into the Okay.

David Keyes: 19:51

And so so all the steps, if I understand correctly, were, you know, import date bring in data from the database, do that deduplication, figure out the community cases versus the border cases, tell folks if there's cases that they should be, you know, cases of concern to be looking into. Make a a summary to send off to high level folks at 9 AM, do another deeper summary that went off at 11 AM, that was done with our markdown, knitted to word, and then a media summary, both in our markdown to word as well as, HTML given to the web team to put online. Are those all the steps? Is there anything that we've

Chris Knox: 20:39

been missed? That's based. Yep. Yep. I mean, no.

Chris Knox: 20:41

That those are all the steps. It didn't always look exactly like that, but yeah. Sure. But and then and I guess the other thing is that that that that border and case of concern decision making sort of steps involved accessing a whole other set of databases. So it's kind of the case database, and then we'd be joining that onto the travel database and the testing database and the managed isolation databases and that sort of thing.

Chris Knox: 21:09

So kinda each each of those decisions is is a join and then kind of a

David Keyes: 21:14

Gotcha. Yeah. And how I guess, I I should have asked this before, but how how big was the team that was working on this?

Chris Knox: 21:25

It varied a bit over time, but probably averaged about 5 or 6 people.

David Keyes: 21:29

Okay. I mean, it's kind of amazing actually when you think about, you know, a relatively small team like that being able to automate all of this so that you can, you know, do produce this much output every day. And I I also actually liked what you were saying about how our you know, despite the fact that there was some turnover, R kind of kept things going by having everything in a code based environment so people could, you know, just pick up other people's code and run it as opposed to, you know, if Larry who does all the analysis in Excel leaves, then, you know, you're you're kinda screwed at that point because you don't have it written down as to what he does. Yep.

Chris Knox: 22:13

Yeah. I think that's one of the the kind of most overlooked, like like that. There's obviously an like, setting up setting up Larry's analysis as an Excel is usually faster than writing it up in code. But that and and it often feels like it's harder to onboard people into another environment. But actually, if you've gotta just sit down, run this, look for error messages, almost anyone can do that.

Chris Knox: 22:43

You know? And then and so we had some people come in with essentially no higher experience at all.

David Keyes: 22:47

Oh, really?

Chris Knox: 22:48

And they'd be yeah. Yeah. And they'd be dropped into sort of a week of shadowing someone doing it. And then Wow. Then they could, you know and and, obviously, they they they were there for their, you know we they came in because they understood data.

Chris Knox: 23:04

And so they were there for that sense checking. Not not for they weren't there for their r skills because they didn't have them.

David Keyes: 23:12

But they were

Chris Knox: 23:12

learning them fast. And and, obviously, if something went wrong, then they had to go go and talk to someone who knew what was going on. But we could it at least meant that, you know, relatively junior people could pick up the reporting workflow as long as someone who knew what was around was around solve problems when it happened.

David Keyes: 23:30

Yeah. That's interesting. I didn't realize that people were coming in and working on this so quickly after after starting, working with R is interesting. What about you you talked about, you know, at some point, you moved to GitLab, and collaborating that way. How were you collaborating on code prior to that?

Chris Knox: 23:51

The worst way. So it it was a I mean, so initially, the code was just sitting up it was sitting on the shared drive, whichever Okay.

David Keyes: 24:01

That's so true.

Chris Knox: 24:03

And it was a collaboration. Was that are you in that file? Right. Right. Okay.

Chris Knox: 24:09

I won't go in that file type of thing. So so, I mean, I mean, there and there's ways to set up your your workflow to mitigate that with lots of small files. Right. Right. But that's

David Keyes: 24:23

I mean, that's amazing to me that, you know, you both had people who were relatively new or or totally new to our who were onboarded relatively quickly. You weren't using, you know, version control. And, you know, at the same time, you were dealing with this huge, you know, world changing event. You were able to bring people on to our change your team's workflow, and and make it all work. Do you have any tips or thoughts as to to what made that possible?

Chris Knox: 24:57

I don't know. I mean, everyone was highly motivated. I've never worked in a team that worked so well. And I think that was, you know, I and I think that sort of the sense of urgency that something like responding to COVID gives you just Right. Makes people more willing to to just be uncomfortable.

Chris Knox: 25:17

You know? Like, it was tough. Yeah. And, actually, the the onboarding people into Git was much harder in the end than onboarding people into an R reporting workflow. But that everyone we had relatively quickly picked up the value of having kind of an R based reporting pipeline.

Chris Knox: 25:40

I think that sort of spoke for itself. But the abstractions needed to sort of get into version control were a lot harder for people to pick up, if they were coming from that sort of software background. And I guess so we didn't initially have Git because, like, all of this is also going on within a background. A lot of fantastic work was done before I got there basically to push the ministry to be quite innovative in this space. You know, the health laptops are just the whole computing system is phenomenally locked down.

Chris Knox: 26:18

Mhmm. You know, it's, like, come coming from, like, sort of a journalism background where it's pretty open and, you can just install whatever open source tool you want. Right. The health data has a, you know Sure. There's a lot of restrictions on what you can do, and we were working with phenomenal amounts of of kind of identifiable information about people.

Chris Knox: 26:43

And so, you know, you can you've gotta be a lot more locked down. And and and, New Zealand, one of New Zealand's big hospitals had just been shut down by a cyber attack. Oh, wow. And had to kinda completely rebuild their system. So there there's a lot of reasons to be very, very careful.

Chris Knox: 27:02

And so just walking in and saying, I need this open source tool. People are like, nah. Yeah. And so the initial reason there was no version control was it wasn't an option.

David Keyes: 27:13

I see.

Chris Knox: 27:13

But we, but there was a software development team that had started using GitLab, and we were able to to kinda, once it had been approved for them, we were able to make the argument that I was really valuable for data analysis.

David Keyes: 27:30

That's interesting.

Chris Knox: 27:31

And and, also, that took like, that was quite a the the more software focused people were like, why why do why do analysts need version control? So it took quite a bit of, you know, took quite a bit of convincing to introduce and sort of get based reproducible ideas. Sure. And then the other thing that that, worked for us was that the ministry had set up a, a big hosted internal RStudio server instance. So everyone was ultimately, everyone was running on their own, on on on this big so and and by the time I left, I think we were close to 300, users.

Chris Knox: 28:15

Wow. Was it in the ministry? So, yeah, so the ministry of health has become a a massive user of, and so kinda and that was on that's oral. So within the COVID response, we were probably there was our team. There were probably 20 or 30 analysts working in r.

Chris Knox: 28:38

And then the, then then r was also used heavily by the team that was managing the vaccine rollout and all the vaccine rollout reporting as well, and then there's a whole bunch of other.

David Keyes: 28:51

Okay.

Chris Knox: 28:52

But, yeah, it's definitely become the tool of choice for, for analysts. Right? For for for kinda health Yeah. Analysts and years old.

David Keyes: 29:02

Why has it why do you think it's become so popular among folks at the Ministry of Health?

Chris Knox: 29:10

I think, I mean, because well, I guess, one, the the the lack of a license significant licensing fee means that, you know, you're not, the costs of some of the Oracle systems and and that were being used before and other things like that were prohibitive to expanding people's teams. And so, yeah, I think that you can kinda just drop anyone in, and and, also, I think just the very ease relatively easy way people could pick up and start doing simple tasks, and it and and then, and I guess just increasing recognition of the need for sort of reproducible pipelines, and and that's the thing. And I guess r r is is becoming you know, it's it's I guess, it's it's become much more popular in New Zealand, small which is ironic like it, you know, because it originated here. But but but sort of didn't have a big you know, was was very statsy. Or was was I guess it was popular in in in in, you know, academic statistics Sure.

Chris Knox: 30:25

Environments. They hadn't didn't sort of propagate out into the rest of, you know, the rest of analytics. But now that now there's some pretty big inroads.

David Keyes: 30:34

Right.

Chris Knox: 30:35

And and so yes. So and and, I mean, just the fact that that you can you can get 300 people who are going to sign up.

David Keyes: 30:43

Right.

Chris Knox: 30:44

You know, there's there's there are now a good pool of people that Yeah. That skills or people and now they're all all your all your people are like, you know, I've been doing this type of analytics route, but I want to learn. So, you know, people are wanting to add it. Like, people are seeing it as a skill that that increases their employability. And so people are are willing to learn it.

Chris Knox: 31:07

Whereas, you know, maybe maybe not that long ago, you'd be like, well, I need you to learn r. And people would be

David Keyes: 31:13

like, oh. Right. Right. Yeah.

Chris Knox: 31:17

So so so the the person who set up the original pipeline prior to my joining, he came from a similar view of things to me. And so I think having the key thing is just having this clear vision that we want to have a an R based pipeline that's almost entirely reproducible, and we're just gonna make it work. And I think that that apart from the fact that the benefits that we've talked about, I think that that that it's a relatively simple mental model. And so, so it's something that kinda everyone can be can be working towards and and but also something that people can be picking off small pieces of. So it's not like I think if you have a kind of a classic BI pipeline, it requires database, you know, DB admins and and and all kinds of, like, specialist skills to get it to work.

Chris Knox: 32:18

And it also is quite monolithic. Whereas, we actually had something that I nicknamed the spreadsheet of doom. And it was just it was I've I've been so I'm I'm a very poor user of spreadsheets. You know, for a long time, I only ever had a read only version of of Word of Excel, and I would use it to open and open things up and have a look at it so that I could see what the columns were to load into. But, obviously, there's lots of people out there that are quite, much more capable than me of using spreadsheets.

Chris Knox: 32:50

And so we we had a this spreadsheet that basically served as the error checking for for the so we we'd put in some we'd get in a few kind of numbers from from a different source, and they'd go into the spreadsheet, and then the pipeline would run. And if the spreadsheet and the pipeline matched, we're good to go.

David Keyes: 33:10

Gotcha.

Chris Knox: 33:11

And if they didn't, then we were not we were not good to go. Mhmm. And so, but it got it got very complicated. But one of my key triumphs was to eliminate that spreadsheet and build the error checking in in another place. But also, I think that's the flexibility that that's what was great about this kind of pipeline is is that we could actually be like, this is what we're working towards, but we can't do this bit.

Chris Knox: 33:37

We can't automate this bit now. So let's just do it in a spreadsheet. And then we had a couple of people set that up. And so I think that that that approach is really empowering for people. So that, we were you know, we didn't have analysts sitting around waiting for for people to finish setting up things in databases, you know, that sort of thing.

Chris Knox: 33:58

People could actually just get in and do stuff, and it would be, you could see the impact of it. You know, people would start work on Monday, and by sort of Wednesday, they'd they'd see the the work that what they did in media. You know, like, it's quite

David Keyes: 34:12

a Right. Right.

Chris Knox: 34:13

It's It's a it's a pace that you don't normally get, you know, under a normal analytics type of situation. You'll arrive and do something for quite a few years, and then it might show up

David Keyes: 34:24

Right.

Chris Knox: 34:25

Somewhere. You know? I'm evangelizing this R pipeline pretty heavily, but I do think that it it, yeah, as I reflect on it, as well as kind of providing kind of those technical advantages, I think it did actually help as as a way of assembling the team and and people can, you know like like, I think that it's an abstraction that that people can kind of work with and be empowered by. And then also because then then you can be like, hey. You're new, but here's a little bit we need to tidy up.

Chris Knox: 34:53

Come and understand how that works. Tidy it up. Do this one piece. And, yeah, and then then then that person becomes the expert on that one piece.

David Keyes: 35:04

Interesting. So you're saying it it's like it actually facilitated collaboration in ways that might not have been possible with another tool. Am I understanding

Chris Knox: 35:12

you correctly? So. Yeah. Yeah. Basically.

Chris Knox: 35:14

Basically. Because, yeah, it helps ensure that you're not too siloed.

David Keyes: 35:20

That's interesting. And so Yeah. Essentially, I haven't thought about r in that way, but that that absolutely makes sense. So, well, let me ask you one last question, because I see we're I I wanna be respectful of your time. If you were advising the government of New Zealand or or indeed any government, I'm curious if you would have any thoughts for them with regard to r that might be things they should be thinking about to help prepare for the next COVID?

Chris Knox: 35:50

Yeah. It's just I was think I was actually thinking about this yesterday, ironically. It was good luck that we're able to send up that RStudio server environment. Other people had been pushing for that, and it kind of aligned with the COVID response. As and I think that it's almost like something like someone like the government needs a bunch of those systems ready to go, and and actually needs a central strategy for so so, like, I think what what we did was great, but it would almost be like I think that if we could have a a, a government wide strategy around response to analytics.

Chris Knox: 36:31

Like, this is what it's gonna look like. This is how it's gonna work. These are the resources in place. Obviously, that's it's much harder to do something like that at sort of an all of government level. But I think at a lower level, it's recognizing that a crisis like COVID, you don't know what you're gonna need to do.

Chris Knox: 36:50

Essentially, the way to plan for it is to set up systems that are flexible. You know, you you could build a fancy software system and and a dashboard and all sorts of things and and be ready to go, but it wouldn't be right.

David Keyes: 37:05

Sure.

Chris Knox: 37:06

So so, basically, invest having the investment in the people and the tools in place so that and then I guess the thing where I think that where we our biggest the biggest weakness of our response was was the way that we published data. So another thing I think that would help for this sort of thing was having in place a much better open data kinda culture and platform and structure. So that then if if that had been in place, then we could have basically plugged into that, and and done it. So the way that we publish, it wasn't ideal. It was very with with table based and and and wasn't very friendly to machine reading and that sort of thing.

Chris Knox: 37:54

So, there there's there's a lot that could be done to improve that, But trying to make those improvements while responding to a crisis is tricky.

David Keyes: 38:04

Yes. I can imagine that being the case. Great. Well, Chris, thank you so much for speaking with me today. This has been really interesting to learn about the role that our played in New Zealand's COVID response.

David Keyes: 38:17

So, yeah, again, thank you very much.

Chris Knox: 38:19

Perfect. Thanks for having me on.

David Keyes: 38:23

Thanks again for listening. I hope you found this conversation interesting. If you have any feedback, I'd love to hear it. David@ r for the rest of us.com. Thanks.

More episodes

Chapters

What is R for the Rest of Us?