SWIMM UPSTREAM

Jason Gauci is Argo AI’s Director of Engineering and has led machine learning teams at Facebook (now Meta) and Google. Jason also hosts the Programming Throwdown podcast.

Looking back at his experiences leading a research lab at Facebook, Jason reflects on the challenges of disruptive technological changes. Hint: Sometimes, being on the vanguard is indistinguishable from failing.

Show Notes

Jason Gauci is Argo AI’s Director of Engineering and has led machine learning teams at Facebook (now Meta) and Google. Jason also hosts the Programming Throwdown podcast

Looking back at his experiences leading a research lab at Facebook, Jason reflects on the challenges of disruptive technological changes. Hint: Sometimes, being on the vanguard is indistinguishable from failing. 

What is SWIMM UPSTREAM?

The Swimm Upstream Podcast is a collection of conversations about knowledge sharing, code documentation, change management, scaling dev teams and more. Our guests come from all over the tech world, with some really interesting insights, stories, and… coffee hacks. Join Tom Ahi Dror, Co-Founder of Swimm (a Continuous Documentation platform that streamlines onboarding and knowledge sharing within software engineering teams), as he talks with some of the most inspiring engineering and dev team leaders in the industry.

Tom 0:00
Here is part one of our episode with Jason Gauci, Director of Software Engineering at Argo AI. Jason wears many hats and prior to joining Argo, he led teams at Meta and Apple, building core machine learning infrastructure. Jason is also the founder and host of the Programming Throwdown podcast. Find the link to his podcast in the episode description.

Welcome to the show Jason!

Jason 0:08
Hey Tom, thanks for having me.

Tom 0:50
Let’s kick things off with some warm-up questions. What have you been listening to on Spotify?

Jason
I’m originally from Florida and recently moved to Austin. My wife actually got me really into country music, which is kind of a rare thing for programmers. I think most people expect programmers to listen to electronic or trance music. But, there’s this hilarious song. Fancy Like by Walker Hayes, and it basically goes through a person saying “oh, I’m really fancy. So I’m going to take you to Burger King with the Frosties and everything.” And it’s a throwback to when my wife and I met 17 years ago in college and we were kind of broke, so it really just throws us back to that era. Also, whenever my son and I are in the car, we listen to Minecraft parody songs.

Tom 1:20
Wait, which Minecraft songs?

Jason 1:22
So there's a whole community of people who just make songs about Minecraft.

Tom 1:28
Wow.

I thought I knew too much about Minecraft because of my son. It appears I didn't. I didn't know about the songs. I know about the books, and videos of people playing constantly, and so on. So it's good to know about the songs. I'm going to talk with my son about that as well.

So, tell us a bit about yourself - where you are right now

Jason 1:54
I've been to a variety of different research labs at various corporations. I was at Google research, I was working on ML things at Apple, and then the past six years, I was at Facebook research, and I've spent the past three months at a startup. I did the research thing for a long time, building core ML infrastructure for these companies.

Someone from Argo reached out to me and it was really like a really compelling story. You know, Actually, someone in our neighborhood passed away in a car accident and that really hit me. We send people out at 5:30 in the morning to go to school early. In Texas, it gets really hot, so you have to play sports early. This kid was driving and it was super early and he was really tired, driving on winding roads going 60 miles an hour. This is where we need a self-driving car, you know, we need to have some way to protect people.. And, and so I'm at Argo, spinning up a machine learning kind of infrastructure group here to help out all the different teams across Argo that are doing ML.

Tom 5:35
That sounds amazing.

We're focusing this season on change and we're very interested in situations where change is needed, and change happens, and whether it sticks or not. And we're interested in changes that involve both technology, but also a human element, organizational element, and so on. And, you had a change in mind, that you are a big part of. So tell us about the organization where you were and the role that you were in when this happened?

Jason 6:11
I have a background in reinforcement learning, and robotics and control theory - and a lot of this and that. That background served me really well at my first job. But then when I went to Google, I did something completely different. I worked on predicting outcomes, so it's not really a control or like a dynamics thing. It's just to predict what happened and try and forecast. Facebook gave me the opportunity to get back into that sort of control-theoretic mindset. The challenge was that the entire industry of recommender systems had been running, what I would call an open loop. So for example, if you know, collaborative filtering, collaborative filtering is this way of saying, you know, people, like you bought X, therefore, you will buy X. So you can imagine this giant matrix, and all the rows are all the people, and all the columns are all the products, and you're trying to just guess, at all those cells in the matrix?

Tom 7:24
Yeah.

Jason 7:25
The thing about - it’s totally open loop. So you take these guesses and you hand them off to some system that sends people emails or ranks things on a search page or something like that.

Tom 7:36
Or suggests a song on Spotify?

Jason 7:40
Yeah, exactly. Yeah.

The thing that's missing is the sort of counterfactuals. So you know, if I put this thing first, that's going to make it more popular. And so that's going to make it show up more often, that's going to fill in more of those cells in the matrix for that row. And so it's going to be a self-fulfilling prophecy. And that's one example. There's many other sorts of feedback loops that you have to compensate for, right. When I was at Apple, there was this interesting hack that people were doing. Let's say you work at Facebook, and I know you work at Facebook, and I work at a startup making a video game. So the two of us are talking or at a restaurant, and you tell me, Jason, Facebook's coming out with a new app tomorrow. Well, I know Facebook is super popular, and I know their apps are going to get a lot of downloads just because they're Facebook. So what I'm going to do as the Indie game developer is. I'm going to go on Amazon Mechanical Turk, and pay a bunch of people to install the new Facebook app, and then install my game. And if I do that just right, then my game is going to become the recommended game for that app. So people who download that app will get a pop up from Apple saying, oh, you know, people who got this app also got this game. And now a self-fulfilling prophecy kicks in and now I don't have to do any advertising. And my game has become really popular. So this is sort of a trick that people are exploiting.

Jason 9:21
We set out to rethink the way that all of the decisions are made at Facebook, which is definitely no small feat and kind of ended up touching all different parts of the company.

Tom 9:36
Was this a crisis? A threat? Was this something that was on the minds of decision-makers?

Jason 9:43
I think folks in leadership, we're excited about the idea. But on the ground, I think that it was a pretty disruptive change. In the beginning, for example, people were saying, that this new system. How does it compare to our existing system on all of these metrics, and all of those metrics are like apples and oranges. So I couldn't even generate the comparison. It's like if you have a system that copies someone else's chess style. And so I've done a really good job of copying Bobby Fischer style. And then I compare that with alpha chess or Alpha Go Zero on chess, right? It's kind of apples and oranges. Right? You could have done a great job copying Bobby Fischer style and still lose to Alpha Zero, right?

So I would come with my own metrics, they would have their metrics, and then the conclusion would be, well, it's a ton of work to switch to your system. So let's just call it a day, you know, and that's kind of how the first, the first year went really.

Tom 10:50
So you're saying you had buy-in from leadership, but you needed buy-in from people that would need to implement your methodology, right? And that was going hard, so, what was your in?

Jason 11:09
Good question. So in the beginning, I had buy-in from leadership. Leadership quickly got fatigued and there's also some churn in leadership. So the people who had the most buy-in, left, and that definitely didn't help. And so there's definite fatigue a year in. And you're right, there wasn't the buy-in at the ground level. So it's a really tough spot, I'll be honest, an extremely difficult situation.

What I ended up doing was actually finding that one person who is going to be our champion, and I found a team. They were doing something that was more on the fringe, they weren't on the critical path. They weren't on the golden line of the company's roadmap. But that person was super passionate about our technology. And that ended up being a critical piece of the puzzle.

Jason 12:11
I remember at the time my new boss, saying, well, even if you succeed, it's not going to have a big impact. So it's like, you brought a 20-ton baseball bat and you're planning on hitting this homerun. But even if you hit the homerun, it doesn't matter.

It's not an important piece of the company. So even if you give some huge improvement, it's not going to do anything material in the short term. So I went against that advice because I felt like we needed that proof of concept, that we had so many lessons to learn. We needed someone who had that intellectual capital and they were willing to spend to learn these lessons. And so while we were learning these lessons, the things on the team from the outside looked like they were getting worse. It's like, well, you know, the potential isn't there anymore and they're still not getting the result.

Jason 13:18
One thing I realized is that being on the vanguard of something, like really on the vanguard of something, is indistinguishable from failing. We ended up finally kind of cracking the code on this and there's a lot of different technical pieces here. We wrote some research papers later, explaining everything technically, like all the things we had to do, but we found like the key pieces of the puzzle with that team, and then we went after a team that was more impactful - one that was more aligned with the company goals. And we kept kind of in this Agla metric way, we kept kind of building up teams. And then at some point, the project became big enough that it was more than just me.

Tom 14:16
The ball was already rolling, right?

Jason 14:19
Yeah, exactly. And so then it was really about, you know, let's hand the baton and just help people over the fence. And it became like, really, really exciting. But at that point, the change had already happened.

Tom 14:38
This sounds very much like a startup

Jason 14:41
Yeah.

Tom 14:41
You need to really try out your concept and you know that it's not going to work out on the first try. Right. And it sounds like it didn't work on the 11th try, right? You need a lot of friction and you need grit and passion. When you had that, obviously, and the champion that you mentioned had that too, and it wouldn't have worked otherwise. Right? And let me ask you - after it did work, was the implementation as hard as you expected it to be at first?

Jason 15:21
I think the implementation ended up being much more difficult because we went in with a bit of naivete. Like we went in thinking, it's open loop and so if we close the loop. Let me just give a quick explanation. Open loop in robotics just means you're not getting, you're not reacting to feedback. So imagine, you're controlling even a Lego robot or something, and you say I want the motor to turn for three seconds. Well, after three seconds, you know, maybe you just bumped into a wall for three seconds, maybe you're three seconds, further out, whatever that means. You don't know. Alternatively, you could say, let's do closed loop. So I'm going to go until I see a red dot on the ground. And so you're closing the loop, you're looking for the red dot. And as soon as you see the red dot, you're changing your behavior.

Jason 16:14
So we thought, well if we close the loop around some company objective, that is actually a whole other tangent around sort of the company objectives and how to set that appropriately in code and all of that. But, if we close the loop around one of these objectives that will just do better. What we didn't realize is that the sort of unknown unknowns in one of these problems is enormous. Think about the Android app store. Think about all the different millions of apps. There's probably 1000 apps created every hour or something.

Tom 16:58
Yeah.

Jason 16:58
And there's billions of people, right? And so it's such a large space. If you just try to operate as a closed loop robot, it's kind of like that analogy of the robot that tries to path plan out of the burning building. And while they're generating the plan, the building burns down. So a lot of the traditional algorithms don't work and we ended up inventing a lot of new things. So the complexity blew up in that way. But yeah, once we knew what we were missing, that really put our minds at ease, because we could start. As we started filling in those pieces, we could see things get better versus in the beginning when just nothing really worked and so you're not getting a lot of signals.

Tom 17:44
So this all happened with the first champion, right? Everything that you're describing right now, am I right?

Jason 17:51
Yeah, a lot of it happened with the first champion. But we also cut a lot of corners. Going back to your startup analogy, you might find a small or medium-sized business that would love your product and hasn't spent any time or energy in that space. So you provide a valuable service for them and the baseline is nothing. It's easy to get a win, right? And then you go to Macy's and you say, well, I'll do your machine learning and it's like, oh, they have, you know, a whole department or many departments. And so the bar is way higher, and so you have to do a lot more. And so, I guess that's another reason to really find the right person.

Tom 18:34
Thanks everyone for tuning in. That's all the time we have for today. To read episode transcripts, check out our past season, suggest an episode or join our growing community of developers. Head to swimm.io that Swimm with two M's dot IO.

That was part one of Jason's episode. Stay with us for part two.