Practical AI

Autonomous driving is not just a big tech or closed-source game, it's becoming accessible through open innovation and real-world deployment. Dan and Chris sit down with Harald Schäfer, CTO at Comma AI, to explore how OpenPilot is bringing self-driving to everyday vehicles using open source AI. We dive into the intersection of machine learning, robotics, and simulation, including how world models are enabling training at scale and shaping the future of autonomy.

Featuring:
Links:

Creators and Guests

Host
Chris Benson
Cohost @ Practical AI Podcast • AI / Autonomy Research Engineer @ Lockheed Martin
Host
Daniel Whitenack
CEO @Prediction Guard & cohost @Practical AI podcast
Guest
Harald Schaefer

What is Practical AI?

Making artificial intelligence practical, productive & accessible to everyone. Practical AI is a show in which technology professionals, business people, students, enthusiasts, and expert guests engage in lively discussions about Artificial Intelligence and related topics (Machine Learning, Deep Learning, Neural Networks, GANs, MLOps, AIOps, LLMs & more).

The focus is on productive implementations and real-world scenarios that are accessible to everyone. If you want to keep up with the latest advances in AI, while keeping one foot in the real world, then this is the show for you!

Narrator:

Welcome to the Practical AI Podcast, where we break down the real world applications of artificial intelligence and how it's shaping the way we live, work, and create. Our goal is to help make AI technology practical, productive, and accessible to everyone. Whether you're a developer, business leader, or just curious about the tech behind the buzz, you're in the right place. Be sure to connect with us on LinkedIn, X, or Blue Sky to stay up to date with episode drops, behind the scenes content, and AI insights. You can learn more at practicalai.fm.

Narrator:

Now onto the show.

Daniel:

Welcome to another episode of the Practical AI Podcast. This is Daniel Whitenack. I'm CEO at Prediction Guard, and I'm joined as always by my cohost, Benson, who is a principal AI and autonomy research engineer. How you doing, Chris?

Chris:

Doing very well today.

Daniel:

How's it going? It's it's going great. I was commenting to our guests today just before we started recording that earlier this year, was in a in the car with one of our one of our engineers, shout out to Ed, and he's like, hey. Have you heard about this cool thing? We're driving in the car.

Daniel:

Right? And he's like, There's this cool thing you can put in your car and make it like a like a AI AI assisted driving car without it being like a specific self driving car. And, and so he he forwarded me the information about Comma, and I'm really excited today to welcome Harald Schaefer, who is CTO at Comma AI. Welcome, Harald.

Harald:

Thank you. Thank you for having me. Excited to be here.

Daniel:

Yeah. Yeah. Well, obviously, I I kind of alluded to some of what you're involved with, but maybe could you give us just a little bit of background, about yourself and Comma and kind of how you ended up in this spot of working on, working on some of the things that you're working on.

Harald:

Sure. So Comma makes this device, like you said, that you can install in cars and gives them autonomy features that they hadn't had before. You know, things like auto steer and and better ACC. So on on the highway, you kind of get some level of autonomy. And, you know, the software that runs that is is OpenPilot, and that's a completely open source autonomy stack for cars.

Harald:

By far the most popular open source self driving stack online. And I think it's it's currently even the most popular robotics project on on GitHub. So that's kinda where we're at. I've been working on this a really long time. I've been at Comma for nine years now.

Harald:

So basically, been my entire life. I don't think there's there's much to say about my professional life that's not related to Comma. But I came to The US like ten years ago, graduated and started working here. And so been working on open pilot and and this this type of stuff, ever since.

Daniel:

So what was when you started that journey, what was kind of the state of both, like, self driving autonomy when when you started things? And also, like, I guess on the commercial and closed source side and the open source side, what did that look like kind of then compared to now if you look back on, on that journey?

Harald:

So when I joined, we didn't have a product. It was just a project and you could kind of install the software if you went through the hassle of installing like a beefy laptop in your car and installing all the power, or you could like retrofit a phone that you'd to do all that yourself. So that was the state the project was in when I joined. The company was pretty young at that time. George the founder had I think worked on it for a little over a year, maybe two years at the time and that's the state it was in.

Harald:

It was a usable ADAS system, but the product side was really not that integrated state. So that's where we were. In terms of open source, I think there is no real genuine open source ADAS product that's that's useful in any way, other than us. That was true then. That's true now.

Harald:

But the commercial side has obviously changed massively in that time. 2017 when I joined, you know, highway autonomy was bad, maybe usable in some cases, but most people probably wouldn't use it because it just makes too many mistakes, it's too uncomfortable. That's obviously not true at all anymore. Know, Open Pilot is really good. We have over 50% of miles driven with with people that have our system are driven by, you know, the system and not the human.

Harald:

Obviously, there's Tesla FSD, which is, you know, at an even higher level of of what what kind of things they can do and and you can you can get over 90 engagement there if you really wanted to. And you've got Waymo, which is like supervised robotaxi. That's like an actual product. You know, unclear if they they're a they're a money making product, but it is a real product that people can use. And, obviously, none of this stuff existed in 2016.

Harald:

As for how I got into it, I think it's worth mentioning because this is kind of the journey that I just talked about is I remember when I was, I don't know, must have been eight, I must have been really quite young at the time, but I saw the Chris Benson TED Talk about Waymo where they had that where he had a closing statement that his kids were growing up and he was hoping they would never need a driver's license and I think they they must have been like, I don't know, less than 10 at the time and you know, they they have a driver's license now. So I think that that's that's the timeline we're talking about is from, you know, that prediction about Waymo being able to displace the need for people to have a driver's license. That's obviously didn't manifest, but we did make a lot of progress and there are some cool stuff on the market now even if it doesn't mean that all driving is autonomous.

Chris:

I'm curious. As you were as you were getting into this, what was it about this particular problem that that attracted you? Why? What was it that caught your imagination enough to like grab you for such a long period of time as well and still hold you today? What was it versus all the other things out there that you might have dived into?

Harald:

I mean, something that George was saying a lot of the time and I think I really resonated with me and is still true, is that self driving was the most interesting applied robotics problem, period. It was a place where you could make products that were essentially immediately useful. You know, we don't have 100% reliable autonomy. You need supervision for it to be useful, but people buy our products because it adds value to our life to their lives. This is actually quite unique in robotics.

Harald:

There are not many use cases where this is the case. You can have some kind of hyper specific industrial robots are obviously useful. Robot vacuum cleaners. I'm a big fan of robot vacuum cleaners. There's robot lawnmowers.

Harald:

And that's roughly it. There's there's not that many places where you can do, like, applied AI, applied robotics in the real world. And that's why I thought self driving was so cool. And then the other thing is I just really like the open source nature that we have with Open Pilot and that we're trying to promote. I think an open source future is just generally better for everyone.

Daniel:

Well, yeah, I'm particularly interested in this conversation because, you know, a lot of times on the show, we have really interesting people on, but sometimes they're and great discussions, but sometimes they're constrained with in terms of what level of detail they're able to talk about kind of architecture and things like that naturally because they're maybe working on, certain proprietary things. And with having so much in the open source world, with Comma and and OpenPilot, OpenPilot. I'm I'm kind of, interested if maybe from from a from a high level perspect perspective for those out there that are listening that might not have an idea of, like, what what sort of architecture and kind of main components are part of a self driving or autonomous system. Like, OpenPilot is maybe a piece of that and fulfills a role. You have the devices.

Daniel:

Could you could you just give us, like, a mental model for how to think about how these pieces fit together? What's where and what components are needed to make, you know, the system the system work, I guess?

Harald:

So we're talking about ours in particular. Right? Or just in general? Yeah. Sure.

Harald:

Yeah. So ours in particular, have a device that you can install in the car. The the device has compute and it has, some cameras and then some other sensors like GPS and IMU that are not necessarily that needed, but they're in there. That device then, runs some machine learning models by looking at the road that tells you where roughly to drive and, concretely that is a longitudinal acceleration and a, curvature of the road. So that's like an angle of the steering wheel.

Harald:

I'll talk about how that works internally later, but in runtime, this is really all that happens is there is just a machine learning model that takes in the, video input and outputs those actions of acceleration and, curvature. And then that goes to some API that, you know, we kind of develop that interfaces with all these different car models. So we reverse engineer the canvas of the car and so we can understand what messages we need to send to command steering, gas and brake and do all the auxiliary stuff of like the engagement state, you know, all those kind of things that are needed to do autonomy in a car. So that's basically what's happening at runtime. We just have a device, it runs some models, it outputs the actions to take and then there's a car API layer that, you know, is reverse engineered for every type of car we support that sends messages on the CAN bus to do the certain actions.

Harald:

Then those models themselves that we train, so we've been kind of talking about end to end training for a long time. So they're end to end in the sense that they take in video and they output actions. There's no intermediate space of, you know, our cones detector, traffic lights detected. That's all doesn't exist. And very high level, our training stack looks like we have hundreds of millions of miles of humans driving.

Harald:

We sample some of that. We teach the model, okay, if a human's in this situation, this is the most likely trajectory they're going to take. If you train that directly, get a system that doesn't really work. This is like a well known machine learning problem that's not necessarily well understood, but you can't just do imitation learning and expect things to work in the real world. You need to expose the model during training to mistakes and show it how to recover from those mistakes.

Harald:

And we do that by training in a simulator where we can introduce, you know, going off the center of the lane line and then we can supervise, okay, this is how you would recover from this. That's very high level how how things are trained and how things works. I don't know if there's anything I missed there or any questions that we we can discuss there.

Daniel:

No. I was just wondering just on that note, the OpenPilot project, that is part of that, I guess, the software stack that interfaces with the with the models and executes kind of the policy and interacts with the car API or does that fulfill a a different role? Because it is kind of more broadly a project related to kind of autonomy and robotics generally, or or am I misunderstanding?

Harald:

Yes. So, I mean, the goal of OpenPilot is to be general robotics, but we're clearly focused on driving right now. But a lot of the things that are written in OpenPilot are things like, okay, well there's a UI and then there's a localizer which is written in classical code that like takes in all these input sensors and makes the best estimate of the current motion of the device. You know, that's robotics for driving. Then there's, like you said, there's this whole layer of interfacing with the car.

Harald:

So that's also part of OpenPilot, how do you communicate with the car and manage like the state machines. And then in general, it's also an operating system, right? So there's many different threads and processes running and those intercommunicate and those need to get managed. And so that's OpenPilot. But a lot of the decision making about what happens and how to control the car essentially happens inside the neural network.

Chris:

It's pretty cool. I'm curious, I think the space that you've chosen in terms of addressing the need and, you know, where there vehicles without autonomy capability altogether, traditional, below you, and then kind of above you, there are the fully integrated vehicles. We talk about things like Teslas and their competitors, where the whole, the hardware, software is all working together for the whole vehicle, and that gives you another level of capability. How do you target the level of capability that Comma is addressing in terms of when you're doing this, you know, what is kind of an add on to an existing vehicle that doesn't have any autonomy capability, like, how do you think about the problem of what you can add value to by bringing autonomy capability, and what also, like, what's the constraint? What's too much without having a fully integrated platform from the get go, you know, that you're manufacturing out of a factory?

Harald:

So just to be clear, our mission is to solve self driving cars, while shipping intermediaries, And now that, you know, we're kind of evolving that to solve robotics. So that is what we see everything as. We're trying to build solutions that, you know, genuinely do solve the AI and the applied robotics problem in a real and genuine way. That's the starting point of how we think about it. It's not that we think like, oh, we we don't really start from the product or the user side.

Harald:

But given that starting point, we then think, okay, if we wanna make progress on the robotics problem on self driving cars, you know, how do we have intermediaries that make sense? We don't wanna be, you know, doing research in a lab and saying like, oh, we're never gonna release everything until it's done. We wanna make progress. And at the meantime, be able to ship useful features. So we really focus on how do we create some kind of end to end solution that's scalable and can be, you know, the most intelligent driving agent that can drive better than a human.

Harald:

But then we just kinda see, okay, given this system, can we apply this in a way that's useful to people? And then we kind of figure out, you know, which cars does this work well enough on that we that that can be supported. Because you were talking about constraints. Like, there there are some cars that have certain constraints that make them unusable for Open Pilot, and so then that's not as interesting. Does that kind of answer your question?

Harald:

I'm I'm not sure.

Chris:

I think so. Yeah. Definitely. I and I'm glad you expanded on that. I may have have had too narrow a vision in my understanding of of what you were addressing there.

Chris:

So I I like the way that you are kind of iteratively solving that problem in the large.

Harald:

Yeah, so I mean the distinction is I think we're trying to make incremental steps towards some late stage solution that also have incremental usefulness to people, I think that's kind of where we differ from a lot of other people. I think a lot of other people are like, you know, everything or nothing kind of approach. I think that's just generally a bad approach, I think. We wanna make money along the way. We wanna pay our own bills.

Harald:

We run our own data center. And you know, those limitations obviously mean that we run with 100x less compute in our data center than, you know, Waymo or Tesla, and that obviously has some side effects. But I think in the long run, this isn't really a big deal, as long as we're making progress towards some solution and, shipping useful products, and I think the long term future looks bright.

Daniel:

And I'm not sure, you mentioned a couple times this kind of idea of end to end, which you mentioned was, you know, part of what you've been talking about for some time. I some listeners might, might not totally get kind of the implications of that. So if if you could maybe talk through like an end to end model as in as in your case and how it might compare or contrast to other other approaches that maybe have been used in autonomy that wouldn't be considered end end to end solutions.

Harald:

Sure. Yeah, end to end, it kinda it's a matter of interpretation. When when I say end to end, what I mean is we have examples of humans driving competently, by just recording them. That data contains the information about how to drive, and we want to let a machine learning model distill that information and learn how to drive. So directly taking in just raw sensor data and outputting a policy that looks like that or some really good subset of that.

Harald:

In contrast, something that would not be end to end at all, for example, is something like some kind of segmentation, semantic segmentation network that detects where all the lane lines are, detects where all the traffic lights are, then builds this huge grid, And then you write some algorithm and it's like, okay, well, don't touch lane lines, don't hit kids and don't crash. And then that goes through some kind of optimization, you know, that's again, hand tuned by someone and produces some trajectory. We've been focused on the end to end approach for a long time. You know, end to end has generally made really fast progress over the last several years. So basically everyone's interested in end to end to some extent.

Harald:

What people have actually shipped is more some kind of mixture now. Waymo is I think slightly less end to end. They're doing end to end research, but they still have a lot of classical detection. Think that's relevant. Tesla, I think has a little bit on the back end, but they're also trying to shift fully towards end to end.

Harald:

But, yeah, I think that's kind of the state of things.

Chris:

As you look at the it kind of the trajectory that you're that you're currently on and the at both, you know, in terms of where you've come from and kind of where you're at now and into the near term future with the recognition that the vision is to solve, you know, autonomous driving kind of in the large, to paraphrase you. Where do you see yourself now on that? And kind of and also, like, in the way that you're approaching it, why are you at where you're at now, you know, you know, on that path? How are you envisioning your approach to this to a solution compared to the Waymo's and the Teslas and the others of the world and the fact that they have different different approaches there?

Harald:

Yeah. Mean, the difference is a lot of it's just constraints. Right? The reason we're so passionate about this end to end thing is because it requires less human effort to leverage the same levels of capability. Waymo and Tesla for a long time were able to get capability by doing things like having humans hand label things in scenes, by, you know, piping back data that showed a lot of uncertainty, and then humans would label them.

Harald:

They had like this whole data engine thing. You know, they all have they've used many different approaches to patch the holes in what is, I think, everyone's kind of vision of this end to end thing. Whereas, you know, we're constrained, we don't have that sort of we wanna be profitable, don't have that sort of money. So we've been focusing on this idea for instance beginning because we think this is the right long term strategy. I think most people see this as the right long term strategy, but I think because we're so focused on it in just strictly the strategy sense, we're just, you know, slightly more sophisticated.

Harald:

But, you know, in the capability sense, I I think not quite. I think so one big innovation that we have, and I think we're the first people to do this at this point, is our models are trained in simulation, but they're not just they're trained in a machine learning simulation. So, you know, those like video generations you see, like Sora, all those types of things, you know, we train our models now in a diffusion simulator, the videos are all generated. Again, Waymo and Tesla are exploring these things, but they haven't been as focused on it, I think they've not quite shipped this yet, and they also have a higher bar to meet before they can ship that. But for us, this is the most efficient way to make progress, so I think we're just we're slightly closer to this kind of, end vision of the strategy.

Daniel:

Could you dig into that a little bit? Because I think that is a really interesting point. And I think some people have maybe on the hype side of things heard mention of, you know, a world model or other things like that. You know, you talk about, your world model in most recent release blog post and this fact of how, you know, the the agent that you've released is maybe the first to your knowledge that is kind of fully trained in this learned simulation. Could you could you just pick apart that a little bit for, you know, maybe more on the practical side for for our listeners in the sense of like, what what what do you mean by a world model?

Daniel:

How how is that problem actually like, what are the challenges associated with that problem actually creating that world model? That might be a a first place to to start because even before training the the policy model or the the the model that you end up wanting to use, right, somehow you have to if you're gonna train it in that simulation, you have to have a really good simulation. So Yes.

Harald:

Yeah. Yeah. So, I mean, I talked about in the beginning, this is just an assumption that we have to make is that it's it's impossible to just do imitation learning. You have to do some tricks on top of that, to get something to recover from mistakes and not drift out of the lane line. And everyone has kind of different strategies on how to deal with that.

Harald:

Like I said, I think we think the long term strategy is you just need a really good simulator. And we so our solution has always involved training things in simulation. It's just our previous simulators were like classical, they would estimate depth and then reproject depth. This has artifacts. This is you make some assumptions that don't hold.

Harald:

So you want a a really good simulator that can capture the world completely. And obviously, the if we think about end to end again, the end to end way to do is just tell a machine learning model, okay, you know, try to simulate the world. The difficulty here is, okay, first of all, you need to make video that looks somewhat realistic because otherwise, you know, you have all these artifacts that can be exploited. And I mean, you know, it's only in the last couple years we've gotten anywhere close to that. And then the the other issue the other big challenge which is where we differ from, you know, most video generation models is if you want to make like a robotic simulator with this approach, you need it to be accurate in terms of responding to inputs.

Harald:

If if you tell your simulator, okay, car turns left 10 degrees, the simulator actually has to then produce video that reflects that left turn 10 degrees. It's not enough for the video to just look realistic if it doesn't respond accurately to the inputs. So I'd say those are the two challenges in making a simulator. It has to look photorealistic, has to be obviously as diverse as the real world, and then it has to respond accurate to inputs. So that's the challenges we were working on.

Harald:

Video photorealistics kinda solved for us. Right? A lot of people are are working on this. And then I think where we kinda had to where where our stuff kinda deviates from all the published papers and stuff is is is trying to make it respond accurately.

Chris:

I'm curious. As as you implement the notion of a world view, how are you how are you approaching it? Is it is it more of a training mechanism, or is it something where you're doing real time planning, you know, en route? How how does it fit into the overall architecture, and how are you guys using it?

Harald:

The world model? Yeah. Yeah. So so the world model acts as a in our case, it acts as a simulator. So you can just initiate, instantiate it on some real video.

Harald:

And then basically you take control in simulation and you give to the world model actions like turn left 10 degrees, and then the world model will produce the next image. So that's one aspect of how it's used. Additionally, the world model also supervises the recoveries, during training. So the world model that's producing these images also sees the future. Like there is some real future attached to this scenario that we're making.

Harald:

And so we can then introduce deviations and the world model who sees the future can also say, here is a likely trajectory to get from this deviated state that we are now to this future. Sorry. And this is I'm I'm trying to explain this in a way that's not extremely confusing, but it is just extremely confusing. So

Chris:

that is

Harald:

how our training training stack works is that the world model simulator produces the images and supervises the recovery trajectories that then are eventually what what gets into the car.

Daniel:

It and on that point of kind of the trajectory to what gets in the car, one of one of the things that might be helpful to highlight it, yeah, like you mentioned, like we've mentioned different models, the the the world model, the model that's on the car, the kind of harness around that, if you will. Am I correct? You also mentioned having a data center. Are you am I correct that, like, that data center and kind of centralized infrastructure is mainly geared towards kind of that that training and simulation. And then a lot of the sort of real time inference, for individual cars and and decision making, does that happen on on device or?

Harald:

Yeah. Yeah. Exactly. So the everything at run time is strictly on the device. The device doesn't need any connectivity other than updates or to send data back, so we can use it to train.

Harald:

And the data center, we don't run any user facing services there. That's It's just the training data center that runs all these different training things. And yeah, just to be clear, so there are two main models involved, which is the world model that acts as the simulator that we train inside of, and then there's the agent policy, which is in comparison a really tiny model. That's the one then that trains inside the simulation and get shipped to the, to the devices.

Daniel:

Yeah. That that's interesting. What what have you I guess devices have obviously advanced in also, like, in addition to some of these things, like your ability to create more realistic imagery for, like, the simulations and that sort of thing, obviously, hardware, or or maybe capabilities of running models within kinda edge environments has advanced over the years since since you started this. Could you give us a little bit of a picture of of that world and maybe what's different now versus when you started this, both in terms of what you need to do to run this sort of model in in that kind of edge environment? Maybe the the tooling or ease of doing that now versus, years ago?

Harald:

Sure. Actually on our product end specifically, there's not been as much progress as in other places. We still run a relatively old chip Because our device sits on a windshield, there's quite a limitation to how much heat we can generate. So we haven't followed really the trend in how much these other systems have increased their compute power. That's something we're now trying to solve by we're gonna sell an external GPU that you can plug in that can, you know, be under your seat or something.

Harald:

That way we can kind of match the compute that things like FSD and Waymo use. Right now we use probably less than a 100 x a hundredth of of what what an FSD computer has. There's definitely been increase in efficiency, you can you those those cars have better chips than they used to. But a big change has just been that they put in more power hungry chips than they used to. I think this has been more of a recognition that it's worth spending what's there.

Chris:

I'm curious as you're as you're looking at that as a possibility in terms of of upgrading the hardware, the processing capability on board, what, like, what is an immediate term type of capability that you would add to your stack? So compared to, like, if I'm looking at the Comma website and seeing the demos that are listed there, if you could, like, have a quick hit by having extra compute, what are the things that you guys are talking about at least in the open that you can share with the public that would be good for that?

Harald:

I mean, so we just started working on this, but like our tests currently indicate that it recognizes green lights in nuance situations twice as much if we have a 10x bigger model. And like an external compute unit, a GPU like this would give us 100x bigger model in the limit. So that that's roughly the type of improvements I think you can expect. And that's also roughly I think the scaling that exists, which is, you know, we're a hundredth of the compute of an FSD computer. An FSD computer is more capable, but on the highway, you're not even really gonna notice the difference.

Harald:

You really need you need exponential increase in compute to have like these marginally noticeable gains. But I think, know, at a 100 x with the external GPU, that's the roughly the things that we're talking about. It's like twice as reliable at detecting, you know, lights in in in weird nuanced situations or maybe if we optimize it a bit a bit more.

Daniel:

And could you talk us through, I guess, more of the user experience side with within the car. So if I if I'm installing Comma four, like, what what does that experience look like for me? And how do I like, typically, if I, for example, imagine I'm driving a car with lane assist, right? I I know it's on maybe via an icon and, you know, I feel the the steering wheel move in in a Tesla sort of autopilot, etcetera. There's different kind there's a different kind of experience.

Daniel:

Right? What what does the user experience look like from this standpoint in terms of the kinda putting putting the device in and and what happens as I as I utilize the the system?

Harald:

Yeah. So, to install it in in most cars, there's just by the rearview mirror, there's a canvas connection. So you take the trim cover off, you plug in our thing, and you stick it to the windshield, and that's pretty much it. I mean, that is it. So then it's connected to your car.

Harald:

And then, as for engaging, we we just use the, CAN signals of the, engage button of the cruise control of the car. So you would essentially engage cruise control. You would get feedback from our device. There's a UI. There's a sounds that you get, but but that's how you interface with it.

Harald:

And then, you know, like I said, we are extremely reliable on the highway. Over 50% of miles of our users are driven on the highway, driven by the system. And the thing that we're really focused on now is we wanna get really smooth, like red light behavior, like in the city driving really smooth. We've had that in like beta mode for a couple years, but we're trying to get that to a point where it's like so reliable and so comfortable that you prefer to your own driving, which is the state it's in on the highway.

Chris:

I'm I'm curious as you are as you are kind of taking this and and kinda super you know, giving superpowers to your existing car in terms of what it can do, How are you thinking about as you talked kind of at the beginning of the conversation about the larger problem, of solving, you know, for autonomy and self driving, what are what are some other use cases that you think this would apply to fairly easily without a big jump or a big development ever? Something where you could say, okay, this works here. I can put it over here and over here too. Do you have any thoughts in mind, and does the company have any intention of of of looking at alternative use cases as kind of opening up new lines of business or anything?

Harald:

Yeah. I mean, we're we're open to this. And I'll I'll give you a very concrete example of something that we worked on. We use this world model simulator so that you can simulate scenes. If you had a simulator where you had to manually put in traffic lights and assets and say the timing of the traffic light, that obviously wouldn't expand to anything except a road.

Harald:

We're also interested in indoor robotics. And I think the first challenge in indoor robotics that's basically not been solved in some kind of machine learning way is indoor navigation. The most basic thing you would want your indoor robot to do is to just drive around, figure out what like, where it can go in the house, you know, what the map of the house is. And, know, your Roomba or I mean Roborock, whatever is the best now, could do this, but not in some kind of end to end machine learning way. You would have to hand code a LiDAR SLAM algorithm or a Vision SLAM and probably still have to hand do some stuff that's specific to a house.

Harald:

So as like a short project, we we translated all of our stuff to, you know, indoor and just had a robot drive around indoor and try to path plan indoor. And, like, that kind of worked. I'd say the machine learning state is not not good enough yet to make that reliable, but that's kind of what we're imagining is, as we create these things more generically, these will transfer easier to, for example, indoor navigation. The next thing of course is action. In this case, I'm just talking about driving, but at some point you want your robot to do something other than just drive.

Harald:

And then you're talking about things like, you know, let's say moving an arm or grasping something or moving your head. This would require some like really high level machine learning approach that treats the driving actions of curvature and longitudinal acceleration the exact same way it would treat like the movement of an arm. So that's the direction that we're thinking. Machine learning is we are not at all in a state in robotics where this is close. But I think that's kind of, our long term view is how do we create a machine learning system that can understand that moving an arm is is in principle a similar thing as moving a steering wheel and learn about it in the same way.

Harald:

Right now that's far away.

Daniel:

What what do you view as outside of maybe larger simulations or more training, of the same types of models or the same architecture? What are the kind of, I guess, outside of the box things that are the unsolved problems that that when you're kind of thinking about the the solution space that you're working in to get to some of the, the larger vision that you're talking about? But what are some of the main, I guess, the main research areas that are that are still unsolved that are most relevant to to this to to this type of problem?

Harald:

Yeah. I'd say there's three things. There's controls, RL, and continual learning. I think those three things are necessary for, like, this kind of end vision of robotics, and they currently don't work at all. None of those three things work at all.

Daniel:

And could you you break down a little bit, like, what you mean by each of those those things? Yeah.

Harald:

So controls is is one that we particularly deal with much more than other companies because we don't have any control over what, you know, the car how the car responds to requests of steering and gas and brake. And in general, the cars that we support respond very poorly. You will ask, you know, turn put stock on the steering wheel and it will do it delayed. It will not do it the way you ask it. It will do some weird internal logic that we sometimes don't fully understand.

Harald:

So we deal with very, crappy controls essentially, and we solve those problems with classical control solutions. Machine learning, we've tried this so many times. We have open challenges about this and as far as I understand, no one in the research community has made significant progress on this either. When it comes to low level controls, machine learning just there's no good solutions. I think you need something that looks like RL probably to solve that.

Harald:

To give you an example of what we do is we we learn the tire stiffness of all the cars that it runs on and it has to learn it live on every car as well as like the friction coefficient of the tires. Without that, we can't get good control. And that stuff is all, you know, classical classical optimization. No machine learning there. So that's the controls thing.

Harald:

I don't know. Any questions about any more to elaborate on that? To me.

Chris:

It does. I'm kinda curious. I actually wanna go back for a moment to something we were talking about early in the conversation that's been on my mind to ask, and there wasn't a good moment before now, so I'm just gonna dive back. And that is, when you guys first decided to write your own software in the form of Open Pilot, I'm curious, a two part question is, why did you elect to do that from scratch at the time? What was that motivation that you had on that versus like using ROS, the robotic operating system these days, you know, obviously, ROS two is replacing it, you know, or some other alternative that are out there.

Chris:

And then as a tag on, if you will, what was the what drove the decision to open source it versus keeping it proprietary, upfront when you were in those early days and stuff? Because I see that you are, you guys, you know, you take a lot of pride in being an open source company and that we we certainly are very supportive of that. We'd like that. But I'm curious what your motivations were on on writing it and making it open source.

Harald:

So the why not use some of these decisions predate me, I don't really can't really say exactly what the thought process Totally fair. But in general, OpenPilot is extremely efficient. When it comes to like inter process communication, you know, I think OpenPilot does this better than anyone else, including Ross. I don't know what exactly the details are. I think there's some stuff about we have, you know, zero copy zero copy messaging and stuff like that that just makes it more efficient.

Harald:

I mean, we run on a not that new phone chip. Right? So these these things do matter. Yeah. I I can't say that much more about ROS than than that.

Harald:

As for the open source aspect, it it is pretty important for it to be open source because there are many cars supported and the community helps port them. Right? If we had a closed source stack that interfaced with the cars, I mean, this would be extremely difficult for anyone to add support for a new car, which is a pretty important part about the ecosystem. So that was kind of a requirement from day one. Then there's parts that don't need to be open source for the whole kind of ecosystem to work like the machine learning model running and all that.

Harald:

So that's just more from a principle point of view. I like open source. Everyone that works here likes open source. I think if you buy a device and you don't get to control what runs on it and you don't get to know what runs on it, I think it's you should question whether you really own the device or you're in some weird contract with someone. So I think philosophically, we're very much a pro of this.

Harald:

We've generally become more open source over time. Like people complain that our trading stack is not open source. That's not something we're against. We wanna open source more stuff. We just open sourced a lot of the data center management stuff that we have, which is all, you know, stuff that we need as well.

Chris:

And I have one other quick question I wanted to throw in, and I'm looking at the OpenPilot GitHub, you know, at the bottom where it kinda gives the percentages of different languages and stuff. As an approximation, I noticed you guys are kinda rough with other things thrown in, you're really two thirds Python, one third C plus plus I'm wondering if you, as people one of the things that we've been doing a lot lately on the show is we've been hitting different autonomy stories and kind of kinda sharing a little bit more about autonomy as as a major use case for AI. I'm curious, how do you how do you like, obviously Python being the primary language these days for AI and stuff, but how do you differentiate what needs to be in Python versus what needs to be in C plus plus these days or in other, you know, obviously there are now alternative high performance languages out there, but like how do you think about that architecturally as a CTO in terms of saying this is where we wanna play for this particular type of function? Do you do you have a methodology or a philosophy around that?

Harald:

I mean, I think just everything that can be in Python should be in Python. I think that's the that's the answer. Ultimately, most of the development happens in Python. If you're debugging, that's most likely in Python. The machine learning training is in Python.

Harald:

The more other things are in Python, the easier it is to go from experiment to shipping, and that's kind of what we wanna optimize. Some things cannot be in Python for various reasons. You know, there's a whole layer that runs with interfacing on a car on a separate chip that has safety implications, that's written according to specific standards, that's written in C. So there's not a Python option there. Sometimes there's performance reasons why something can't be in Python, but very I'd say, especially since most of the compute happens in neural networks anyway.

Harald:

So yeah. And

Daniel:

and just just circling back, making sure we get kind of those other two elements. You had mentioned these three kind of open challenges, I guess. The first around controls. I believe the second was RL. I I, I'm blanking on the the the third you mentioned.

Daniel:

Can you hit those last two? Yeah. Yeah.

Harald:

Yeah. So, I mean, they're all very related, I think, those three things. So I talk about controls, I think the solution to controls is some kind of RL. I think what's special about controls is that imitation learning doesn't really work at all. And imitation learning is basically how we got all of the machine learning progress that we've gotten.

Harald:

Right? You you learn on some large corpus of tokens, and now all of a sudden your your model is smart somehow. But when you're talking about like tight feedback loop stuff, the imitation learning doesn't work there. You need RL. And by and large, RL just doesn't work.

Harald:

You know, there's some types of RL that they use in the LLM stuff that from what can what I've read not isn't exactly the type of RL that will work for controls. Now we've got like, we've got RL strategies that seem to work for the humanoid robots. As far as I understand, like a lot of the cool tricks are some type of RL in a very constrained, very accurate simulation environment. But, yeah, it's it's RL is just not in a state where we can say, okay, have this reward function to optimize, which in our case is, you know, don't oscillate the steering wheel and also do what you're asked to do. Very simple reward function is not trivial to optimize that in a noisy real world environment.

Harald:

So that's one thing. Yeah.

Chris:

Yeah. Keep keep I'm sorry. I was just interesting. Keep going. I kinda cut you off right there.

Harald:

I know. That's okay. But but yeah. So that controls RL is kind of related and continual learning comes in there too. Because like I said, there's many things that make controls a problem that needs to be learned live on that car, on that drive even.

Harald:

If you inflate your tires, it will affect how OpenPilot drives and you need to learn that. We do continue learning now with a lot of classical optimization but that ideally would be a smart neural network that can understand all those things. If it starts raining and you start losing traction as a human driver this is something you notice and adjust to but modern machine learning strategies can't do that.

Chris:

So as we this has been really, really cool. Love having conversations like this, and thank you for for sharing your knowledge. Love the fact that it's open source so we can really talk about it in-depth. As you are kind of going from, know, we've we've got a mostly talked about kind of where you're at now and some of the opportunities that might be near term and stuff, but I'm kind of curious, to put you in a different frame, if you're, you know, the work day is over, you're kinda doing whatever you do to chill out at the end of the day, what is in your head about places to go with this kind of technology going forward? And I don't mean things that necessarily Comma's gonna do right now, or even things that you have on the plan, on the roadmap, but more specifically, like, when you're just thinking about that would be something I would like to evolve to, to get to at some point, you know, maybe without a known path.

Chris:

What are some of those things that you could see, you know, in your capacity as a CTO of an autonomy company? Like what what are the types of things that excite you that maybe the rest of us aren't as familiar with or, you know, we haven't thought about that, that you'd like to see this grow into? Any any of those kind of dreams and aspirations you can you can share as we close?

Harald:

Sure. But it's it's not gonna be that visionary. I think it's Okay. Fair enough. I No pressure.

Harald:

I think I'm I'm very pragmatic as well. I think I want very simple things, which is, you know, I really like my dishwasher. I really like my vacuum cleaner. I think they make my life a lot easier and I want more tools like that that use some kind of technology to make my daily chores and stuff that's just annoying, you know, like driving and stuff easier and those things will happen. I hope they happen as soon as they can.

Harald:

These are really, really hard problems and, and when they do happen, I I want them all to be open source such that you can own them, you can control them, you if there's spyware in your house, you can delete it. So yeah. I'd say that's the general vision. I just want simple robotics projects products that that make your life easier in ways that it's tedious and that aren't owned by a big corporation.

Chris:

Good answer.

Daniel:

Yeah. Yeah. I think that's a that's a very appropriate and and good way to close out, today. Harald, it was it was really nice to have you on the show. We look forward to having you you back sometime to talk about all all the cool things that that will happen, I'm sure, this year and and in future future, road map for for Comma.

Daniel:

Thank you so much for the work here, and, hope to talk to you again soon.

Harald:

Cool. Yeah. Hope so too. Thank you very much for having me.

Narrator:

Alright. That's our show for this week. If you haven't checked out our website, head to practicalai.fm, and be sure to connect with us on LinkedIn, X, or Blue Sky. You'll see us posting insights related to the AI developments, and we would love for you to join the conversation. Thanks to our partner Prediction Guard for providing operational support for the show.

Narrator:

Check them out at predictionguard.com. Also, thanks to Breakmaster Cylinder for the beats and to you for listening. That's all for now, but you'll hear from us again next week.