Tiny DevOps

In this episode, Luca Ingianni helps me tackle the question: Does DevOps make sense for embedded systems software?

Show Notes

Luca Ingianni is a former aeronautical engineer turned IT and DevOps practicioner.  He is a teacher and advisor on a mission to teach and advise engineers to apply DevOps ways that works best for them and their customers. He is also the co-host of The Agile Embedded podcast.

In this episode, we talk about applying DevOps principles to "non-standard" technical stacks, particularly to answer the question: Does DevOps make sense for embedded systems software?

Today's Guest
Luca Ingianni
Co-host of The Agile Embedded Podcast
Personal web site

Resources
Book: Test-Driven Development for Embedded C by James Grenning

Watch this episode on YouTube.

What is Tiny DevOps?

Solving big problems with small teams

Jonathan Hall: Does it make sense to do DevOps or to what extent does it make sense to do DevOps if you're working on an embedded system?

Voiceover: Ladies and gentlemen, The Tiny DevOps Guy.

[music]

Jonathan Hall: Hi, everyone. Welcome to another episode of Tiny DevOps, the show where we believe you don't need a thousand engineers to do world-class DevOps. I'm your host, Jonathan Hall. Today I have with me a special guest, Luca Ingianni, who is the co-host of a couple of other podcasts, one on embedded agile, which is closely related to what we're going to talk about today, DevOps on embedded systems.

The reason I want to talk about this is one of the common complaints I hear people make about DevOps, or maybe not complaints, but detractors, is DevOps only works if you're doing web software, or if you're doing SaaS or if you're doing something like that. I understand why they might say that because certain types of software don't have an operations component or at least not the same type of operations component, but I want to challenge that assumption.

Honestly, it's the assumption I've made in the past. One of my email bootcamps, I make a claim about, I think it's continuous deployment, and I say, on the other hand, if you're building firmware for diesel fuel injectors, or pacemakers, or self-driving cars, these rules may not apply to you. You have more experience in this area, so I'm hoping you can either tell me I'm completely wrong and we can still do DevOps in these areas, or maybe I am actually right and I just happened to guess right.

In any case, today you're the expert on this topic. I'm hopeful that you can shed some light, but before I go too far to that area, Luca, why don't you introduce yourself and tell us about what you do.

Luca Ingianni: Hi, everyone. I'm Luca Ingianni. I'm actually an aeronautical engineer by training. I just tumbled into IT and never quite found my way back out, these things happen. Maybe that has influenced my thinking a little bit and my perspective on a lot of things that I see in IT. I started my career in embedded systems like helicopter avionics, that sort of thing.

I sort of, by accident or by necessity, stumbled upon what I now know is called DevOps. Even though I wasn't familiar with the term at the time, it just came about as something that felt necessary for me and my colleagues to improve our practices and get better at building helicopters or whatever it was we were doing.

Then eventually, of course, I found out that somebody had stolen all of my ideas and called them DevOps. I seem to have a knack for explaining engineering techniques, thought models, et cetera, to engineers and to non-engineers. This is what I've made my professional. Instead of being a mediocre engineer, I try to be a somewhat better trainer, coach, mentor, whatever you need to improve your development practices.

Jonathan Hall: When did you discover DevOps and discover that somebody had stolen your ideas? What timeframe was that?

Luca Ingianni: Of course, it wasn't like a single instance. It created over the years. I know that I used Hudson back when Jenkins were still called Hudson, that must have been like 2010ish, to do continuous integration back when that was still something very few people considered seriously just because I felt that I needed to have some technical underpinning to improve my cycle times.

Then, of course, over time, I came to the realization that this only gets me so far, I actually need to improve the practices of me and my colleagues, all the other stuff that surrounds that technology. I guess that took me a couple of years, probably until 2015 I would guess, until I realized that, by accident, I'd become a DevOps guy.

Jonathan Hall: I think that's a common story among people who are doing DevOps, that we were doing all these things all along and then we realized there was a name for it. Before we dive into the details, just at a really high level, does it make sense to do DevOps or to what extent does it make sense to do DevOps if you're working on an embedded system?

Luca Ingianni: I guess in order to answer that question, we need to take a step back and agree on some kind of a definition for DevOps, because I think this is what some of the confusion stems from. Like, "I don't think you will ever see Kubernetes installed on a fuel injector," or I sincerely hope that will never happen.

If we think of DevOps as a set of technical practices, continuous integration, I don't know, Kubernetes type stuff, whatever, then maybe you can make the case that some of that is not technically suited very well to an embedded environment, but if you broaden your view and you say there's a lot of processes attached to this, there is a lot of mindset of culture attached to that, which is the view that I take, then, of course, it makes perfect sense because at the heart of it, DevOps is just this mixture of lean and agile and all those practices that come from traditional engineering and are at the core about risk management.

Of course, that's a good thing, no matter what you do. Certainly, your practices may look a little different but all of the fundamental thought processes apply just as well. Why wouldn't you do continuous deployment for an embedded system? Especially if it's something as benign as a Kiosk. Maybe if it's a pacemaker, then you should be more cautious about what you're doing, but even in the context of a pacemaker if you've done your job right, there's actually nothing fundamentally scary about continuous deployment to pacemakers, because at some point you're going to deploy anywhere and you're going to be as certain as you can that you've done your job right before you do. The pure act of deploying doesn't change that, either you missed a bug or you didn't miss a bug, whether you have a manual deployment decision or not.

Jonathan Hall: That's one thing I always try to drill into the teams I'm coaching on, especially continuous deployment is putting a manual check in there usually doesn't actually add any safety. Somebody is making a choice one way or the other. They're either making it when they hit that merge button, or they're making it at some point later, probably removed from the code they're creating, and making it with less information. It sounds like you agree with that, even in the case of a pacemaker, it still is true.

Luca Ingianni: Yes, of course. All of the risk mitigation happens long before somebody clicks deploy or doesn't click deploy, doesn't matter to me. All of your tests must have concluded long before that, all of your safety critical designs must have happened long before that.

Jonathan Hall: That's great insight that your risk mitigation must happen before.

Luca Ingianni: To speak to your thing about when the deployment decision happens, I think with continuous deployment, the deployment decision happens long before somebody clicks on merger or whatever, it happens as you install your continuous deployment script. That is when you make the business decision that, "We are going to deploy as soon as we are technically able to."

You could make other decisions. You could make the decision of to wait until it's Monday morning, for instance, so everybody's in the office and you can catch issues as soon as they appear, that's also a valid decision. You made the decision to not wait at all. Business decision, not a technical decision, not a risk mitigation decision.

Jonathan Hall: Talk about that. You say it's not a risk mitigation decision because most people frame it as such, where are they wrong?

Luca Ingianni: I understand those people. It does feel scary, doesn't it? To not have any way of stopping this train once it leaves the station, but you really have to ask yourself what are you gaining from waiting until a later time? If you do some manual exploratory testing before you actually deploy into production, okay, fine. Then you do, in fact, mitigate some risk, and that's a valid decision, nothing wrong with that, but if you don't, if all you do is stick your newly created artifact in a bucket, and then at some later point reaching to the bucket and sticking into production, you might as well reach into the bucket right now. There's no difference. There's only a difference in your head.

Jonathan Hall: Do you have an example, maybe a story of a time when you've been working on an embedded system and you started to apply DevOps principles that you could share with us to give context to the rest of this conversation?

Luca Ingianni: One project that I worked on a couple of years ago, I tried out a lot of this DevOps stuff and I had just realized that I was doing DevOps, so I was all in on the DevOps thing.

Jonathan Hall: Of course.

Luca Ingianni: Anyway, let me explain to you what this product was. That was actually fairly awesome machine. It's called a way a Vonda. It's, broadly speaking, it's a kind of industrial robot that is used in microchip production to make electrical connections from the actual silicon chip to the pins. They are connected with tiny little wires, literally hair-thin wires placed with micrometer precision, ultrasound welded on, then you put the wire on the other spot, weld it on there, and then jack the rest off.

I didn't do that very quickly, of course. So awesome machine, really precise, really fast, lots of confusing constraints. You can't just move your robot arm whichever way you want. Maybe you might bump into other components and that would be really bad. Very interesting machine. As being a proper machine, it had a lot more complexity to it than only software.

Of course, we had complex software with pathfinding algorithms and we needed to-- I don't know, calculate breaking points and sense the surface and all kinds of things, but also we had electronics in there from power electronics, big, strong servers that literally took somebody's finger off ones, to tiny little sensors, and very sensitive scales, and we had mechanics. We had hardware. We had a big beefy frame. Imagine this thing is as big as a cupboard or something.

Jonathan Hall: When you joined, what was the state of operations development? What were the practices they were going through when you joined and then how were you able to improve that?

Luca Ingianni: This was a fairly remarkable company. They're fairly successful. They are one of a few companies worldwide that are able to build those machines, but they were very set in their ways. As it often happens in embedded systems, you get electrical engineers and mechanical engineers and somebody finds out one of them can code on top of Pascal or something. They're now the head of the software department.

They were very behind as often is the case in the embedded industry where-- The electrical engineers who wrote the firmware didn't have version control. That was in 2018. Actually, it is a testament to the skill that they were able to not make a mess of it, but by way were they making their life harder than it needed to be.

Essentially, there was nothing. We had very little in the way of formal QA, we had very little in terms of version control, in terms of software development processes. I was trying to set some of this up and, of course, that wasn't my official mandate. I was there as a requirements engineer, but they just drove me mad. I needed to do something about this.

We got to the point that we had automated unit testing. We had automated integration testing and also testing on the target system. In embedded systems, typically, you develop on a PC and then you flash on to a potentially really tiny microcontroller with a kilobyte of RAM or something. That's called the target in embedded systems speak.

We had testing on the target. We were working our way towards actual product level tests, BDD style tests. There was a lot of movement across the board, I think, towards more modern practices.

Jonathan Hall: Were you doing continuous deployment, continuous delivery?

Luca Ingianni: Continuous deployment didn't make sense just from a standpoint of the hardware landscape. Our devices were not set up for that. We could have rigged something to make it work, I don't know, with Docker containers that contain the actual firmware or something, but we didn't get that far and didn't make sense because most of those industrial systems, if they are networked at all, they are on very separate networks, not connected to the open internet.

At some point, a truck would come and pick out a device up and drive it away. After that, you don't get to deploy anymore, unless you've given support engineer an USB stick and send them on a plane.

Jonathan Hall: Were you at least able to do continuous delivery in the sense of every build would create an artifact that was deployable?

Luca Ingianni: Yes, we were, but that was the interesting lesson I took from that, which was that there was no point because the rest of the organization didn't know what to do with that. We were able to drop them new artifacts every half hour, if you wanted to, but there was nobody to give them to in an organizational sense. Everybody else did their own thing, was working in their own silos at their own pace, and we were merrily throwing out new iterations to nobody really, which is the big lesson. I think that DevOps feels like it's an IT topic, a topic for the IT organization. It isn't. It's much broader than that.

That is something that I've often heard in conversations with other DevOps managers, leaders, whatever that the biggest surprise was finding out that introducing DevOps to their own part of the organization, to the IT organization, was, maybe not easy, but it was straightforward and they knew what they were getting into, but the big surprise was that they needed to make it work for the entire rest of the organization.

Jonathan Hall: You're absolutely right. We often say, often with frustration in our voice, that DevOps is a culture or a cultural shift, not a technical shift. We usually say that when we've done the technical stuff and DevOps is still failing in some way, right?

Luca Ingianni: Yes.

Jonathan Hall: We did all the technical stuff, but it's still not DevOps. Why not? I think we recognize in the back of our minds, usually, that it's about a culture change. Then it's always frustrating when we can't make that change in other parts of the organization.

You also made a point that brings me back to my original premise that a lot of people assume that DevOps doesn't work in non-web or non-SaaS software. Part of the reason I think people assume this is because operations and what you're describing-- the operation, so to speak, of what you're describing involves a forklift and potentially a representative on an airplane, right?

Luca Ingianni: Exactly.

Jonathan Hall: That's operations in your product here. Even so, what I hear you saying is that even though maybe technically in the dictionary sense, DevOps isn't happening here, most of the principles up to the point of basically continuous deployment and continuous delivery are still valid. The idea of streamlining your work, shorter feedback loops, automation, things like this, would you agree?

Luca Ingianni: Oh, definitely. First of all, I just want to point out that this kind of problem with deployment is not unique to embedded systems. If you think of Microsoft Word, even Microsoft can't make a deployment decision on behalf of the customers. You decide when you click on upgrade or whatever it is. They're trying to move away from that, but fundamentally, this is not a very unheard-of scenario.

Yes, you're right that technically in this particular scenario, we didn't have Ops, so it couldn't have been DevOps. Technically, you would have been right but also you would have been very boring, [chuckles] because we're still doing engineering. We're still thinking in terms of value streams or we should be, anyway, and yes, of course, some things will look different.

Also, we are working with hardware and hardware just has longer cycle times. If I design a new board layout, PCB layout, and I want to actually turn it into a prototype, I need to get that sent off to be etched, to have components placed on it, and then it gets sent to me by mail because how else? That whole thing just takes a week or two. My cycle time by necessity is two weeks plus whatever time I need to actually do my actual design. Yes, it looks different. So what? Of course, it will look different.

Jonathan Hall: Let's talk about some of the specific practices you've done and how they applied to this project. Maybe we can just start at the left of the product cycle and work our way towards deployment. You mentioned unit testing. I imagine that when you say unit testing, you mean testing individual functions that didn't matter that this was on an embedded system versus some Java runtime or whatever else. Did you also have automated tests that did depend on the hardware environment?

Luca Ingianni: To a degree. Of course, eventually, you'll bump into an interface to the actual hardware and if the actual hardware is not there, then what do you do? Either you mock it out, or you just don't test that sort of thing, that's also a valid decision. We did unit tests for essentially everything.

I was also working in a real-time environment, so we had a central control loop that run at 16 kilohertz, which is really fast even for embedded systems. Come hell or high water, we had to be done with all of our work in the worst case scenario in 62.5 milliseconds. If we didn't and we missed the deadline three times in a row, then the scheduler would notice and would just yank power from the system. That just very timing dependent stuff is testable in isolation in the ship of unit test but it wasn't worth the trouble.

Jonathan Hall: You also said when you started they weren't even using source control at all. What steps did you make in that direction? I'm assuming that you did use source control at the end.

Luca Ingianni: Yes, so the software department, the people who wrote the high-level control software, they were already using source control. I think they were using CVS and they were in the process of moving to get the embedded- or rather, the FOMO folks, they were aware that they should start using version control. As it happens, they just never got around to it.

Jonathan Hall: Always, I hear a lot of people ask me questions like, "How do I convince my team to do X?" Where X is something that would improve productivity or at least the person asking believe so. This is a good example of that, I think, or could be. You convince them by addressing a problem they're feeling.

Luca Ingianni: This was the first time I consciously started to introduce DevOps somewhere and I didn't do it as well as I might have. I guess the lesson that I took from that was that I should have been both more forceful in going for DevOps practices and saying, "No, this is how we're going to do this." "You should really be doing this because it will have advantages for you," and at the same time doing exactly this other thing that you pointed out which was making it really clear what was in it for an individual engineer, like the FOMO folks, I should have been really plain about, "Okay, you're going to waste a week learning it," You go however long, "and you're going to mess up every now and then until you finally get the hang of it, and then it's going to be annoying and tedious, but, in return, you will get this thing, which was very difficult before to be really easy. Wouldn't that be nice?"

I think it's reasonable for everybody involved, be egoistic about this and say, "Look, what do I gain from this?"

Jonathan Hall: What kind of monitoring did you have in place? Because that's another part of DevOps that often gets swept under the rug, and we already discussed that your operations was a different piece than a typical DevOps organization, but I wouldn't be surprised to hear that you had some sort of monitoring in place or maybe the monitoring was your customer service department when something went wrong, they would call you.

Luca Ingianni: By necessity, a lot of it was the customer service department because we-- once the forklift comes and takes the machine away, then it's gone and we can't really look at the logs, but yes, the machines themselves collected a lot of logs both on a technical level and also on a manufacturing process level. They would know how well all of their movements worked, et cetera, et cetera.

That tended to be analyzed at the customer's plant by their process engineers and, of course, if they spotted a problem they would call us and we had ways to assemble all of the logs and all of the data and stick it into a zip file and put it onto a USB stick and then we could use sneaken it to walk over there, take the USB stick, stick it into another machine that was connected to the internet and they could mail it to us and we could have a look.

What we could, of course, have been doing would have been to do much more monitoring on a development process level. We had started that. We tracked metrics like test coverage, that sort of thing. Yes, it's amazing, this was not a big company. They had maybe 20 engineers total or something. We all knew each other, of course, and we would bump into each other at the coffee machine, but there was not much visibility into what everybody was doing.

I wish I would have been more forceful about changing that. I don't even know what that would've meant at a company like commonboard or something, like we could have fit the work of 20 engineers on a common commonboard.

Jonathan Hall: How do you think DevOps should be affected or the way we do DevOps should be affected when you're dealing with life or death situations, say a pacemaker or a self-driving car or the space shuttle or something like that where human lives are on the line. How, or should that affect the way we apply DevOps?

Luca Ingianni: That's an awesome question and it's a very difficult question. One of the things we are aiming for in DevOps, of course, is more frequent deliveries, more frequent updates, which is a good thing, of course. On the other hand, it is fraught with some risk that you change the behavior of this system, which is exactly the point, of course, but it forces your users to keep learning and to keep being aware of those changes.

That's fine if you're building a soda machine or something, but I'm always reminded of this accident that happened maybe three, four years ago when a Tesla owner was driving down the road in autopilot mode and it was a road that they'd been driving literally everyday, it was part of their commute to work.

Tesla had installed an update for the autopilot as they do just over the air, and as it turns out, this autopilot update had introduced a change that I find eminently reasonable. Apparently, it used to be that autopilot would look at the left-hand side lane divider and follow that to keep lanes. Now they said, "You know what? We're going to find the left-hand side laney divider and we're going to find the right-hand side lane divider, and we're going to center ourselves in there.

Makes a lot of sense, you'd drive nicely in the center of your lane, perfect, until, of course, the road split and the car just nicely averaged itself into a barrier that was at the center of that rod split. Very reasonable change, and no observable change of behavior, as far as the driver was concerned, until at the last second the car just didn't veer left as it used to, instead just continues straight on and, unfortunately, the poor driver died.

This shows to me that we, in fact, as a society, need to learn ways to deal with machines that all of a sudden can change their behavior without any advanced notice or without any sign. If a mechanical thing looks different, we can tell, "Oh, that might behave differently, I should pay attention," but the car look just as it did yesterday, and over-the-air updates, yes, good idea. They probably fixed a bunch of bugs that could have led to accidents otherwise, so, yes, over-the-air updates also good idea, but what do we do about those behavior changes?

I think we, as a society need to find a solution to that because I can't see a technical solution. Like we can't make drivers change logs before they start their drive. Nobody does that. I think fundamentally, applying DevOps principles to something that interacts with the real world as embedded systems do, changes the rules of the game in ways that we have not yet quite understood, I think.

Jonathan Hall: I agree. I guess there's no answer today. We need to hash this out over the next decade or so as a society at large, right?

Luca Ingianni: Yes, exactly. That's going to be, yet again, an interaction between the creators of such devices and the users of such devices on how do we deal with changes that somehow break behavior in unexpected ways, in ways that are maybe dangerous or maybe just annoying?

Jonathan Hall: What advice can you offer to somebody who wants to start implementing these things and maybe they're struggling? Maybe they're in the same situation you were in when you joined this team, maybe they're excited about DevOps, they're trying to implement it on their team whether it's an embedded team or a SaaS team or whatever, what do you suggest?

Luca Ingianni: Fundamentally, I think the important thing is to get started somewhere and then build outward. If you don't have version control, maybe start there. If you already have it, maybe the next step would be CI, whatever. Keep building more parts of the system but do it in an intentional way, don't install CI for the sake of having installed CI.

Know what you're going to do with it. Know what you're going to do with the feedback. If you don't build an organization that is able to deal with this fast flow of feedback, then all you've done is installed new software and now you get to maintain it and it won't really change anything.

I think this is an important point to be intentional about how to employ tools to improve your practices, your processes, your philosophies, and then just keep learning and keep iterating. Of course, if you want to learn more about this sort of thing, I have an entire podcast for that, which is called The Agile Embedded Podcast that I'm doing together with Jeff Gable, who's a much better engineer than I am. Where we talk about these kinds of things at length.

Jonathan Hall: Good resource. I've listened to the podcast a few times. It's very informative, highly recommended.

Luca Ingianni: Thank you.

Jonathan Hall: Any other resources you recommend whether it'll be about embedded or just DevOps in general for somebody in this situation?

Luca Ingianni: Yes. There's a book that I might recommend which sounds like it's very specific, but it's actually a lot broader than it first appears, which is called Test Driven Development for Embedded C by James Grenning. As it turns out, James Grenning is one of the original signatories, is that how you say it? of the Agile Manifesto. He's a dyed-in-the-wool agile practitioner, and he's a hardcore embedded systems guy, and he can write very well.

This is a really interesting book that builds up TDD from first principles, both in terms of code and in terms of practice and makes it sound shockingly simple, which it is. It's a really awesome book for somebody who's maybe dabbling in those kinds of low level work and just wants to look behind the scenes of what the TDD framework is doing or something.

Jonathan Hall: Is embedded C a prerequisite? If you're not using embedded C will you still get something from the book?

Luca Ingianni: Yes, you will. I guess it helps if you're fluent in SCIO, fluentish in C, but even though all of the examples are geared towards embedded, at that low level, at the level of unit testing, it doesn't really matter what it runs on. Does it?

Jonathan Hall: All right, Luca, thank you so much for coming on. How can people get ahold of you or get in touch with you if they're interested?

Luca Ingianni: Since Jonathan already noticed that my name is hard to pronounce, I went for a much easier URL, so you can find me at luca.engineer. Luca is L-U-C-A.engineer. That will take you to my website where I try to write blog posts and where you can certainly catch me, write me an email or something to ask me questions. As you can tell, I like to talk about this sort of thing. Don't hesitate to reach out. Don't hesitate to ask me any questions. I'd be delighted to hear from you.

Jonathan Hall: Luca, is there anything else you would like to add that I failed to talk about or failed to ask about that you think our listeners should hear?

Luca Ingianni: Maybe just can't be repeated often enough that there is really no excuse for not moving towards DevOps. Whatever that means for your concrete situation, yes, you're not a web startup. Yes, you're doing-- I don't know. You're a small shop or you're doing a better-- or you're building spaceships, whatever. It doesn't matter. You will find a way to apply the principles and philosophies of DevOps to your situation, to your engineering position, and you should.

Jonathan Hall: Great. You heard it from the expert. Start doing DevOps.

[laughter]

Jonathan Hall: Wonderful. Thank you, Luca. It's been a pleasure talking. I've learned a lot. I hope our listeners have too. Thanks for coming on today.

Luca Ingianni: Thanks for having me. This was a lot of fun.

Jonathan Hall: This episode is Copyright 2021 by Jonathan Hall, all rights reserved. Find me online at jhall.io. Theme music is performed by Riley Day.

[00:34:43] [END OF AUDIO]