Kube Cuddle

In this episode Rich speaks with Liz Rice from Isovalent..

Topics include: How Liz got involved in cloud native, Kelsey Hightower’s Tetris demo, writing eBPF programs, use cases for Cilium, why she joined Isovalent, Cilium’s new service mesh, Liz’s time on the CNCF’s Technical Oversight Committee, Golang.

Show Notes

Thanks for all of the support that the podcast is getting on Patreon. If you’d like to help keep the podcast sustainable for only $2 a month, you can get more info here.

Links:

ZX80 / Timex Sinclair 1000 / Commodore 64

Kelsey Hightower’s Tetris demo

Thomas Graf

Brendan Gregg

Liz’s talk at KubeCon LA

A Beginner's Guide to eBPF Programming with Go

Hubble

DTrace

Beyond printf & tcpdump: Debugging Kubernetes Networking with eBPF (from KubeCon LA)

Tetragon

The Clilum Project Update at KubeCon Valencia

Liz’s talk about the Cilium Service Mesh

The CNCF’s Technical Oversight Committee

The Charlie meme

That XKCD cartoon

What is eBPF? by Liz

Listener question from @isugimpy - Thanks!

Episode Transcript

Logo by the amazing Emily Griffin.

Music by Monplaisir.

Thanks for listening.

★ Support this podcast on Patreon ★

What is Kube Cuddle?

A podcast about Kubernetes, and the people who build and use it.

Kube Cuddle - Liz Rice
===

Rich: Welcome to Kube Cuddle, a podcast about Kubernetes and the people who build and use it. I'm your host Rich Burroughs. Today I'm speaking with Liz Rice, the Chief Open Source Officer at Isovalent. Welcome Liz.

Liz: Hi, Rich. Thanks for having me .

Rich: when I Oh, I'm, I'm so excited to have you here. I've been kind of a fan of yours for, for quite a while. So it's really great to have you on. I obviously wanna talk a lot about Cilium and eBPF and, and all of those things, but first, a little bit about you.

So, um, can you tell us how you got into computing in the first place?

Liz: When I was really small, I, I wanted, uh, the, one of those like TV console, computer games, like one of my friends had one and I really wanted that. But instead, my mom got us a ZX80, not a ZX81, a ZX80. It was like, you know, one K of memory and BASIC. And I gotta say, I was initially a little bit skeptical that this was gonna be as much fun as the TV console thing, but actually it really was.

And I pretty much from then, you know, why would I want to do anything else for work if I can do computers.

Rich: That's amazing. I had a Timex Sinclair 1000,

um, which was, uh, little, little thing that had the, um, the weird, like embedded key kind of pad that you, you pressed down on the thing. And then, it, uh, loaded and saved things to cassette tape, which

Liz: Oh yes. Yes. So one of my computers, I had, I had a few as a sort of child and then a teenager and, and one of them was a Commodore 64 and that had a cassette. Well, first of all, I remember having to sort of save up a lot of money to, you know, buy the special tape player cuz you couldn't just use your mum's tape player.

You had to have the special one and then it also, it did this thing where it, as I understood it, it had each, the data was sort of recorded twice in succession and it would load for the first one and then it would load the second one. And as it went along, if the second one didn't precisely match the first one, no, I'm gonna stop.

Now I'm giving up and you know, this would take hours and you get to the end and it would be like, eh, something went wrong really near the end. Thank you.

Rich: That's really interesting that it did that. Kind of a preview towards a lot of our data security practices

Liz: Yeah. I don't think there was much in the way of error correction going on. So,

Rich: Yeah.

Liz: Yeah.

Rich: Um, and how did you get into like this cloud native Kubernetes stuff? Hm. Mm.

Liz: Okay. So my early career was really in networking protocols. I worked on that for quite a long time, and then I had a few years away from that kind of level of technology and working on some consumer things like I worked at Skype for a while. Worked at Last.fm, did some like more product management consultancy in, in kind of hardware and, and things that were really quite different from this.

Um, and, at some point I was working on a, a TV and movie recommendation startup that was kind of ticking over. It was doing okay, but it wasn't really growing. And, um, one of my friends, Anne Currie, who you may have come across, she had kind of come across this whole containers thing, and simultaneously the startup next to ours in this accelerator was kind of going mad about Docker.

And I thought, oh, there's something, something going on here. You know, these, these two sources are saying these container things should be, you know, is interesting. And I ended up, uh, you know, looking at it, thinking, oh yeah, this is interesting. And Anne and myself and, and another friend Ross, uh, formed a startup to explore some container scaling technology that, um, you know. We all do like horizontal pod autoscaling now, but we were way ahead of our time, and at the time people were really happy if they were running one container in a VM, you know, that,

Rich: yeah, well, they're just running stuff, running stuff with the Docker daemon, right? Like, Like,

no real management or orchestration or anything just, I'm gonna fire up the Docker daemon and, and get this container going.

Liz: Exactly. And we were trying to persuade them. Eh, you might want to try and pack a few more containers into that, that giant VM that you're running. And, um, yeah, that was, efficiency was not the problem that people were trying to tackle at that point in time.

Rich: Well, I mean, when you think about it, yeah. You know, the stuff that people were doing, you know, it's exactly the analog, right? Like

we were, you'd have a host and it would be a lot of times dedicated to one service and you would run like one instance of that service on the host. And that was it. That, and it'd be at like, you know, 3% CPU utilization,

Liz: Yeah, exactly. And

then,

Rich: used to do things.

Liz: And then Kelsey came along with his amazing Tetris demos showing how, you know, you could pack more workloads into the space or into the resources available. And, uh, I think that changed a lot of people's minds. And at that point,

you.

know,

Rich: It's amazing that you mentioned that because that was actually my intro to all of this stuff.

So, um, I mean, I knew about Docker a little bit already and had played with it some, um, but, Kelsey did that talk at a small conference here in Portland and I saw it and, and I was just blown away because I had been living in that world where, um, you know, he talks about, uh, also about how we were the schedulers, right?

Like you had a spreadsheet of like, what app was running on what host. And,

and, and I was that guy. I was like the scheduler, you know? And so when I saw this, when I saw this idea that you could just, basically, the, these nodes just became a bunch of compute and memory and, and storage, and you didn't have to worry about what was running, where, um, it just blew me away.

It was such a great idea.

Liz: yeah, absolutely. Yeah.

So,

Rich: that's really interesting.

Liz: yeah,

Rich: so, uh, definitely wanna talk, um, about eBPF a lot. Um, I wondered if you could maybe start off by giving us like a really, really high level view of, of what it is, uh, for folks who might not know

Liz: Sure. So the acronym stands for Extended Berkeley Packet Filter, and you can really forget that, cuz it doesn't say anything really about what eBPF actually does today. What eBPF allows us to do is to run custom programs within the kernel and we attach them to events. So whenever that event happens, we can trigger some custom behavior.

And it's been used a lot, uh, initially I think for observability. So every time you hit that event, increment a counter, share that, you know, the, the counter levels with user space and you've got amazing observability data about all these, you know, whatever kinds of events you want from across the kernel. But also being used a lot for networking.

And the first time I came across eBPF it was seeing Thomas Graf's talk about Cillium at DockerCon. I thought, oh, that's, that's an interesting, this, this looks interesting. Yeah. I'll keep an eye on that. But at the time it was, you know, advanced kernel feature that not many people were running and, you know, who knew whether it was gonna take off, but over the years, I'd kind of been keeping an eye on it.

And, uh, and now it's in everybody's kernel so we can all use it.

Rich: I, uh, my intro to it was actually, on Twitter, through Brendan Gregg.

He was talking about the flame graphs he was making with it and, and all of that. And so. My initial kind of context for it was that right. Was that it's this like observability thing. And so I saw the talk that you did at KubeCon Los Angeles.

Um, I think it was a recorded talk, but, but it really, it really kind of started to open my eyes to a lot of the other sort of use cases and, and things that, that eBPF can do.

Liz: Yeah, it's, it's actually been really eye opening, joining Isovalent. So I mean, the reason why I joined that company is because that's where all the kind of expertise in eBPF is. And in fact, there's quite a lot of hand in hand development of Cilium and eBPF in the kernel. So there are folks at Isovalent that are, so who are making that kind of enablement in the kernel to allow these new features to exist.

And it certainly was eye opening for me, as somebody who kind of thought, I know a little bit about eBPF, but the range of things that I didn't know and still don't know, you know, it's, it's definitely a learning curve cuz it's essentially, you know, I, I can write a Hello World, no problem.

But you quite quickly get into the point where your eBPF programs are acting on kernel data structures. So that means you kind of have to understand what those data structures are there for and what they've got in them and why they're doing what they do. So that's one of the reasons why eBPF is sort of, so I don't know, intriguing and difficult and exciting all at once.

Rich: Yeah. Yeah. I think that, um, I guess that's what I find interesting about, you know, tools like Cilium is that, you know, most of us, I think are not gonna be writing eBPF programs, right? Like I'm never going to do that in my lifetime. And so having these tools that, you know, allow you to kind of harness the power of it without having to actually write the programs yourself is, is pretty fantastic.

Liz: Yeah. I, I think that's exactly right. I mean, I'm someone who likes to kind of see the code, touch the code. I'm not very comfortable just seeing boxes and arrows. But I think you can just do a little bit of eBPF code and sort of quite quickly realize, yeah, okay, now I've got a mental model for how this works and I can use some other tools and I can understand, I dunno, things like how programs are being loaded into the kernel, how they're communicating with each other, but I don't need to know all the details.

I don't need to understand every line of code to have a good mental model for what's happening.

Rich: So you've actually given a talk, uh, I think several talks about, how to write eBPF programs. And I wondered if you would kind of share a little bit of that process with us. Like what, what is it like to actually write a Hello World?

Liz: Yeah. So your first challenge is figuring out what language and what library you want to write the code in. And that's quite a, uh. Things are evolving quite quickly in that space. So for example, when I was first looking at eBPF really BCC was the choice and you'd write your user space code in Python and, uh, your kernel code, the eBPF program itself in C. And BCC would take care of all the compilation for you at, at the point where you run the Python code, it would compile the eBPF code you and load it.

And that has benefit or had benefits at the time in terms of your compiling on the machine you're going to run on. So, you know, you've got compatible kernel header files in place, but it's quite slow. It requires you to have the whole tool chain on that machine. Maybe not ideal. So since then, there's been quite a lot of, improvement.

There's been a lot of advance in the kind of portability of eBPF programs. There's a thing called Compile Once Run Everywhere, which allows you to, well kind of, as the acronym suggests, you can compile it on one machine and run it elsewhere. And in order to make that happen, there's some really clever things going on, sort of adjusting the data structure offsets sort of on the fly, sort of in real time, as you load the program into the target machine.

Um, so things like that have changed the landscape and changed what you can do and how accessible it is to write BPF programs. Um, you can now write in Rust, I'm, you know, I've, I've done a tiny, tiny bit of Rust just to, just to poke at it. I'm certainly not a Rust expert, but the Rust compiler now supports eBPF as a target. So that's quite a nice option

for yeah,

Rich: And you've done a talk about writing them in Go

Liz: Yes, yes. So when you are writing eBPF code in Go, Cilium has a, an eBPF library and Cilium uses Go as the user space management and coordination side of what we do. Um, and so the, that eBPF library is a good option for, um, loading and, and accessing maps, and so on that you'd wanna do from user space. And then the Go, the eBPF code itself is written in C in that case.

Um, but all these different, there are few, couple of different Go libraries. There's, um, a couple of different sort of flavors of, of BCC now. There are some different languages and there's support for Libbpf, which is part of this Compile Once Run Everywhere, um, kind of approach. Yeah, you have choices to make.

So that's your first thing you kind of have to identify, um, and not all, uh, not all of these libraries support all the different attachment points that you might want to attach your BPF programs to, or the different types of events, um, and the different program types that you can attach to those events.

So it can be a bit of a, take a bit of research or a bit of trial and error to figure out for your, you know, example that you want to do. What is the, the, uh, the right language? What's the easiest approach. At

Rich: Yeah. I imagine that, um, this is moving very fast, like you said. Um, I will put a link in the show notes to at least like one of your talks about this, so that if folks, um, are interested in like writing some, some eBPF code, they can, get a little bit of a introduction from you.

Um, so Cilium, how, how would you describe Cilium itself?

Liz: At some level, I might change my description depending on who I'm talking to because for many people in the

Rich: So these are, these are

Kubernetes nerds

that are listening,

Liz: so

for, for Kubernetes folks, I would say I would start from the position, this is a Kubernetes CNI, Kubernetes networking plugin. It's not exclusively a CNI. So we do have people using Cilium for load balancing in traditional networking environments.

So I'm, depending on who I was talking to I might, you know, lean towards that. But I think for, for the majority of us, certainly in the, in the cloud native community, it's best known for being a networking plugin.

And it was written using eBPF from scratch, it sort of natively uses eBPF. And that allows us to track the different endpoints in a Kubernetes cluster.

Um, you know, they're assigned IP addresses, but we also track them with these, um, Cilium endpoints. And we are aware of the Kubernetes identities. So in normal, you know, in a, the life cycle of a cluster, pods will appear and disappear and appear and disappear. And the IP addresses get reused for different pods.

So IP addresses are not terribly helpful, apart from, at the sort of immediate moment when you're sending and receiving a packet, but if you come back and look at some network logs later, the IP address could be pretty useless. But because we map them to the Kubernetes pods and services, we have this much richer information about where network traffic is flowing to and from, and, and which services are communicating with which other service and so on.

So we get really nice visibility through a tool called Hubble that's part of the Cilium

Rich: Yeah. And so this information is coming right from the kernel.

Oh,

Liz: Correct. Yes. Yes. Um, so we are able to hook in, um, there's a few different places in the network stack where you can hook in, um, one of them being the socket level. So we know when applications are sending packets into, into the top of the network stack. There's also probably my favorite event type in eBPF called XDP, which is express data path.

And this is when you are receiving a network packet into a, into an interface, the earliest possible point you can attach an eBPF program to, it's before it goes into the networking stack. In some cases it can actually be offloaded, or some network cards support offloading the eBPF code to be processed on the network card.

So it never even hits the kernel at all, which I think is amazing.

Rich: that is,

I had no idea about that.

Um, so what are, what are some of the big use cases for Cilium? We talked about that it's a CNI, but I, I was at the talk that you and Thomas and a couple other folks did it at KubeCon in Valencia, And there was a slide that was like, um, Cilium is a CNI. And it had like 16 different bullet points on it, of like all these different things that Cilium can do.

And it just, it just blew me away.

Liz: Yeah, it it's, it's pretty, uh, pretty full on set of capabilities. Yeah. So, uh, where do I start? It's very good at efficient networking because we can bypass essentially the host networking stack in a lot of cases, rather than having to, well, in, most pods run in their own network namespace, and that means they have their own network stack.

So in a traditional non eBPF environment, a packet has to go all the way through the host network stack through the virtual ethernet connection into the pod, and then through the pod's networking stack. But with Cilium because we know, ah, here's this packet. I know the pod that it's destined for. I can just send it straight into that network namespace without having to go through the host network stack, which gives some pretty significant improvements in performance

and for similar reasons, if we, um, we've uh, recently been working on Cilium as a service mesh, and there's a similar performance gain to be made there because we can just shorten that network path, make it much, uh, shorter for a network packet.

Doesn't have to go through a enormous number of, uh, loops up and down the stack in order to travel from, from one pod to another, or, or to exit the, the, the node.

Rich: That's super interesting. So, uh, so the CNI is one of the use cases, but there's some other things too, right? So it is an observability tool.

Liz: Yes. Yes. So as well as, you know, sending packets from, from A to B, we can also report the fact that that packet has gone from A to B. Um, that's the Hubble component, which is optional, but highly useful. I think it's one of the features that a lot of users, the ability to track down what packets are flowing. Uh, enforcement of network policy is another important feature. Um, so the policy is loaded as eBPF programs, and we can very quickly inspect packets to see whether they're in or out of policy and, and discard them if they're, if they're not. Load balancing. So I mentioned earlier that, you know, we have users using Cilium as a, a load balancer outside of a Kubernetes environment, but it's also, if we think about kube-proxy, kube-proxy is essentially a, a load balancer between, or sending traffic to two different pods that back a service.

And we can do that very efficiently with, with Cilium, and replacing the kind of iptables implementation. Um, so again, it's all about efficiency. Integration with legacy networking stacks is also a big part of what people come to us for. So maybe they have a, a highly scaled network, telco use cases being, being a good example.

Um, and they need perhaps BGP connections. And that's something that we can offer as well.

Rich: Oh, that's really interesting. Yeah. I, you know, I had a chance to talk to Thomas Graf um, at KubeCon at the, at the Isovalent booth. And, um, I mentioned to him that this, this reminded me a lot of DTrace, you

know, in the Solaris world, just in, in that it gave you the ability to sort of plug more directly into the kernel.

Right. And, and get a lot more information about what was going on. And, and it was programmable, you know? So, uh, so it kind of seemed like a bit of an analog to me. Um, but I think that, uh, I remember a lot of us when DTrace came out, were super excited about it. Right. Because suddenly you could, there were those bits of information that you could just never figure out how to get when you were troubleshooting a problem.

And suddenly you had a way to like, kind of just plug directly in and, and pull out a lot of stuff that wouldn't have been observable before.

Liz: Yeah, I never used DTrace myself, but, um, I've certainly heard that parallel drawn and, and I, I think it's, I think eBPF is probably giving us a more complete, um, programability if you like, but, uh, I think certainly in terms of being able to access, you know, this data directly from the kernel, there's certainly a parallel.

Rich: Yeah, I think that like, that, that troubleshooting kind of use case is, is I, I saw a talk from someone. I, I don't, I can't recall her name, but I'll, I'll put it in the show notes. Um, that was also at Los Angeles and, and they were talking about an example of like troubleshooting a problem, using Cilium and, and it was like, wow, there's, there's like, um, again, you know, there's, there's all these times in my, where I've like been troubleshooting these really, really hairy problems and been really frustrated because I couldn't get like the answer that I needed.

And, and I saw this talk and I was like, I wish I would've had this, you know, 15 years ago. It would've saved me some pain and maybe I would've gotten a bit more sleep too.

Liz: Yeah, I think, uh, it's often, it's a common reaction that people see the, the visibility that we can give with SIIA and they kind of go, oh, wow. You know, um, and we're doing that even more with, um, Tetragon, which is a new kind of sub-project in, in Cilium, which gives us this visibility into, uh, what's happening at a process level.

So being able to see and potentially even enforce policies around things like, what files a pod is allowed to access or what system calls or what network connections, the kind of runtime security element of, of eBPF, which is, um, kind of pushing things forward quite quite a lot. Again, is another example of having that kernel knowledge in the team and being able to kind of leverage that for eBPF tooling, kind of pushing the boundaries of what we can do.

So it's kind of fascinating.

Rich: Yeah. I mean, that's a great situation to be in right when you're kind of working both sides of it. Right. When you've got people like helping advance the kernel capabilities and also people developing the software that uses it.

Liz: Yeah. And, and having that full understanding of what, you know, those kernel data structures mean. So knowing what the right attachment points are and knowing how to handle the information that you have at those, at those events.

Rich: Yeah. I, I remember when you joined, Isovalent, um, I actually talk to people about this sometimes because, one of my heuristics when I like look at startups and think about them is like, who's there. Right? And especially, who is leaving like another company that's actually pretty cool and going to join this place.

Right. And when, when you went there, it definitely got my attention. And then, um, a while later, my friend Duffie Cooley as

well. And, and I was like, okay, there must be something going on here. And, and I wonder if you know, like what it was that, that really made you want to go there.

Liz: I, a big part of it was this just incredible depth of knowledge around eBPF, which it was an area that I was really interested in and, and I was keen to focus on. Um, really awesome people. You know, gotta be the number one thing. And like who, who are you gonna spend eight hours of your day talking with communicating with, I, I love that they are lovely and smart, you know, that's super important.

Um, I mean, you're, you're absolutely right that the, the folks that, you know, some of them may be not quite as well known. Some of them are, you know, perhaps better known folks from the community. Um, Bill Mulligan, who joined us from the CNCF, Swana Podilia who came to us from HashiCorp. There's been some really great folks who've been joining us.

I mean, I feel a bit like I probably missed out some names and I'll feel guilty later, but

Rich: Oh, that's

Liz: yeah. There's just some amazing

folks on team.

Rich: You're speaking off the top of your head. They, they can't hold that against you. I, I know Swarna, I've I

got a chance to talk with her some at Valencia too. And, um, yeah, she just seemed super excited about what you all were doing. Um, so I saw your two talks there, um, at the conference and they were just packed, right?

Like there was the, the talk that you did with Thomas, and then there were, uh, was someone there from Datadog and someone from Google, kind of talking about their use of Cilium which was really interesting. Um, and then you did a separate talk where you talked about the new service mesh capability in Cilium. I guess it had one already, but there's a, a newer service mesh that uses Envoy.

That's been recently released it's in beta now, maybe.

Liz: Yeah, well, we are at, at time of recording almost at the point of being able to GA the, uh, some, some features of it. So, um, it's been, it went into beta end of 2021. And the idea here is, Cilium uses Envoy as a proxy anyway, whenever it needs to terminate at layer seven.

Rich: Yeah.

Liz: So, and Envoy is as many people will be very familiar with.

It's used in a lot of service meshes as the network proxy, but it's typically been used in the sidecar model with one instance of Envoy per pod. And we had an instance of Envoy sitting on the node and I think it was a pretty, um, natural evolution of thought, and actually something that Thomas has been talking about since before I, I joined, I think there's a talk from EnvoyCon I think from two or three years ago.

Um, it it's a, it, it's not that big leap to think, well, if we've got this instance of Envoy and we have kernel visibility across all of the network connections on this cluster anyway, could we rationalize what we're doing here, move all that layer seven functionality into the single instance of Envoy on the node, and save having all this sort of duplicated resources and routing tables and, and the complexity of inject injecting sidecars.

And it turns out that we can so, and, and, you know, get a significantly shorter data path as a result

Rich: Yeah. You had a slide in your talk where it was kind of the before and after where it showed this diagram of like all these pods that have this extra sidecar, and then suddenly it's the same number of pods, but all the sidecars are gone and, and that really had an impact on me because, um, I think a lot about, you know, the efficiency of what we're doing, um, in terms of the bigger scale of that.

Right. And what the impacts are like, like those extra pods that are running in those sidecars are, they're taking up power in the end. Right. And if you can suddenly, you know, cut the number of, of, of containers you're running in half or something, gotta be a big savings

Liz: Absolutely. And, and I think that's, what's so attractive about it that. Every time you inject a sidecar container. I, the first one that seems like a great elegant model, but at some point you kind of look at it and think there's a lot of these , you know, and a lot of them have the same information contained within them, you know, in terms of things like routing tables.

So yeah, it, it, it makes a lot of sense to try and rationalize those, I believe.

Rich: Yeah. I guess we're kind of doing deduplication or

Liz: Yeah. And, and I, I think it doesn't just apply in service mesh. I think there's, um, you know, other examples where eBPF's being used for things like observability tooling, where it just makes sense to instrument the kernel, and it automatically has visibility over everything. You, you don't have this issue that, well, if you didn't inject the sidecar, then that pod is invisible to your tool.

Which is really quite interesting from a security point of view, because if you do have a malicious workload running, you know, if I'm an attacker, I'm probably not gonna inject a sidecar into my pod that I'm trying to persist on your cluster, but you know, if it's instrumented in the kernel, then I can't, I can't avoid being seen, well, I can't easily avoid being seen.

Rich: Yeah, no, that's, that's another good point too. And you know, um, my first job was at an internet provider. I was a system administrator and I definitely saw some boxes get hacked. And I saw a lot of rootkits, you know? And that's like one of the kind of typical patterns, right, is that you, substitute the binary, like the, your ls binary, right.

That doesn't show like certain things or your ps binary that ignores the processes you want it to ignore. And, and yeah, it definitely seems like, you're gonna be better off getting that information directly from the kernel instead of like trusting these other tools.

Liz: Yeah. I mean, there's probably a whole, you know, arms, race to be had of BPF related malware as well. But, uh, you know, it it's an arms race. Well

and I

think

Rich: I think, I think Kris Nova is, is is

Liz: yeah.

Rich: doing some things.

Liz: Yeah, I haven't watched what she's done yet, but I have seen some, some things on Twitter. It looks pretty, uh, pretty intriguing.

Rich: Honestly, it frightens me. Um, it,

my reaction when I saw that she was doing that stuff was, um, uh, boy, I'm glad I'm not an SRE anymore. Um

um, so besides, uh, this work, um, that you're doing at Isovalent, um, you previously were a member of the Technical Oversight Committee at the CNCF. And I wondered if we could chat about that a little bit, because I think that's a lot of times those, those processes that happen at the CNCF, are maybe not, uh, the most intuitive to folks.

They don't necessarily have a, a ton of, um, I guess, visibility into what's happening. So can you like give us a high level overview of like the TOC and what it does?

Liz: Yeah. So I was on the TOC for three years, which was, you know, plenty of time to, I guess, hopefully see some changes and, um, you know, hugely privileged to work with some really interesting people and, and learn a lot from some, some amazing folks on, you know, during that time. The TOC really was, as the CNCF was founded, there was this sort of three arm, um, governance model.

If you like of the governing board. The, a marketing group and the TOC. Four, four arms, if you include the staff as well, but the TOC was really intended to provide this sort of neutral, um, assessment of projects and whether they were cloud native and, and to, uh, provide a sort of overview of the direction of, you know, what does cloud native mean and, and how should this evolve?

And, and it, to some extent, I think that has, um, it it's been encapsulated in the definition of cloud native, which was one of the things that the TOC wrote. Arguably, you can tell that it was written by a committee. I but I think that

sense of

Rich: a, it's a, difficult thing to define though.

Liz: I mean, it's very

difficult. Yeah. I, it's one of those things where, you know, it's, it's a bit like pornography, you know, you know it, when you see it,

Rich: right. Right.

Liz: Um,

yeah.

Rich: One of the things that TOC does is decide which projects make it into the CNCF, like which becomes sandbox, box projects. I think also like how they like go through that maturity model, how they

graduate. Is that correct?

Liz: mm. Yeah, exactly. And the, I think as, as a TOC member, you're trying to help end users understand the maturity of these different projects. That's the, that's the goal of this model. You know, we're saying that a graduated project is, is de-risked and loads of people are using it in production. And, and you know, we're not saying it's bug free, but we're saying, you know, this is something that we think you you'll be able to depend on.

It's not going to, you know, disappear overnight. So a well founded project. And then incubation being the sort of step below that. When the CNCF was first created, there were just these two levels. Um, so incubation was it's, it's not graduated yet. But then, um, around about the same time as I joined the TOC there was this need for somewhere that you could experiment. The idea to be able to collaborate on projects, have a kind of safe, neutral home for it.

And, um, and that became the sandbox. There was always a bit of a tension between, really great product, a project rather, really great projects and, um, and ideas and that kind of innovation. But you don't want people just joining for the marketing benefit. So at some point we realized, well, perhaps the right thing to do here is to withdraw all, you know, make, there are no marketing benefits to being in the sandbox.

It's just there to offer that collaboration ground. Um, whether we entirely succeeded in that. I mean, I think the fact that you're a sandbox project is, is, still something that people, people like to, you know, be able to claim. Um, but yeah, it's, it's a, it's a tricky balancing act. I think of, of trying to define whether you think a project is going to go anywhere and be useful, but also having a low bar.

You know, we, we, we went back and forth on this a fair bit during my time on the TOC around like how, how easy we wanted it to be, to create a sandbox project without wasting the time and resources from the organization. But, you know, while still enabling lots of projects, um, it's, it's always gonna be a tricky balance. And also the sheer work involved in looking at all these projects. You have an idea that seems great when there are ten projects in the world and then suddenly a hundred more projects want to come and get involved. You have to change the processes and you have to change the way you think about it. So we've definitely seen over over time, the CNCF and the approach to handling projects has, has had to, had to change.

And, you know, we're all humans, we've all kind of tried to adapt and probably made some mistakes along the way. But, you know, I think everyone has tried to act in, you know, in good faith. And, uh, so, you know, maybe somewhere along the line, there have been some projects that were disappointed and maybe some of those will have been a mistake, but maybe there were a whole lot of good decisions made as well.

I hope so.

Rich: I, I mean, I was thinking about this because the, you know, it's the meme at this point, of like Charlie from, It's Always Sunny in Philadelphia, you know, pointing at the CNCF landscape, it's like, it's so overwhelming for people. and I was thinking about that position on the TOC that like, you're looking at this firehose of projects coming in, and I'm sure that you're seeing more that are actually even represented on that landscape.

Liz: Yeah, absolutely. And it becomes very difficult to, to kind of feel even remotely knowledgeable about all of these different areas. And that was one of the reasons why we brought in the, the, what are now called TAGs. The Technical Advisory Groups. Is we really needed to lean on the community for their expertise.

The eleven members of the TOC just can't know everything, but then you add in a layer of, um, it's, it's kind of unavoidable bureaucracy. If you like that these TAGs are gonna meet independently of the TOC it's. It adds in some extra kind of communication overhead. And it it's, it's certainly, I think absolutely the right decision, because we've been able to get so many more people involved and, um, get more knowledge and experience both sort of fed into the decision making process, but also build up in all those individuals who can have a meaningful role in a TAG that serve them well in their careers.

Rich: Yeah, it actually makes absolutely makes sense to me. You know, you, like you said, you can't be an expert at any everything, you know? Um, yeah. Uh, I'm wondering, so, so one thing I've observed in, in my career is that people who are in, I guess what I would call consultancy kind of roles A lot of times get a really interesting perspective on the industry, just in the fact that they get to see what's happening in a lot more different companies than, than the normal person would. I I wonder if there's some of that that goes on with the TOC, like you're, you know, seeing all these different projects, are there, are there any kind of patterns or interesting things that you kind of learned from seeing this sort of big aggregate number of projects?

Liz: Definitely. I think in fact, when I was, before I joined the TOC I was the KubeCon/CloudNativeCon Program Chair. And, and well, I did that with, with Kelsey and then with, um, Janet Kuo. And as part of that, we had to do this project update presentation. And I went from kind of being vaguely aware that there were a whole bunch of other projects to, oh, I'm standing on a stage in front of thousands of people telling them, you know, just giving them this little snippets of news about these projects.

And I realized that not that many people kind of take the time to, you know, even that quite superficial level of knowledge that I had about them. And, and that was one of the reasons why I kind of thought perhaps the TOC is something else that I can get involved with because I, I wouldn't pretend that my knowledge was deep, but it was certainly broad.

And then you spend more time in the TOC seeing more of this breadth, hearing more about what people are using, aren't using, that kind of, um, you know, sort of, you're sort of trying to feel from the community, what what's working and what people are gravitating towards next, and actually that kind of group of consultants that you mentioned, there, there are quite a few of quite a few people that I consider friends who work in that kind of role and are gold mines for that sort of, you know, just like tell me what you are seeing in this space.

You know,

Rich: Yeah. Yeah. I, I always find it really fascinating to talk to those people and a lot of, because a lot of companies have the same problems, you know, they really do. And, and I think that it's, it's that way with a lot of things, like I've been on, um, program committees, you know, for a few different conferences and, and it's the same sort of thing.

Once you start seeing conference submissions in aggregate, you know, you're, you're

looking at 500 of them, suddenly the patterns of like, what works well and what doesn't, and, you know, what's likely to get somebody in become like super obvious to somebody who wouldn't, you know, get that, um, unless they'd kind of been through that experience.

Liz: That's so true. And you really see the topics that people are excited about. You know, it's never just one person who's submitting a talk that kind of turns into the next big hype thing. It's because you know, ten different people from different organizations have submitted on it and you think, ah, that's okay.

There's something going on here.

Rich: yeah, definitely. Um, that is, that is definitely one, one, uh, kind of pro tip. I will, I will give folks who are interested in doing more speaking is like, get yourself on some program committees if you can do that. It's, it's not always easy, especially if you're somebody who's newer to the industry.

Um, but, um, but a lot of times, even like your local DevOpsDays will want people to help like review talks or, or, um, you know, there are a lot of other conferences out there, but I, I personally have found it like super helpful.

Liz: Yeah, I think it's really helpful in terms of seeing what people are interested in. You can tell the difference between a good submission and a bad submission, depending on the program committee, you might get to hear what other people felt or, or thought about different submissions, and you can use that to help you craft your own talks and your own proposals going forward.

It's it's a really good idea.

Rich: Yeah. Um, so to shift gears, I think maybe, uh, one more time. Um, so you just have such a varied background. There's like all these different things I wanna talk to you about it's, it's really fascinating. Um, you, uh, um, are a Go programmer and, um, you're somebody that Google has recognized as an expert in Go.

And, uh, I am not a Go programmer. I've dabbled at it a bit, but I'm, I'm just curious, like, cuz there've been a lot of changes, you know, in the last few years

with the, the way dependencies are handled and now the generics getting added and, and I'm just curious kind of what your take is on like where Go is at and, and how, how you're feeling about that.

Liz: Mostly feeling a pretty high degree of imposter syndrome because there was definitely a period of time where I felt pretty plugged in and, and on top of it, and, you know, I kind of knew what was going on. Um, and I would say I'm less clued up than, than many. There are people who are, are far more expert than me, but I, I think, I mean, the, the generics conversation was, so it was so interesting to sort of observe at the kind of meta level of.

Do we don't we, and

I think the Go team have always been, one of the strengths of Go is that they've had that really strong team of, you know, it's not really a community project in the sense that CNCF projects or community projects, they've got this sort of

tight group of exactly. And, and that's, their job is to hold that language sort of together.

And that I think is why it was so successful from, from the get go

Rich: Yeah.

Liz: Um,

Rich: oh, I see what you did there.

Liz: That was terrible. Uh but yeah, but right from the start, they were really, um, they, you know, they had these very strong, strongly held beliefs about the language. And I think that served it very well and seeing how that kind of community input has been handled and, and how they've learned along the way, to listen to the community and get feedback and, and explore different options without making a commitment.

I've been party to some really interesting meetings where they've kind of rolled out, Here is an idea how, how do you think this would work? You know, what problems can you see? And just by having a large number of people in the room with like different ideas and different approaches that, you know, quite often come up with some, some hesitation about a particular approach, and maybe it's back to the drawing board was so interesting to see that back and forth.

Yeah.

Rich: yeah. Yeah. Um, okay. I, I do wanna ask you about this. So, um, we're recording about a week after KubeCon for those folks who are listening. And just a few days ago, there was announcement that, um, VMware is being acquired. Um, I don't need you to comment on that part of it. Um, that's

not your job, but, but the thing that concerns me about it, I guess, is the fact that, there's obviously a lot of people working there who are very involved in the Kubernetes community who are probably pretty concerned about their jobs at this point.

Right. You know, and you've got people there who are, you know, helping to run SIGs and, and all this stuff. And, I guess the bigger picture kind of question to me is like, , you know, are you concerned about like how this is gonna impact Kubernetes itself or, or, you know, how do we keep from being tied so much to like a specific vendor or company?

Liz: It's a great question. And I think it really speaks to why the CNCF exists. You know, there is no doubt that if VMware or any, one of those sort of major companies were to walk away, it would have an impact and it would require a bunch of, you know, figuring out who's gonna do what when. Individuals, obviously, you know, they, they may be impacted if, if this happens. I am confident that they will be able to find roles.

You know, that, that, I mean, cloud native is such a,

um, it's, it's a commercial success for so many organizations that there are plenty of companies out there who want cloud native skills. So while I, you know, people may be in a difficult situation, I, I would hope that that was a short term issue. And I think longer term, the CNCF is set up to sort of withstand that kind of existential threat by spreading the, by being that neutral holding ground.

You know, that's the main benefit of that foundation is ensuring that all of those large companies can have skin in the game, but without being the sole kind of the, like that XKCD cartoon with the one Jenga brick that you pull out and everything will collapse.

I don't think VMware is that brick at all.

Rich: yeah.

Yeah. I, I think that, I guess for me, it's like the. There's a difference in being able to find a job and being able to find a job that will let you do the kind of contributions you were doing before. Right? Like those, those aren't necessarily the same thing, but, but I think you're right in that a lot of those folks are very experienced and have good networks and, and are probably gonna land pretty well.

Liz: And the, the need for those roles and those contributions, I think, is, it's quite well understood by individuals in a number of organizations and maybe, you know, it's gonna result in some of those people having to lobby internally for like, Actually, yeah, we do need to spend a bit more of our resources on employing people to do these contributions.

But I, I think that there is such a momentum behind the whole community and the whole sort of movement that is cloud native. That those where that, where those conversations have to take place, it probably will be successful. That's what I

Rich: Yeah. What is your view of like how these big vendors kind of cooperate and work together in the Kubernetes project. From, from my point of view, and I'm very much an outsider, you know, I'm a guy with a microphone, but, I'm not super involved in, in the CNCF, but, but generally my, take on things has been that, that these big vendors do tend to be really working for the good of the project itself.

Liz: I think by and large, that's true. There, there are certainly cases where, you know, we could be a bit cynical about why, you know, resources have been invested in a certain direction and not in another. Um, but equally I have some sympathy that, you know, all these companies have businesses to run and they have to do things, you know, not just out of some kind of altruistic desire to help the community, but they also have to have a bus...

You know, it may not be a very direct connection between a contribution here and the business success there, but they have to have some kind of sense that they're connected in some way. And I, I would definitely like to see that, there are occasions where I've certainly seen in the governing board. A lot of, yes, we should really do a thing.

Great. Okay. So are you gonna put some people, you know, give people some time to do the thing cuz if you don't, where's it gonna come from?

Those, those discussions do happen from time to time and, and sometimes they're, you know, sometimes they fall on deaf ears and sometimes they do actually result in, in changes and, and resources being applied.

I, and I think sometimes there is a tendency for big companies to think, oh, I can solve this problem with the application of money. And sometimes it's not just money that's needed, it's actually people and skills. And if it was easy for people to employ people with that money, then you know, everybody's life would be easier.

Rich: Yeah, absolutely. I, I do wanna point out that, of course, you know, we don't know what's gonna happen with the VMware thing, you know, um, there, there are some not great precedents, from, you know, previous acquisitions this

company has made. And so I think people do have a legitimate reason to be concerned, but, um, we don't know for sure what's gonna happen.

Um, also I mentioned on Twitter that, you know, if, if you're, if you're a listener, you work at VMware, you're, putting yourself out

there for, for other opportunities, you know, if I can help anyway, please, feel free to hit me up. If I can connect you to someone at a company that you're interested in or, or anything like that.

Um, Liz, I have one sort of a two part listener question for you from Jamie Howard. And this is a Cilium thing. Um, So the, the question is. with, uh, with Cilium introducing its own service mesh. the Istio integration be maintained further? Yeah.

Liz: So we have a an existing Cilium integration with, uh, with Istio as a control plane, which, uh, in it's sort of current or, you know, current form supports the sidecar model. And the, the advantage of that is really to do with the, the shortening, the network path. With the sidecarless model. It will shorten even further. Now what we are really doing with Cilium service mesh is primarily about the data plane.

It doesn't necessarily matter to the control plane abstraction, how many proxies there are that, you know, you don't configure an Istio resource that says, you know, please add my Envoy proxy here, here, and here. It's implicit in the fact that you are injecting Istio but as a person configuring those Istio CRDs, you, you.

You are sort of a step removed from where the proxies are. And with Cilium service mesh, we're kind of agnostic to the control plane. So it will really come from what do users want? What do, what do people want to contribute? What do people want to pay for? And, and what, what is their preferred control plane for configuring that data plane?

So I think to some extent, whether that sidecar model continues to be supported or proves unnecessary will, will be part of what comes out in the wash, as people start to deploy Cilium service mesh as they, um, start kind of, well, I mean, as we are working on and, and people across the project are working on integrations with different service mesh control planes. So for example, we've seen quite a lot of people who are, you know, who don't have super sophisticated use cases who are perfectly happy with a Kubernetes ingress and they don't need, you know, a whole load of other sort of services or service manipulation. Um, and that's by no means, you know, the whole of service mesh solved, but we, we see a mapping between whether it's service mesh interface, Gateway API is certainly something that we expect we will map into Cilium service mesh, um, Istio control plane.

If there is demand for configuring the sidecarless data plane with Istio CRDs, then, you know, implementation will follow. It's really a case of what user requirements are.

Rich: So you're saying that would maybe be more like the Envoy model where there's, you know, not, not a sidecar per pod

Liz: exactly. So you could have the sidecar per node, but configured through SD CDs with a kind of reconciliation between those CRDs and the Envoy configuration on the node.

Rich: Okay. Well, I guess we'll see. That's,

Liz: Somebody just has to type it in. They just have to type it in and it'll be fine.

Rich: no, I mean, I, I think it makes sense. And, and I think that, um, obviously listening to the community, you know, and what people really want is, is very important when you're, when you're building projects. I think that, um, there are enough Istio users out there. I'm guessing that it seems likely that people would wanna put the effort in to, to make that happen.

Liz: Yeah. I, I think that's, I mean, it's certainly the initial set of beta testers, and I think they're self-selecting in some sense, you know, they've, they've been attracted by Cilium, but we were actually quite surprised at the amount of enthusiasm there was for implementing SMI as the control plane that people are saying that has what we need.

And, uh, you

Rich: Yeah.

Liz: so yeah, will be driven by the community.

Rich: Okay. Uh, well, that's all that I have for you, Liz. Um, I'm super glad that we were able to have this chat. Um, you're a pleasure to, to talk with, um, is there anything you'd like to mention to folks as we're signing off here?

Liz: I feel it would be remiss of me not to mention that, um, I wrote a little report. That's been published by O'Reilly and you can download a copy for free from the Isovalent website or well, for the cost of your contact details. And it tells you what eBPF is. So if you're intrigued about eBPF and wants. Learn a little bit more, then, then go download a copy of that and let me know what you

think.

Rich: is that is that people were lining up to get those signed at KubeCon?

Liz: Yeah, that was a lot of fun. It's I love signing books cuz you get to meet people, even if it's just for a couple of minutes and sometimes they tell you, you know, some really nice little nugget about what they're interested in and, and it's, just so rewarding. Really nice.

Rich: Yeah, I, I will definitely put a link to that in the show notes. And I'll also link to your Twitter, you're @lizrice there. and yeah, thanks again for, for coming on. It's been, uh, great to chat with you.

Liz: My pleasure. Thanks for having me.

Rich: Kube Cuddle is created and hosted by me, Rich Burroughs. If you enjoyed the podcast, please consider telling a friend. It helps a lot. Big thanks to Emily Griffin who designed the logo. You can find her at daybrighten.com. And thanks to Monplaisir for our music. You can find more of his work at loyaltyfreakmusic.com. Thanks a lot for listening.

More episodes

Chapters

Show Notes

What is Kube Cuddle?