Oxide and Friends

Andrew Stone of Oxide Engineering joined Bryan, Adam, and the Oxide Friends to talk about his purpose-built, replay debugger for the Oxide setup textual UI. Andrew borrowed a technique from his extensive work with distributed systems to built a UI that was well-structured... and highly amenable to debuggability. He built a custom debugger "in a weekend"!

Some of the topics we hit on, in the order that we hit them:
The (lightly) edited live chat from the show:
  • MattCampbell: I'm gathering that this is more like the fancy pseudo-GUI style of TUI, which is possibly bad for accessibility
  • ahl: we are also building with accessibility in mind, stripping away some of the non-textual elements optionally
  • MattCampbell: oh, cool
  • ahl: Episode about the "Sidecar" switch: https://github.com/oxidecomputer/oxide-and-friends/blob/master/2021_11_29.md
  • MattCampbell: ooh! That kind of recording is definitely better for accessibility than a video.
  • uwaces: Were you inspired by Elm? (The programming language for web browsers?)
  • bcantrill: Here's Andrew's PR for this, FWIW: oxidecomputer/omicron#2682
  • uwaces: Elm has a very similar model. They have even had a debugger that let you run events in reverse: https://elm-lang.org/news/time-travel-made-easy
  • bch: I’m joining late - 1) does this state-machine replay model have a name 2) expand on (describe ) the I/o logic separation distinction?
  • ahl: http://dtrace.org/blogs/ahl/2015/06/22/first-rust-program-pain/
  • zk: RE: logic separation in consensus protocols: the benefit of seperating out the state machine into a side-effect free function allows you to write a formally verified implementation in a pure FP lang or theorem prover, and then extract a reference program from the proof.
  • we're going to the zoo: lol i’m a web dev && we do UI tests via StorybookJS + snapshots of each story + snapshots of the end state of an interaction
  • ig: At that point you could turn the recording into an “expect test”. https://blog.janestreet.com/the-joy-of-expect-tests/
  • we're going to the zoo: TOFU but for tests 🥰
  • uwaces: Are you at all worried that you are replicating the horror that is the IBM 3270 terminal? — I have personal history programming on z/OS where the only interface is a graphical EBCDIC 3027 interface — the horror is that people write programs to interact with graphical window (assuming a certain size).
  • ahl: https://docs.rs/serde/latest/serde/#data-formats
  • ahl: SHOW NOTES Bryan as "semi-elderly" engineer
  • MattCampbell: didn't Bryan write a blog post on this?
  • MattCampbell: http://dtrace.org/blogs/bmc/2008/11/16/on-modalities-and-misadventures/
  • uwaces: https://www.replay.io
  • ahl: https://devtools.fm/episode/9
  • ahl: e.g. https://altsysrq.github.io/proptest-book/intro.html
  • we're going to the zoo: https://github.com/AFLplusplus/LibAFL
  • ig: Are you using proptest, quickcheck, or something else?
  • nickik: This really started with Haskell https://hackage.haskell.org/package/QuickCheck Its also cool that it does 'narrowing' meaning it will try to find an error, and then try to generate a simpler error case.
  • endigma: how different is something like this from what go calls "fuzzing"
  • Riking: Fuzzing does also have a minimization step
  • we're going to the zoo: https://github.com/dubzzz/fast-check
  • Riking: Property-based testing tends to be structured differently in philosophy, while fuzzers are more aligned to "give you a bag of bytes"
  • nickik: http://www.quviq.com/products/erlang-quickcheck/
  • endigma: yeah I can tell its a different structure, but the overall goal seems similar
  • we're going to the zoo: they are nonexclusive approaches to testing
  • papertigers: I think Kelly was doing a bunch of tests at Joyent based on quick check and prop test. First time I encountered it
  • we're going to the zoo: libafl provides a #[derive(Arbitrary)] macro that will provide the correct values for a struct
  • uwaces: Lots of stuff in Rust existed first in Haskell (build.rs, quote!, Derive macros, Traits, ect….)…
  • nixinator: https://tenor.com/view/%C3%B3culos-escuro-exterminador-terminator-arnold-schwarzenegger-gif-14440790
  • we're going to the zoo: “what do these means” depends on who you ask lol
  • we're going to the zoo: fast-check is 🔥 for TypeScript
  • endigma: if the tested function is deterministic and the test is testing arbitrary input and testing against the result to be derivative in some way of the input function by some f(x), don't you end up re-implementing the tested function to provide the expected result? how does the author choose what properties of a system to test without falling into a "testing the test" pit?
  • we're going to the zoo: Rust: “Here comes the Haskell plane!”
  • nixinator: Isn’t rust == oxidation
  • endigma: yes
  • endigma: in a scientific sense
  • nixinator: Iron oxide 🙂 lol
  • nixinator: Very good!
  • GeneralShaw: Is prop test a way of formal verification? Is it same/different?
  • ahl: https://dl.acm.org/conference/aadebug
  • ig: I mean, Haskell is an academic research language at it’s core. It naturally is going to try new things and try and push the envelope, that’s what many of the core developers use the language for.
  • uwaces: Not all of the Haskell ideas are good :). Rusts thesis when it started was “let’s take the good boring ideas that are >20 years old and leave the exciting new ones out”. Haskell is all about the exciting new ideas that might be bad (Lenes, lazy evaluation, ect…)
  • ig: Rust had Servo as it’s driving force in the early stages as well, so was choosing features that made implementing Servo easier.
  • endigma: the parallel between haskell and elixir is interesting, elixir being "the other functional language" that exists in the sort of limelight
  • nickik: Not really, formal verification proves that it satisfied some condition, property based testing basically just throws a bunch of stuff against your code and tries to break it.
  • ElFurbe: "score some horse"
  • ElFurbe: Outstanding
  • nickik: In Switzerland at least horse meat is totally normal, just buy it in a standard boring store.
  • rolypoly: Ballmer curve, but with horse, and for debugging.
  • uwaces: On that topic Rust has some exciting usability developments for Bounded model checking: https://github.com/model-checking/kani — proving correctness of property tests.
  • ig: Okay I tuned out for a minute and now I’m wondering if I’m having a fever dream.
  • GeneralShaw: Oh that sounds like Constrained random tests, but somehow takes the properties as the constraints
  • endigma: debugging -> stroke -> horse meat
  • nickik: Good horse: https://www.migros.ch/de/product/mo/3851110
  • Nahum: The word he was looking for was probably "elder"
  • ig: Event sourcing is also in that same CQRS family.
  • ig: In terms of google able terms
  • endigma: isn't cqrs command query separation
  • ig: Event sourcing becomes harder when you need to do GDPR right to amend and right to be forgotten.
  • uwaces: Yay for struct opt!
  • ig: Thanks Andrew! Great episode.
  • nickik: Datomic style databases allows you to have traditional-ish database but you can also subscribe to the event log. To comply with GDPR you can use 'Excision'. That will delete the data but it remember the transaction that did the removal.
  • endigma: Datomic looks really interesting, never heard of this style of db before, sort of like the git db
  • ig: Yeah, and if you didn’t build that in from the start you might end up needing an O(n) processing of the event log to excise.
  • ajs: Kani looks super interesting
  • ajs: I've had it on the backlog to play with for a while
  • ig: Most DBs have a commit log, most don’t expose it externally. Event sourcing reimplements a lot of what’s in the commit log.
  • nickik: Maybe more practical then full datomic, datascript (https://github.com/tonsky/datascript) is datomic in a browser. Good store for React applications to build on.
  • nickik: Eventsourcing can scale to much larger size then you can handle with one Datomic style DB. But unless you really need it, its kind of a pain.
  • endigma: is there anything preventing implementing it as a data structure ontop of a more conventional db?
  • nickik: Datomic allows you to add arbitrary data to your transactional log, so for example you can attach to a transaction that it was done by user-x, threw api versions 2.2 and so on. That quite neat.
  • nickik: That's exactly what datomic does, its designed to be read-scalable on big key value stores, but it works fine on SQL Databases! See: https://docs.datomic.com/on-prem/overview/storage.html
  • endigma: oh thats pretty cool, i suppose the datom model would work well with hyperscale k/v
  • endigma: from what i'm reading datoms are a sort of tuple though, k/v doesn't normally index by more than one k
  • endigma: i wonder how batching lookups works to get the k/v of a particular entity
  • endigma: or if they all just happen separately and its optimized for that
  • endigma: Although I'm thinking like etcd
  • uwaces: No. It just automates example creation. The same general framework can be used to do formal verification re:Kani and bounded model checking.
  • Cyborus: ah, it seems i'm a bit late
  • nickik: It does not use the K/V store directly. It puts large batches into one V. Then the have an external index that is a bunch of trees and the leafs are these batches of datoms. This has some information: https://tonsky.me/blog/unofficial-guide-to-datomic-internals/ or check out Rick Hickeys talks on YT.
  • endigma: Sure, so more similar to the goal of fuzz tests than unit tests.
  • we're going to the zoo: https://www.bjaress.com/posts/2021-07-03-fuzz-testing-vs-property-based-testing.html a reasonable approach will use both a naive and structured generative test
  • we're going to the zoo: a fuzz test is just a property test that claims “for any possible input, the program should only output the types i expect / a known exception”
  • endigma: if thats correct it makes a lot of sense why you might want to make a framework to write these sort of assumptions, perhaps something like go-testdeep
  • endigma: (sort of)
  • endigma: https://earthly.dev/blog/property-based-testing/
If we got something wrong or missed something, please file a PR! Our next show will likely be on Monday at 5p Pacific Time on our Discord server; stay tuned to our Mastodon feeds for details, or subscribe to this calendar. We'd love to have you join us, as we always love to hear from new speakers!

Give feedback

Creators & Guests

Host
Adam Leventhal
Host
Bryan Cantrill

What is Oxide and Friends?

Oxide hosts a weekly Discord show where we discuss a wide range of topics: computer history, startups, Oxide hardware bringup, and other topics du jour. These are the recordings in podcast form.
Join us live (usually Mondays at 5pm PT) https://discord.gg/gcQxNHAKCB
Subscribe to our calendar: https://sesh.fyi/api/calendar/v2/iMdFbuFRupMwuTiwvXswNU.ics

Speaker 1:

I will let's see maybe describe how we got here a little bit. So, we I think you made Adam, I know we have mentioned demo Friday the past here, that we and and our credo around demo based development here at Oxide.

Speaker 2:

Yeah. I mean, it's I feel like bordering on extensively. But, yeah, like, demo day is prominent feature.

Speaker 1:

And it is deliberately unstructured, so it's kind of a it's it's a form for people to demo stuff for their peers. Our kind of you is that, your peers are I mean, you get the biggest shot in your arm, I think. You get big shots in the arm, obviously, from customers and from from folks that are looking at your work, but especially your peers. It's a big shot in the arm when you're when you can show your work off to your peers because they can appreciate some things that are can be hard to appreciate. So we've had a lot of fun, this every week, and, we it's it's always been very uplifting and interesting.

Speaker 1:

You kinda never know what's gonna happen. I'm always worried we're gonna have demo Friday and nobody shows up, but I think it's happened once. I think that actually has Like, alright. No demos today. End of demo Friday.

Speaker 1:

But, basically, it's, there's always something going on. And so, Andrew, I did not know you had this one coming at all. And, could you describe first the body of software that you that you have been working on, kind of what the this body of software does, and then maybe we'll get to describe a little bit what the actual demonstration was.

Speaker 3:

Sure. So, I mean, I'm with you. I actually didn't know this demo was coming about a week and a half ago, and then I started working on it. But it was planned, as possible from the beginning. And so, like, what what we have, is this giant rack, right, that users are gonna plug into their data center and they need to connect it to their network, and they need to connect it to an identity provider, and they need to do all sorts of initial configuration so the the control plane will actually boot and be reachable, from their internal network.

Speaker 3:

And so they can deploy VMs and do whatever else they need to do on the rack. There needs to be some if the control plane is not up, there needs to be some way to actually log in and configure the rack, in the first place so we can set up the control plane. And so Adam and I and another colleague of ours, John, we sat down, last February or so and started working out, well, what's what's that configuration actually gonna look like? And we talked about using, you know, various ways to do it with an API, through the web, just have a CLI. We basically landed on some sort of CLI, that would where a user would plug their laptop into the technician port, and that would allow them to do the low level configuration.

Speaker 3:

So go ahead,

Speaker 1:

How do we configure the well, how do we configure the network without the network? I mean, it feels like we we've got a bit of an inception problem. So how do we just physically, how do we connect into this thing? Yeah.

Speaker 3:

So I am probably not the best person to explain how the switch works, but there are some, there's there's basically 3 sets of ports. There's the front facing ports, the rear facing ports, and these and 2 technician ports. And so the front facing ports are the ones that connect, to the user's data center. Right? And, like, that's what gives them their fast 100 gig networking.

Speaker 3:

And then the rear facing ports are the ones that I believe are just for inside the rack to allow slides to talk to each other over the rack. And then there's these 2 technician ports, and these have, you know, regular old RJ 45 cables, you know, endpoints you can connect, an Ethernet cable into. So that's, those basically open up into some sort of environment, that allows you to configure do something to the rack. Right? And, they plug in those those, technician ports, are reachable on the management network, I believe.

Speaker 3:

Or is it the bootstrap network?

Speaker 2:

I'm It's the management network. Yeah. And and we need to do a a whole show on the management network, because it's I think it's pretty outstanding. But we've talked in the past about the the data network, like how all the nodes talk and how it talks you know, how we do, you know, multipathing in in a in a smart way and so forth. But there's another network, and, actually, there's yet another network, but the management network to control all these nodes out of band.

Speaker 2:

And and that's the that's the network that this technician port is plugging into. So, the rack rolls into your data center. You plug in to with an rj 45 connector into, you know, to your laptop, do a, you know, LLDP to discover the link local address and SSH in, and then we're in Andrew's body of software.

Speaker 3:

Yeah.

Speaker 1:

And this is the this is the body of software that's gonna basically configure this thing, and it's gonna be effectively, it's a terminal it's gonna be a terminal UI. And, you know, these are kind of these are the apps that often these are essential, but can often get neglected. How dare you say it, Andrew?

Speaker 3:

Yeah. So I certainly neglected it. I didn't think I would be the run to write it. I have many other pieces of code, required to run on the rack. And I thought at one point, like, we're talking about making a web a web UI or something, but I figured there'd be somebody with some sort of user experience knowledge who would be building this.

Speaker 3:

But apparently, that is not the case. Although to be fair, I have worked with an NGR designer to correct some early mistakes.

Speaker 2:

It it it should be said that, you know, we we chose a CLI because we really wanted it to work for everyone in every environment, and we didn't you know, this is you're setting up just the earliest earliest small amount of configurations so that you can plug into the rest of the network, and we just wanted to keep it simple. From from a sort of connectivity perspective and the technologies involved, we didn't want people, like, debugging JavaScript or whatever. We just want to keep it simple. And I say

Speaker 1:

not browser deniers is what you're saying.

Speaker 2:

No. Totally. We're we're we're browser maximalists for sure, but we just wanted something simple to get to the browser.

Speaker 1:

That's right.

Speaker 2:

I would also say, you know, Andrew, you talk about this being neglected software. The the, you know, what I sort of imagined was the most most basic user interface conceivable of, like, you know, something prompting you 1, 2, 3, or 4. But Andrew has discovered an entire universe of Chewy's, of, like, text based user interfaces and has built something unbelievably beautiful.

Speaker 3:

Yeah. So it was very much by accident because, Adam, I I figured it would be just like what you said as well, especially if I was gonna be the one building it. Like anything else, you you find, you know, an interesting niche and you start digging in and having fun. And, boy, was it a lot of fun. When you start having to make things real and get to the production engineering side, you know, things get can get less fun.

Speaker 3:

But it's certainly a fun thing, to to dig into. So so we do have, there's a text, a text user interface, and we talk to it through this technician port. And so there's a there's a whole subset of software that runs here. Right? And so where is the software running?

Speaker 3:

It's running on what we call the streamlet, which is a server sled that is directly connected to the switch. And there is some magic routing, I am not gonna discuss because I can probably not explain it well, but that essentially allows, the technician port plug in all that data to come over, out a port on the technician switch. And so when a user does SSH in, we can it ends up in a Lumos land and the SSH command runs, our shell, which is called Wicket. So it's a captive shell. It's a text user interface.

Speaker 3:

It's built on top of, 2ERS and cross term and Rust. And it presents the user with essentially, like, a graphical interface inside the terminal. So they're SSH ing in and this is their their login entry point to allow them to configure the system. And so should be

Speaker 1:

said that just in general, these are the kind of interfaces that Rust makes really easy and delightful to write. Honestly, there's a good, great support. It's been really fun to write these kind of applications.

Speaker 3:

Yeah. Totally. And so, like, credit where credit's due, like, I would not have even remotely thought of doing anything like this, but we have a manufacturing station that we also built, like Josh Clulo, and he, implemented a text user interface for it. And I was literally about to start working on this, and I was like, He he it was a demo Friday thing. He demoed it, and I was like, well, why am I gonna do, like, a CLI where I can have this beautiful captive shell?

Speaker 3:

Because Josh does have some aesthetic sense, and, it was nice. And so I initially copied him, which it got changed later for various reasons.

Speaker 1:

Okay. I cannot risk this pointing out an implementation detail of Josh's manufacturing software, which is all, which honestly, on the one hand, no customer sees the other hand. Runs on the manufacturing line. It's very important. The font that he uses is the same font that Sun had on the open boot prom.

Speaker 1:

And, I mean, Adam, I felt like I I teared up when I saw that. I was just like,

Speaker 3:

we're home. We're home. Throwback. Yeah. Totally.

Speaker 3:

Awesome throwback. Anyway, sorry. Andrew No. No. No.

Speaker 3:

It's like it's super it's super impressive. And, like, at that manufacturing station, like, Josh, I actually talked to him and I I was like, walk me through your your mental state and, like, what is your point of view on this type of software? Because, like, if you know Josh, yes, and that, like, he definitely has a point of view and, like, specifics specific reasons for why he did something. And so I wanted to understand it and see, like, if I wanted to go down exactly the same path or something different. And so I did do something slightly different, but it was more because I knew this was going into, going to live on the rack, and it wasn't gonna be just me operating on it.

Speaker 3:

And I didn't so I didn't wanna, like, write a terminal library from scratch. I want to reuse some available widgets, and there are third party widgets. We're using some of them. So anyways, we have this next user interface, and it is mostly stateless. And what I mean by mostly stateless is that it talks to a downstream service called WICD, and that is what talks to other services on the rack, including our management gateway service, which allows us to talk over the service processors on each switch and configure the rest of the rack.

Speaker 3:

And I don't know if Brian wants to get more into the service processors.

Speaker 1:

Well, so so at the moment, like, so now I wanna jump actually kind of forward in time to your actual demo. So this is kind of all background to under to appreciate. And you you had built Wicket and had demonstrated in the past and would make great progress on that in terms of walking the this kind of the the initial application effectively to configure the rack. But and so now maybe fast forward to the demo. Also, the way you did it, you little you little sneaky guy.

Speaker 1:

It was very you I think you got an inner showman in you, Andrew. I was very, in terms of the way you demoed it. So maybe you wanna describe the demo a little bit.

Speaker 3:

Yeah. So unlike PT Barnum, we at Oxide are transparent in what we are selling, but That is not always true for internal demos. And, like and so, I I

Speaker 1:

feel that transparency does not apply in the first two minutes of a demo of a demo Friday.

Speaker 3:

Yeah. And and and apparently yeah. Yeah. And, you know, secrecy secrecy also for keeping things fun. Those are only for fun reasons, not for any various reasons, I must say.

Speaker 3:

Okay. So I started the demo, and, like, this is, like, the 3rd time I've demoed Wicket for various things. One was the initial like, after the initial prototype. 2 was after a redesign and when, like, we could do some emergency updates where we could actually blast some could actually update service processors and ROTs, through some some software. So, like, that was, the second demo.

Speaker 3:

And so the third one, I was like, okay. I have some new functionality. I wanna show you. And so I essentially, said, like, watch closely, watch what I'm dealing, watch the screen. And everybody's watching and probably going, okay.

Speaker 3:

Like, what is this? I see what's going on here. It it makes a lot it it makes sense, but, like, I've seen this before. You're just walking through the UI, and then I just lifted my hands up and the UI kept moving. And so, it turns out what was going on was a recording, but it wasn't a screen recording.

Speaker 3:

It was a, recording of a live Wicket session that I was able to capture with a magic key press that I'm sure Brian's cat could learn. And that recording was being replayed. And then I went to see how it works. Yeah.

Speaker 1:

Okay. Well, that's not true.

Speaker 3:

K. Well, my memory is terrible, so whatever.

Speaker 1:

But you you didn't go through no. Then it's, like, not and then, of course, everyone's, like, wait a minute. What? And then you'd be proceeded to show this debugger that you'd written. Do you not recall the It's right.

Speaker 3:

I mean, so so the debugger is what does the replay. So okay. So I did, I did But then you also

Speaker 1:

you're going backwards.

Speaker 3:

Yeah.

Speaker 1:

Go

Speaker 3:

ahead. It did not go I the still bugger is not reversible.

Speaker 1:

For the go backwards. Those screw.

Speaker 3:

It did not go backwards. It can be made to go backwards. I have built things that have gone backwards in the past. We are not using immutable, like, functional data structures. And so it is very cheap to capture the state or I mean, it is very expensive in the current system to capture all the state because we're making a copy of it all in memory, for every step through a UI screen.

Speaker 3:

Really not the way you wanna build it. And I'll I'll talk about the internals. But, yeah, what I showed was that you can reset this. You can then, kick off the you can kick off the player recording. You load the recording, through, like, a file, and it's a there is, like, a little CLI.

Speaker 3:

That's probably what Adam and I were envisioning for, like, a normal captive CLI built on, something called Readline, which is a tool for New Shell. And that that's just, like, allows you to give you interactive, command line driven, abilities. And so what I showed was that you can load a recording from a file, hit play on it, and then as the recording is playing, you can also pause it and you can resume, and you can show, like, what the current event that's being executed is and then, the internal state of the system.

Speaker 1:

At that point It's about now when Jordan is, like, did I miss what what debugger are we talking? Did he did I, like, miss something? I mean, it's like, no, no, no one knows anything about this. But this so everyone is seeing this for the very first time. You can't like it over.

Speaker 1:

So it was it was pretty neat. So, Andrew, I think then you, at that point, you you relented and explained kinda how you had implemented this that allowed for the this kind of replay debugging.

Speaker 3:

Yeah. So this is a technique that is fairly widely known, but unfortunately, like, not widely known enough, I would say. And it's, it's really I mean, there there's many different terms for it and and different techniques, but it's essentially, if you can maintain if you can separate your IO your your your IO from your state from your mutable state, you can record you can basically treat all mutations of the state as individual operations, and then you can record those operations. And you can replay your entire history by just, passing the operations through the input as the input to a state machine. So you have this mutable state and the state takes a message essentially, and it updates its own internal state and possibly output something some some output messages.

Speaker 3:

Right? And so, that's how WICkit is is built essentially. There's an event loop. There's a global state, and then there are a bunch of messages coming in. And all this, all that makes this possible is that we were those messages are totally ordered, and we are keep we're able to serialize them and save them to a file.

Speaker 3:

And then with that, you can take a live dump, and replay all those events in order.

Speaker 2:

Yep. Andrew, I don't know if this is describing too much foresight to you, but I really got the sense when you were talking about this for the first time that you sort of had this in mind from the beginning. Right? That as you were designing, Wicket, that you were thinking about maybe not this specific debugger, but debugability in its mastermind. Okay.

Speaker 3:

You have to do this ahead of time. Right? Like, if you start so for instance, like, Rust makes this it's easy to build this. Like, it's easy to build many things in Rust, but you can't, like, do it after the fact. You you have to separate your IO, like, in any language.

Speaker 3:

You have to, like, you have to do the do this purposefully. And so if you're writing, if you started out with a bunch of, like, Tokyo tasks, right, and those tasks were receiving messages and mutating local state and, like, spawning things and Done. The state was spread all over the place, you could never do that. Right? You can never Yeah.

Speaker 3:

You can never or totally order those operations. And so I knew from the beginning that I wanted to totally order things because, a, it makes the system easier to understand. But, like, that's that's what a UI is. Like, UIs have been built this way for a long time, as far as I know. Right?

Speaker 3:

At least I mean, it's possible that I've only read, like, UI papers from the seventies eighties. But, like, essentially, like, there's a there's some sort of main event loop and interactions are getting pumped through that. I think this is, like, somewhat how, like, GTK works and and other things like that. The the the main thing is that you separate all kind of mutating, states separate all your state mutation from, like, any nondeterministic operations. So one other thing you have to do to make this work is that, there's 2 real in this in this scenario, there's 2 main case of nondeterminism.

Speaker 3:

3. So there's events flowing from the downstream system such as, like, a sled gets pulled and the event fires up, right, saying, okay. Here's an alert. This sled is no longer there. You wanna update the UI.

Speaker 3:

There's also, a user pressing a key press to say do something, and that should trigger a redraw on the UI. And then there's, the third thing, which is just time. Right? If we wanna animate things, and there are some animations in this UI, we need to have, like, a regularly timed tick operation. Like, you don't wanna just have the main loop, like, setting a timer and then processing that concurrently because now those ticks are not ordered with respect to the other events coming into the system.

Speaker 3:

And so instead, we have this channel, really a queue, which allows us to totally order all those events. So anything coming in, user key presses, downstream events from the from the rack, and timer events are all normalized into a Rust enum and then popped on this channel. And so now you have a total order. That total order just gets replayed through the state machine, and the state machine looks at what's happening and says, okay. What do I need to draw now?

Speaker 3:

There are some optimizations there just to, like, not draw when we don't have to. Another idea I got from Josh, but it's essentially for feeding all these things into a single threaded event loop. And there there is Tokyo, in the works, but it's but but that IO handling and whether it's asynchronous or synchronous is is totally extracted out from the main loop, which does the drawing. It allows us to replay that that drawing, in a in a debugger later on.

Speaker 1:

Yes. I mean, a bunch of important stuff there. One, I think just and you said this, but just to really underscore the point, like separating out your IO from your logic is really important, And it allows that because it there's just so many positive things that come from that. And, conversely, when your IO is convoluted with your logic, there are so many things that become just much more difficult. It's like having a 2 pack a day smoking habit.

Speaker 1:

It's just like, oh, man. There's gonna be so many so many health problems now that are just gonna be really hard. And it's, like, it actually if you could separate this out, there's a lot of things that that positive things out of it. One question that that was in the channel intro was, were you inspired at all by by Elm, which I know is taken it's I the the program I have for web browsers. I don't know if you you, you talked to Crespo about his, his affair with Elm, but, really interesting language.

Speaker 3:

No. So, like, you know, I am not a UI person. Like, this was, I I won't wanna call it my first, like, terminal UI or whatever. Like, I've worked on a few a few things, a few toys before, like, in college and elsewhere. But but no, I haven't done any real UI work and I was not super familiar with Elm or how it worked.

Speaker 3:

I learned about that actually while I was was doing this. I was I was trying to dig up papers, and I learned that this is very similar to the Elm architecture, but it was after I kinda made this decision. So, like, this decision really comes from my background, which is building, like, distributed systems and, in particular, like, consensus systems. And so if you wanna model, like, a consensus protocol, you have a bunch of independent actors and they all maintain local state, and you can't in reality, you can't get a total order of overall events because they're remotely located whatever. In a simulation, you can certainly do that.

Speaker 3:

And so if you model it the same way where each replica or actor in the system has its own state and is just getting fed events, no matter if those events are coming from the network or from a debugger, you can write property based tests and you can do all sorts of things to ensure that your each independent system maintains as you want it as a whole. And then you can, also pass output events and messages between those replicas all single threaded. Right? You you can you can basically run your whole system, your whole, consensus network in a single thread and total order everything and make sure that global invariance are upheld. Right?

Speaker 3:

This is not a real real real test. Right? You're leaving out the network. You're leaving out a lot of non determinism. But, like, if you could get that things working and the inputs the only way you can get an input into these actors is through a message, then you're pretty confident that, like, barring stupid, like, stupid bugs, that your system's gonna behave the way you you want it to behave.

Speaker 3:

And so I just built it like that. Like, I this is the I said this is the second time I built a a debugger like this. And the first time was for another project I was working on at VMware, that sadly got canceled for, you know, unknown reasons.

Speaker 1:

I'm sure, like, really awesome well founded reasons had nothing to do with

Speaker 3:

Always a good decision. Yeah.

Speaker 2:

It always and articulated to everyone.

Speaker 1:

Yes. That's right.

Speaker 3:

Very clearly spelled out. Not I I have no confusion about it whatsoever.

Speaker 1:

And and there's no lingering resentment, so it it all worked out.

Speaker 3:

Yeah. So I started building that from yeah. No lingering resentment at all. I didn't not every other every other position I took after that was was not a slight disappointment. It's fine.

Speaker 3:

No. I I don't get me wrong. My time at VMware was great for the most part, and I worked on a bunch of interesting projects. But the thing I was hired to work on lasted much, a much shorter time than I expected it last. And so it was implementing the that project was something called Herit, and it was I was implementing, view stamp replication.

Speaker 3:

The view stamp replication or visited paper from scratch, and I was doing it in Rust. And this was June of 2015. So if you know your timeline, Rust had just gone 1.0. There certainly was no Tokyo.

Speaker 1:

I don't know. When is that relative to your blog entry on Rust? That is about the same time, isn't it?

Speaker 2:

Oh, jeez. Yeah. I gotta look that up. What what month in 2015 was?

Speaker 3:

June.

Speaker 1:

Yeah. That's that's

Speaker 2:

right around that time. Yeah. Right.

Speaker 1:

Right. This is my first introduction to Rust is reading Adam's blog entry, and the well, I was, like, okay. Well, in conclusion, I will never use this language. This sounds extreme seemingly painful. And Adam's last line was like, in conclusion, this is kind of interesting.

Speaker 1:

I'm I'm I really enjoyed this. And I'm like, what is you read your own blog entry. But, I mean, it was a I think that it was a painful time in a lot of I mean, it was a there were things that that you could tell when I arrived in 2018 that, a just gone through a big step function in terms of reducing pain, but I think in 2015, intro, Rust was raw.

Speaker 2:

Yeah. Yeah. It was to really you had to really want it. I mean, there were lots of times when the compiler would would tell you to go fetch rocks and then tell you those aren't the right rocks. Go get some other rocks.

Speaker 3:

That is very accurate.

Speaker 1:

Did the Rust compiler from 2015, did that prepare you for being, for being a parent of a toddler? Adam, just like, you know what? I was a Rust user in 2015. Like, you're gonna have to do more than this kid.

Speaker 2:

I would say better certainly better than any other programming language. I mean, your c plus plus is, like, close, but a a little more psychotic and and less sort of arbitrary.

Speaker 1:

Well and this is where you also have the, you know, the, certainly, the ownership model is a very is is a real is certainly new to was new to me with Rasta. New to

Speaker 2:

me as well.

Speaker 1:

And did

Speaker 3:

you just say c plus plus was less psychotic?

Speaker 2:

No. No. No. Pardon me. C plus plus more psychotic, kinda less arbitrary in its whims.

Speaker 2:

I'm just thinking about which language prepares one better for having a child. It's better.

Speaker 1:

Yeah. I mean, the the the memory on safety and kind of ramp of data corruption definitely helps you prepare for for being a a parent, I think.

Speaker 2:

And Yeah. Every child I think it's the short of it.

Speaker 1:

Exactly. But so so, Andrew, you had been a, a Rust in the Rust user in those early days. And so I go because one thing I I would love to understand is building this debugger was super fast once you had this whole thing built out. It was, like, days is my understanding.

Speaker 3:

Yeah. It was about 2 days, which, like, it was just started. Right? It it's actually at a point today, so it's been a week ish. And now it has, you know, like, single stepping and, you know, you can do a few fancier things.

Speaker 3:

You can you can play at different speeds by manipulating the the tick time. Since we're just sending messages, we can choose how long to sleep when we see a tick message in the debugger.

Speaker 2:

And has it has this been useful for for debugging some some problems that you've encountered while developing this?

Speaker 3:

Of course not. It's a total toy right now. No. Like, no. The real purpose so so part of the reason I did it was because I knew we could, and it was a weekend.

Speaker 3:

And I thought I could get it mostly written over the weekend, because I just I don't know. I was bored, and I didn't wanna do real work. I had started working on, like, a real thing for work and decided I wanted to do something else that was tangentially work related, which is honestly how most of my projects go. Like, most interesting things that come out of my head are very, like, immediately relevant to the work I'm doing, but not strictly on the path to shipping a product. And so that's that's how this popped out.

Speaker 3:

But, like, I had the idea that I wanted something like this. I wasn't sure it was necessary, but then I get excited. And so, like, the real driving motivation besides just me being bored and, like, really want to build this because I thought it would be cool to show visual state, was that we have to add we were talking about, so a coworker and I were talking about adding progress bars. And, also, like, we're trying to get this system running on the what we're calling the dog food rack. And I don't have full access to that rack at all points in time.

Speaker 3:

And so I was thinking, well, it'd be nice if I could just simulate all the events I want in the UI. So when I add animations or change styling, I don't have to walk through and be plugged into a real rack to do this. But I wanna look at it as a human. I'm not just looking to, like, run a test and see if it pass or fails. Like, I wanna see how it looks.

Speaker 3:

I was like, oh, would it be neat if I could just, you know, step through all the important parts of the UI that I care about while I'm making changes and then just run the recording after I make a change and see what it

Speaker 1:

looks like.

Speaker 2:

That's what

Speaker 1:

and you demonstrated that. That's what I actually love. It's like, alright. So let's, you know, let's see if we if we, you know, change this color or we change the way this thing looks or just you know, you're gonna make some cosmetic change, and we wanna, like, really understand quickly how that's gonna look. And now we can just go replay it and see and Yeah.

Speaker 1:

Get this, which I thought was really neat.

Speaker 3:

Yeah. And so, like, you can see. So I don't think I did that on the demo. I mentioned it. But, like, I did do it for Benjie the other day.

Speaker 3:

Benjie's our designer. But, like, you can, like, you can see where there's bugs. Like okay. So for instance, I created a little, like, style module, and it just it just contains some colors and, like, names them through functions. Right?

Speaker 3:

It just says, like, for plain text, make it off white, whatever our specific off white color is. And so, like, you change that. And if you change that, you're gonna see all the places you're using that as you run through the recording. And, like, when I first made the change, I noticed that it did not change where I thought it was gonna change, meaning that something was hard coded in that place. Right?

Speaker 3:

And so that was, like, really nice. That would be a really hard bug to catch. But, like, when you're just walk watching it and you see all the colors change, except there's one thing you expect to change, doesn't change, then you know you gotta go fix that spot. So, like, there's unexpected, like, serendipitous, bugs revealed to you that you probably wouldn't even notice if you were slowly, like, manually walking through things unless you were looking for them. Well, so there there there we go.

Speaker 1:

That's your answer to Adam's question. How's it help find bugs? That sounds great.

Speaker 2:

Yeah. And and could these could these event streams also form the basis of, like, CI where you're, kind of taking some recordings and replaying them into the new code to validate that everything's working properly?

Speaker 3:

Yeah. So this was a larger idea I had that I'm not sure I wanna build in particular, or at least not right now. But you could so you can imagine that these events are coming through and they're manipulating what is essentially a bunch of rectangles on your screen. And you can write some tests that say, when these mutations come in, I expect this rectangle to look like this, or I expect the modal to be up. If a modal's up, and, like, the background should be this color.

Speaker 3:

And you can actually copy out the buffers. Right? And then check after you run it through up run some operations that the buffer mutations look as you expect. So you can totally automate, like, what the CLI is supposed to be. With 2 ERS, like, you have full access to the terminal buffer.

Speaker 3:

It's not that big because it's just, like, some some characters on a screen. Right? It's not a full, like, 4 k resolution thing. So, yeah, you you can you can write tests through that. And, like, I thought about it.

Speaker 3:

And so this system is that we've talked about recordings and how you can play your recordings. Well, so 2 ARS and Wicket allow you to resize the screen. So I operate in TMUX a lot, and I'm I resize the screen so it's, like, small like a laptop screen, like it will be in the data center. And when you resize, the placement of everything changes. And so if you were to use a, say, take your recording in a large screen, and then without doing any resizing and then run it on a small, like, debugger window, it's probably gonna crash.

Speaker 3:

Just because the resize events are ignored right now in the debugger, and it's it's gonna try to place things outside your your terminal window, like, if your window is too small. But that is just like an artifact of not implementing it implementing that functionality yet. You could imagine that if we know we're running a TMUX, we can automatically resize the window. And, Rain, my colleague, they pointed out that, you could actually use, like, a PTY and so have, like, a virtual screen that's always the right size and then process resize events that way. So that would be a cool way to do it, but I haven't dug into that yet.

Speaker 3:

And so right now, we're just using this to iterate fast on debugging. But it also is running in production or well, nothing is running in production now. But, like, it could be. It's enabled so that you can store up a certain number of events and then dump them, if you see an anomaly.

Speaker 2:

So, Andrew, if I'm if we're in the data center, you know, sweating next to the machine and something goes wonky, we can hit a keystroke, and it'll emit the stream of events to, like, bring back home to diagnose what went wrong?

Speaker 3:

Yeah. Yeah. Exactly. So it just dumps it to a text file, which you can configure a location. Oh, it's not text.

Speaker 3:

It's serialized. What am I using? CBOR? A number of reasons.

Speaker 2:

Oh, gotcha.

Speaker 3:

But yeah. Like, what there's another interesting component there, which is, like, okay. You have the stream events. Well, how long are you gonna be sitting at this screen messing with it and ticks going off? Like, you can't let that stream of events build up indefinitely.

Speaker 3:

You will take up a lot of memory, and your files will get quite big. And so, like, you typically wanna constrain the history of events. And again, because we have a state machine and we have access to our global state, we can take a snapshot of that state, when we have, you know, say, a max number of events or we've used a certain amount of memory. And that that allows us to keep, like, a a fixed memory usage while this is running and not kinda destroy the machine, which which lets us use it in production. And so, like, let's say you have a 1,000 events, and that's all you're allowed to record, and that'll give you whatever, 10, 15 seconds of recording time.

Speaker 3:

Once that initial 10, 15 seconds is up, you take a snapshot, you clear out your vector of events, and you start building up from there. And so what that means is when you take a dump, you're not gonna start from the beginning. You're not gonna start from the oxide splash screen. You're gonna start from wherever, you know, within that last event window. But everything will still be consistent.

Speaker 3:

And so if you have a couple minutes worth, and you see a problem, and you wanna hit a few keys and see what happens, you can just do that and then know that, like, 30 seconds later when you when you hit the hot key to take the dump that you'll have the the anomaly that you wanna capture.

Speaker 2:

Well and that seems huge for the kind of deep menued system that we're building there. Or if you wanna debug, you know, the the 15th step that someone's taking, it's or or not even debug it, but if you just wanna kinda keep on building on it, it's such a pain in the neck to have to start from the beginning each time. So this thing fast forward you to the part that you're interested in, that seems like a huge time saver.

Speaker 1:

Yeah. Well and, also yeah. And I love just the ability, obviously, to debug postmortem. I mean, as as our colleague Cliff said, shortly after he arrived at Oxide, y'all are really into postmortem bugging. We are we are really into postmortem debugging.

Speaker 1:

But just the ability to to have that state that can then be sent elsewhere for purposes of understanding the system is just extraordinary. Andrew, something you said briefly that I wanna go back to because I think it's I mean, that this was in Rust allowed you to extend it in this new way or or develop this kind of extension from Citibank really quickly. And, I mean, I know yeah.

Speaker 3:

I don't I mean alright. Continue. Sorry. Continue the question. I mean

Speaker 1:

Well, I was just gonna say and I mean, you know, for anyone who influenced Rust, this will be old hat. But for folks that that, are new to Rust or haven't implemented in Rust, they may not realize the load bearing role that it has here as it has in so many things that Cerdee is playing here. Serde makes us super quick to to implement.

Speaker 3:

Yeah. So that is that is very much a key. Right? Like, I I cheated. Like, I have on so much other software.

Speaker 3:

Right? Like, if you were working in c plus plus or other languages, there may be similar functionality. Right? But, like, Serde is the serialization and deserialization library built into Rust. So I was talking about Herit before.

Speaker 3:

Serde also didn't exist when I was writing Herit. There was something called Rust

Speaker 2:

I was just saying that. Yeah.

Speaker 3:

It was the precursor, and it was okay. It was much slower, and it was definitely not as well documented. But, like, it works. So I was able to cheat back then as well, just not as well. Like, the cheating wasn't as effective.

Speaker 3:

I got, like, an 89 on the test.

Speaker 1:

But so and describe a little bit how one uses Ceredex. This is one thing remarkable about it is how

Speaker 3:

Yeah. This is so this is, like, just amazing to me. Like, this blew me away. So, like, I've never personally written a procedural macro. I have some knowledge of how they're built.

Speaker 3:

I've if you looked into it and always gotten distracted and gone on to something else. But I have used procedural macros in pretty much all my code, since I started with Rust. And so what a procedural macro does is it allows you to essentially give an annotation to some part of your code base, and it will automatically generate code, by reading that code and modifying it modifying it according to what you instructed to modify with your annotation. And so for Serde, what it does is it takes a structure or any sort of data structure in Rust, and you can say you can annotate it with the SERDESerialize and deserialize tags, and that will automatically generate serialization and deserialization code. And so that, is what allowed me to very quickly hook this up to just writing things to a file.

Speaker 3:

So one question that arrives, okay, like, well, what serialization format? Well, that's the great thing about SERDES, it's pluggable. And so it defines, like, a visitor pattern, and then there are different, serialization formats that implement the SERDES, callbacks, essentially. They they they implement the serialized call for and so the macro uses them or generates enough information. I'm not fully aware of, like, what the path here is.

Speaker 3:

You can you can you can combine various formats and SERDHE itself, and then all the user has to do is say use this format and use SERD A, and it generates serialization code of the right format for you.

Speaker 1:

Well and importantly, deserialization code. Because serialization code is I mean, it

Speaker 3:

can be True.

Speaker 1:

Yes, but it's actually like, it it it is much easier to write serialization code than deserialization code.

Speaker 3:

Yes.

Speaker 1:

And, rarely is one good without the other. So you you needed to both serialize and deserialize here. And, I mean, I've just done this over and over and over again, where the ability to just literally annotate a structure by denoting that it's gonna derive the serialization and deserialization traits, and then, like, it all happens just magically, and you get this really robust deserialization, which is not which is I feel that that certainly speaking from my past self, mean, I would YOLO the serialization, and the deserialization would be, for something like this, would be, just the max power way, as we say. Making a Simpsons reference, the wrong way, but faster. And whereas when you insert it, like, you're actually getting, like, really robust deserialization here as well.

Speaker 3:

Yeah. It will not deserial so, I mean, it depends what format you're using, etcetera. But, like, when you say deserialize this struct, you know essentially what type you're gonna be deserializing into, and SIRnet will attempt to deserialize into that format. And if it doesn't, it will give you an error, and it'll tell you where it errored out. That's not true for every, like, limitation of every format and and inserted, but it it works pretty well.

Speaker 3:

And, like, the way, Wixit is using it and the and the Wixit debugger is using it is it's kind of the interface between the 2. Right? Like Wixit, like, I these were separate commits. Right? Like, Wicket, I I had the serialization code.

Speaker 3:

I dumped out all the events of the file. I created the snapshot format, which has that state that we talked about and the events. And I dumped that out to a file and then the debugger came later. It just read that file, deserialized those events, and and played them replayed them. So there were some abstracting of shared codes so that we would play things the same way we do an actual Wicket, so that, like, you're not not writing 2 separate implementations that can diverge.

Speaker 3:

And so there was a little trickiness there to get that how I wanted it to. But for the most part, like, the the the interface layer between those two things is just serialized events, and it's automatically serialized. Like, I didn't do any work to, like, pick you know, you know, do anything with that format.

Speaker 1:

Yeah. When I hear the it feel this is, like, one of these examples. Right? And I know there's there's a lot of, you know, discussion happening out there about where is Rust a good fit, where's Rust not a good fit. But it's the presence of this kind of stuff that that, at least at Oxide, we see a really broad use case for an up and down the stack in pretty much.

Speaker 1:

It just feels like there there's lots and lots of software that would benefit from this pattern and and this approach and kind of the ability to have a, a language in an environment that could do that has been a big, big win for us, I dare say.

Speaker 3:

Yeah. I mean, in the past life, I've used Tcl for Python to do something quick and dirty like this. And I I wrote, like, our own IDL, like, protobuf IDL from my past job, you know, in c plus plus. Yeah. I I

Speaker 1:

did one of these I did one of these in Pearl. That was that was that was a mistake.

Speaker 2:

That that was probably, like, 2,004 or something. I don't know. I feel like you can based on the language based on the choice of Pearl, you can really pin it down.

Speaker 3:

Yeah.

Speaker 1:

Well, no. That would no. No. Well, no. Sadly, this would have been 2004.

Speaker 1:

It it tells you something that I was doing this in actually 2,008. So, I probably it'd have been 2004 for me if it was more defensible decision. But, and that was actually where I ended up redoing that the the that CLI in JavaScript, actually. I know that's kind of what the or JavaScript that was in the note. My first kind of realized that you could use JavaScript in and it was, like, it was useful, but, boy, nothing compared to, what we can go to with with Rust and being able to to apply this just ridiculous power tool, throughout the stack has been has been great.

Speaker 1:

And then so, Andrew, in terms of, like, applying this kind of pattern, because this kind of feels like the all of the fun of distributed systems without any of the, like, real pain in the ass of distributed systems. Am I wrong about that?

Speaker 3:

No. You're exactly right. Because there are no communicating processes here. Like, I'm just reporting events from a single threaded thing anyways. Right?

Speaker 3:

Like, yeah, there are downstream services, but, like, those are just rest services, and they're just, like, we're just pulling them and grabbing, like, inventory updates. It's not there's no complexity on the distributed systems front here. But when you I I guess, like, what Brian said to me early on is, like, as, the semi, you know, elderly engineers, like, we have this synthesis this synthesis of knowledge. Right? And you you learn that you can combine things and take your experience from one place and use it in another.

Speaker 3:

It's not always the best idea, but in this case, it worked out nicely.

Speaker 1:

Yeah. I the the where I I do think that that, and I think there's also a lot of value in when you've got these parts of the system that everyone is kinda looking at someone else to go implement, that is really important. Right? This is ultimately this is the first actual software that a customer runs on the rack is is is the software. And instead of viewing it as regrettable, you're viewing it as a real opportunity to, like, okay.

Speaker 1:

Wait a minute. How I I don't have to implement this in the same way that it's been implemented. I I can actually apply all these new techniques. And how can I go do this in a way that's gonna make it faster to implement and it's gonna make it more robust and and so on? So I think that there's a lot of value to that kind of versatility where you're going into a new domain that you that is and you said, like, this is not you you don't do a lot of UIs and I lot of terminal UIs, but this is a real opportunity to apply that distributed systems wisdom to a new domain.

Speaker 3:

Yeah. It's I mean, really, you should take any opportunity. I mean, I just like, me personally, I do not mind digging into new domains, like, as long as I have the the time and the right support system. Right? And Oxide's Oxide's great for that.

Speaker 3:

Maybe not so much on the time front lately, but, like, but having, like, just the ability to dig into something new is great. And, like, if it's needed, like, you you feel that there's value there, you can go dig in, but you learn. I mean, there's just an ability to learn something new, and really, like, you can really dig deep and, like, essentially, this was this was totally demo driven development. Like, I did this to show off my peers. I wasn't like this was not like, I don't know.

Speaker 3:

Like like, this definitely adds value, and I think the customers are gonna like it. I think it looks great. I think it's gonna look better for real. Like, it'll it'll it'll continue to get better with more people working on it, right, and more designs, like, skills and and coordination across everything. But, like, the impetus for it really was to, like, make it look cool for the demo at first, and see and see how well it worked if I get it done.

Speaker 3:

Right? And so, like, the first iteration of it was about 2 weeks of the work and and had a bunch of, like, very weird, like, UI rendering and it had, like, mouse over events and, like, the bulk of that was thrown away. But, like, the core of it being a text user interface and showing a rendering of the rack and other things were were kept. And so it if you have time to experiment, it's always fun to step outside the your your own comfort zone and see what you can do. Honestly, that's been my whole experience at Oxide.

Speaker 3:

I have a, an interesting position. I came in thinking I was gonna be doing one thing and kinda did a bunch of other stuff. It's been interesting. Just like in VMware. Yeah.

Speaker 3:

That's true. It's true. In a different

Speaker 2:

way. Yeah. In a very different way.

Speaker 3:

Yeah.

Speaker 1:

You know, I I I wanna make a joke about canceling projects, but I can't even do that. It's just too cruel. I'd also I I I so hate it when the it just brings up the the this is in in many ways, like, my, there are many aspects of Soul of the New Machine that Meredith reread, but, the fact that the birth of that project is when when one project, prevailed over another in the the shootout at Hojo's. It's where they they and I and that that definitely stuck with me reading that on the the second read. Sadly, all the shootouts at Hojo's.

Speaker 1:

But no shootouts at Hojo's at Oxide. So this is all, all very, very good stuff.

Speaker 3:

Yeah. I mean, we have so much to do. It would be hard to have a shootout. Like, I think we can all find our own domains to be dominant in. Yeah.

Speaker 1:

Exactly. That's

Speaker 2:

right. No no no reason to, like, try to squeeze someone out. Brian, you, you broke the seal on JavaScript. I did wanna mention I mentioned this to Andrew. Have you checked out this thing called replay.

Speaker 2:

Io? I learned about it because our, our colleague, Justin, runs another podcast, called devtools.fm, which is great. And he had recommended this episode, and it's a replay debugger for JavaScript in the browser, which sounds tantalizing. The internals of it are ludicrous because, like, it has you know, JavaScript and browser has none of the properties that Andrew was able to take advantage of. And so they're basically inter both interposing on system calls to get those raw events.

Speaker 2:

Yeah. Exactly. Taking advantage of the fact that JavaScript is, you know, mostly single threaded, but then also getting deep in the underwear of the, the virtual machine in terms of, like, locking primitives and stuff like that. Anyway, a neat system. Reminded me of some of the stuff Andrew had had was talking about, and he's built in in the WICN debugger.

Speaker 2:

But, also, like, you need to start recording from the beginning of time, and desynchronization is a huge problem. Anyway, beat system.

Speaker 1:

Yeah. When you also have there there there's this kinda tension about how general purpose versus special purpose your debugging infrastructure is. And that there on the one hand, if you've got general purpose debugging infrastructure that can debug any program or debug any JavaScript, you can obviously, it's got a very broad service area. On the other hand, it can be really nasty to get a lot of that stuff to work. And, I mean, the one thing I love about, Sandra, is this is very special purpose debugging infrastructure.

Speaker 1:

This is just for this application effectively. But as a result, it's it's extremely powerful, and there are a bunch of problems that you don't have to solve because of the way you've architected the application. And I think that we don't spend quite enough time on application specific debugging infrastructure. I think it's really, really powerful, and I think that it it so often, will help inform good architecture decisions when you think about this from from the get go.

Speaker 3:

A 100%. Like, this is the same I mean, you can make the same argument for certain types of testing. Like, I I don't buy the whole, like, certainly, I don't wanna get into this rabbit hole, but, like, test driven development, building out the perfect architecture by, you know, writing your tiny unit test functions. But I do think, like, more sophisticated testing strategies can let you know when you can't see whole subsystems, in your test process because they're just inaccessible. So, like, property based testing, I think, is a is a good way to exercise large portions of your system, with a large input space.

Speaker 1:

Can you elaborate on property property based testing? Because I actually had not really seen it prior to oxide. I know it's it's it's it's out there, but could you just elaborate on it a bit?

Speaker 3:

Yeah. So so it's actually kinda similar, to what I built here, which is that you have at least the ways I I would use it for distributed system. So there's 2 separate property based testing. I guess the simplest way to describe it is you would have a function. Let's say you have a function that, reverses, like, what's gonna do is, like, reverse twice.

Speaker 3:

Let's say that this is a simple, like, example that most property best testing, like, tutorials go through. It's like you have a function, and it's gonna reverse a list twice. And when you reverse a list twice, the property is that you should have the same list you started with. Right? So what property based testing does is it generates a bunch of inputs automatically, giving, like so you give it some sort of generation function, and it'll generate inputs that correspond to the right type.

Speaker 3:

And then the the test infrastructure will do that, And then it will run through your actual function. So you're not writing you're not, like, writing 7 or 8 individual tests giving, like, various tests manually, giving, like, various inputs manually saying, like, here's a list a, b, c. Here's a list a, b, c, d. Like, the property based test will generate those, and it'll it'll start growing them and getting more random, over time as it runs. And so it's just testing that the property that you want, you can name that property, like, reverse twice, gives me the original list.

Speaker 3:

That could be the name of your property. And all it's doing is checking that that assertion holds given a set of inputs. And so it has to be deterministic. Like, otherwise, you're gonna be in a world of hurt. Because, like, if your test just started not doing that properly, coding bad.

Speaker 3:

But, like, this is this is why, like, the the separating your IO and making things deterministic, this is how I kinda learned to structure my code that way and how I ended up with learning to build these debuggers. Because in order to make distributed systems property test deterministic, you you don't wanna just like they're you don't want them those tests to call into, like, the local clock while they're running. You want them to pass in the time or the duration for an expiry event. Right? And so that way, every time you run through and you get that same order of events, you're gonna get the same result in the test, and so they're not flaky.

Speaker 3:

And so, like, you want that staple manipulation. And so, like, that's the the simplest form of that's of a stateless test would be just generating a bunch of inputs and passing it to a function that's like a pure function. Right? But, like, there's also the the stable version is you have, an actual models. You have you have a the state of your system under test, and then you have a model state.

Speaker 3:

And so you're actually writing a separate state machine for the model that's perhaps simplified, And you're running the events. You're taking you're generating a list of events again, and then you're running it through the model, and then you're running it through your your system under test, and you're comparing that the model your assertions your properties now are that the invariance for the model actually hold and match, like, the the system under test. So for instance, you can say, the sequence numbers will never have duplicate sequence numbers in this, duplicated log. So, like, you'll never you'll never see, or you'll never see diverging agreement among all the replicas. And so all you need to do to keep track of that is to essentially maintain in your model, that the replicas alright.

Speaker 3:

I'm confusing myself right now. But, in essence, like, you're maintaining, like, you're maintaining a model and you're maintaining the variance, for that model and then comparing them to the system system under test and making sure that they match. And the property based testing part of that is just generating the inputs and running through those assertions and invariance and allowing you to see if they match. What makes it really interesting is that most property based testing tools, when they fail, they give you a history. And that history can be really long and complex.

Speaker 3:

But they run through and they, through a process called shrinking. And shrinking actually reduces the state space. So it it steps through. It kinda does a binary search to see where the test would fail. So it starts running tests with similarly patterned inputs that are smaller to try to see if it can get the minimum sequence of events that cause the test to fail.

Speaker 3:

And that is a really nice thing because that allows you to more easily debug it. So, like, if you have if you have 2 things, if you have the property based testing infrastructure that can run through your system tests and show you the failing history, and you have a debugger that knows how to play those same events, you can now take that recording of the failing test and run it through your debugger. Right? And so these would be all application building. But if you structure your code that supports property based testing, it's most likely gonna support this type of event replay debugger, and you can use the actual failing test, to run through the debugger and inspect the state and see what's going

Speaker 1:

wrong. Yeah. That's really cool. And so Adam dropped in a link to the PropTest crate, and I think that this and I I this started my understanding is with quick check. So, Nick in the chat dropped in a a link to to QuickCheck from Haskell.

Speaker 1:

And is it is this kinda coming from the the Haskell side of it? I mean, have you used what have you used

Speaker 3:

for the property? So I think I believe it was introduced, with Haskell for Quick Check, and that was like a a John Hughes jam. No. That was Simon Peyton Jones jam. But John Hughes was the, so I first started with property based testing in Erlang, when I was working at Basho.

Speaker 3:

And so wrote a lot of property based tests and stateful property based tests. And we used a proprietary tool from Cubic, called I noticed. It was like early in quick check. It wasn't a very creative name. But it was an awesome tool.

Speaker 3:

And it had, like it actually did much more sophisticated things than either the original QuickCheck as far as I know, and, like, any of the open source tools for Rust. Like, it would generate symbolic inputs, and so it would, like, symbolically shrink the state space, and it had all sorts of funky stuff. It would also do randomization, of the, like, run time. So it would not only, like, manipulate the inputs, but it would also manipulate the Erlang scheduler under the hood through something called Pulse. And there's a Tokyo project, I think it's called Loom, that does something similar, to test, like, Tokyo.

Speaker 3:

And so it manipulates, like, the the state space, or the underlying, like, schedules of the of the tasks that are running. But then when I wrote when I was working on Harrow, PropTest did not exist, so I used, QuickCheck, the QuickCheck implementation at first. And I kinda built up on top of it, to build a a debugger there, and and write some tests. But here, I've used, prop test exclusively. So I haven't written as many property based tests as I would have liked before.

Speaker 3:

But, yeah, I I use PropTest at Docside.

Speaker 1:

And would and would that be your recommendation for for folks who know the rest and looking to

Speaker 3:

get Yeah. So QuickCheck, they're different. Right? Like, QuickCheck is a cool tool. It I think it was influential.

Speaker 3:

A QuickCheck for a while, I'm talking about, and that it introduced the arbitrary trait. With quick check, like, there's a there's a why, like, PropTest versus QuickCheck, I think, in the PropTest book. And so, like, there's a, you know, a doc, e an ebook that that comes with the PropTest, open source stuff. And it explains, like, quick check, does it shrinking and generation based on types? Like, you divine the the shrinking and expansion, like, generation and shrinking, from prop tests are much more coherent.

Speaker 3:

So, like, you can derive, you can take a type and then compose them and, like, generate new outputs that aren't strictly mapped to, like, a rust type. And it will shrink in the same order that those outputs are, like, derived. And so, like, it always kinda does the right thing. You can do a few more complex things, I think, with PropTest. And it's got a a few more tools, at its disposal.

Speaker 3:

And it's got a really nice, like, introductory book. I think they say it's mostly feature complete. I don't think they're accepting, like, new features to prop test. It just works. Like, you can build things on top of it.

Speaker 3:

I, like, I would recommend using PropTest. Yeah.

Speaker 1:

Yeah. This is something I really need to personally add to my arsenal. I really have not. Adam, have you have you used prop test?

Speaker 2:

No. You you and me both. That that link to prop test is, like, one of my, forgotten tabs of, like, must read material that seem to pile up at an alarming rated oxide.

Speaker 3:

Yeah. It's I will say, like, don't be intimidated. It's it's much easier than probably most things you guys have done. It's probably slightly easier to use than implementing DTrace.

Speaker 2:

But is it easier to use than using DTrace?

Speaker 1:

Yeah. But I'm I'm I'm semi elderly now according to

Speaker 2:

you, Eric. Semi elderly. Thank you for that, Andrew.

Speaker 3:

Yeah. Well The

Speaker 1:

so and actually this is kinda like and we we should actually find the do an upside of this that the, the UASIS, I'm probably pronouncing mispronouncing that in the chat is pointing out that, a lot of the stuff that that I would say we like about Rust, existed first in Haskell. So build RS and quote, drive macros, traits, etcetera, which is kind of interesting. I mean, that the it's actually honestly, it's part of what I I love about Rust is that it is drawing on a bunch of different folkways. And I look forward I know I've I've joked with Steve Gladnick that I I personally look forward to the HoppleTalk on Rust about pulling in all these different things. Because, you know, if you had told me a decade ago that there were things in Haskell that would be really meaningful to me, I I would have been curious for details because that would seem plausible.

Speaker 2:

Uh-huh. Curious and incredulous. Yes.

Speaker 1:

And curious and maybe incredulous. I well, I I I yes. Maybe incredulous. In in part because I think that the, you know, what Russ has done is kind of taken a bunch of, I mean, Haskell is, gonna get myself into huge trouble here. But, I mean, Haskell has neophilia.

Speaker 1:

Right? I mean, it's like it likes there's it's a great laboratory for exploring different ideas. And when you combine all of those ideas Adam, do you remember do you remember being an AAD bug in 2003? I mean, obviously How

Speaker 2:

could I forget? I I one of my favorite conference citizen I've ever attended, I say somewhat ironically.

Speaker 1:

I'd say that completely seriously. That was so much fun. This is the so, Andrew, this is the automated and algorithmic debugging conference, and, which we were I I mean, I was so I I thought it was

Speaker 2:

so much fun. So much fun, but maybe not my favorite conference, but I did have a great time.

Speaker 1:

That so much happened to that conference. There there

Speaker 2:

was Yes.

Speaker 1:

I felt like I was going to the Olympics. I was so excited. I was Oh, because because they didn't

Speaker 2:

have it every year, and Brian had gotten a paper in, a terrific paper in. And we thought this was going to be truly the Olympiad of people who loved debugging as much as us.

Speaker 1:

And it was a ragtag band that, there were, like, 60 people there, half of whom were I definitely learned about, boy, the French prologue mafia. Do not do not mess with those people. Like, you end up in a dumpster if you I've do you remember the they had some sharp opinions about about prologue? But then do you remember that the the, person presenting on Haskell and, needing to solve an NP complete problem that generate an error message. Do you remember this?

Speaker 2:

I do remember this. Yes. Keep thinking

Speaker 3:

wow. What

Speaker 1:

a what a language.

Speaker 2:

Language. Oh my goodness. I gotta

Speaker 1:

do a traveling salesman problem. Actually get to a TSP to actually, Hamiltonian circuits in order to be able to generate an error message to understand why your why your type inference is failing, which I that was kind of, like, the only thing I knew about the language for a long time. So it it definitely did not feel like, it it it felt implausible that it was gonna have immediate impact, but it definitely is.

Speaker 2:

Also, dear listener for you thinking, oh, like, I can't wait for the next a a debug so I can attend. That was actually the last a a debug. That's

Speaker 1:

that was the last AA debug because they can't, like, they can't get a space. They got kicked out of the Denny's or whatever, and now they can't get a space. I mean, it's it is it's really pretty sad. I and we thought it was only happening every few years, every Olympiad because it was so so extraordinary that you just couldn't have it every year, but it's like, no. No.

Speaker 1:

They can't get their shit together every year. This is also the the conference in which, well, I mean, a couple of things. This is the, Adam, you had had the idea when we were leaving, you told me that you were very upset that California had taken away our right to eat horses. Yeah. And in particular, the as California is willing to do, we had just passed a ballot initiative banning the the human consumption of horse meat.

Speaker 1:

And, you take that as personal offense. I think it's fair to say. And you and you were like, you know, I'd never wanted to eat horse meat, but now obviously I do, and we're going to Belgium for this conference, Ghent, Belgium, amazing city. And we are gonna and I just remember you hatching this, like, when we were leaving, like, I wanna eat horse in Belgium. And I just felt like I I mean, I it was my calling to help you on your quest.

Speaker 3:

I mean, I'm sure

Speaker 2:

I'm in. Let's go.

Speaker 3:

I always think of horse meat when I think of Belgium.

Speaker 1:

Well, okay. There you go. So we are so well, so now it's important to say that this is like definitely pre, internet on the phone era. This is 2003. So it's like post internet, but kind of like barely.

Speaker 1:

And in particular, we've got no way of and we don't speak the language. I mean, it's Flemish. I speak French.

Speaker 2:

They they speak the language, you know, our language. But yeah.

Speaker 1:

Yeah. But we're looking for but but we're looking for horse. We need to go we're looking for horse meat. We need

Speaker 3:

to be we need to be able to go into this call. Yeah.

Speaker 1:

Yeah. What we learned is that, like, horsemeat is not actually you can't just, like, show up in Belgium and ask for horsemeat. It doesn't work that way.

Speaker 2:

Actually, we we learned something else too. We learned that horse is slang for heroin, and everyone at the conference thought we were really trying to party extremely hard even for AA debug. Well,

Speaker 1:

because we said we wanted to score some horse, and it was illegal in California. As long as we're in Belgium, we wanted to score some horse. And they're like, you guys look like you have your okay. What? It's like, no.

Speaker 1:

Don't We were

Speaker 2:

just gonna have some beers after it, but I I mean, you guys do

Speaker 1:

you. Like, come on. When are we gonna get back to Belgium? Let's get some horse. And then yeah.

Speaker 1:

Okay. No. No. It's not. And then we had the whole Water Zooey fiasco where it turns out we thought we we were convinced that Water Zooey was horse, but it turns out that was hippopotamus.

Speaker 1:

Were they stirring hippopotamus to that restaurant? I feel we never got to the bottom of that. It feels like it would have been more exotic. Anyway, we finally found and then we finally passed, and, Andrew, just to your point that you think about from everything of horse meat, we passed a butcher shop that had a giant horse's ass. Right, Adam?

Speaker 1:

Am I imagining this?

Speaker 2:

Yeah. No. This is all true.

Speaker 1:

This feels very dreamlike right now. But there's a giant horse's butt, like, hanging, like, a shingle outside of this thing. We're like, these guys have to have horse.

Speaker 3:

But I feel

Speaker 1:

like, do they have horse in the end?

Speaker 2:

No. No. Sure.

Speaker 3:

It wasn't like a centaur and you just didn't see the front?

Speaker 1:

I did. Might have been a centaur.

Speaker 2:

We went the whole the the the the short story made very long. We got but we got no horse.

Speaker 3:

It's a shame. Well Feel good for the horse

Speaker 1:

to make it even longer by having no. The we got no horse, but the the store was closing in 10 minutes, and I had no cash. And we and I had to run across town for an ATM.

Speaker 2:

Yeah. We ended up we had another inspiration too, which is we wanted illegal cheese. That is to say Oh, that's unpasteurized raw milk cheese. That was our downgrade from

Speaker 3:

from the horse. Yeah.

Speaker 1:

From horse.

Speaker 2:

Yeah. Yeah.

Speaker 1:

I'm sorry, everybody. This is way down there. Who did this? This is I I but, you know, when when Andrew said send me Elderly, I just obviously took offense to that, and I wanna show that I'm actually full Elderly by just I

Speaker 3:

was paraphrasing the Elderly. I thought experience experience was the word I was looking for. I just

Speaker 1:

No. I take it as praise.

Speaker 3:

Rambling. I take

Speaker 1:

it as praise.

Speaker 3:

Look. I exactly. It goes on and on and on, and

Speaker 1:

there was a horse's ass there. Yeah. Give give me 2 bees for some horse, you'd say. Well, and this this is fantastic. Great work.

Speaker 1:

That was really, exciting stuff. And I think also it's it just feels like there's a lot that a lot that is practical that you can take from here into whatever software you might be writing. This is all open source. I dropped a link to your PR, Andrew. Hopefully, that's the right thing to drop.

Speaker 1:

But, this is all part of Omicron, which is our control plane. So if you wanna go explicitly check it out, definitely go for it, and, look for the intro view. I think you've even integrated the debugger now to avenue. I I know it's,

Speaker 3:

that It's

Speaker 1:

all the way anyway.

Speaker 3:

That PR is the debugger, I think.

Speaker 1:

Oh, nice. Yeah. There we go.

Speaker 3:

Yeah. That's what I that's what I opened up today. And it was just because I wanted to add the single stepping and, like, fast forward functionality, essentially, before I opened it up.

Speaker 1:

Well, a lot of great optical lessons. And then there was one other question in the chat. Like, what you call this model? I mean, to me, it's kind of an actor model. I mean, I'm I'm not sure

Speaker 3:

I've Yeah. I'd say, like, just is there Like, event driven state machines. Right? Like Yeah. Right.

Speaker 3:

So there's, like, there's another model. I guess it's a it was popularized, I don't know, maybe 10 years ago or something. And it's not always useful for a variety of reasons, but, like, you have this queue, like CQRS. Right? Command, query, replay.

Speaker 3:

I forget what it stands for. Essentially, you have this queue of objects and that allows you to do the same thing. Or this this queue of, like, HTTP request or whatever, you know, and and you can replay them through your downstream system. And so, like, people were building similar systems like this so it's, like, they could debug their business logic, essentially, by recording all the information that came in from the outside world through this queue. I mean, the key problem is, like, now you're recording an indefinite amount of information, and you can't take, like, a global snapshot of all your microservices, like, consistently.

Speaker 3:

So, like, it doesn't it doesn't really work for very large systems. You can think about modeling many systems like this. And yeah. I mean, that's that's one of the takeaways. Another one is just, like, if you structure things correctly, like, this is it's a few days of work to to really build a debugger, especially in a language where, you know, crates are, like, tools, like, libraries or open source that you can just build upon.

Speaker 3:

Like, I didn't build my own, like, read line library, and I didn't have to wrap anything in our l rep. And, yeah, I didn't build my own command line parser. Like, it all those libraries exist in Rust, and, it's the ecosystem as much as anything that makes stuff like this possible to do quickly.

Speaker 1:

Awesome. Well, great work, and, should you have an oxide rack, this will be the first thing you see is is, Andrew's software. You'll see a really amazing terminal experience on the headset. Had this had, much of humanity's scarce resources poured into it because it is it's gorgeous. So great stuff, Andrew.

Speaker 1:

Thanks for joining us, and, I I'm just gonna tease next week a little bit, Adam, if you don't mind.

Speaker 2:

Sure.

Speaker 1:

We so next week, we are gonna be talking about the cable backplane, and in particular, a bit of a mishap of the cable backplane, and how we debugged it. We're gonna have, some terrific guests. We're gonna have a mechanical engineer from our, our manufacturing partner, and we are gonna have, an engineer from Semtech, which is our cabling partner. So we've got, 2 terrific engineers along with, with Robert Keith here at Oxide, and that one is gonna be a really fun discussion. So join us next week for our special guests.

Speaker 2:

And and is that on Monday, Brian?

Speaker 1:

That'd be on Monday. Yep. Great. Excellent. Alright, Andrew.

Speaker 1:

Thanks again. Yeah.

Speaker 3:

Thank you.

Speaker 1:

Take care, everyone. Bye.