Oxide and Friends

Oxide and Friends Twitter Space: May 10, 2021

A Requiem for SPARC with Tom Lyon
We’ve been holding a Twitter Space weekly on Mondays at 5p for about an hour. In addition to [@bcantrill](https://twitter.com/bcantrill) and [@ahl](https://twitter.com/ahl), speakers included special guest Tom Lyon plus Joshua Clulow, Dan McDonald, Dan Cross, Tom Killalea, Theo Schlossnagle, Antranig Vartanian, and [@perlhack](https://twitter.com/perlhack).
We recorded the space; the recording is here.
Some of the topics we hit on, in the order that we hit them:
  • [@2:06](https://youtu.be/79NNXn5Kr90?t=126) SPARC 30th anniversary dinner > SPARC was an amazing achievement for its time, > but there were some nasty trade-offs made.
  • [@2:56](https://youtu.be/79NNXn5Kr90?t=176) illumos announcement on the end of SPARC support
  • [@4:37](https://youtu.be/79NNXn5Kr90?t=277) “There is no photography allowed in the bring-up lab” story
  • [@6:23](https://youtu.be/79NNXn5Kr90?t=383) UltraSPARC-II E-cache parity error
  • [@8:51](https://youtu.be/79NNXn5Kr90?t=531) Register windows > Most people don’t know, about that first SPARC, > there was no integer multiply or divide..
    > It would trap on the instructions.
  • I feel so decadent, I’ve just been sprinkling multiplications around my code for years.
  • [@9:55](https://youtu.be/79NNXn5Kr90?t=595) popc instruction (also called Hamming Weight)
    • IBM Stretch 1961, and the one-of-a-kind IBM Harvest made for the NSA
    • Henry Warren’s 2002 Hacker’s Delight Ch. 5 shows a ~20 instruction algorithm (no branches, only adds/shifts/masks by constants) > Warren: According to computer folklore, the population count function is important to the > National Security Agency. No one (outside of NSA) seems to know just what they use it for, > but it may be in cryptography work or in searching huge amounts of material.
    • According to Agner Fog, Ice Lake performs popcnt with a 3 cycle latency, and Zen 3 with just 1 cycle latency.
    • Phil Bagwell’s 2001 Ideal Hash Trees depend on pop count > Bagwell: Note that the performance of the algorithm is seriously impacted > by the poor execution speed of the POPCT emulation in Java, a problem > the Java designers may wish to address. 
      • Persistent versions of Bagwell’s trees are used for the built-in hash maps of Clojure, and in libraries for Scala etc.
  • [@11:39](https://youtu.be/79NNXn5Kr90?t=699) This was the debate between Roger Faulkner and Jeff Bonwick: register windows
  • [@12:35](https://youtu.be/79NNXn5Kr90?t=755) Register fishing: Bryan’s version and Adam’s version > When you want to know the state of some other process, you have to flush > those register windows to memory to be able to recover the stack trace.
    • [@14:30](https://youtu.be/79NNXn5Kr90?t=870) Delay slot > We sat around the lunch table talking about how crazy it would > be to have a branch that executed right after a branch.
    • DCTI couple (delayed control transfer instruction)
    • [@15:31](https://youtu.be/79NNXn5Kr90?t=931) “Well, the instruction set doesn’t allow that..” story > Bedlam. As far as Solaris kernel discussions go, bedlam.
    • Leibniz vs. Newton
  • [@20:14](https://youtu.be/79NNXn5Kr90?t=1214) Annulled branches
  • [@22:17](https://youtu.be/79NNXn5Kr90?t=1337) Praise for SPARC
    • SPARC address space identifiers > When we were porting Solaris to x86, and deciding what fraction of the > address space would belong to the kernel vs the user, it felt disgusting to me.
  • [@25:26](https://youtu.be/79NNXn5Kr90?t=1526) Software-filled TLB > They just didn’t have the room to cram a hardware page table walk into the chip.
    • MIPS would give you a trap on a VAC conflict (virtual address cache)
  • [@27:34](https://youtu.be/79NNXn5Kr90?t=1654) It was slow, it was late, and it had a lot of problems, it was wrong.
    • UltraSPARC-III, code-named “Cheetah” > It’s weird, I compile this thing over and over, and every 80th time when > I compile and run it, it’s 40x slower..
    • UltraSPARC-IV+, code-named “Panther”
  • [@32:17](https://youtu.be/79NNXn5Kr90?t=1937) Does the Viking I-cache bug ring a bell?
    • SuperSPARC, code-named “Viking” > You’d have to DC balance the I-cache. If you had too many zeros, > they’d start flipping to ones.
    • E-cache parity error > It was due to everything but high energy particle strikes.
    • Radioactive boron in our SRAM manufacturing process
  • [@38:52](https://youtu.be/79NNXn5Kr90?t=2332) “Move it further from the tube” story > When you’re going to have a customer do something, you have to remember there’s > a human being on the other end of that. You cannot have them chasing your theories. > You need to be transparent and honest with them.
  • [@42:25](https://youtu.be/79NNXn5Kr90?t=2545) Micron DRAM story
  • [@44:38](https://youtu.be/79NNXn5Kr90?t=2678) High priced consultants and cosmic rays > They literally lined the roof with lead.. and it didn’t change the error rate at all!
  • [@46:47](https://youtu.be/79NNXn5Kr90?t=2807) And the SRAM manufacturer was..
  • [@48:11](https://youtu.be/79NNXn5Kr90?t=2891) Aftermarket
  • [@51:34](https://youtu.be/79NNXn5Kr90?t=3094) What’s that tapping sound? > Seeing how that particular sausage was made, very very slowly, was discouraging.
    • Regatta On a Chip
    • UltraSPARC-T1, code-named “Niagara” 
      • [@57:15](https://youtu.be/79NNXn5Kr90?t=3435) The only thing we could get to run fast was benchmarks..
    • “Balanced” computing
  • [@59:18](https://youtu.be/79NNXn5Kr90?t=3558) Sun Fire V880, code-named “Daktari”
  • [@1:04:14](https://youtu.be/79NNXn5Kr90?t=3854) RISC
  • [@1:06:04](https://youtu.be/79NNXn5Kr90?t=3964) Intel 432 “The only constants you need are 0 and 1”
  • [@1:09:12](https://youtu.be/79NNXn5Kr90?t=4152) Machine learning in the 80s?
  • [@1:11:37](https://youtu.be/79NNXn5Kr90?t=4297) The best historical analog for Oxide? > I loved that it was deliberate hardware-software co-design.
    • IBM AS/400 > Bill is amazing. He’s clearly the smartest person I’ve ever known. > But you never know what time scale he’s operating in, whether he’s telling you > to do something for tomorrow or for the next century.
    • Optative voice
  • [@1:14:42](https://youtu.be/79NNXn5Kr90?t=4482) How early in Sun’s history were people talking about doing their own CPU?
  • [@1:17:11](https://youtu.be/79NNXn5Kr90?t=4631) Finding SPARC bugs > I had a little Sun 4c that I had cranked up to 26k hertz, > and at 26k hertz it stopped at the banner message. > And I came in the next morning and it was at the login prompt! > This little poor machine had managed to boot! > I hit enter and it immediately panicked.
    • Processor state register (PSR), processor interrupt level (PIL)
  • [@1:20:35](https://youtu.be/79NNXn5Kr90?t=4835) OpenBoot, Forth
  • [@1:21:54](https://youtu.be/79NNXn5Kr90?t=4914) Long lived E10K, code-named “Starfire”
  • [@1:24:01](https://youtu.be/79NNXn5Kr90?t=5041) Invasive physical attacks
  • Tom Lyon and Joseph Skudlarek’s USENIX 1985 paper: All The Chips That Fit
  • Sun keyboard photo
  • [@1:29:56](https://youtu.be/79NNXn5Kr90?t=5396) The real secret of Sun’s success is that we built them to make ourselves happy. > Open source software in general, you develop it for yourself. > That way there’s at least one person who likes it.
(Did we miss anything? PRs always welcome!)
Our next Twitter Space will be on May 17th, 2021 at 5p Pacific! Join us; we always love to hear from new speakers!

Creators & Guests

Host
Adam Leventhal
Host
Bryan Cantrill

What is Oxide and Friends?

Oxide hosts a weekly Discord show where we discuss a wide range of topics: computer history, startups, Oxide hardware bringup, and other topics du jour. These are the recordings in podcast form.
Join us live (usually Mondays at 5pm PT) https://discord.gg/gcQxNHAKCB
Subscribe to our calendar: https://sesh.fyi/api/calendar/v2/iMdFbuFRupMwuTiwvXswNU.ics

Speaker 1:

Adam, can you hear me? I can

Speaker 2:

hear you.

Speaker 1:

I can hear you. Oh. What happened? Did it die? I I I it it died instantaneously.

Speaker 1:

I tried to, yeah. It died instantaneously. Sorry. See, I I see Tom is here, which is great. And let me just I'm I'm gonna just tweet out the link, Adam.

Speaker 3:

Yeah.

Speaker 1:

And then, folks, we're gonna

Speaker 3:

get going here

Speaker 1:

in just just a second or 2. But you you need, like, some intro music or something. I don't know. I I I I I've got, like, I've got a lot of requests for for for,

Speaker 2:

more production values?

Speaker 1:

Just a little bit.

Speaker 4:

Hey there.

Speaker 2:

Should we have a sound board for a few of these things? Hey, Tom. Good to meet you, and I'm delighted that you installed the mobile app just for this occasion. I'm very honored.

Speaker 3:

Who who is this speaking?

Speaker 2:

This is this is Adam.

Speaker 3:

Adam, nice to meet you too.

Speaker 1:

Well, the thing that's great is that, like, Tom is a Twitter employee. And it it I I and it, we were able to do something that his own w two could not achieve. Get him to install the Twitter app, which I thought was pretty cool.

Speaker 3:

I have a strong aversion to apps in general.

Speaker 1:

Yeah. I can't blame you, and thank you for, for highlighting the right folks. And I saw Tom Kialeah joined. Tom has also been pinged a bunch of the Twitter Spaces folks, which is great. I had a really good conversation with them actually last week, and they've got which is was fun.

Speaker 1:

So, fortunately, where they wanna take things is exactly where we want them to go to facilitate conversations. So it's really fun. So I don't know. We are we recording? We are we, As

Speaker 2:

far as I know, I'm rec we're recording. I I see the thing around on my computer, so it seems to be recording. And I I feel like there are a couple of, like, fish fans holding up their their, like, tape recorders up in the audience according to some Twitter mentions I've gotten. So I'm delighted to hear that if my recording fails, someone else might have my pen.

Speaker 1:

Excellent. Okay. That's great. So, so this is not gonna be and, Tom, I know you went to a, like, anniversary of Spark. Like, what was that?

Speaker 1:

The 30th anniversary

Speaker 3:

of Spark? Uh-huh. So That was, like, 4 years ago.

Speaker 1:

How, how how uniformly positive was that? Because I don't feel this is gonna be uniformly positive. It's

Speaker 2:

just Okay.

Speaker 3:

I Well, that

Speaker 1:

Go ahead.

Speaker 3:

That was uniformly positive except except for, you know, people who who don't like Oracle. Right.

Speaker 1:

Yeah. Yeah. Yeah. Of course. Of course.

Speaker 1:

That yeah. That's that's just Red Meat. Yeah. Of course. But the but I feel like I I don't know.

Speaker 1:

I kinda have a complicated relationship with Spark. I don't know. I I don't know, Adam, what you what you kind of felt Tom, what you felt about it. But there are things I really love about.

Speaker 3:

It was an amazing it was an amazing achievement for its time, but there were some nasty trade trademarks made, trade offs made.

Speaker 1:

So

Speaker 2:

And and, Brian, before we get to this, it's it's probably worth, you know, as as we get to 5 o'clock, just mentioning how we got here, and and the inspiration for this spark themed,

Speaker 1:

spark themed yeah. So we got here because, the we, in Olymos with which, has inherited the Solaris and system 34 heritage, before it, Olumos is turning off the spark support. We are desupporting spark, which I thought was a kind of a nonevent, but, it ended up being like a top hacker news story. So it's like okay. And I can and I can see why.

Speaker 1:

I mean, I can understand why. I because it it's more for what it represents, of course. But we, you know, it was it it was a struggle to, to maintain Spark support for for for the the the developers, and no one really had a Sparkbox. And I guess they've become very expensive on eBay because they are and, I mean, rightfully so. They're, they're they're good boxes.

Speaker 1:

I mean, from a an SS 2 is still the most rugged thing that humanity has ever Spartation 2, that's who, and, Tom, were you still at Sun with the SS 2 was developed? Is that

Speaker 3:

I don't even remember. I was definitely there. Probably, SS 1 was, like, 89. So I left in 94. So there was plenty of time for more stuff.

Speaker 1:

Oh, okay. Yeah. Definitely. Alright. So the and so Adam's asked about the origin so that we thought, you know, this would be an appropriate requiem for for Spark, because the I don't know that NetBSD will still support it and so on, but, it's it's an opportunity to reflect back on our our wounds.

Speaker 1:

Tom, I have to ask you about a story that Steve Chesson told me that I would love to know whether it's true or not. I'm not sure if you're gonna you're gonna be able to speak to the for SEO or one of it or another. There there is no photography allowed in the bring up lab. Back in back in the day, you go to the in the SMCC lab, and there's a big, like, no flash photography. And the way Steve Chesson told it to me, they all gathered for a group photo around campus, which was the first s s one, I believe.

Speaker 3:

Right. Right.

Speaker 1:

Someone took a group photo, and the the actual packaging was not on the CPU. And the it began to after the group photo, flash photo, began to glow white as all of the electrons had been excited. All the electrons absorbed their new energy, turned into a giant conductor, and bricked the first CPU is what Steve Chastain told me.

Speaker 3:

I I have not heard that story. I thought it was just gonna be for spy potential because, you know, people would would always put these giant circuit diagrams on the walls Right. So it'd be easy to take pictures.

Speaker 1:

Right. Well, that's why I remember asking Steve about it because it felt a bit out of Sun's character. Sun did not seem to care that much about industrial espionage, espionage. That seemed to be that seemed to be pretty low on the concerns.

Speaker 3:

Until until somebody stole all the source

Speaker 1:

code. Somebody stole all the source code. There yes. We we could do the, there were a couple of acts of industrial espionage that were, I was, so, Tom, do you wanna maybe talk about your some of your first exposure to Spark? I definitely wanna like, first of all, this this this wake is not gonna be complete without talking about some of the real causes of the ecash parity error.

Speaker 1:

So I just want to, to brace any and all attendees that if you were a Sun customer of a certain vintage, your eye begins to twitch when the ecash parity is mentioned. So we'll we'll be getting on on the spark bugs too, but maybe, Tom, you can kick us off with some of your first exposure. For those you don't know, Tom is employee number 8 at Sun. Right, Tom?

Speaker 3:

Right. And you Yeah. And so Yeah.

Speaker 1:

Go ahead.

Speaker 3:

Wow. So so my direct Spark involvement was really, at the very beginning, making sure that IO architecture was possible, you know, IO interrupts, that kind of stuff. And then, I realized I didn't have much to do with it after that, other than it being one of the one of the many processors in the fleet, because we had all the 68,000 stuff. We had 3 86 for a while. You know, so processor architectures were always coming and going from my point of view.

Speaker 1:

Interesting. Spark was just kinda one of the masks for you.

Speaker 3:

Yeah. But but the amazing thing about Spark is the the first Spark was not the Spark station 1. Right? That that was a fairly advanced system, but there it was a sun 4 board that was just like the sun threes. And the pro the processor chip for that was really just gated gate arrays.

Speaker 3:

So it was really amazing that they could get the clock speed up, and the rest of the processor was so freaking simple that it it hurt the architecture for a long time.

Speaker 1:

And how Spark was was was this one of the the brief moment in which Spark had a faster clock than the competition? A time that Adam and I don't really recall that quickly.

Speaker 3:

Yeah. I don't recall any of the numbers either or but it it was a heck of a lot better than what we could get out of Motorola.

Speaker 1:

Interesting. So I so my first Spark, I think, was, like, what it was 25 megahertz coming out. I think I had my 16 megahertz 3 86 s x, and it was 25 going to 40 on s s one, if that makes sense. And that's as an undergraduate. And but I to me, what was compelling about Spark well, I I did your favorite features about Spark, Tom, in those early days from a software perspective?

Speaker 3:

Oh, favorite? Oh, god. No. It was. I never did like the register windows.

Speaker 1:

Okay. Here we go. We got right there. That's good. That's good.

Speaker 1:

We went right there. Okay. Cool. So you you do not like register windows. Tell me why.

Speaker 3:

Oh, it's just too hard to to deal with the interrupts and the spilling and predicting performance and blah blah blah. But but the mind boggling thing about that, most people don't know about the first part, is that there was no integer multiplier divide.

Speaker 1:

There was Wait. Really? Really?

Speaker 3:

Really?

Speaker 5:

So that's so did it

Speaker 2:

did it trap on this?

Speaker 1:

Yeah. Go on. How did that work?

Speaker 3:

Yeah. It it would trap on the instructions, and and it turns out most most multiplies and divides and source code are by some constant, and so the compiler could could optimize those pretty nicely.

Speaker 2:

I feel so decadent. I've just been sprinkling multiplications around my code for years. Just

Speaker 1:

Right. Wetting wetting some trap handler cleaned them up. Right.

Speaker 3:

So so so that's that's an example of how far they had to go to reuse this thing down.

Speaker 2:

Well and Tom, what was it so was it true at that time then PoPC, the population count instruction was in silicon, but multiple integer multiplication was not?

Speaker 3:

I I'd be surprised if Popsy

Speaker 1:

would Popsy was definitely not silicon. Yeah. I Yeah.

Speaker 2:

Wait. Not not not ever? It was it was

Speaker 3:

Oh, later. Yeah.

Speaker 1:

I didn't think Popsy c was ever I wasn't, like, the running joke that that's all the NSA wanted. And I

Speaker 6:

thought Pop c wasn't implemented in Yeah.

Speaker 1:

I got it. It was an instruction?

Speaker 2:

Yeah. So Pop

Speaker 1:

c is an instruction to Spark.

Speaker 2:

1,000, 2,001, it was a trap. And I was told, you know, as neat as it sounded to to not use that as an instruction because it wasn't gonna help anything go faster. But I assumed at some point it had been selected, but maybe Popsy had had just always been, you know, all always trapped into the kernel.

Speaker 1:

That was the NSA instruction, was my understanding. Yeah.

Speaker 3:

Oh, Tom,

Speaker 1:

is that your Yeah. Yeah.

Speaker 3:

Uh-huh. Actually, I I I've been reading about the the stretch machine and harvest at the NSA. It it it was all the same pop count stuff from the early fifties.

Speaker 1:

I never understood how you could use Popsy to, to perform evil or to intervene in foreign civil wars, but apparently, you can't.

Speaker 7:

I don't know. I I I I that

Speaker 1:

that never really made sense to me why Popsy

Speaker 2:

That's why.

Speaker 1:

Has these perdastly deeds.

Speaker 3:

That's why you're not a spook.

Speaker 1:

That's true. This is why I I couldn't hack it because I don't Do you want

Speaker 6:

me to do you want me to see if I can find out from Belvin Blaze or somebody else like that? Sure.

Speaker 1:

Yeah. Yeah. Yeah. I'd be I'd be curious to know what the why c is, is so nefarious. But so, yeah, Adam, I don't think he's ever in when he's talking, but I may have been wrong.

Speaker 1:

So the alright. So, Tom, you did not like register windows because you had to go deal with them. Because dealing with the the and this was the debate between Roger Faulkner and Jeff Bonwick at Sun about register windows. And, Tom, you should know that you were on the the now the late Roger Faulkner. You're Roger and you would have seen eye to eye because Roger's view on red shirt windows were that they were horrific because he was implementing slash proc and having to deal with actually debugging these things and dealing with spill traps and all of the the the under the covers machinations required to get register Windows to work.

Speaker 1:

But Jeff's perspective was, these are great. Like, I need some registers. I just, like, save, and then I restore later. And, like Yeah. Yeah.

Speaker 1:

It's like, I get it. It's a problem for someone else. But you know what? That's your problem. That's not my problem.

Speaker 3:

Yeah. Yeah. If if you're trying to look into some other other process and look at the stack, it's like, well, good luck with that.

Speaker 1:

Well, yeah. The the yeah. Sorry. I'm glad I

Speaker 2:

Or or trying to diagnose performance pathologies as you're fluctuating between, you know, spill and fill traps.

Speaker 1:

Right. Yes. Adam, I mean, we get we gotta talk about DTrace fish, about the r as I mean

Speaker 2:

Yep. I mean, obviously, that's that's that's the whole point.

Speaker 1:

That's that's right. That's right. It's alright. Go ahead, please.

Speaker 2:

That's right. Well, I mean, so so register Windows just for the uninitiated. You know, Spark has bunch of registers way more than 32 bit x36. And, and then more that you could even see. So you would you would save to rotate through the windows and restore to rotate back so that every time you get a new function call, you'd get a whole new collection of of registers and the old, the output registers from the function that was calling you become the input registers for you.

Speaker 2:

So it it was sort of neat. And I don't know if you, Brian, this is your feeling as well, but, like, just it just is great. And then as you dug into it, it had all these pathologies. Like, you had a fixed number of registers, and if your function needed more or needed less, you can you can adjust that. And you're using resources in running, you have to flush those register windows to the stack to to memory to be able to recover the stack stack trace.

Speaker 2:

And and so in DTrace, we often wanted to know the value of registers buried in the stack trace. Now so, Brian, what for for DTraceFISH, what problem were you trying to solve? Because I think it was slightly different than than what I was trying to solve.

Speaker 1:

I well, we I was trying to do the same kind of thing. You know, like, I wanted to to without causing a spill trap, I wanted to actually go grab a I think I just wanna go grab the stack pointers that are actually causing a a spill, if I recall correctly.

Speaker 2:

That's right. And you're yeah. That's right. You were doing it, I think, for kernel registers.

Speaker 1:

That's right.

Speaker 2:

Maybe and I would and I was doing it for user registers. So, one of the other kinda esoteric characteristics or or at least now esoteric characteristics of the Spark instruction set was the now esoteric characteristics of the Spark instruction set was the delay slot. So you would take a branch, and then the instruction after the branch would also execute. So we had sat around at in the kernel, kernel development, and we sat around the lunch table talking about how crazy it would be to have a branch that executed right after a branch. Am I misremembering this?

Speaker 2:

Like, I remember talking about this all the time and how it would be a nifty Well,

Speaker 1:

the yeah. Yeah. In particular, if you have a branch in the slot, it is so you've got the instruction the the delay slot follows a branch. That is always executed unless it's a branch always a null, in which case it's not executed. The and the if the instruction in the slot is itself a branch, then you that's called a DCTI couple.

Speaker 1:

And you execute the target of the initial branch, and then you execute the target of the second branch. And what

Speaker 2:

And and I just read I was reading some papers this weekend on Spark in this, in this you know, light some candles and really enjoy it. But, that apparently conditional, you couldn't put a, a branch in the delay slot of a conditional branch.

Speaker 1:

That's right.

Speaker 2:

That was illegal by the okay. I I you you knew that. I didn't. But, not that I ever needed such a thing, but, but it was a it could have a interesting quirk.

Speaker 1:

So do you remember when the when the author of Kernant came out to give a kernel technical discussion?

Speaker 2:

Yes.

Speaker 1:

And he had done he had done some things that are, like, you know, fine for an academic project, but were never gonna work in a production setting. And he had done some, instruction instrumentation that wouldn't always work. And we were actually wondering whether he was gonna work at Sun. And one of the junior engineers, I mean, very junior, was, like, 22 at the time, raises his hand and asks him about, what what happens if there's a branch in the slot. And the way he asked it was almost overly deferential because it was such a young young engineer.

Speaker 1:

And though this guy answered, he said, well, the instruction set doesn't allow that. And you're like, it's a room. It was like uproar. It it was like it was like a murder verdict that just been announced, like the I mean, it was just like it's like everybody's talking at once, like, 5 hands shoot up. Some people start talking.

Speaker 1:

I mean, it was like it was mayhem. I mean, Adam, do you mind am I

Speaker 3:

am I

Speaker 1:

am I I'm exaggerating maybe a little bit.

Speaker 2:

Bedlam. As I mean, as far as, like, Solaris kernel technical discussions go, bedlam.

Speaker 1:

Bedlam. Absolute bedlam. Chaos. It is we need to take a break. We need to like, we then cool.

Speaker 1:

Everyone needs a clock. But I actually remember at the time thinking that it was not that he was so arrogant in telling this young engineer that this thing that was very much a thing, a DCDI couple, did not exist. Which is, like, man, you're at sun. Like, this is a like, just, like, read the room a little bit. And I remember thinking, like, I don't think this guy should work here.

Speaker 1:

Just the way he treated that that question was really, really bad. But, yes. So these are the type of couples very much exist exist, and we definitely were looking for a way to use instruction picking. What That's

Speaker 2:

right. So on on on the back of, like, those lunchtime disruptions, instructions, and, and then us both wanting to go, you know, grovel through these register windows to pluck out particular instruction values or pardon me, register values. Independently, we we both discovered this or invented or or wrote this mechanism to rotate through the register windows by hand and then use instruction picking to pluck out the specific register that we needed. Because, after trapping into the kernel and executing in

Speaker 3:

It it

Speaker 1:

the We're both now. We we we had both developed this in parallel. And each of each of us were, like, we're very proud of ourselves. And then we compared notes, and each was convinced that the other had cribbed their work.

Speaker 2:

That's right. I mean I mean, just throwing our shoulders out, patting ourselves

Speaker 1:

like that. Right. We're so proud independently. It was both kinda sad, basically. And the architect the ice is dead.

Speaker 4:

You both invented the calculus?

Speaker 1:

That's exactly right. Yes. Okay. I yes. Yes.

Speaker 1:

Liebman's and Newton is a much better analogy because that way, that guy that must have to wear a cell here.

Speaker 2:

I I I went back to the code, and yours is slightly better documented than

Speaker 1:

But your

Speaker 2:

Yours, like, gives a nod to the fact that there's something tricky going on and, like, there's a one line comment, explaining this very subtle mechanism. And mine is just like, well, obviously. He is like, you know, you know, 75 instructions Right. Including, like, a pile of unreachable instructions and a a jump that has a branch in its delay slot, should be obvious to the, to the uninitiated Well,

Speaker 1:

you're very kind, but, you're I think we also agreed that it perhaps this is just an elaborate setup to get me to confess this because I will I will confess this on the record that yours was first. So they Oh. The and this whole thing 15 years late, 20 years later.

Speaker 2:

When I went when I went to the Twitter spaces team and I told them that they needed to build this product Right. Because I just that's that's all this is The

Speaker 1:

the the very the very long game. Josh, you're either being arrested or you're in the street. I'm not sure what's going on.

Speaker 4:

I'm terribly sorry.

Speaker 1:

Yeah. No worries. The the these, the the register window police are are coming after Clue. Yep. You

Speaker 6:

sure it's just in the register spill clean

Speaker 1:

Oh, god.

Speaker 3:

So an an old

Speaker 4:

Yes. An old branch then, did it just stole for the delay slot?

Speaker 1:

No. So in a in a nulled branch, takes the slot and but so, normally, the, the slot would be executed regardless of whether the branch was taken or not.

Speaker 4:

Right. But in the presumably, that was some sort of artifact of the pipeline.

Speaker 1:

Oh, absolutely an artifact of the pipeline. Yeah.

Speaker 4:

So in the in the in the old case, was that just twice as expensive? Like, if it just twiddle for for the slot?

Speaker 1:

That is a good question about it. Dom, do you have any idea that I do not know the history of a null I mean, that whole mechanism. The delay slot is wonky. The the fact that, like, they can be annulled is a real confession that, like, okay. This is a lot of disasters.

Speaker 1:

This is actually

Speaker 4:

it's actually impossible to program.

Speaker 1:

Actually, this is a mess. Sorry. Or you just end up with, like, knobs in the slot. Right? Which is like that was like the classic, like, GCC unoptimized code was knobs in the slot.

Speaker 1:

Yeah.

Speaker 3:

I I I I think this enrollment stuff is pretty newfangled. I don't think I'm gonna any of that.

Speaker 4:

Just the the comma a thing, right, on the in the assembly text or whatever.

Speaker 8:

That's right. That's right. Yes.

Speaker 3:

That's right.

Speaker 1:

And then it had the the sense of being annulled. First, like, what does it even mean? Like, this is, like, our Catholic union never existed between the the the source and target instruction. I mean, I think it's a super weird word to use, and then it means the opposite in the branch always case.

Speaker 2:

That's right. That that that they that it inverts the meaning in the case where it's most common. So you have the most

Speaker 4:

So b a comma a

Speaker 1:

dot Never executes what's inside. Yep. That's right. I think I'm getting that right. I mean, I'll walk the back to I've obviously not written in an old slot in a long old branch a long time.

Speaker 2:

I just opened up my my trusty Spark architecture manual, which on the page, where it describes unknown branches has a, like, a flight coupon for, like, a United flight to Chicago from, like, 22,002. So you knew it was

Speaker 1:

an This time capsule just fell out of my Spark my my Spark, Daniel. Alright. So so one of the things to to to praise Spark a bit.

Speaker 2:

One of

Speaker 1:

the things that I so in my first job doing kernel development on x eighty 6, and I had this big this is back in the day when you get all of Pentium described in a single thick manual. And this thing is, like, almost phone book thickness. I love that manual. I still have it. But it was probably printed out.

Speaker 1:

And I wait. There were so many times that summer, I was in the guts of x86 and the task state segment this other segmented memory and all this other bullshit. And I remember looking up at, like, the super thin volume that is that was Spark v 8 at the time. Being like, how do they get the the higher instruction set described in so few words? And it's like, it could've been even fewer words because of the adult branch nonsense and all that stuff.

Speaker 1:

It could've been even tighter. But it so I I definitely, like, lionized Spark for for its elegance for sure before I had implant in this garbage.

Speaker 2:

I gotta say, I I I don't know if this is, if this is a minority opinion, but the alternate address

Speaker 1:

I love that.

Speaker 2:

I I always found very elegant I mean, and this is this is such an anachronism. But, you know, in the early when we were kind of hyping up the the Solaris port to x86 and then AMD 64 and sort of deciding what fraction of the address space would belong to the tower on on, like, a on a split address, Spark Ivory tower on on, like, a on a split address space. But the fact that you had the full whether it's 32 bits or 64 bits with a with an asterisk on that, from the and then and then be able to like, have the full address space for user land as well? Like, oh, that seemed really cool.

Speaker 1:

It is very cool. So the address space identifiers allowed you to do a load from a different address space that was your secondary address space. So those who've done kernel implementation, you can imagine that you all you do is annotate a load when you wanna load from user land. It's not and then you're in a totally disjoint address space.

Speaker 6:

It There's something Sorry.

Speaker 1:

Go ahead. Go ahead. Keep going. Right.

Speaker 6:

No. No. No. No. No.

Speaker 6:

You bring up ASIs and you're missing one of the cool parts about ASIs and Spark starting with v 9 and beta. But sorry. I didn't

Speaker 1:

mean to interrupt. At all. Because I actually I know exactly what you're gonna say, so you should say it. I agree with it. I think I know what you're what you're gonna say.

Speaker 6:

Yeah. Mhmm. Starting in v 9, somebody in Spark design land had the bright idea because, oh my god, we have PCI. We have all these other Intelie related things and, oh, shit. They're all little Indian.

Speaker 6:

Oh, we also have algorithms that have implicit little Indian bias. I'm looking at you MD 5. So you could use an alternate space identifier that would go to the little endian space of the same address space and boom, auto swap.

Speaker 8:

It was very

Speaker 1:

it was very nice in hardware. It would the the ASIs were really very elegant. And that you and, Adam, you realize that was very prophetic because we the we ultimately, we needed to implement kernel page table isolation on x86. That's meltdown. Right?

Speaker 1:

The the the fact that they shared the same address space was actually hugely problematic. So, no, ASIs were really nice. They they were really nice. Yeah. I Tom, I've got a question for you on because I saw you were mentioning this.

Speaker 1:

The software the software filled TLB, did that date back to the earliest Spark?

Speaker 3:

Oh, yeah. Yeah. There was no room for any expert TLB logic. So you got a trap and you had to figure it all out.

Speaker 1:

Interesting. So that was just an area of concern primarily

Speaker 3:

in terms of A what?

Speaker 1:

In terms of chip area, in terms of just, like, the they they should not have the room to cram our repair and Shabalock into it.

Speaker 3:

Right. And then, the other really annoying feature, not not so much the processor, but the whole earliest Sun four architecture was the physically tagged caches and, TLBs. And there there's basically no no chance of any, coherency between IO and processor or even or even a fancy processor coherency.

Speaker 1:

Oh, interesting. So this is the Berkeley index physically tagged caches?

Speaker 3:

Yeah. Something like I I forget the details. But it was just flush flush flush every time you wanted to do IO.

Speaker 1:

Oh, interesting. Well, certainly, the, the virtual index physically tagged cash on UltraSpark 12, you could have what was you could have the same, same line be in different colors of the cache. You have what's called a VAC conflict, and that was a very, very bad state to get in. I don't know if you ever had to deal with any of that shit, but it was very bad. Yeah.

Speaker 2:

Yeah. I remember those performance pathologies being, like, inscrutable.

Speaker 1:

And, you know, I always thought that NIPS did a really nice job on this. If NIPS because I'm like, why am I the software? Like, can someone help me detect this? NIPS gives you a would give you NIPS. You talk about rest in peace, would give you a trap on a back conflict, which I always thought was really nice.

Speaker 1:

And then you could clean up one half of it. Alright. So the so the software filled TLB went back to the earliest days, because I actually use that quite a bit. We would, what is I I developed something so in in Spark so you you were there for the early days of Spark, Tom. You you Adam and I were there for more like the later days of Spark.

Speaker 1:

And before it got, like, good again, I feel like there was, like, that the, like, the last, like you know, the kind of the Niagara era where it was, like, m t 2 where it was arguably good again. Adam and I were just there for the worst of the days, I feel. And we had a so there was a a a variant, UltraSpark 3, very cruelly named Cheetah because it was not it it it My god. It was

Speaker 3:

It was not fast.

Speaker 1:

It was not fast. It was slow. It was late, and it had a lot of problems. It was wrong. This part of the problem with microprocessors is these things tend to come in clumps.

Speaker 1:

Like, a microprocessor that's late is almost, by definition, especially that era, slow. And then it's almost certainly late because it's like a mess. Big pipeline. And in particular, their TLB so Ultra Struct 1 and 2 had a 64 entry voice ad associated TLB and for all page sizes. And we would lock down, like, I don't know, 3 or 4 pages of that.

Speaker 1:

They, in, we had done a much work in the operating system for large pages, especially for database workloads. And in in UltraSpark 3, they decided that well, no one's using large pages. We actually were definitely using large pages. So we are going to have 2 TLBs, a large page TLB and a small page TLB. And the 8 k TLB, per 8 k pages, is gonna be 512 entries.

Speaker 1:

It's like, okay. That's good. 2 wayside associative. And you're like, oh, 2 really? Can I have more ways, please?

Speaker 1:

Yeah. Because two way set associativity is like, that is and we definitely did have problems where the I cache was two way set. And so if you were going, like, jump, jump, jump, and those 3 mapped the same two way set, you would the 3rd jump would kick out the first one. And so you go return. And so you'd have this benchmark.

Speaker 1:

You're like, you know what's really weird? Like, I compile this thing over and over and over again. And, like, every 80th time when I compile it and run it, it's 40 x slower. And you're like, yeah. Yeah.

Speaker 1:

Shit. But then the the, the large page TLB, they took from 64 entries down to 16 entries. And then we had to for other reasons of other things being broken, we had to lock, like, 5 of those entries down. So you end up with, like, you end up going to this, like, super tiny TLB. You end up basically, it was a huge step backwards in so many different ways.

Speaker 1:

It was it was not. It was bad.

Speaker 6:

They had fewer register windows too in Cheetah.

Speaker 1:

Oh, right. So you'd spill

Speaker 6:

more often.

Speaker 1:

That's right. You'd spill more often. Did you Adam, were you did you did you end up locked in any of those meetings? Was I the only one did I did I dive for the team and then what I think

Speaker 2:

it was just you.

Speaker 1:

I think I

Speaker 2:

I think I I was next. You you pulled me in on the on the next one, the Panther meetings.

Speaker 1:

Right? The Panther meetings, yes. And that was in 2,000 I wanna say in 2,004 maybe. That makes sense?

Speaker 2:

But but you were gonna talk about, Atris, I think.

Speaker 1:

I was talking Atris. Yeah. The so in the so in particular, they had built Cheetah, a microprocessor built. This is, like, in 2003. They had built it on traces from, Sun 4 m running Sybase.

Speaker 1:

That's how they had kinda concluded that no one uses large pages. So they were running software that was from 1994 at that point. And you're like, that was a decade ago. Like, we that was not a forward looking decision. That was not good.

Speaker 3:

Well, hey. They they had traces.

Speaker 1:

They did. Yeah. I guess that's way there you go, Tom. Glasses are half full, but you had traces. Right.

Speaker 1:

Right. Yeah. That's right. All the traces show the wrong things, but,

Speaker 3:

I gotta tell you, in in, in 94 is when I left Sun, and, the Spark roadmap contributed to my decision in a big way.

Speaker 1:

Oh, really? God. That is like Yeah. So I show up because

Speaker 3:

I You

Speaker 1:

know, it's like

Speaker 6:

did you leave over Viking?

Speaker 3:

No. It was it was kind of the whole the whole program. I went to this review and there were, like, 6 different chips in development and they were all stepping on each other, and none of them were meeting anywhere near a competitive goal for clock staging. That sounds It was it was very depressing.

Speaker 1:

In in hindsight, maybe I maybe some of what happened to me was noble in advance, but apparently was.

Speaker 7:

The because I thought it was

Speaker 1:

it was so great about it. And what ultimately for me honestly, the sun was not just that the fact that it was, like, really investing in UNIX, but these SMP machines were great. Right? They're making big SMP machines bigger than anybody else, which is great. So that was the exciting bit.

Speaker 1:

But the actual, like, the microprocessors themselves were clearly not so not so great. The, but so Dan mentioned Viking. So, Tom, were you there for how much the Viking fracas were you there for? Were you does the does the Viking does the Viking I cache bug ring a bell?

Speaker 3:

Nope. Oh. Pretty far pretty far away from that stuff.

Speaker 1:

So the Viking I cache was not grounded out properly. And For for people

Speaker 6:

in the audience who don't know what Viking is, we use code names and we forget that people weren't Sun employees then. The Viking was the super spark chip that featured prominently in the spark station 10, the spark station 20, and the spark station 5.

Speaker 1:

Right. And so it had an improperly grounded out ICache. And, as a result yeah. And as I was describing this to one of our colleagues at Oxide, he was like, wait a minute. You'd have to, like, d c balance the I cache?

Speaker 1:

Like, yes. You have to d c balance the I cache. And if you had too many zeros, they would start flipping to ones. And, Bondewijk figured this out when, amusingly, Tom just dropped. I imagine Tom is now filing a bug internally for, the the top four, Twitter employee dropped by spaces.

Speaker 1:

The, discovered this when he had a branch that was, where the displacement sent him to an address that had a lot of zeros at it, and then they started flipping, but very, very bad. Now Viking was bad and lives on in the debugger where we would still, try to flip one bit of a bad address to determine if it could have corresponded to actual data corruption.

Speaker 2:

Because we'd see You know, I I I stumbled onto that viking story through the m d b d command where I said, why on earth would you wanna take an address and try flipping all the bits in it to see if it comes out as something meaningful? And you're like, well, sit down and let me tell you.

Speaker 1:

That's right. Let me tell you a story about biking.

Speaker 2:

And so did you did you, like I mean, that flip 1, did you, like, discover bugs with that back in the day? Like, did you, you know, plug that into some goofball address and discover, like, that that the cache had had done anything?

Speaker 1:

I did, but only because of software. So in other words, it was something that had incorrectly mapped an address, not because of hardware. I don't think. But, and I should just say that, like, anybody who like, you got Sparks stories, just hop in here, raise your hand, request. We'll it will all requests granted.

Speaker 1:

See, definitely, the the subtitle of any of our spaces. The, so that was so Viking was very bad and, I mean, Tom quit as a result, I guess. Tom Ryan, the UltraSpark arguably saved the company. That's when I showed up. Or not arguably.

Speaker 1:

In dispute, we saved the company. And, there are others, Tom. Tom, did you drop offers Twitter Spaces?

Speaker 3:

I I did. I I don't know if that was a Comcast moment or what.

Speaker 1:

Okay. Alright. Well, we we were joking that that you're off, like, pinging the Twitter spaces team. I I feel, you know, I just Yeah. So the so Tom and, amazingly, the company kept stayed afloat through what is a very, very rocky period of Sun 4 m.

Speaker 1:

Sun 4 U UltraSpark definitely worked or seemed to and was great at 167 megahertz and at 3:30 or whatever. But I think it's when we got to about 400 that we started seeing eCash barriers. So if you were a Sun Sun customer during this era, all I can say is that I personally am sorry. Every Sun employee is sorry. We're really sorry.

Speaker 1:

It's so the it would the whole thing was a big wake up call. So the Ecash parity and I recently saw read a memoir a very bad memoir written by a son exec at the time who, I was like, I'm obviously skipping ahead to the chapter on the eCatch parity error, and he totally misdescribed the eCatch parity error. He's like, and we discovered that it was due to high energy particle strikes. It's like, no. It wasn't.

Speaker 1:

It was not due to high energy. It was due to everything but high energy particle strikes. So the ecash parity error, if you saw from the e cash parity error, the 2 most common there were some manufacturing defects that were the most common, and there was a design defect that was also a common cause of it. One of the manufacturing defects, Tom, I don't know if you've heard of this, that it there was a, we had radioactive boron had, an impurity in our SRAM manufacturing process.

Speaker 3:

Yeah. It got into the packaging. Right?

Speaker 1:

That it got into the packaging. So you had radioactive boron, which is an alpha emitter, but it's sitting in the actual SRAM cell.

Speaker 4:

The particles are coming from inside the house?

Speaker 1:

The particles are coming from inside the house. And that And

Speaker 2:

Was there ECC on that s RAM, Brian?

Speaker 1:

There is there that there was parity, not ECC.

Speaker 2:

Parity. Oh, I see. Okay.

Speaker 3:

Because ECC takes another clock to compute it.

Speaker 1:

That's right. And so so you would get so but, Adam, we love this that the you get a parity error. Right? In other words, we have a parity error. So what do we do now?

Speaker 1:

And, apparently, the the the spark approach to errors was really like, I just hope they don't happen. And in particular, you would take a trap, but you would not be in an architect coherent state when you took a trap. So it's like as it turns out, like, there's actually nothing you can do with this trap. Like, you are you've got no way of knowing what the architectural state should have been. So you basically have to die.

Speaker 1:

But the problem was that the and this is something that it took a long time to figure out. People didn't really internalize this. Who gets the trap on the decache parity? Well, who gets the trap is whoever observed the decache parity. Well, the problem is it was very rarely, like never, the CPU with the bad cache.

Speaker 1:

What would happen is that line would be snooped by another CPU. That CPU that snooped it would be like, wait a minute. This is, like, bad, and it would die. And because we didn't understand, we collectively, we signed, we humanity, didn't understand what was happening, we would say, oh, that CPU is broken. Replace that CPU.

Speaker 1:

It's like, no. No. You're you're you're actually, like, replacing the master. That's actually the only CPU you can say with confidence that you should replace. So people would be like, okay.

Speaker 1:

I replaced CPU 13. Okay. Now CPU 4 died. Okay. Replace CPU 4.

Speaker 1:

Now CPU 5 died. And you would get to the point when we had customers who were like, I have replaced every single CPU in here except for CPU 8. CPU 8 seems to be great. You're like, no. No.

Speaker 1:

CPU 8 is the murderer. CPU 8 is the one. It's murdering all these CPUs.

Speaker 2:

It's it's it's the diagram of the airplane with the bullet holes.

Speaker 1:

It is the diagram of the airplane with the bullet holes. And we, as a company I I really wanted, like, Steve Chasten or whatever to write a book about this because there's so much to learn by how Sun mishandled this in so many different ways, like, earnest ways, not like being malicious. We just did not understand what was going on. And in particular, like, there's a kind of a desire when you're on the front lines of a problem like this to give the customer, like, something to do. And you're not necessarily trying to deflect blame, but that's what it comes across as.

Speaker 1:

And we had a customer in Europe who, we said we it was oh, this is due to dust. It's like, dude, dust. He's like, well, yeah. You know, I think we've got I think there's dust in the data center. And the customer's like, alright.

Speaker 1:

Well, okay. I guess we'll accelerate this big HVAC project we got. We'll accelerate that. So they did they put, you know, 1,000,000 of dollars into, solving this HVAC problem, and they still have all these eCash barriers. And then Sun says, well, we think it's due to the proximity to the tube.

Speaker 1:

This is in London. They're a quarter of a mile away from the tube. And the customer's like, okay. We've done a map and, like, do you realize, like, there is no spot in London that's not a quarter of a mile away from the tube. So we don't think it's this, but the, the customer ended up building a new data center.

Speaker 1:

Center. And as it turns out, there were lots of things that could contribute to the cash parity error. One was the manufacturing defects that were not due to the radioactive boron, but were due to just really grievous manufacturing problems. Particular, the customer is so upset that Sun has them like, we'll give you a factory tour. Customer's like, I would love that.

Speaker 1:

They go to a factory tour, which is a huge mistake because Sun did not have the discipline to really make these things well. And the customer walks in, and this is a customer that has been, like, they're moving their data center further away from a tube and they're, like, cleaning up dust. And they walk in and it's just this incredibly dusty manufacturing floor because they're they're de boxing these machines in the same room that they're doing the burn in. So there's cardboard everywhere. There's dust everywhere.

Speaker 1:

And the customer ran their finger along one of the horizontal surfaces and looked back at the finger, which, of course, is black, and held up the finger to the exec. He said, just just remind me why I built a new data center. And it was like, wow. Tough moment. So it was it it was it was an educational moment.

Speaker 1:

And I it's it was one of those moments that was, like, man, when you're gonna have a customer do something, you always have to remember there's a human being on the other end of that. And you, like, you cannot have them chasing your theories. Like, you really need to be to be transparent them and honest with them, and it was we were not as a company, honestly.

Speaker 2:

But I I think, Ecash, Brian, that it it caused a a real reckoning, certainly, in the Solaris kernel group in terms of how to how to build software more reliably, how to build software defensively against these kind of hardwares, and how to, like, like, collaborate with some of the folks in the hardware.

Speaker 1:

Yeah. That's right. I mean, this so the fault management architecture at Sun, that we still have in the Lumos, that fault management architecture came out of the cache barrier. And in particular, it came out of the observation that, again, there were many problems that that were causing us. A bunch of the manufacturing kind of defects, you could predict in advance by looking at the rate of correctables over time.

Speaker 1:

And if the rate of correctables would rise, you actually had some indicator that there were components that were beginning to fail, and that you should take, like you you could take action. You could actually, like, turn off the CPU. So yeah. That it was it was a huge wake up call, because we had screwed up so much. There's actually I just another fine one other final story on the embarrassment of the cash barrier.

Speaker 1:

The, and I don't know if you're the I don't know if you're the Micron story. So Micron was late to Sun for a a DRAM shipment. And it was the point, like, we needed to ship the system to make the quarter, and McNeely is on the phone. And this one this story I actually know from McNeely actually told us the story. This one I I've got on on the least quasi good authority, I guess.

Speaker 1:

The, McNealy calls up the Micron CEO. He's like, where's my Deepgram? Like, we need to make our quarter. Micron's CEO says, no. It's actually it's funny.

Speaker 1:

I was just gonna call you because I actually have your DRAM. It's sitting on a private jet here in Boise to to send you, but, like, it's very important to give this to you. That jet can't take off because our ERP system has something that you might recognize called an ecash parity error.

Speaker 3:

Oh, man. Wow.

Speaker 1:

Yeah. That was another one of those that's why you will never see me me. Please don't be me. Anyway yeah. It was bad, Adam.

Speaker 1:

It was it was

Speaker 3:

Yeah.

Speaker 1:

It was bad. And we we learned a lot, I would I would like to believe, about making more reliable systems. And just like you can't have the strategy of hoping something doesn't happen.

Speaker 2:

Yeah. Or or I mean, to to your point or your story, like, some of the, you know, when these aberrant conditions occurred, the sort of thought of, like, oh, we'll we'll generate it. We'll we'll we'll kick it down the road and let it be somebody else's problem. But thinking through that full system to know, like, that you have enough architectural state to do something with that interrupt.

Speaker 1:

That's right. That's right. You can't just, like, hey, we'll just kick it upstairs. I don't know. Software software question mark question mark.

Speaker 1:

Right. Nate, I saw you on mute.

Speaker 3:

Good to get in here.

Speaker 9:

Oh, yeah. At at first, I was just gonna say that I I imagine the entire audience of listeners gasping in at at once at the punch line of that one. Yeah. But the the I also wanted to I've your story about the them blaming it on, like, cosmic rays triggered something for me I haven't thought of in, like, 20 years. The first place that I worked for was a recent spin off of Hewlett Packard.

Speaker 9:

It was actually the first one ever. And it was in a facility where, they had had some semiconductor operations going whether it was, I think it was just assembly at that point. But they had some kind of, like, mysterious bug like that that went on for months months months, and then they had some, expensive consultant come in and, you know, looked around and looked into all these things and said cosmic rays. It's cosmic rays. And so they they literally lined the, the roof of these buildings with, like, lead or tantalum or something, And and it didn't change the error rate at all.

Speaker 9:

And that's what it made me think of.

Speaker 1:

Oh, interesting. That must have

Speaker 9:

been, like, an en vogue, you know, thing to go to if you were a consultant and you, you couldn't find the answer.

Speaker 1:

Well, I mean, obviously, cosmic rays do exist. And I I know I saw Rick Althor join. I know there's there are definitely stories of where, the hyperscalers, AWS, Google, and so on, can see, like they can see sunspots by seeing higher instance of, of of errors. Like, they definitely do exist, but it is very hard to verify that that is the ultimately, it's kind of like, you know, I found a bug in the compile or I found a bug in the operating system. It's like, maybe.

Speaker 1:

But maybe it's a bug in your program. Exactly. Maybe.

Speaker 9:

Yeah. But but then You need a really statistically significant number of data points to, to convince me that that's, you know, coming from the

Speaker 1:

other direction.

Speaker 3:

I I think the earliest hints, though, were the the failures in at Los Alamos and at NCAR, which are both very high altitude.

Speaker 1:

Well, that's it, Tom. And I think that the the with the e cache parity error, the reason that the and this kind of this myth got hold, like, oh, it's cosmic rays. And there the the physicists that were looking into this is like, no. It's not higher in Denver, and it's not higher in Los Alamos. Because if you're not see if it's not higher at altitude, like, go fish.

Speaker 1:

You've got something else. And you again, you can't have cosmic rays, but it's just it's not likely by any means.

Speaker 2:

Brett, is it kosher to talk about who the manufacturer was before?

Speaker 1:

Definitely. I feel. Like, yeah.

Speaker 2:

Yeah. Go for it. I

Speaker 3:

mean, you

Speaker 1:

you told me That's that's oh, what? Oh, yeah. You're just kinda getting my fingerprints on the revolver? Yeah. So alright.

Speaker 1:

Well, you should know that also that the that we were being just absolutely just demolished in the marketplace, justifiably so, for this really grievous error that we were then mishandling. And, of course, like, who are we competing with? Well, we're competing with, like, HPE Superdome, and we're competing with with IBM Power. And, I mean, IBM is just rightfully just going to town on us. I mean, I think the IBM sales reps must have got I think IBM sales reps got paged on an eCash parody error before the before Sunday.

Speaker 1:

It's like they, you know, they managed to show up. And so IBM is just absolutely feeding on this on and, of course, who is the the the our SRAM manufacturer, of course, is IBM Microelectronics. And we had the contract was written in such a way that what they had actually done, what they had delivered was not in violation of the contract. So, write your contracts carefully, because you want to be sure that on so they they honestly did. The way you have to borrow on was was obviously extremely bad, not to dismiss it.

Speaker 1:

But it was man, there was so much else that was wrong about if that had been the only problem, it would have been, it would have been much simpler. But it there are a lot of problems. The, so, I know that, I saw Dan Cross join. Dan, I know you wanted to talk, like, Spark. I would say knock offs, but, like, Turbospark and all the other.

Speaker 1:

Do you wanna talk about Fujitsu Turbospark?

Speaker 7:

Do it. Talk about Turbospark.

Speaker 1:

Really have much so I honestly had very little exposure to the Fujitsu chips other than there was this alternate universe timeline in which people would take our operating system then proprietary and get it working on gear that we never saw and then would occasionally ask us highly technical, very well informed questions about very arcane arcane parts of of the system. So I honestly had very little exposure to to Turbospark or to also to Hal, right, and the ROS.

Speaker 7:

I had a set of

Speaker 2:

ROS HypersBox at some point. They were Of course. Yeah. Incredibly hot. They ran so hot.

Speaker 2:

It felt like the spark station

Speaker 1:

was gonna catch fire. Did you had how did you did we only let Australia have ROS microprocessors or something, Josh? What how did you end up

Speaker 3:

with this?

Speaker 2:

They were

Speaker 4:

in a cupboard at the university.

Speaker 1:

There you go. The the best things come out of the cupboard. So yeah. I don't Dan, I don't know if you do have more stories with with with turbocharger? Did you have to deal with it at all?

Speaker 7:

Not really. I I have a turbo spark spark station 5 down in my basement that I haven't powered up in, like, probably 15 years, and it probably needs to be recapped and all of that good stuff. But I'm just curious if there were any stories around that. I mean, that one was notable to me because it came out around the time of the Ultra 1, and it was a 170 megahertz part, but presumably it's on 4 m microarchitecture, and, you know, produced by Sun as a spark station. And I thought, oh, that's kind of interesting.

Speaker 7:

And I just assumed that was a Sun part, but then I just looked it up and it was a Fujitsu.

Speaker 1:

It was a Fujitsu part. Yeah. And Fujitsu and Sun and TI and there was a fair there were some very weird because Sun was fabulous. So there were very, relationships that I wouldn't have much insight into, honestly. But, yes, they were very fast, SS fives.

Speaker 1:

I can tell you from the inside of Sun, I got a choice between one of those coming in and a 143 megahertz Sun for you. And there was no question that I wanted the 143 megahertz sun for you. It was gonna be the the the box has my furniture then. A little dual processor that was at an electron that used to, watchdog if the table ever got pounded, because it was a and so whenever I was upset, I would pound the table because I was frustrated that I I myself thought something and then the watch the machine would watch talk, which felt very appropriate. But I was, like, getting punished for losing my temper

Speaker 6:

at the machine. The the Ross Sun 4 m parts were people who had spark tens and hated them would buy Ross parts and, like, oh, okay. Now it doesn't suck.

Speaker 1:

Well or, apparently, it heats your cabinet in

Speaker 6:

in Josh's case. It it yeah. It they they were they were I I I think there was, like, one my first job out of grad school, there's a lot of sun 4 m lying around. And one of the tens was that, and it was kept in a cooler place, like a machine room.

Speaker 1:

So and then we we had so after, it and I but, yeah, love it. We've got these stories on on those. I did wanna, like, talk about, Adam, when you and I were dealing with Panther, what became UltraSpark 4. And we would get on these con calls, and they were all voice at the time, obviously, the Poly com. And there was, like and I guess, like, something's just never changed.

Speaker 1:

I feel like we're still in this mode. There were I I remember Adam and I were on a call where there was, like, a tapping sound, and they they spent 15 minutes trying to debug where the tapping sound was coming from. And we're like, we are screwed. Like, we can't run we literally cannot run a conference call. Like, we are.

Speaker 1:

This is not good. I don't know, Adam, what your thoughts were at the time.

Speaker 2:

I I mean, I was it was sort of early in my career at Sun and just seeing how that particular sausage was made very, very slowly, by folks who, as you say, couldn't run a conference call. It was discouraging. Although, you know, not not necessarily in our defense, for every subsequent conference call that you and I were on, we would spend the first 5 minutes tapping our fingers on the Polycom to to Yeah.

Speaker 1:

I I think right. We we were very tempted. They they had created a perverse incentive where we actually wanted to create a tapping center just to watch them debug it. It was so entertaining. It was really terrible.

Speaker 1:

I mean, what what was what was wrong with us? So what a what a bunch

Speaker 3:

of brats

Speaker 1:

we were. Exactly. But, Panther, I remember thinking, like, wow. This chip is gonna be amazing when we stick it in a time machine and send it back to 2,001 when it would have been competitive. I mean, it would have been competitive in 2,001.

Speaker 1:

It would have been amazing in 2,001. But

Speaker 2:

I That's how all those processors were. And was it was it Eagle that went into Millennium or do I have that backwards?

Speaker 1:

It was Millennium that went into Eagle.

Speaker 3:

Thank you.

Speaker 1:

I believe. And Millennium yeah. Millennium was a canceled project, Ultra Spark 5. It did not go well.

Speaker 2:

I don't know. But it it it it was canceled in, like I mean, by by the Oracle overlords, like, much belatedly.

Speaker 1:

Was that

Speaker 6:

No. That was rock. That was rock.

Speaker 2:

Rock. Right. Right.

Speaker 1:

Rock? Which do you remember what rock stood for, Dan?

Speaker 6:

No. They no. No. No. I don't.

Speaker 1:

Okay. So IBM had their IBM Regatta, and that was a Regatta on a chip, ROC. That was rock. Not exactly, Adam. That's that's exactly the right reaction, kind of like a fatigued exhale.

Speaker 1:

Like, that's what we're doing right now?

Speaker 2:

I mean, I know that it didn't literally occur, but it felt like we never actually shipped the new Spark CPU while I was at Sun. Although, until we got, like, the the Niagara and the the t series.

Speaker 1:

Yes. And those were great. I mean or or that that was very well, that would have been really great if they hadn't had a just a single FPU. The first one That's right.

Speaker 2:

They it was so odd. So it was it was this system it was this hugely multi threaded system at the time. Do you remember how how many threads or cores was there?

Speaker 1:

I wanna say,

Speaker 6:

like, 60. 1632.

Speaker 1:

Yeah. Yeah. Something like that. I'll just say 32 or 64.

Speaker 4:

The small ones had a less than that. They were agonizingly slow.

Speaker 1:

They And

Speaker 3:

I think the very first one was only 8 threads.

Speaker 4:

The t right. If you bought a t 1000 with the small CPU, it was, like, 8 threads or something or 16 threads.

Speaker 1:

And each of those

Speaker 4:

threads, each of those threads would run about as fast, like observably fast as I don't know, like UltraSpark 3i maybe? Best at best?

Speaker 1:

That was really slow. Yeah. I should also Josh also said, Josh, that you're someone who is, like, pretty high tolerance for pain from a performance perspective.

Speaker 4:

I know. And I tried to get people to use these things. And as honestly, the thing that really made it embarrassing was like, because it was a very wide CPU, nothing could go faster than the 8 100 megahertz or whatever useful performance you could get out of it. Interactive SSH was slow. Like it just felt laggy.

Speaker 1:

Yeah. That's bad.

Speaker 6:

Speaking well, t one was a throwback to the early sparks because it had no integer multiply.

Speaker 1:

Oh, interesting. I just remember they had the a single FPU for all those threads and cores. And the way I get that, like, there's this idea there should be a name for this. I'm sure there's a single word for this in German, where you have a you see something that you kind of assert is common or, and you add it. Or you see you assert that something is uncommon, so you eliminate it.

Speaker 1:

So you're like, floating point operations happen, like don't happen. So we are going to, or they're very they're very rare. It's like, well, they're they are like maybe, but, like, they definitely do happen. Like if you run, you know, there are many integer programs that do actually enough floating point to be painful if you're gonna make it a very painful operation. Looks like your large pages,

Speaker 2:

you know, is, you know, you know, nobody's using these huge pages. Same kind of, like, inference where they're drawing the wrong inferences. And I remember being very excited about t one, seeing that it would, like, it it they quoted all these pretty nifty, you know, performance benchmarks ostensible web processes, which were very hot at the time. And then with the caveat, oh, as long as you're not using FPU. And to your point, Brian, everything it turned out was using a little bit of FPU.

Speaker 2:

And when everyone is using a little bit of FPU, everything just backed up and became very slow and very single threat.

Speaker 1:

That's right. Laura, you got did you have one of these stories as well? Do you have a a similar kinda story? I saw Laura unmuting herself, but maybe no. Maybe not.

Speaker 1:

I'd I

Speaker 8:

I I have a similar story to that.

Speaker 1:

Oh, okay. Ran a

Speaker 8:

whole bunch of stuff on on the Niagara chips. And, honestly, after about a year of beating on it, the only thing we were able to ever able to get to run fast was benchmarks.

Speaker 1:

Yeah. That's great. Well, hey. At least those benchmarks are so why are you complaining about Those benchmarks are great. Those benchmarks, of course, approximate your real world records.

Speaker 1:

It's like the Jedi Jedi mind

Speaker 2:

trick for for for phones.

Speaker 3:

I think they did a great job marketing that though with the throughput oriented computing. It's like when you're too embarrassed about latency.

Speaker 1:

Hey. Do you remember balanced computing, Tom? Didn't the I if I recall correctly, the a son would talk about balanced computing at a time when the c when everything sucked. It's like, no. No.

Speaker 1:

Oh, no. No. No. You wouldn't wanna have one of those fast alpha CPUs. Like, you I mean, you The feng shui Well, I'm sure it's wrong.

Speaker 1:

Terrible.

Speaker 3:

I I I think IBM mainframes were the first with the balanced.

Speaker 1:

The balanced. There you go. There you go. So yeah.

Speaker 3:

It it sucks, but there's a lot of it.

Speaker 1:

That's that's right. Well, I I thought it was obviously a great idea. I it was just it was early, honestly. And it was early and, like, the whole company desperately needed it to succeed, which probably meant putting too much investing too much in it from an emotional perspective. Not a that necessarily.

Speaker 1:

And, DeFeo, did you ever get to the bottom of why why was it, did it perform poorly for you? I mean, at the end of

Speaker 8:

the day, everything uses FPA. Right. It's it's what the problem was. But if you could if you could manifest a highly concurrent workload that relied only on, integer arithmetic, then it then it was okay. So it worked, worked okay.

Speaker 8:

It's like a web cache.

Speaker 1:

Right.

Speaker 8:

Which was basically made it the most expensive web cache you could find. Right.

Speaker 2:

Right. And that and it turns out that was the marketing material. I remember that quote from you being very effective.

Speaker 1:

That's right. The most expensive webcast you can buy raves, the airsoft cycle. The I definitely remember, and I know that with Adam, whenever we talk about the cost of the systems, I just remember us learning the cost of a v eight eighty. Each CPU v eight eighty was retailed for, like, a 120 k or something like this. And Yeah.

Speaker 1:

Do you remember us talking with this? I remember that is it is this as vivid for you as it is for me?

Speaker 2:

In indelible. I remember being in building 17. And and I think it might have been when I was an intern or I just joined or something like that. I mean, really I mean, it was, like, 2,000, 2,001.

Speaker 1:

And you're just, like, who would pay that for a computer? Does that make any sense? Like, why would I just why would I just buy 8 smaller computers? Or

Speaker 2:

a 100 for that price.

Speaker 8:

We we deployed probably I think we had one one install that had 12 v eight eighties, racked side by side by side, all running Oracle 8 I. And, that architecture printed money, so it was well worth it. It never really broke.

Speaker 1:

Well, there you go. That it that's that answers the questions. I remember just being like, I don't know. I guess yeah. I don't know.

Speaker 1:

It's I people want it, I guess. And then after everything, like, busted out, I'm like, listen to the babes, the children. They were telling us. Adam was telling us. The intern knew.

Speaker 1:

The intern knew this was ridiculous. So

Speaker 2:

I mean, but that that speaks to like, I feel like we were holding on to Spark both both as the full company and in the kernel group, first, we we just wanted it to go. Like, we wanted it to be the thing. And then when we when we made the Solaris port to to hammer, to to AMD 64, it was just so clear, the the performance gap. It was

Speaker 3:

when I moved into the

Speaker 1:

So

Speaker 3:

In in into a group at the Uni where we

Speaker 4:

were using Sunrise for desktops, there were a couple of, like, no, V2 forties, I think that were in there. And it's like, people are like, thin clients kind of suck. This is not

Speaker 3:

a good

Speaker 4:

ProLiant from before the Hewlett Packard ProLiant from before the Hewlett Packard acquisition that had, like, a Pentium 3 in it, and it felt maybe 3 times as fast interactively. And it was definitely it's like, why would we should stop buying these things?

Speaker 2:

Okay. So now we've come to the part, the moment in the Twitter space where Brian gets kicked out. We are hostless. Brian can listen to all the speaking, but can't actually speak himself. So, we're gonna pause for a second.

Speaker 2:

Brian, I don't know.

Speaker 1:

Alright. Adam, you there? Frank. Frank. I can hear you.

Speaker 1:

Yes. Yes. And I'm again, I'm I'm approving all comers. So if you've got something to say, what I think talk about Spark, definitely, just just request to speak. And are you gonna kick off the recording again, Adam?

Speaker 1:

Uh-oh. No more Adam. Oh, Twitter Spaces, we love you so much. Can you please work more reliably? Nate, do you do a lot of Twitter Spaces?

Speaker 1:

This is happening in a bunch of Twitter Spaces. It's just us. This is my own bad luck. I just I I love Twitter spaces, and I want it to work. Adam, you there?

Speaker 1:

Actually, can anyone hear me?

Speaker 3:

I just I I

Speaker 1:

just Jesus. God. You're okay. Give me I'm just like, wait a minute. Have I just am I are we dead again?

Speaker 1:

Is that I

Speaker 2:

I can hear you now, and and I seem to be in.

Speaker 1:

Okay. Good. Yeah. That is like, I I come on Twitter Spaces.

Speaker 3:

So is there an hour limit or something?

Speaker 1:

Yeah. It so supposedly, this is, an I I saw, Antoinette was saying that this was, had happened in a space that he was in, and he thought it was there was a garbage collection issue that we're I mean, of course, ironically. I mean, this space should die because of an Ecash parity error somewhere. So that Yeah.

Speaker 8:

I was gonna say my my app

Speaker 1:

threw an Ecash parity. Ecash. Exactly. Alright. So I know, some folks have been waiting to, the, Enron Advocate that's been waiting to get in here.

Speaker 1:

So wanna make sure that got a chance to speak.

Speaker 2:

Yeah. So I had maybe a dumb question as a young person, like, why Spark? Was this just the days before x86 was dominant, or was there an advantage, or was it just Sun being sun?

Speaker 6:

Oh, it was not just sun being sun.

Speaker 1:

Yeah. First of all, it could be all those things.

Speaker 3:

Yes. I mean, the the 3386 was not really taking

Speaker 6:

off. D at the time. And let's see what instructions we could get that are small and we can make a small chip that clocks really fast. So Stanford came up with the MIPS project. Berkeley came up with the risk project, which is a progenitor to spark, if I remember correctly, Tom.

Speaker 3:

Yeah. Right? Yeah. Absolutely. Dave David Patterson carried it into some.

Speaker 6:

Mhmm. Yeah. And so those were the 2 those were the 2 big forces behind reduced instruction set computing at the time. Eventually, IBM took their 801 project from Watson Labs and turned it into power. And

Speaker 3:

then they're in the eighties. Intermediate romp or something. Oh, god. Don't forget a romp.

Speaker 6:

I had a romp machine in Rochester for one of my summer gigs there, and that thing was slower than the night.

Speaker 1:

Hold on. Don't forget the Romp. Are you saying Romp? R o m p?

Speaker 7:

Yeah. Oh, yeah. Yeah. Yeah.

Speaker 2:

Oh my god. What?

Speaker 6:

Microprocessor. Oh.

Speaker 2:

It was

Speaker 1:

a b 2 processor.

Speaker 7:

You could either run AIX on it, or you could run a port of 4 3 VSD that IBM called AOS, the academic operating system.

Speaker 6:

We actually had we actually had AOS on the ones in Rochester because they were used by the a s 400 TCP IP group.

Speaker 7:

Oh. Wow. Wow. AOS was actually pretty reasonable as a port of 43. AIX on that thing is that was something else.

Speaker 1:

That is nuts. I know. So yeah. And I you know, there's a very good listening to, to answer the question, David Patterson's got a great retrospective. And their their, touring lecture is really good.

Speaker 1:

And they talked about in particular, they talked about the fact that a bunch of grad students at Stanford and Berkeley were able to make a CPU that was faster than the industry. That's how what a big deal. So risk was a very big deal when it happened. So that's that's why Spark, because

Speaker 3:

The other the other great data point though in in roughly that time frame was the Intel 432

Speaker 1:

Yes.

Speaker 3:

Where they they completely forgot about performance and it was just ridiculous architecture.

Speaker 1:

So I know I'm in the right room because I was actually gonna bring up the Intel 430 32 earlier when you were talking about removing things that that they don't think are needed. And the the 4 32 architects felt that people the only constants you really needed were 0 and 1. And as a result, the only immediate values it had were 0 and 1. And I Robert Caldwell in his what I still feel is like one of the greatest papers ever written on, performance considerations architectural performance considerations in the Intel 4 32, whatever it is. I'll we'll link to it.

Speaker 1:

Great, great paper. But he talk he says that that this assumption is, quote, almost certainly in error. So so because, Tom, you were you guys were watching the 4 32 at from Sun's perspective, which you're like, what are you guys doing over there?

Speaker 3:

Yeah. It it was kinda a little bit before Sun, but, because I know because my wife came to work at Sun from from that group in early 'eighty three, So it was already a laughingstock by that time.

Speaker 1:

Tom, I have so many follow-up questions. Your wife came from the 432 group? Yeah. Marketing. But this is like so for those of you who don't know, Tom, I I I wanna make a Tom, I wanna make a documentary about your extended family.

Speaker 1:

So Tom is from, like, one of what one of, like, I think 10. Right? I'm not I'm is it are you you got 10?

Speaker 3:

9 kids. 9 kids.

Speaker 1:

9 kids. And, Tom's brother in was one of the people that invented the the optical mouse or invented 1 of the optical mice. Excuse me. Invented 1 of the optical mice. Right.

Speaker 1:

The but I I had no idea that your wife worked at the 432. That's great. Yeah. And on the marketing department, man, no wonder she was at Sun. She must've been like, hey.

Speaker 1:

Are you are they hiring over there? Because, like, I gotta tell you,

Speaker 3:

marketing this thing is

Speaker 1:

not fun. Yeah. That is that is great. Did so do you have 4 32 manuals, or does she not allow those? Are those not permitted in your collection?

Speaker 3:

I do not have any manuals. I have

Speaker 1:

Yes.

Speaker 3:

Yes. Because they were, like, handouts because they couldn't actually sell them, so they put them into the site.

Speaker 10:

I imagine that hurts the heat dissipation. Yeah.

Speaker 3:

But yeah. You have no problem problem power again.

Speaker 1:

That's amazing. Oh my gosh. Wow. I did. Okay.

Speaker 1:

Yeah. But, so I know there are a couple other people that were yeah. That I'm trying to couple other people that were joining us. So chime in here, folks. Do you have any memories of Spark?

Speaker 11:

Yeah. Yeah. Yes. I'm too young to even remember Spark or even close to that. But but I have a question, on the follow-up on the, why Spark question.

Speaker 11:

So, my mentor who they used to smuggle books from the States to the

Speaker 3:

Soviet Union back in

Speaker 11:

UNIX manuals. And one of the discs that I found when he, you know, gave me everything that he had was something of a video of Bill Joy in the eighties. And, I I just tried to find that and it has even less than a 1000 view. I I was trying to YouTube to search on YouTube the open system sorry, the open group imperative, but, apparently, the video is called the open, system imperative. And, the thing that I just remembered about it is that he was talking about that because Spark has a lot of registers, it would allow, more performance for high level programming languages and including machine learning in the future.

Speaker 11:

That that's what he said when I when I watched the video years ago. I still don't understand that. Like, did they actually have this kind of a vision when they were building Spark?

Speaker 1:

Tom, that's Tom, that one's going to you. That's I Wow.

Speaker 3:

I do not recall any any discussion of machine learning in the eighties. I remember, you know, AI was still talked about by all the Lisp people, but that was that never went anywhere.

Speaker 1:

But if we

Speaker 3:

Until the nineties until the nineties, then AI became American Idol.

Speaker 1:

If we're gonna praise, though, Bill Joy, we're gonna have to DC balance him with the future doesn't need us article that he wrote in 1997 in Wired that resulted in basically me having to talk my mom's book club off the roof. This is where Yeah. He maintained that, like, the robots were gonna gonna take over everything. And I beg and my mom is like, no. Brian works for this company.

Speaker 1:

So, like, this guy is clearly right. I'm like, oh, mommy. He's not right. Sorry. It's like

Speaker 3:

Yeah. Wait. I don't know what kind of drugs he was on now.

Speaker 1:

Wait. I mean, definitely, Vischer. That and, Troy, that's amazing. Is that video online? Or is that did you find it available?

Speaker 11:

Yes. I I yes. I just found it. It was uploaded 5 years ago and it has, like, 500 views. And, I'll I'll share it on Twitter as well.

Speaker 11:

I'd I'd be very happy to hear stories about how they thought it would be and how it became. And I just found that that they also had an idea for universal binary that would that would work on all Unix machines.

Speaker 1:

Interesting. Now that's an idea that I mean, that's the JAR, isn't it?

Speaker 3:

You could They have a Java.

Speaker 1:

You could argue that that's the AS 400, though. That is the system 38 at IBM did that and had I mean, they actually did an amazing architectural transformation to power, and without changing their binaries because they did had the I mean, the this is the the future system work that turned into system 38 that turned into a s 400. I know that I'm, like, I'm an a an a s 400 fan. But the, we were asked and we were when at Oxide, we were we were raising around, and we were asked for the the best analog for, for oxide. Like, what's what's a historical analog for oxide?

Speaker 1:

And before my brain could really get a hold of my mouth, AS 400 was out there. Like, IBM a s 400. And the PC firm is like, what? Like, what the fuck hole am I gonna have to dig myself out of and explain? It's like, no.

Speaker 1:

You're not oh, okay. No. I'm not trying to raise money for the a b IBM AS 400. It's not a very good idea.

Speaker 3:

I don't I don't know, Brian. The if they may have had a nice architecture, but you don't need much to run RPG.

Speaker 1:

That is fair. You know what I like about the a s 400 is? I just loved that it was it that it was deliberate hardware codesign. I know I have once again put myself in the position of meaning to defend the AS 400. But I I did love that aspect of it.

Speaker 1:

And it's so, Tom, to to answer this question, like, how much were people thinking about, like, the what applications would look like in the because that's what that is kind of amazing that Bill was thinking that far in advance for was on esoteric drugs.

Speaker 3:

Yeah. Well well, Bill was out I mean, Bill is amazing. He's he's clearly the smartest person I've ever known. But, you know, you never know what time scale he's operating in, whether he's telling you to do something for tomorrow or for the next century.

Speaker 1:

That's funny. Right. You you don't actually, the it's like I had a, worked for a guy once who would claim that he was speaking in the optative voice, which is a Greek voice, in which you refer to something in the future as if it's the present. I'm like, okay. But to the casual listener, it just sounds like lying.

Speaker 1:

Right? I mean, can't we?

Speaker 3:

Right. Right.

Speaker 1:

But

Speaker 3:

But but but but for Spark, you know, talk about chip area, that very first chip, you know, there was a some need to have multi port access to the registers. And it was Bill who came up with a three-dimensional layout of the chip that apparently never been done by anybody. And so, you you multiple layers of wires to get out the registers. And, of course, that's normal these days, but back then it blew people's minds.

Speaker 10:

So that was the first non planar check?

Speaker 3:

I don't know if it's really the first, but it was certainly not standard practice for what we were doing.

Speaker 1:

And, Tom, when is this? This is, like, 88?

Speaker 3:

87. 87.

Speaker 1:

What was the the were you part of the Of course.

Speaker 3:

Well Yeah. Well, beef well, before that. 87 is when it shipped. So probably late 85, he was thinking these things.

Speaker 1:

When do you we'd how early in Sun's history were people talking about doing their own CPU? Asking for a friend.

Speaker 3:

Pretty early. Bill was from Berkeley. Berkeley was doing risk. So it was clearly an interesting idea, and Motorola was not being very aggressive with clock speeds. But, no.

Speaker 3:

It was very ballsy to do it because, you know, jeez, it's not like we weren't doing everything else as well.

Speaker 2:

Yeah. Listen to that last part.

Speaker 1:

Which part? That would that we should do it? That's what I heard. I heard what Tom saying is if you're contemplating your own silicon, you should absolutely do it. That's is that not what with

Speaker 3:

that

Speaker 1:

that's that's the moral of the story, I think. Right? I think it, like oh, wait a minute. We're at Sparks Wake. Never mind.

Speaker 1:

Hold on. Wait. What? Wait.

Speaker 3:

Where am

Speaker 1:

I again?

Speaker 3:

Hey, Brian. I know I know a guy who's putting together a team to do a c CPUs, so we should talk.

Speaker 1:

There you go. Well, actually, the the the one if you could please ask him to dedicate his his intellectual capabilities to a secure microcontroller, we would really like that. It is secure hardware, as it turns out, is really, really hard. And, we we've we we talked about it last week, but the vulnerability that Laura found was, and that that that

Speaker 3:

Yeah.

Speaker 1:

Rick did a lot of work on. I think it showed just how complicated it is to make things really secure as it turns out.

Speaker 3:

Well, I think I think we've all learned it's impossible to make things really secure. You you could do a lot better, but but, somebody's gonna find a bug.

Speaker 1:

Well, yeah. And I think that I mean, I do feel that we you know, the the focus of microprocessor development has really shifted where it's like it's perform it it's not just it has to be performance and, like, you there there are all these other aspects of the system. Now I don't think Spark really considered all that much. I don't know. I don't recall any real conversations around it.

Speaker 1:

I don't know that we executed speculatively enough to be a problem. Like, that was part of the problem. Exactly. Part of the problem was that we were

Speaker 2:

I'm I remember early days of them talking about scout threads, which I which was sort of early speculation.

Speaker 1:

It was. That's right. I forgot about scout threads. Yeah. This whole idea of having this, like, literal separate thread running ahead of you, which is, yeah, in hindsight would not have been a great idea.

Speaker 1:

Yeah. I mean, it kind of died on the 1. Yeah.

Speaker 2:

First fortunately, they never got it working. Yeah.

Speaker 1:

The the other thing I I did so, Adam, did you find spark

Speaker 2:

bugs? I don't think I did find spark bugs, Frank.

Speaker 1:

Because I and I'm not just actually asking that as a segue to my own my my own spark bugs. But the, shortly after I came to the company, I was working on being able to tune the system clock up, and, we had decided that you should tune the clock at either a 100 Hertz or a 1000 hertz. So this is the old El BOLT. Tom, you remember El BOLT from back in the day?

Speaker 3:

Oh, yeah. Yeah. Yeah.

Speaker 1:

And that hadn't changed. And El BOLT would wrap. After 2 248 days, Elbit would become negative, and system software would go haywire. So my first project at Sun was, fixing that problem, making sure that Elbit could wrap correctly, and then making Hertz configurable. And, again, it already been said there's a 100 or a 1000.

Speaker 1:

Like, why not higher? And they're like, well okay. So I was like, I'm gonna actually figure hertz so high that the machine no longer boots and that this is, like, very satisfying. They made it get hertz so high that the machine is only interrupting the clock the only x team clock interrupt. And I had a little sun 4 c that I cranked up to 2026,000 hertz.

Speaker 1:

And at 26,000 hertz, it stopped at the banner message. And I'm like, alright. Like, that's done. And, that's very satisfying. And I came in the next morning, and it was at the login prompt.

Speaker 1:

And this little poor machine had been executing, like, not only knows how few instructions at a time, but it managed to boot. And I'm like, oh, this is great. And I hit the enter return enter the key and it immediately panicked. And it panicked because, as it turns out, Spark had a problem that was in all variants of Spark, all Sun four m variants because it was in the RTL that got cut and pasted, whereby you could take an Tom, do you out of you ever had to deal with, like, the writing to the PSR register? And then there when you wrote the pill to the PSR register, the architecture manual and, Adam, if you got it in front of you, you should look at the language they use because it's so goofy.

Speaker 1:

You have to have several knobs after the right to the PSR to quote, unquote, quiesce the PSR. And you're like, what? Like, that what stinks?

Speaker 3:

Oh, I know. And you were getting clock interrupts and

Speaker 1:

You were getting it. As it turns out, you were not supposed it was supposed to be the case that all architectural state was affected on the the right PSR, and the knobs were merely to do question mark, question mark, question mark and quiesce the PSR. But as it turns out, you could take an interrupt. And the, which everyone told me is like, no. No.

Speaker 1:

That's not it. Like, you just showed up. Like, that's not a bug. I'm like, oh, I kinda think it is. Like, I got this thing kinda dead to rights.

Speaker 1:

And I remember Mike explained reproducing it it on the simulator was very, very gratifying. I'm like, so do I get, like, an? They're like, no. No. You don't get anything.

Speaker 1:

Sorry. It's like, do I get I can't get, like, a t shirt or something, or do I get no. Okay. Never mind. I just get the satisfaction of having found a bug in Spark.

Speaker 1:

Apparently, bugs in Spark were not so, like, hard to find those days. Those were not, like, not precious. But I had to get that one out there as long as we're talking about smart bugs. And it I guess one of the folks had had, asked to speak and want to make sure that we got that there are other any other Spark memories or questions that we got through everybody? I know we wanted to be mindful of time, but also wanna give Spark a proper sending off.

Speaker 4:

The prom really was better than everything else that still exists today. The

Speaker 7:

But that prom that prom had its origins in the 68,000, though. Right? I remember having some 3 fifties and you had something similar. It wasn't quite as cool as the sun 4 proms, but it was still pretty nifty.

Speaker 3:

Yeah. Open open boot started with the sun 2 or 3. Mitch Mitch Bradley is the man. He was all hung up on 4th. So

Speaker 1:

And I am trying to, Rahul, I'm trying to approve you as a speaker. I just did okay. Never mind. Twittled spaces? Come on.

Speaker 1:

Don't be weird. Please work.

Speaker 9:

Tom, was that why why 4th ended up as part of the prom? Well, the prom was a 4th interpreter.

Speaker 3:

Yeah. Yeah. I don't know what was the compelling reason to change. It's probably probably looking at CPU transitions because the original problem was all, 68,000 assembler.

Speaker 9:

I just have one that I don't have a spark story, but I have a 4th story, so I'll save it for later.

Speaker 1:

And I I think now is the time. Hit up end up with a 4th story.

Speaker 9:

Somebody was still in there. Yeah.

Speaker 5:

So can you hear me?

Speaker 1:

Yes.

Speaker 5:

Yeah. So, hi, this is Jose. I used to work as a field engineer, just at the end of Sun Microsystems. So this was back in 2009 when Oracle was just about to take over. And, I got a task to I think it was spark station 5 or 10 something like that.

Speaker 5:

And I got a task to replace a e from battery, I think. And this was actually a system service processor for e 10 k. And

Speaker 3:

it was

Speaker 5:

it was really really funny amount of money that the company had to pay to have the e 10 k on extended life support back in 2009.

Speaker 1:

You had you had a an e 10 k that was still operating in 2009. Wow.

Speaker 5:

Yeah. This was this was in 4 months.

Speaker 4:

For some for some context, the service processor you're talking about is itself what? A spark station 10 or something or an Ultra 1?

Speaker 1:

It is. Yeah. It's a spark nose. It's a start up and SS 10.

Speaker 4:

The little the little pilot light machine that starts the

Speaker 1:

big machine.

Speaker 3:

It It's the BMC. It is

Speaker 1:

the BMC, and that thing was a mess. It was not good. It was it's especially easy if you ended up having to support it. You would know all about how, all about where the the hair was on that one.

Speaker 3:

Well, speaking of long running machines though, someone on Twitter a few years ago had a Solaris box that had been up for 18 years uptime.

Speaker 1:

Wow. That's running in a version of the operating system that's got a lot of bugs in it. So I'm glad you're paying and then it should survive.

Speaker 3:

Tim I probably I probably forgot to use it.

Speaker 1:

That's right. Tim, I think you're trying to get in here with a with a 4th story.

Speaker 12:

Oh, yeah. Well, this is a little off topic maybe, but, I have a story that I I remember reading, about Sun Microsystems that, somebody, broke the Java Virtual Machine or the sandbox. And, and I always found it very creative the way they did it. They took a computer and they put it in an oven and heated up the oven until the bits started dropping out, and then they were able to break into the sandbox.

Speaker 1:

Wow. Yeah.

Speaker 12:

I always felt really creative.

Speaker 1:

Yeah. Well, if someone starts to stick your computer in an oven, you may wanna prevent them from from doing that.

Speaker 3:

Usually, you could just do that by taking the fans off. Right?

Speaker 1:

That's right. Yeah. Well, that was a precursor to a lot of our, like, the voltage glitching attacks, right, that we that we see today are, are all the the kind of the the errors to those kind of attacks. Physical attacks. Invasive attacks, as we say.

Speaker 3:

Or you could just yell at your hard drive.

Speaker 1:

Well, the that's right. You know, it's amazing how many my son did not do a very good job explaining that if you did not protect the boot prom, that anyone could basically walk up to the machine and write to arbitrary memory. Because Adam, when you were Adam and I share an alma mater, and when you were at school, did they were those password protected when you were there?

Speaker 2:

Almost all of them. Although I found a few where I could I could get into the prom and then, convince it that my UID was 0.

Speaker 1:

I believe that those few that you have found were part of my agreement with the staff, whereby I was going to share with them with a problem, and they were going to preserve on my lab machines, I was gonna have the right to go to the prom. So I I think this might have survived me. It was very convenient. The prom was so nice. So, Tom, what's the history of that?

Speaker 1:

Did you have that at all on the 68 k?

Speaker 3:

I'm not sure when it came in. It probably came in with the Sun 3, because that was fairly major change from the Sun 2, And it was driven by Mitch Bradley, who was very early. He also was a hardware guy. He designed the first SCSI board.

Speaker 1:

Oh, wow. Like the first one.

Speaker 3:

And he wrote the first Intel Ethernet driver and, Intel Ethernet was just a mess back then.

Speaker 6:

I Was that LE?

Speaker 3:

Yeah. It was a little Indian little Indian chip. No.

Speaker 6:

No. No. No. I meant the the driver. Was that the Lance Ethernet?

Speaker 6:

The LTE No.

Speaker 3:

No. That's got the the Lance.

Speaker 1:

Lance Ethernet.

Speaker 3:

The Lance was a nice chip. Oh, sorry. But the 82586 was the Intel one. It sucked. I I have a I have a paper in an old Usenix called all the chips that that that that discussed some of the problems.

Speaker 3:

Wow.

Speaker 1:

Hey. So all the chips that that I wanna go find that paper, Josh, I'm gonna mute your typing. You don't mind. The

Speaker 7:

Clint Clint Cole thinks that's, like, the greatest paper ever, by

Speaker 1:

the way.

Speaker 3:

Yeah. Thanks.

Speaker 1:

That and so It

Speaker 3:

was fun.

Speaker 1:

And so is that the because Happy Meal Ethernet, did that did did Happy Meal that was HME. Right? Was Happy Meal Ethernet? Is that is that story apocryphal, Tom? Or is that is that true?

Speaker 3:

After my time After your

Speaker 1:

time That was s

Speaker 3:

s s bus something or another. Right?

Speaker 1:

Do you know the origin of the key sequence? The stop a key sequence, l one a?

Speaker 3:

That, I'm pretty sure John Gilmore came up with that. Alright. I I Or maybe even It's

Speaker 2:

worth it's worth explaining what that is.

Speaker 1:

Yeah. So if you hit the the the sun key the beautiful beloved sun keyboard. I probably need to just take a moment. Like, is there, like, a an emoji for, like, wiping a tear away from my eye? The, the the sun key and, Adam, you still have a type 5 or do you have a type 4?

Speaker 1:

What do you got? You've got

Speaker 2:

I have 2 type fives, but with the UNIX layout.

Speaker 1:

Right. I love the I love the qualifier that, like, you know.

Speaker 2:

Gotta be

Speaker 1:

Gotta be the unique settings. The the controls in the right place. So the old school sun keyboards had the function keys down the left. There were 2 columns, 5 key may know maybe more 6 keys. I don't know, Tom.

Speaker 1:

How many keys high of me? And the key in the upper left was the stop key. What was stop even for? What did it even stop?

Speaker 3:

It was on the first keyboard, it's the sun one. It was just labeled L1.

Speaker 9:

I was going to say for getting out of Vim.

Speaker 3:

Right. So it so the escape sequence was l one a, and then later keyboards, they changed the label or something. But it was always I think it's because that key was easiest for the keyboard scanner to find. There's no

Speaker 2:

That's that's awesome.

Speaker 1:

And it was l one before it was stopped. I always said l one a. I mean, I was Right. Right. Use l one a as a verb, of course.

Speaker 1:

But I did not I I kind of assumed that it was l one after it stopped, but it was l one before it was stopped.

Speaker 3:

Yeah. Yeah.

Speaker 1:

Wow. And Back

Speaker 3:

when the keyboard back when the keyboard had a parallel interface.

Speaker 1:

And, Tom, you realize that if you have any of those keyboards, those are worth like I mean, it's worth more than Bitcoin. The the the those keyboards are worth a fortune now.

Speaker 3:

Certainly. To the to the people

Speaker 2:

that can't talk without one, to to that to

Speaker 4:

those people, it's especially for, like,

Speaker 7:

Those people are welcome to come to my house and retrieve them from my basement, please.

Speaker 3:

We know that we're we're talking earlier about what what application.

Speaker 1:

Dan, is that a come and take it for some keyboards? I I can't tell if that's like a if that are you inciting to violence or you it's like a

Speaker 7:

you can pry my son's

Speaker 1:

keyboard out of my through my cold dead hands.

Speaker 7:

No. That that means I have way too much stuff in my basement. That's that's, like, a literal, like, no.

Speaker 1:

Come and take it. Come and take it.

Speaker 7:

That's a literal, please, come and take it.

Speaker 1:

Please come and take it away.

Speaker 3:

But but but earlier, somebody was talking about, you know, did what kind of applications did we envision, all this stuff. And the real secret of Sun's success is that we built them to make ourselves happy. It was it was for the software engineers to use and other engineers. And, that that was the driving vision.

Speaker 1:

The the engineers that that driving demographic were the actual, like, engineers themselves.

Speaker 3:

Yep. And and that that's the roots of UNIX too. Right? The Ken and Dennis did it to please themselves, not not for anyone else.

Speaker 1:

I mean, I don't know, Adam. I can't speak for you, but dtrace is definitely I definitely did dtrace for me. So I definitely I feel like we and and Josh, definitely, you should go look at Josh's TTY based software, clearly done for himself. TTY based, presentation software. Yeah.

Speaker 1:

I know. The best software we do is for ourselves, really, sadly. I mean, not that we don't care about other people, but that's the stuff that really I think that's, like, it's the stuff we've got the best intuition about, isn't it?

Speaker 2:

Absolutely. And and born of the the pain of those problem solved? Because I I know that you had been thinking about t trace for a long time, but it was also born of some of these I think on the e ten k, some of these awful performance

Speaker 1:

I just have the hope Steve Jobs is the same way. Jobs has developed it for himself. Right? It's just that he, and

Speaker 3:

Well, open open source software in general, you know. You develop it for yourself. In that way, there's at least one person who likes it.

Speaker 1:

That's right. Well, it does actually change the scope of your ambition. You're just like I mean, not that you're and one isn't ambitious, but you're like, you know, at least I have developed something that I myself or we ourselves like in the world. And that's something that's really really to be said to that.

Speaker 3:

And and and you're not gonna make it take 3 years to do it either. You want something sooner.

Speaker 1:

That's right. Right. Exactly. You you will act as your own accelerant. Yes.

Speaker 1:

For sure.

Speaker 3:

Yep. Yep.

Speaker 1:

Well, I I think that's

Speaker 10:

Are we sure that personal software is better software and not

Speaker 1:

just more extreme software? Like, if you ever run a fuzzer on a fuzzer, it crashes immediately. Is that's that's funny. You know, I have not run a fuzzer on a fuzzer. That is that's pretty funny.

Speaker 1:

Like, I wonder if it's just like these are the extreme pieces of software. And when it goes really well, Scratch or own image makes really great software. And when it goes really badly, we don't tell people

Speaker 3:

about it.

Speaker 1:

We are telling you that's true. It could be this is the plane. Right? This is

Speaker 3:

the survivor bias. Right? Exactly.

Speaker 1:

Yep. Yep. Yep. Well, I think that it might be a good note on which to, to leave it. I did this has been a lot of fun.

Speaker 1:

Adam is Adam, you got both of these recorded? Yeah.

Speaker 2:

I do. And we'll we'll see we'll see how it came out, and, I'll I'll I'll be posting it. We'll be posting it as soon as possible.

Speaker 1:

Thank you so much, Tom. Thank you especially to you, but thank you to everyone for for joining us. This has been a lot of fun. If you got any thoughts on how we can improve this, definitely let us know. But, definitely enjoy it.

Speaker 1:

We'll be doing it again next week, I think. Very cool. See you next

Speaker 3:

time. Yeah. Thanks, Tom. Thanks, everyone. See you next time.

Speaker 1:

Thanks, everybody. Thank you, Tom.

Speaker 6:

It's great seeing those early Spark stories.