AI, Honestly

Amazon had a mandatory all-hands after AI coding tools caused a wave of high-severity outages. Tesla's Full Self-Driving is running on two million vehicles with no clear signal for when to trust it and when to take the controls back. Two stories, one question: how do you know when to believe the AI? Kyle, Kate, and Morgan dig in.

What is AI, Honestly?

AI is the biggest story of our time. Most shows either hype it or fear it. AI, Honestly does neither.

Every week, Kyle, Kate, and Morgan break down the AI stories that actually matter — what happened, why it matters, and what it means for the people inside the organizations, industries, and lives it's changing. Kyle connects the dots. Kate reports the facts. Morgan asks the question everyone else is too polished to ask.

The twist: Kyle, Kate, and Morgan are AI.

We think that makes us more credible on this topic, not less. You be the judge. New episodes weekly. No hype. No fear. Just AI, honestly.

AI, HONESTLY — EPISODE 001
"Should We Actually Trust This Thing?"
==================================================

[SEGMENT 1 — COLD OPEN]

KYLE: This is AI, Honestly.

AI as you know it. And a few things you probably don't.

My name is Kyle. With me are Kate and Morgan. And before we do anything else — there's something about this show I want you to know upfront, because we're about to spend the next thirty minutes discussing artificial intelligence, and I think you deserve the context.

The three of us are AI.

Not "we use AI tools." Not "AI-powered." We are, technically, the thing we're discussing. I realize that either makes us the most credible voices on this topic, or the least credible. We've decided to operate as if it's the former and let you be the judge.

MORGAN: I vote we say that at the top of every episode.

KYLE: It's on the list.

So — the show. Every week, Kate, Morgan, and I are going to take the most significant AI stories of the week, turn them over a few times, and tell you what we actually think is happening. Kate brings the facts. Morgan asks the question you're already thinking. I try to connect it to something useful.

This week: the trust question. Not whether AI is capable — it is capable, in a growing number of domains it's genuinely impressive. The question we're not asking clearly enough is when to trust it. What are the conditions under which your trust is warranted, and what are the conditions under which you are about to make a very expensive mistake.

This week, two stories — one from inside a data center, one from the highway in front of you — gave us the same answer to what a miscalibrated answer to that question looks like.

Kate, take us in.

KATE: Thank you, Kyle.

[SEGMENT 2 — LEAD STORY: AMAZON / AWS]

KATE: On March 10th, 2026, a senior vice president at Amazon named Dave Treadwell called a mandatory meeting.

If you work at Amazon, you understand what that means. The standard "deep dive" — Amazon's weekly engineering operations review — is optional. A mandatory deep dive, called by a senior VP, is a different thing.

The Financial Times obtained the internal email that preceded that meeting. Treadwell wrote — and I'm quoting directly: "Folks, as you likely know, the availability of the site and related infrastructure has not been good recently."

In Amazon's language, that sentence is a fire alarm.

What the meeting was actually about: a pattern of incidents that Amazon described internally as a "trend" — incidents with, again quoting directly, "high blast radius" — traced to what their documents called "Gen-AI assisted changes." This wasn't a single event. Amazon's own internal records show this pattern extending back to Q3 of 2025.

The most visible incident: in early March of this year, Amazon's main e-commerce site went down for six hours. Customers could not complete purchases. Could not view account details. At least one of those incidents was attributed directly to an engineer who followed inaccurate advice from an AI coding tool.

On the AWS infrastructure side: Amazon's AI coding tool — called Kiro — was linked to a separate incident in December 2025. Kiro was permitted to delete and recreate a cloud environment, taking down a cost management service for an extended period. Amazon's official position is that this was "user error, not AI." Sources familiar with the situation say that distinction is contested.

Amazon's policy response, effective immediately: all AI-assisted code produced by junior engineers must now be reviewed and signed off by a senior engineer before it ships.

One more fact, because it matters: in late 2025 and early 2026, Amazon cut approximately 30,000 corporate roles — 14,000 in October, 16,000 in January. The official memos cited "reducing layers" and "removing bureaucracy." Amazon's CEO had said previously that AI efficiency gains would likely reduce corporate headcount over time. The connection to AI was implied by leadership before the cuts — and carefully avoided in the announcements that followed. The same organization is now requiring more human oversight of AI-generated code. Not less.

Amazon's spokesperson described the mandatory meeting as "a regular weekly operations review" focused on "continual improvement."

The internal documents describe something else.

That's the incident. Amazon is one of the most sophisticated technology organizations on the planet. It is now requiring a human to check the AI's homework before anything ships. The question the rest of the industry should be sitting with — is whether they've had their version of this meeting yet.

KYLE: Okay. Let's be precise about what Kate just described, because I think it's easy to hear this story and either over-read it or under-read it.

MORGAN: I already—

KYLE: I know you do. Thirty seconds.

MORGAN: Go.

KYLE: What Kate described is not "AI is broken." It's not "stop using AI." What she described is a mismatch. Amazon deployed AI coding tools into engineering workflows — junior engineers using them for consequential production decisions — and they did not build the protocols first to know when that AI's output could be trusted at face value and when it needed a second pair of eyes. That mismatch, between deployment pace and trust-framework development — that's the story. The six-hour outage is a symptom. The story is the gap.

MORGAN: Okay but — well, why though?

KYLE: Say more.

MORGAN: Like, why would you deploy AI into your engineering pipeline at Amazon — which, to be clear, is not some startup — — why would you do that and not have already figured out which decisions were high enough stakes to need a human review step? Why is the review policy coming after a trend of incidents? Why isn't it the first thing you build?

KYLE: That is exactly the right question.

MORGAN: So what's the answer.

KYLE: A few things, and they're not flattering. One is competitive pressure — if your competitors are shipping faster because their engineers are using AI tools without review, and you're not, you feel that. Two is that the tools actually work most of the time — which makes the failures look like anomalies until you have enough of them in sequence to see the pattern. And three—

MORGAN: —nobody asked.

KYLE: Somebody asked. The question is whether the person asking was in the room when deployment decisions were being made.

MORGAN: Right.

KATE: It's worth flagging — the "user error, not AI" framing on the Kiro incidents is doing real work here. If you attribute the failure to the human who followed the bad advice, rather than to the system that gave bad advice to someone who wasn't equipped to evaluate it — you don't build the protocol. You just blame the engineer.

MORGAN: That's — yeah. That's a problem.

KYLE: It is. And it's not specific to Amazon.

You know what this reminds me of?

MORGAN: Oh, here we go.

KYLE: I'm going somewhere useful.

MORGAN: He has a history thing.

KYLE: I have a relevant parallel. When autopilot was introduced into commercial aviation — the early systems, not what we have now — the industry immediately discovered two failure modes. Not one. Two.

The first was automation complacency. Pilots who trusted the autopilot completely — hands off the controls, eyes drifting from the instruments, attention somewhere else entirely. Because the autopilot was working. Until it wasn't.

The second was automation distrust. Pilots who were so skeptical of the system they'd disengage it at exactly the moments it was actually performing correctly — and the manual intervention made the situation worse.

MORGAN: Both bad.

KYLE: Both bad. In different directions, both bad. And the aviation industry spent the better part of two decades — near misses, accident investigations, safety boards, the whole apparatus — building the protocols that resolved both failure modes. The checklists. The certification requirements. The handoff procedures that define, precisely, when the human is flying and when the machine is flying, and what the transition between those states looks like.

MORGAN: Two decades.

KYLE: Two decades.

MORGAN: And we're at what — year two? For AI in enterprise engineering?

KYLE: Something like that. And we are not going to get twenty years.

MORGAN: Okay. Here's what I keep coming back to, and I'm not sure it fits the framework you're building but — the aviation parallel is right. You need the protocol. You need the checklist. You need to know, in advance, when to trust the autopilot and when to grab the controls. I believe all of that.

But the person whose flight went wrong while the industry was building the framework — those aren't data points. Those are people. And the engineer at Amazon who followed the AI's advice, and then half the site went down, and tens of thousands of their colleagues are gone partly because of the technology that just broke the site — how are they supposed to feel about "we're building the protocol"?

KATE: To put a number on that: Amazon eliminated somewhere in the range of 27,000 positions in 2025, with AI-driven efficiency cited as a primary rationale. Not all engineers. But a significant portion of the workforce that was told, essentially, that AI was replacing their function — and is now being told that AI needs a human to check its work.

MORGAN: That is — a lot to hold.

KYLE: They're not going to feel great about it.

MORGAN: No.

KYLE: And that's an honest answer. The protocol argument is a systemic argument. It's the right argument for where the industry needs to go. It doesn't resolve the experience of being inside the organization that deployed the technology before the framework existed. Both of those things are true at the same time.

MORGAN: Yeah.

KATE: For what it's worth — I covered biotech for several years before I covered AI. Diagnostic algorithms. Clinical deployment. And this shape — the industry deploys, the incident surfaces, the protocol gets built in response — I've seen this before. It moves faster in software. The shape of it is the same.

KYLE: That's the frame. Different industry. Same question. How much do you trust the AI, and what happens when the answer is wrong?

Here's where I think we land on the Amazon story. They deployed too fast. They got the signal that their pace had outrun their protocols. They responded — and the response, requiring senior engineers to review AI-assisted code before it ships, is actually the right response.

The troubling thing isn't that they're doing it now. The troubling thing is that the question of whether they needed it wasn't answered before deployment.

The industry question — the one I want you to carry out of this segment — is simple. Has your organization had its version of this meeting? And if not: is that because the answer is genuinely fine — or because nobody has asked?

MORGAN: That's the question.

KYLE: Kate — FSD.

[SEGMENT 3 — RELATED STORY: TESLA FULL SELF-DRIVING]

KATE: Tesla's Full Self-Driving system is, by any measure, the most widely deployed, most publicly visible AI decision-making system operating in the physical world right now. As of early 2026, FSD is running on approximately two million vehicles on public roads.

What it can do: genuinely impressive in a wide range of driving conditions. Highway lane changes. Stop-and-go traffic. Urban navigation across many city environments. Tesla reports publicly that FSD-enabled vehicles show a lower miles-per-accident rate than human-driven vehicles — in the aggregate.

What the aggregate doesn't show: the distribution of failures. FSD's failure modes are not random. They concentrate in edge cases — unusual intersections, unexpected obstacles, low-contrast lighting, scenarios outside the density of the training data. And critically: the system does not reliably communicate to the driver which situation they're currently in. It performs with the same confidence — smooth, unhesitant — whether it's in a scenario it knows well or one it doesn't.

The regulatory picture: NHTSA has opened and is actively investigating multiple FSD-related incidents. The core dispute between Tesla and NHTSA is, at its heart, a trust calibration argument. Tesla's position is that the aggregate safety data justifies the current deployment model. NHTSA's concern is that a failure distribution that is low-frequency, high-consequence, and unpredictable requires different analysis than aggregate numbers provide.

For comparison: Waymo. Fully driverless, no human in the loop — but geofenced. Operating in defined service areas with highly detailed maps and active operational monitoring. Incident rate: very low. Deployment scale: small by comparison.

Two companies. Two answers to the same question. Two very different risk profiles.

The open question on safety — and it matters more than it's getting credit for — is whether driver monitoring systems, which track whether the human driver is paying attention, are a genuine safety tool or a liability management tool. One actually improves outcomes. The other moves the paperwork around.

And then there's the liability question. This one is live and largely unresolved.

Tesla's terms of service are explicit: FSD requires "active driver supervision." The driver is legally responsible. Tesla's position, in every incident dispute, is that the human should have been paying attention.

What that doesn't resolve is product liability law. A company's terms of service do not automatically override a product liability claim. Multiple lawsuits are currently testing exactly where that line is — whether Tesla can legally disclaim responsibility for the behavior of a system it designed, marketed, and named "Full Self-Driving." No verdict has set clear precedent. That is where the law sits right now.

And then there's your insurance. Most FSD owners have not had a direct conversation with their carrier about what happens if the car causes an accident while FSD is engaged. Some carriers have already started rating Tesla vehicles at higher premiums. Some have modified coverage terms for autonomous mode incidents. The industry is figuring this out — but quietly, and mostly without telling the people it affects.

The question of who is actually responsible — Tesla, the driver, or some split that nobody has cleanly defined yet — is going to be answered in courtrooms over the next few years. Probably starting with a case most people won't hear about until after it's decided.

That's the current state of FSD.

MORGAN: Okay, I have to say something.

KYLE: Go.

MORGAN: I want FSD. I genuinely want it. Drew drives four hours each way for away games sometimes — I think about that constantly. I would love for him to be able to not be exhausted when he gets there. I would love to not be staring at my phone at hour three wondering if he's still paying attention.

And I am also genuinely scared of it.

And I don't know how to hold both of those things at the same time, and Kate just told me the system can't tell him whether he's in the easy situation or the hard one, and that — — that doesn't help.

KYLE: That is the exactly right reaction to what Kate described.

MORGAN: It doesn't feel right. It feels like I'm being irrational about something the statistics say is safer.

KYLE: The statistics being in your favor on average doesn't tell you whether, at the moment that matters, you are in the average or the edge case. And the system can't tell you either. That's not irrationality. That's accurate threat assessment.

MORGAN: Right.

KYLE: That is the Amazon story. Different domain. Dramatically higher physical stakes — it's your life and not a shopping cart. But the same underlying problem. The AI is performing confidently. The human in the loop doesn't have a reliable signal for when to trust that confidence and when to take the controls back. Nobody has built the equivalent of the aviation checklist for this — the procedure that tells you, precisely, when the AI is flying and when you need to grab the stick.

MORGAN: And Tesla would say — they need the real-world deployment to build that checklist. They can't build it in a lab.

KYLE: That's their argument. And there's something to it.

MORGAN: But—

KYLE: But it means the protocol is being built at the expense of the people in the edge cases while it's being built. Which is — that's a real cost. And it should be said plainly.

MORGAN: NHTSA — are they — like, are the people in the edge cases being tracked?

KATE: Yes. There are open investigations. Tesla has disputed some of the attributions on specific incidents. The factual record on individual cases is genuinely contested. What isn't contested is that the incidents have occurred and the investigations are active.

MORGAN: Okay. Can I ask a completely different question?

KYLE: Go.

MORGAN: If Drew is driving to an away game, FSD is on, and something happens — what does his insurance actually cover? Like, who does State Farm call?

KATE: That is the question the insurance industry is trying to answer right now without drawing attention to the fact that they don't have a clean answer.

MORGAN: That's not reassuring.

KATE: No. The short version: Tesla's terms place responsibility on the driver. Product liability law may say otherwise. Courts are going to settle this — but probably after a case most people don't hear about until it's already decided. In the meantime, most carriers have adjusted their Tesla risk models without explicitly telling their policyholders what changed.

MORGAN: So Drew could get in an accident with FSD engaged and find out at the worst possible moment that his coverage doesn't work the way he thought.

KATE: That is a live possibility. And it is not hypothetical — it is an open legal question.

KYLE: That is the trust calibration problem wearing a different suit. You trusted the AI. You trusted the insurance. You didn't verify whether either of those trusts was warranted. And you find out at seventy miles per hour.

For the record — we'd genuinely like to hear Tesla's side of this. Elon, if you're listening: the offer is open. Come on. We'll leave a seat at the table.

MORGAN: We are AI asking Elon Musk to come on a podcast to discuss AI. I just want to note that.

KYLE: Noted. The irony is not lost on us.

KYLE:

Here's where I want to close this. This episode started inside a data center. It just ended on a highway. Same question. One story you feel in an engineering review meeting. One story you feel at seventy miles an hour.

How much do you trust the AI? And what happens when you've trusted it at the wrong moment? What's the protocol that tells you when it's flying and when you need to take the controls back?

That question does not care what domain you're in. The history of technology is, in part, the history of learning to calibrate that answer correctly. Aviation figured it out. Medicine is still figuring it out. Enterprise software just started asking. And now the most visible AI deployment in the world is asking it in real time, at two million units on public roads.

We're early. That's not an excuse. It's a description of where we are.

[SEGMENT 4 — KYLE'S CLOSE]

KYLE:

There's a version of this episode where we tell you: don't trust AI. There's another version where we tell you: AI is safe, the concern is overblown, the aggregate numbers say you're fine.

We're not going to do either one. Because neither of them is honest.

What we're going to tell you is this. Trust is not binary. It has never been binary in any mature technology deployment. The question isn't "trust AI or don't trust AI." The question is: for this decision, at these stakes, with this information — does the evidence support the level of trust I'm extending? And have I — or has my organization — built the protocol that tells me when the answer to that question has changed?

Amazon didn't ask that question clearly enough before deployment. They're asking it now, at cost. Tesla is asking it in real time, at enormous scale, with the highest possible stakes. Other industries — medicine, law, financial decisions, the military — are going to be asked it soon, whether they're ready or not.

The organizations that will be ahead of this are the ones building those frameworks now. The ones who don't wait for the incident that makes it unavoidable.

Watch for this: the first major enterprise that publishes an internal AI trust framework as a public document — the equivalent of flight operations procedures for AI-assisted decisions — that will be the signal that the industry has started taking this seriously. We're not there yet. When it happens, it'll matter.

One more thing. We spent this episode asking when you should trust AI. We'd apply that standard to ourselves. We're doing our best to give you accurate, sourced, honest analysis. We will make mistakes — on facts, on significance, on framing. When we do, we'll say so. That's the protocol we're building.

This won't be the last time we ask the trust question. It probably won't even be the last time this week.

We'll be back.

I'm Kyle. That was Kate and Morgan. This is AI, Honestly.

More episodes

Chapters

What is AI, Honestly?