The AI Briefing is your 5-minute daily intelligence report on AI in the workplace. Designed for busy corporate leaders, we distill the latest news, emerging agentic tools, and strategic insights into a quick, actionable briefing. No fluff, no jargon overload—just the AI knowledge you need to lead confidently in an automated world.
Today we're going to have a quick discussion
about uptime and how businesses are
leveraging Claude, Codex, you
name it, in their organisations because the other day,
Saturday, Sunday, whatever day it was, over the
weekend, there was quite severe outage at Anthropic.
Not for the first time, not for the
last time, I'm sure.
Now the question that I have is of
course like if AWS went offline or
when Azure goes offline, you know, occasionally cloud
services drop out and organisations grind to
a halt because they depend so deeply on those
platforms to be able to deliver, you know,
either internally or externally the software or
the information they're providing.
But when Anthropic drops offline, which happens more
often than I think Anthropic would like to
admit, and I asked the question the other
day also which is, you know,
do they really, like, is
the offline because it's a spiking compute or
is it because really their SREs aren't that
good and something goes wrong on the other end?
I am curious because obviously if Netflix went offline,
their bottom line would drop out because no
one would use it and they don't move somewhere else. So, you know,
with these organisations, depending on Anthropic,
when it drops offline, do the
other mission critical systems running on Anthropic or,
you know, on chat GPT or whatever, like
from a from a real application integration standpoint,
when things go offline and at Anthropic, how
many people actually notice or do they use different services?
Like do you use a configurable back end
that allows you to flip seamlessly between a
running a model in Anthropic and bedrock or
foundry or whatever?
Like how do you ensure the uptime and
stability of your business critical application if it
depends on an LLM for its execution today?
Obviously, you've got services like cursor who, you
know, both leverages leverage, I believe, their own
models plus some from Anthropic and elsewhere.
As people build on top of these frontier
models, how do you build it to
make sure that your stuff doesn't go offline?
There are obviously ways.
There are many ways to be able to
like, you know, load balance or flip between
different models if you need to.
But the fact that becomes that it's a
bit like doing multi cloud, like you wouldn't
necessarily do multi cloud, unless of course, you
really had a business reason to do it.
Now, if your business depends very heavily on
an LLM to be able to provide insight,
do you just suck it up when it goes offline?
Do these organizations just suck it up?
Or do they have, you know, different ways
of being able to load balance across the
available services, while still providing the same outcome
to the business?
There are, like I said, there are answers
to this question.
I'm just sort of posing it as a
more general thought that hopefully people
can opine on because as you as a
business start to depend more and more on
LLMs, you need to also consider what happens
when they are not available.
If you'd like to know more, if you'd
like to come and ask me some questions, feel free.
My website is conceptcloud .com.
My name is Tom.
This is the AI briefing.
Thank you very much for joining me.