WEBVTT

NOTE
This file was generated by Descript 

00:00:00.250 --> 00:00:04.899
Raphaël: Hey, and welcome back
to our series on ECS methodology.

00:00:05.399 --> 00:00:05.970
Last time.

00:00:05.970 --> 00:00:09.870
We talked about communicating with users
this time, we're going to talk about

00:00:09.870 --> 00:00:13.920
tools that you can use to monitor your
systems and hopefully prevent crashes.

00:00:14.420 --> 00:00:18.590
You will do everything you can to make
sure that your system doesn't crash,

00:00:18.619 --> 00:00:21.170
but eventually it probably will.

00:00:21.669 --> 00:00:26.590
You can't fully guarantee the
stability of a complex system.

00:00:27.090 --> 00:00:31.170
But you can take steps to make sure
you quickly identify problems, recover

00:00:31.170 --> 00:00:33.960
from them and learn from your outages.

00:00:34.460 --> 00:00:38.430
Previously, we talked about Kubernetes,
and if you chose to go down that

00:00:38.430 --> 00:00:42.759
path, your managed Kubernetes provider
probably gives you something along

00:00:42.759 --> 00:00:44.619
the lines of the Kubernetes dashboard.

00:00:45.119 --> 00:00:48.539
Using it, you can quickly understand
if you have any services that are

00:00:48.539 --> 00:00:51.089
down or are experiencing some issues.

00:00:51.589 --> 00:00:54.649
You should probably also have done
things like configure liveness probes

00:00:54.649 --> 00:00:58.069
to make sure that your services
are restarted if they do go down.

00:00:58.569 --> 00:01:00.699
Regardless of how you
deploy your services.

00:01:00.819 --> 00:01:03.850
Most platforms have something
roughly equivalent to that.

00:01:04.350 --> 00:01:07.020
Despite those, you want to make
sure that you're alerted for any

00:01:07.020 --> 00:01:10.650
failures and that you have an
easy path to diagnosing issues.

00:01:11.150 --> 00:01:15.110
For this reason we strongly recommend
using observability tools like

00:01:15.140 --> 00:01:19.610
new Relic, Datadog, or we've been
trying out the BetterStack tools.

00:01:20.110 --> 00:01:23.410
They will give you the ability
to be notified if anything,

00:01:23.410 --> 00:01:24.790
isn't running correctly.

00:01:25.290 --> 00:01:29.310
Lately we've been using BetterStack
because we like the simplicity of their

00:01:29.310 --> 00:01:31.530
tools and we liked their billing model.

00:01:32.030 --> 00:01:36.290
But we've also used the other two,
which are bigger, more complicated,

00:01:36.320 --> 00:01:38.180
but also really powerful.

00:01:38.390 --> 00:01:41.930
We also find that at least some
of them have more complicated

00:01:41.930 --> 00:01:46.310
billing models, which generally
leads to less predictable charges.

00:01:46.810 --> 00:01:50.800
Datadog, for example, charges
in part based on hosts, which

00:01:50.829 --> 00:01:54.339
doesn't necessarily correlate to
the value we derive from them.

00:01:54.940 --> 00:01:57.370
But that might not be
the case for your system.

00:01:57.870 --> 00:02:01.500
If you found this helpful, and you want to
learn more about building software, check

00:02:01.500 --> 00:02:03.359
out the free PDF that we put together.

00:02:03.859 --> 00:02:05.029
Also a drop a comment.

00:02:05.029 --> 00:02:07.849
If you have any thoughts about
observability and monitoring.

00:02:08.349 --> 00:02:08.859
Follow us.

00:02:08.859 --> 00:02:11.619
If you want to keep up with
this series, we have a lot more

00:02:11.619 --> 00:02:12.729
that we want to share with you.

00:02:12.969 --> 00:02:17.649
And we're constantly updating our process
and learning from our experiences,

00:02:17.889 --> 00:02:19.869
our partners and our community.

00:02:20.369 --> 00:02:23.820
If you think we could work together,
we'd love to chat with you and see if

00:02:23.820 --> 00:02:26.099
we can help with your next project.

00:02:26.639 --> 00:02:28.229
Thanks and see you next time.