Smart Metals Podcast

The Evolution from Data Science to Data Engineering in Industry 4.0

In this debut episode of the Smart Metals Podcasting Show, hosts Luke van Enkhuizen and Denis delve into the world of Industry 4.0, focusing on the critical shift from data science to data engineering within industrial environments. 

Denis shares his journey from a data scientist to a data engineer, sparked by the challenge of accessing and preparing data for analysis in the metal industry. He highlights the insufficiency of current systems to support the data analytics needed for Industry 4.0 and argues for a separate digital infrastructure for control and analytics. 

The discussion also covers the limitations of traditional ERP systems for real-time data analysis and the potential of a unified namespace to streamline data integration across various operational levels. 

The episode concludes with insights into establishing a robust digital infrastructure to support data-driven operations and the necessary mindset for companies embarking on their digital transformation journey.

00:00 Welcome to the Smart Metals Podcasting Show!
00:56 Diving Into the World of Data Engineering and Science
01:12 The Shift from Data Science to Data Engineering
03:42 Challenges in Industrial Data Management
06:52 Rethinking Data Integration for Industry 4.0
09:04 Learning from Past Mistakes: A New Approach to Data
15:05 The Unified Namespace: A Game Changer for Data Analytics
19:02 Concluding Thoughts and Future Directions

What is Smart Metals Podcast?

"Smart Metals Podcast," hosted by Luke van Enkhuizen and Denis Gontcharov, offers a clear and practical look into the metals industry's journey through digital transformation, Industry 4.0, and the integration of the Unified Namespace. Listeners can expect in-depth discussions that break down these complex topics into understandable segments, actionable insights, and real-world applications. Luke and Denis bring their expertise to the table, guiding you through the evolving digital landscape with advice on leveraging technology for streamlined operations. Each episode aims to empower metal industry professionals with the knowledge needed to confidently adopt digital innovations and understand the impact of the Unified Namespace in creating a more connected and efficient production environment. Join us to navigate the future of the metals industry with clarity and confidence.

Luke: Hello, and welcome to the
smart metals podcasting show,

a new show hosted in Europe by
your host, Luke van Enkhuizen and

this is a new and somewhat experimental
show where we're going to be diving

into the world of the unified names,
but digital transformation, industrial

internet things, and way more things
exploring the practical and daily

use cases of the real industry 4.

0 stack.

Is that right, Dennis?

Sounds good.

All right.

So if you're listening in and you're
coming all around the world, you probably

hear a nice mix of all kinds of funny
accents, dialogues, and even misspellings.

We are very sorry for that, but I hope
the message is going to be clear because

we want to give you the practical use
cases and the practical explanations of

what it can really mean for your company
to successfully digitally transform.

Denis: And that's what it's all about.

Getting it to work in the field.

Luke: All right.

So that being said, today's first
topic, we will be diving into Dennis's

career as a data engineer and data
scientist and what he has seen on

the field and why it motivated him
to get started into this field.

Denis: Sure.

Yeah, let's take off.

You mentioned two terms, data
science and data engineering.

And I recall on the height of the
hype wave of data science in 2018.

Everyone was going to the data science.

It was called the sexiest
job of the 21st century.

But what I very quickly noticed working
with very industrial data What's that?

It was unbelievably hard
to even get the data.

We didn't need data scientists.

We needed data engineers.

So I eventually transformed
into a data engineer.

Luke: Right.

Okay.

And so what drove you into that field?

What was the motivation for you
to pick this in the first place?

Denis: Essentially my background.

So I studied metallurgy or materials
engineering, even though I've worked

in fields that were different from it.

For instance, in pharmaceuticals,
I always felt that data only

truly spoke to me when I had the
domain knowledge to understand it.

This always drove me to fields like
aluminum manufacturing, Aluminium

smelting and metals in general.

Luke: Okay.

And so what did you see along the way that
has motivated you to, to look for better

ways to digitize and automate systems?

Denis: A pure feeling for necessity.

I quickly realized that the way
we were approaching data science

in industry was not sustainable
in a sense that we were spending

80 percent of our of the project's
time on just getting the data right.

And this didn't make sense to me.

This felt wrong.

I felt we needed a proper approach for
data integration and management that would

enable us to do efficient data science.

This kind of forced me to change my career
focus from being more of a data scientist

to someone who works as a plumber and
really gets down and dirty with the data

and builds a proper data infrastructure.

Luke: Right.

So a big challenge was that your
expectation might have been that

the data was already there, but
it really wasn't useful yet.

And in order to even get started
with your, your science on it,

you need to do a lot of cleanup
and a lot of work around this.

Denis: That's exactly the point.

We are in no way ready to do serious
data science in this industry.

Luke: Okay.

Well, that's a good
statement to start with.

So why exactly is that?

What's really wrong with the data?

Denis: data?

So essentially, the data is there.

Luke: the systems

Denis: have, they work, but they
were designed to control the

process, to steer the plant.

And that they do really well.

What we now want to do in data science
is combine data from all of these sources

and then analyze it at the same time.

And our current systems just
were not designed to provide

us with this functionality.

So the challenge essentially becomes
how do we get the data neatly in

one place so that we can use it.

All

Luke: can we maybe shed some light
on the traditional structures?

How things are done in most companies
that maybe face this issue right now?

Can you describe a bit the typical
structure that maybe isn't really working

in the way you want it to see working?

Denis: Sure.

So the good thing about manufacturing
is that essentially nearly every plant

in the world looks the same, right?

And I'm referring to the automation
pyramid, where at the very bottom,

you have data coming in from your
sensors and your PLCs, your HMI, right?

Then that data is then sent to the
next layer called SCADA, level two,

that organizes data and collects it.

Then the third layer, you have the
MES data, your manufacturing execution

system that concerns order planning
steering of machines planning.

And then the layer on top of that,
you have the ERP systems that I think

I should ask you more how they work,
but the way I see it is that they

translate the customer's order into a
production order and do the planning.

Luke: Yeah.

It, yeah, of course the EP has gone
really broad and wide in a lot of

cases, but in its essence, it's in,
in, it's a development of material

resource planning with then included
like relationships, customers invoicing,

mostly financial transactions actually
as well, coming into the play there.

Yeah, but that's, that
sums right, right up yet.

Yeah.

So of course there's many versions
of it, but in the basics, core

business sales orders, and.

Everything around that.

Denis: Right, so coming back
to your question, what's

the problem with this stack?

It works great for control.

Data travels up and down the stack.

You want, for example, to
produce a certain order.

That information travels down from your
ERP to the MAS all the way down to your

SCADA system that activates the machine.

And the feedback goes all the way back up.

Now this works fine if you
don't have too much data.

or if your data is specified, but
becomes very cumbersome if you have

to add either new data, you have
to extend the data that you send,

or if you just have a lot of data.

And what we saw is that if we try to do
it the same way as in the past, namely

by funneling this data all the way
up and down an automation stack, that

this requires a lot of hands on the
keyboard and takes a really long time.

Luke: Okay.

So it takes a lot of time.

It's not giving you the
real, real time insights.

But so let's, let's, let's,
let's let's dive a bit into it.

So practically speaking
what is the main shift then?

Like what, what, what, what, what,
what do they need to do differently

to get the results they need?

Denis: So I think the main important
philosophy that became popular in recent

years is that we have to separate the two.

Analytics from control.

We cannot and burden our control
systems with the data analysis part.

We really need to have a separate
digital world that we will use for data

collection and processing at scale.

The way that we can keep our
existing digital infrastructure

for control in place, but then
extract data from it efficiently.

without making tiresome point to
point connection and just funneling

all this data into one new place.

Luke: Okay.

And how do we do that?

Denis: So essentially the idea
of pumping all of the data in one

new place is not entirely new.

It's been trying to get done
over the last, I think, two,

three decades even in Germany.

But what made this easy is the use of.

modern between coast technologies
like MQTT that make the transfer

of MQTT is an open protocol that
allows to communicate lots of data

over the Wi Fi very efficiently.

And that makes it possible to read data
that looks very different from all of

these four levels from level one all
the way up to the ERP and level four.

Luke: 4.

Okay, okay.

And okay, this sounds really good.

So if somebody is listening right now.

And they recognize the frustration that
the siloed data is hard to work with.

They need a lot of manual activities,
and they tried all kinds of things like

point to point and bringing it up into the
stack, but having issues there as well.

Perhaps for costing timelines, et cetera.

Then, so what, what is the first thing a
company should do to, to really get going?

And should they like seek out and buy
a software or should they build a team?

What do you think is the
first way to get started?

And how did you do it actually
in your previous business cases?

Denis: in our previous business cases, we
let's start with the unsuccessful ones.

Luke: Okay.

That's a good one as

Denis: ones were the
naive approach of what?

Hey, let's just look what
other indices are doing.

You So they hired people from telecom,
people from, let's say, big companies

like Nike that have different data.

I think what makes, what worked
for them will not necessarily

work for manufacturing.

In essence, what makes manufacturing
different, it's just the vast volume

of data at very high frequency.

And then I really mean the data
from SCADA, from your machines,

all the temperatures, all your
PLCs that communicate really fast.

You don't really have that in a large
enterprise like Nike, where data is

mostly based on sales and orders.

So I think we apply things that
work there and just try to do

the same thing in manufacturing.

And it didn't

Luke: work.

So it's like hiring the big
consulting firms that give

you like a, a generic plan.

And, but that doesn't seem to fit
for the specific metal industry.

That's exactly what

Denis: So they provide an approach that
says, Hey, just build a data lake and

make a connections to this data lake.

Just get all your data to the clouds in
a data lake and it will solve all your

problems because there you have the AI

Luke: and stuff.

Mm-Hmm,

Denis: And this worked fine for the.

upper levels of the stack, like a
database, you can connect your MES

system and you will get the data there.

But this proved connecting your actual
machines to the cloud proved to be

very difficult, if not impossible.

Luke: That's also typical jargons, right?

Industry 4.

0 means letting the machines talk to
each other and connect to the cloud.

That's not really, I think, what is meant.

Maybe in the end you will be able
to access some things from the

cloud, but I don't think that
is the real plan there, right?

Denis: exactly.

It's a very technology focused approach
instead of a what's my problem approach?

Yeah.

To come back to your question of what is
necessary, I think it's always instructive

to start from a business problem and
for this many people with really.

understanding of your business,

Luke: right?

Okay.

So that's not how to do it.

It's not well, it's just not, it's
not a good idea to just take a data

lake or follow buzzwords or just,
and you just connect everything.

So what is it then?

What is the best way to approach a
business case in your, your experience?

Denis: I think we also have to
just be a bit more, Patient.

There was this gold rush five years ago.

We grabbed our shovels, went outside
and started digging, not realizing that

digging gold is actually more difficult.

I think we really need to focus
in this next wave of industry 4.

0 on first establishing a proper data
infrastructure, as boring as it sounds.

But we need the proper tools to process
the big amounts of data efficiently.

We cannot do it with
the tools we have today.

And this is a big challenge
for companies, because.

It's essentially about investing.

You're investing in a digital
infrastructure, which in itself

will not generate revenue.

It will enable you to get revenue
in the future, but that's,

I don't know how you see it.

It's a very hard sell to management.

Luke: Yeah, okay, okay.

Like, I think that I
agree with you on that.

And I think the, the most

Denis: The

Luke: The most crucial piece of
information that you need at any time

in your operations is to know exactly
what's going on at this moment in time.

And how did we do like in any measure
that is relevant to your company?

Like, as in, were we productive?

Did we do everything right?

And, and what happened exactly where?

I think those are really important
questions to answer of your entire

factory stack, and it's almost impossible.

I need to do this with a point to point
or like, you know, pushing everything

upwards into the ERP and then clouds
seems very cumbersome to get this done.

Denis: But I would like
to know your opinion.

It seems that every level of the
stack tries to be that central point

of all data flows to, why did the
manufacturers of ERP system decide

that all data should flow there?

Luke: Well, I think that's the, it's like
the highest level on the pyramid, right?

So it's always been compared to like
your ERP system is kind of like the

nervous system behind your, your company.

That's how it's been sold.

And I think.

When there wasn't a really good
solution out there for like to do

big data, it was a fair affair.

It was better than nothing.

It was a fair, fair positioning.

They're saying, okay, yes, if you're
going to centralize information, then do

it at the place where you already have
the most metadata, your customers, your.

Everything, but I think it reached
a point where there's a saturation,

where there is like this invisible
ceiling as we sometimes discussed,

where basically there is a point
where EFE just starts falling apart.

And especially when it comes to like
more real time data or like more

historical data or combining various
sources and not leaving out crucial

details that you might have missed.

Forgotten about later.

I think that's where it starts to get
very hard to put everything into ERP.

So that's my experience with that.

Denis: Yeah.

To me, it sounds like an outsider that
we're trying to make the ERP do tricks

that it was never designed to do.

Luke: Yeah, exactly.

That's the thing.

That's the point.

It's never designed in the first place.

It's a system of record.

It's not a system of experience or
it's not a system that is meant to

be dynamically altered or event based
or all these terms that you will

learn about in this show probably.

But like it's, it's not meant, I think,
to be more than a system of record.

And for your primary sales order data,
it is the transactional data, financial

data, but there is a limit to it.

I think how far you can go
into production with this.

Denis: Yeah, absolutely.

I think this is a good point
to answer the question.

Well, if it's not the
ERP, if it's not the, M.

E.

S.

Then where should data go?

And I think that's where we reach
the topic of the unified namespace.

Luke: Yeah.

A universal solution that is standing.

Not in the stack, but next to it,
connecting like a spider in the web to

with an open structure to all layers
in a way that it's the same everywhere.

Am I right in this?

Denis: You're exactly right.

And I think the phrase next to it
is super, super important because

this highlights that it's not
part of your automation stack.

We do not want it to steer your process.

It really illustrates that they want to
separate the world of data analytics with

the existing world of process control.

Luke: Yes, exactly.

I think that's also the core message
here that to look for the next

stage in your digital transformation
journey is to look for something that

you can apply in a universal way.

But still next to what you already have.

So you don't have to replace your entire
stack because first of all, it doesn't

make any sense anymore after you reach a
certain maturity in it, it works already.

So why change it?

But you do want to get more data,
more reporting, more insight.

You need to measure things because
the only way how you can improve

things, if you, if you measure them.

So there is a point where like
business intelligence, it's

getting really important for you.

Then it doesn't make sense anymore
to push everything into ERP.

Or relies solely on the business
intelligence tool and their system.

You need to get a more thoughtful
approach that is more architectural.

Correct?

Denis: Yeah, I fully agree.

You also mentioned that it's a
network indeed is a central point

to which all systems communicate.

And this illustrates Why it's superior
to point to point integrations.

Let's say an example you have Let's say
you don't have unified namespace You

are point to point integrating something
from SCADA level 2 all the way to via

level 3, the MES, to your ERP on level 4.

Now imagine if one day you decide to
change your MES system That means you will

also have to update all the connections
that run through it If now, for instance,

you had a unified namespace to which
you just connect those five layers,

you can essentially, the idea of this
unified namespace network is that all

the actors in this network don't have to
know about each other because they only

communicate with the unified namespace.

So if you send information from
SCADA to the MES, SCADA will

communicate to the unified namespace.

And the Unified Namespace will communicate
this data back to your ERP or your MES.

Luke: All right.

Okay.

So I think there's a lot of
topics here that we can explore

further in further episodes.

We started with the core
explanation of why it doesn't work.

And I think to add to that one very
crucial element that you mentioned here

is indeed that if you make point to point.

The points have to stay the same.

The end points have to say the exact same.

If one of the social systems
gets updated, changed, migrated,

replaced, you need to redo everything
in regards to that integration.

I've been doing system integrations
for like seven, eight years now.

And it's always been the same thing.

One of the software software gets updated.

There is one issue with a local
network access or there's anything

wrong with an attachment and it,
the whole thing just breaks down.

And indeed, if you in for some business
processes, you do need to point to points.

I'm not saying it's completely obsolete,
but if you want to get a whole lot of

data from a whole lot of places and for
each data set that you need, you need to

individually build that point to point.

That's going to be a very
big hassle to get done.

Denis: Yes, build point
to point for control.

But for data transfer Just get
all into the unified namespace

as quickly as possible with
as little detours as possible,

Luke: Right.

So this is, I think, this is, I
think what we can conclude this with.

So the conclusion of this first this
episode here is that we want to challenge

you for the idea that, okay, for controls
and operations, day to day, point to

point will probably, you know, Win,
but when it comes to your next stage,

when you break through the invisible
ceiling, as I would say, when you really

want to grow further, optimize further,
measure everything in a company, get

more like data as a product services.

So you want to get specific, specific
reports or like certifications on

your sort of products that you make,
then it starts to be like, you will

reach a point where that does longer
to hold up and that's where you

need to do some, something more.

architecturally correct for data
engineering, I say, by speaking through

data engineering, being able to do
your own data science on your system.

Right?

Denis: Absolutely.

The approach that got
us here today was great.

We had a lot of benefits, but it will not
get us to where we have to be tomorrow.

Luke: You want to what got you to
A, what got you from A to B will

not necessarily get you to C, right?

Absolutely.

Yeah, I think that's a great
one to end this one up.

Dennis, so much, thank you so
much for this first episode here.

I hope you also enjoyed it as a listener.

And I'll see you around
for the next episode.

Thank you, bye bye.