Cybertraps Podcast

This episode is a part of a special series of interviews conducted at the INCH360 Cybersecurity Conference in Spokane, Washington. Visit their website to learn more about INCH360 and their mission. 

Host Jethro D. Jones interviews data scientist Michael Segaline. Michael shares his journey into AI and machine learning, discusses the importance of data cleaning and analytics, and explains why understanding the limitations and proper use of AI is crucial. The conversation highlights the real-world impact of data science in cybersecurity and the value of expertise in an evolving tech landscape.

We’re thrilled to be sponsored by IXL and Renaissance. 

IXL’s comprehensive teaching and learning platform for math, language arts, science, and social studies is accelerating achievement in 95 of the top 100 U.S. school districts. Loved by teachers and backed by independent research from Johns Hopkins University, IXL can help you do the following and more:
  • Simplify and streamline technology
  • Save teachers’ time
  • Reliably meet Tier 1 standards
  • Improve student performance on state assessments
🚀 Ready to see why leading districts trust IXL for their educational needs? Visit IXL.com/BE today to learn more about how IXL can elevate your school or district.

Learn more about Renaissance:
 
As a global leader in education technology operating in more than 110 countries, Renaissance is committed to providing educators with insights and resources to accelerate growth and help all students build a strong foundation for success. We believe that technology can unlock a more effective learning experience, ensure that students get the personalized teaching they need to thrive, and help educators and administrators to truly, fully, See Every Student. Learn more at renaissance.com.


What is Cybertraps Podcast?

We explore the risks arising from the use and misuse of digital devices and electronic communication tools. We interview experts in the fields of cybersafety, cybersecurity, privacy, parenting, and technology and share the wisdom of these experts with you!

Welcome to the Cyber Trapps Podcast.

I am your host, Jethro Jones.

We're here at the Inch 360 Conference.

I've got Mike Seline here.

Mike, welcome.

Why don't you tell us a little bit about who you are and what you do, and what brought you to Inch 360?

Thanks for having me on Jethro.

This is awesome.

Yes, I am a data scientist.

What I do is I am a chemist for data.

I go out into different repositories and I get data.

It doesn't really matter how I get the data, whether it's scraping or pulling from a database.

Either way, once I get the data, then I have to clean the data, structure it, engineer it.

Then I run it through machine learning models to find predictions and classifications based on whatever the data is and whatever problem is I wanna solve.

Usually when it comes to cyber, the world of cyber, I created A

machine learning

model called a random forest, which is really very powerful.

It's more powerful than the large language models when it comes to basic prediction or classification.

In fact it's steroid it out brother xg boost, which is just decision trees on, on steroids is very good for prediction.

So I was able to predict.

I should say classify with a very high accuracy percentage, whether or not something was malware based on the con logs of the system's information's emergency management system.

Fascinating.

Cool.

Yeah.

Yeah,

that's great.

So what brought you to this conference today?

The key word

ai.

Yeah, so a data scientist is a person that creates ai

And my hope usually is to come here and find other data scientists.

So I saw, met one guy today.

He was up there talking Curtis Shelton.

legit.

Yeah.

Just talked to him.

He was blowing my mind.

Good stuff.

pretty damn legit.

He we agree on everything so far because he's a data scientist.

Yeah.

Even though he goes, he explains things a different way.

one of the hard things that we have in industry, is trying to communicate value to people and communicate what it is we actually do.

So years ago when I got out of the Marines, I said, I want to be a guy who creates ai.

So I looked at, I looked that up online.

What's the name of a person who creates ai?

And it kept populating the same answer, which was.

Data scientist.

I'm like, what the hell is a data scientist?

mix of a statistician?

You have to become a statistician and a computer scientist.

I'm like, shit, man.

That sounds like a really hard uphill battle.

That's a hell of a barrier to entry, but I sucked it up Master's degree later in data analytics.

I'm a data scientist and I, it has been a really unique world

Going into.

Anywhere that's applicable,

But

it's cool that a lot of these guests keep going back to machine learning.

And that's what I liked about Curtis is he called it machine learning.

And he said he hates calling it AI because he knows AI is not real.

And AI is the marketing term that clicked.

Yes.

It's the marketing term that clicked.

It's the repackaging of statistics.

Yeah.

To make it look sexy because courts attention

and it creates grabbing committees of equally clueless people trying to regulate it somehow.

Yeah.

It's cool though.

It's interesting to see the, what people talk about and the local tech scene.

There's a, the, some companies that are here that are

developers of software, a lot of them are.

A lot of them just play with I call 'em the rag queens.

So play with the retrieval automated they play with the rag database

In large language models.

And Curtis was talking about that and somebody else was talking about that.

Is the danger of loading your data into these large language models?

Especially your clean data is a digital asset class.

That's what people are not talking about and this is something that's gonna be a serious thing in future.

It's gonna change from people talking about AI to people talking about data.

Because AI doesn't run on hopes and dreams.

It runs on data.

And your ability to take that data and engineer it and clean it into something that's usable.

And convert into data products is gonna be what separates you from the other industry people.

I mean, put you leaps and bounds ahead.

/
And especially as people create more stuff with ai.

Having clean human generated stuff as opposed to AI generated stuff is just a more important thing.

Because if you're training on the AI stuff, then that, to me is a problem.

Yeah.

Absolute analytics beats ai.

Absolutely.

Yeah.

Absolute analytics beats ai absolutely every time that, for example if I'm dealing with voting data, so I'll have to deal with voting data all time, and I get in daily voting data from the county.

I have to run it through a data cleaning script.

In order to get the exact absolute analytics, I want the exact count, I want the top 20 precincts.

So for example, let's say I go code, I get yeah, absolute top 20 precincts, and then I take that same data set of the county voting data cleaned.

It could be cleaned to totally cleaned.

And upload it into a large language model and ask it for the top 20 precincts, and it won't give me

yeah,

the top 20 precincts that are accurate.

that's that's something that people need to talk about is what are the limitations of these systems?

What are the capabilities and limitations and what you're seeing and what I've, what I'm happy about seeing over time is just.

The, charlatans have slowly weeded themselves out, and the real experts are starting to come to the surface, and you can tell who's an expert, not by what they say, but what they never say, what they never talk about.

So the charlatans will never talk about data cleaning

Yeah.

and data governance in day to data.

So that it, so that's

that, that was one of the things that I really appreciated about Curtis was that he was very, he stood out as somebody who really knew what they were talking about and.

Really understood what the issues were.

And and then I talked to him for like 10 minutes after our interview and was like analyzing a specific problem that I was having and he just laid it out so perfectly what I was trying to grasp at that I couldn't quite get.

And he gave me the pathway and I was like, hallelujah.

This guy saved me probably hours and hours of trying to figure this out on my own.

Because he just had the answer right off the top of his head.

'cause he's done this a million times.

It was awesome.

Yeah, he's a legit.

Well, he knows statistics.

He knows enough to be dangerous.

And that's what separates the pros from the amateurs in game is your ability to know statistics.

'cause everything's based on inferential statistics.

Yeah.

Yeah.

So understanding is huge.

Yeah.

Well, thank you for being part of the Cyber Traps podcast.

How would you like people to connect with you and learn more from what you're doing?

awesome.

Thanks for having me on.

You can find me data mining Mike, anywhere online, Google Data Mining.

Mike, check me out on YouTube Data Mining Mike.

sure to like subscribe and smash that bell for more data driven updates.

You find me on LinkedIn as Michael Sini or everything points back to me.

Data Mining Bank, trying to own that.

There you go.

Data Mining Mike.

Here we go.

Thank you, Mike.

Appreciate it.