I talk with Coleman Stavish and Julianna Ianni from Proscia about data-driven pathology. Coleman is the co-founder and CTO of Proscia and Julianna is the VP of AI Research & Development. We discussed the importance of quality control systems in an ML pipeline, model generalizability, and how the regulatory process affects ML development.
Learn how to build a mission-driven machine learning company from the innovators and entrepreneurs who are leading the way. A weekly show about the intersection of ML and business – particularly startups. We discuss the challenges and best practices for working with data, mitigating bias, dealing with regulatory processes, collaborating across disciplines, recruiting and onboarding, maximizing impact, and more.
Heather Couture: Welcome to Impact
ai, the podcast for startups
who wanna create a better future
through the use of machine learning.
I'm your host, Heather Couture.
Today I'm joined by Coleman Sav
and Giana Iani from Prosha to
talk about data driven pathology.
Coleman is the co-founder and CTO of
Prosha, and Juliana is the vice president
of of AI research and development.
Welcome to the.
Julianna Ianni: Thanks it.
Here.
Coleman Stavish: Thanks for having us.
Heather Couture: Coleman, could you
share a bit about your background and
how that led you to create Prosha?
Coleman Stavish: Sure.
You know, I think for a lot of people
who have the opportunity to found or work
on a startup company, the opportunity is
at the intersection of your interests,
uh, the people you know, and the
willingness to work on something new.
So for me, uh, I developed an interest
in computer programming from a young age.
Uh, started working on
iPhone app development in
the early days of the iPhone.
Before and just after the app
store came out, and that led me to
pursue a computer science degree
at the University of Pittsburgh.
And during my time there, I got a
call from a long time friend of mine,
David West, who, uh, was work who I,
I've actually, I've known him since
kindergarten, so we go way back.
It's, it's kind of funny.
Uh, and he gave me a call and
we were talking about, uh, this
project he was working on at Johns
Hopkins in the field of pathology.
And, uh, there was some, there
was a software, uh, opportunity,
there was an opportunity to build
software for pathology and I
had no idea what pathology was.
Uh, but I, I read about it and I
immediately was interested and, and
saw, you know, a high potential there.
And, uh, the rest is history.
I, I really kind of never looked back
and I haven't lost interest since.
Heather Couture: Juliana, what about you?
What path led you to Prosha?
Julianna Ianni: Um, so
very different path.
Um, my, so I kind of was interested in,
in health tech and biotech in general,
um, since high school and, um, pursued
a degree in biomedical engineering.
And, um, during that
time I did an internship.
Um, biomedical informatics and was really,
really interested in, in kind of all
the things that you could do with data.
Um, but that, that internship also made
me realize that, um, the place with the
most data as I could see it, um, in, in
that area was, was in medical imaging.
And so, um, I became really interested
in MRI and, um, pursued my PhD.
Also in biomedical engineering,
but focused in mri.
Um, so I was kind of build building
algorithms, um, both on the acquisition
side of things and on the image
reconstruction side of things.
Um, but as part of that, I, um,
became really interested in machine
learning and, um, started incorporating
that into my work and, um, kind of.
Was really, really interested in
deep learning and seeing that take
off in just about every other field.
Um, but it was a little bit slower in, at
least on the academic side, um, in mri.
Um, and so I was kind of looking
around seeing, saying like, why is no
one doing this ? Um, and I I really
also wanted to get closer to, um,
the, the patient side of things.
Um, I was not patient, no
pun intended, um, in, in, uh,
seeing the impact of my work.
So, um, realized that I needed to hop
over to industry and, um, pro, pro was
a really good fit for me, um, someplace
that I could, um, still pursue, uh,
medical imaging, but, um, realize
some of the, some of the impact of.
And, uh, I, I started at Prosha five
years ago and have been, um, building and
leading out the, the AI team, um, since.
Heather Couture: So what does PR
do and why is this important and in
proven outcomes for cancer patients?
Coleman Stavish: So Heather, in
a sentence, I would say Prosha
builds software to improve
the way pathology is practic.
But what does that actually mean?
So I, I think we have to look at pathology
first and then, you know, connect that to
the, the patient outcome, uh, question.
So pathology is the crucial, is it, it's
a crucial part of the diagnostic medicine
as well as drug development and, and
preclinical research and pathology tells
us through studying tissue and disease.
It tells us who needs to be treated for
what diseases, especially cancer, which
treatments are the most appropriate
for a particular patient, and whether
those treatments were successful,
uh, for that, for that patient.
Uh, better accuracy in diagnosis means
less overdiagnosis and less under
diagnosis, which typically leads to better
patient outcomes and quality of life.
Um, and in on the other side,
in a research and preclinical
setting, pathology is crucial.
Drug development pipeline.
It's helping pharmaceutical companies
develop new treatments, um, while
assessing their safety and efficacy.
And in that, in that area, in that
arena, our software is helping, uh,
the research laboratories as well as
diagnostic laboratories transition
from analog pathology to a new
discipline, which is digital pathology.
It's more data driven, pathology,
uh, using digital images
rather than glass slides.
And with that digital imaging, uh,
shift, uh, there's, there's a whole lot
more data that's at the fingertips of,
of scientists who are working on new
treatments as well as, uh, diagnostic
pathologists who are looking to
diagnose a particular patient's disease.
And, and some of that, uh, of course
is through the use of machine learning
and another novel technology that can
be built on top of that foundation.
Now that we're in a digital.
Heather Couture: So how does machine
learning play in your technology?
What, what role, um, does it do?
Coleman Stavish: So we first thought
about machine learning, playing a role
in analyzing images of tissue, uh, that
had been captured from pathology slides.
So it's a glass pathology slide
containing human tissue or animal tissue.
And then that's been digitized,
uh, with a whole slide scanner.
And then once it's in an image form,
can we process the image, um, and,
you know, apply some sort of machine
learning model or, or pattern recognition
to it to recognize the patterns
of a particular disease or cancer.
Um, and to this, to this day, that's
still at the core of what we do.
Um, but we've recently, Broadened into
applying that similar technology into
other problems within digital pathology.
Uh, mainly to improve the general
usability and, and workflow of
what is, is really a radically new
technology for laboratory medicine.
So as, as labs are converting from,
you know, uh, a century or more of
operating in an analog fashion and
into a digital medium, there's, we
found some other problems that can be
solved, uh, through machine learning as
well, in addition to, uh, recognizing
a particular disease, for example.
Heather Couture: So one of the products
you've developed is an automated quality
control, system for whole slide images.
What kind of challenges had you
encountered with, with these whole slide
images that made this product necessary?
Julianna Ianni: Yeah, so we're, we're
seeing, um, that that many labs kind
of both on, um, the clinical side
and on the research side are having
quality issues impact their work.
So about five to 10% of slides have
some sort of quality issue in most labs.
Um, that can be anything from
like air bubbles to tissue that's
missing from the slide tissue
that's cut off from the slide.
Um, areas that are blurry
or have tissue folds.
Um, some things that may impact
what happens to this slide down the
line and some things that may not.
Um, and so a lot of labs are having
trouble dealing with this, so, um,
they've taken different approaches.
Some labs.
You'll have high enough volume that
they can, they can have someone
who's responsible for, uh, manually
performing quality control on the slides.
But even, uh, even someone who's really
experienced it that, um, maybe spending,
um, an hour to, to perform that quality
control on just 20 to 25 slides.
Um, and many labs can't afford
to have someone doing that.
It takes, you know, until, um, down the
line when there's either a pathologist
trying to diagnose the image or, um,
read the image, um, or whatever study
or, uh, you know, someone trying to
perform downstream image analysis on
that image before, uh, you realize
there's a problem and that the slide
needs to be rescanned or repressed.
Um, and so, um, a quality
controlled tool can.
Reduce, reduce the number of slides
that kind of make it to that stage.
Heather Couture: Does the,
the quality control system
itself use machine learning?
Julianna Ianni: Yes.
Yeah.
Um, we, we try not to kind of take a,
take a hammer approach to everything.
So, um, some of the.
some of the applications we have use, you
know, just traditional computer vision.
Uh, but it's kind of
a combination of that.
And machine learning
that drives application.
Heather Couture: So it's based on some
sort of training set with different,
um, quality defects, annotated
and trained either with machine
learning or with more, you know,
traditional, simpler methods perhaps.
Julianna Ianni: Yeah, exactly.
Heather Couture: So how does this
quality control system, , influence
the downstream use of machine learning?
For example, if you had no quality
control on these images and you
were applying machine learning
for some, some other purpose, what
would be the problem in doing that?
Julianna Ianni: Yeah.
So, um, that's actually something
that, um, you probably could guess
we experience quite a bit, um,
where there's, um, you know, some.
Issues with, with the slide,
um, that's being used for some
downstream, either image analysis
or, um, another AI application.
Um, and, and those
things do have an effect.
I think there's been, there's
been some research on that.
Uh, we've done some of our own work
trying to assess, um, how those,
how those issues come into play.
But um, for example, we definitely found.
Um, out of focus can affect some
of these downstream analyses.
Um, that's where the images
are a little bit blurry.
Um, there's also kind of more insidious,
um, issues with some of the slides.
So, for example, um, you'll
often find slides that have
been annotated with Pen Inc.
Um, that's something that's, that
can be quite common to do in some
settings and, um, that if you're
trying to train a diagnostic
model can really bias the model.
Um, so I, it turns out that that ink
is really something that, you know,
you may use to annotate a tumor.
But it can really trick your model into
thinking, oh, that, that Pending Inc.
Region has something to do with the tumor.
Maybe anything that has
pending on it is a tumor.
Um, so it can really, really
have an effect in some cases.
Heather Couture: I imagine that different
labs might also have a different
prevalence of, of different defects too.
Um, and that, that might
complicate things further.
For, for example, you know, slides
from a particular lab are more
likely to be on a focus or more
likely to have those pen markings.
Um, and you, you don't want the
model to pick up on those defects
instead of the characteristic
that you're trying to train.
Julianna Ianni: Yeah, absolutely.
And totally lots of variation
between labs and even, um, you
know, you may find at one lab
there's, uh, more experienced tech.
Um, and so there's, there's sort of better
quality slides or some labs have someone
cleaning their slides before scanning.
Um, there can be all those,
those kind of variations.
Coleman Stavish: with the automated
quality control product from a
research and development standpoint,
I think it was one of those happy
accidents to some extent where.
We, I, you know, there's a, a product
management, uh, someone in product
management said, Hey, it would
actually be really good if we could,
you know, have a product that could
detect these, these quality defects.
And, and Juliana, and you and your team
had kind of already built some prototypes
for this just to do your other, you know,
pre-processing work for, for some other,
for other, uh, r and d we were doing.
So it was one of those, uh, happy
accidents where we found another use
that was more general than, uh, for the
technology than we maybe originally.
Heather Couture: So your dramatic
pathology solution uses machine
learning to categorize skin disease.
. Beyond quality control one, of the
other challenges, , you see with,
whole slide images is, the variations
from different scanners, different,
different staining intensities
over time, , that type of thing.
So with this particular dramatic pathology
model, how do you ensure that you're,
, those robust variations across, across
these scanners, labs, patient populations,
any other types of variations that.
Julianna Ianni: Yeah, great question.
Um, so our, our dermatopathology solution,
which is called Derma, I do have to notice
for research use only, um, currently.
Um, but what what it does is, is
take images of skin, bio, skin
biopsies, and um, Help ensure that
cases get routed to the best, best
pathologists to diagnose that case.
Um, and it also flags, suspected melanoma
cases to help in prioritizing those cases.
Um, but how we, how we ensure this
system kind of generalizes to all these
variations that are so prevalent, um,
is really kind of a layered approach.
Um, so part of how we address
variation is the data that's kind of
the, um, the first line of defense.
Um, so we like to train our models
with data from more than one
scanner and more than one lab, uh,
to account for some of those vari.
Uh, another thing that we do is,
um, ensuring that, um, in any way
possible we can improve in distribution
performance, um, as it's correlated with
the out of distribution performance.
And, um, you know, finally it's,
it's methods that, um, you know,
have, have, give a focus to that in.
Training and development on
specifically, um, ways to improve,
um, robustness to those variations.
Um, so, uh, methods that specifically are,
are in that improving the generalization
performance during training.
And there's a few different
ways that we're doing that.
Um, my colleague Young, young
is presenting at Med Neuros.
Um, On that topic on December 2nd.
Um, so check that out
if you're interested.
Heather Couture: How does the
regulatory process in the US and perhaps
abroad affect the way you develop?
Learning models.
, I understand that not all AI products are,
have gone through regulatory process at
this point, but maybe even in the way that
you're thinking about it for the future.
Julianna Ianni: Yeah.
Yeah, it does come into play.
Um, a lot of it is, a lot of it is
stuff that, you know, we're already
doing to ensure that we have.
Um, robust technology that really kind
of survives, survives in the wild.
Um, but it's, it's something that
we have to keep in mind from the
very beginning, thinking about
developing an AI system in Europe.
Now, the I BDR is in effect.
So I think that's something
that most companies are kind of
still learning how to navigate.
But, uh, there's, there's still
some commonalities between.
Between all of all of these paths.
Uh, so it's about following
sort of the proper development
process to develop systems that
are both robust and well tested.
And that means kind of, not just
from a software perspective, but from
an algorithm perspective as well.
And I think one, one of the
heaviest impacts to development for
us, just to give you an example,
has been areas where we find.
A great level of disagreement
and the ground truth data.
Uh, so that will come out when you test.
Um, and we have to account for that
disagreement during development,
but it's kind of one of those things
that you wanna account for anyway.
Um, so yeah, lots of, lots of factors.
Coleman Stavish: So there's another
aspect of the, of the regulatory
process that's comes after the
development and validation, uh,
phases where you're actually.
Deployed the, the technology somewhere,
the, the device or product somewhere.
And, uh, the regulatory term
forward is post-market surveillance.
And that is, that applies to
any medical device, whether it's
software or AI or, or hardware.
Um, but it's, it's basically
monitoring performance.
And was there a
degradation in performance?
Was there a, um, an event, like a
negative event that may have affected.
Uh, user or the, uh, the, uh, patient
who, whose data was being analyzed.
And so I think that puts another burden
on developers of this technology.
Um, and it's, I think it's a good burden.
I think it, it makes sense.
Um, but it also requires, uh, thinking
through how, not just how are we going to
validate, but then how are we going to.
Uh, keep tabs on, on the different
deployments and, uh, ensure that we're
not seeing performance, uh, degrade
as maybe, uh, the data or the, the
conditions within the laboratory change.
Uh, so do things do drift, uh,
in terms of, uh, how a lab is
preparing their, their, their
samples and other things can change.
And if the, if the model.
Then the model may not react
well in all cases to that.
So it's, it's critical that
that performance be monitored
and, um, and, and documented.
Uh, and that's something that there,
there may actually be additional
development or, uh, tooling that's
required to really manage that well.
Julianna Ianni: Yeah, definitely.
That's.
It.
Heather Couture: So the technology you're
developing is to help the pathology
workflow, so studying real patient
samples in, in the lab, how do you
ensure that what you decide to develop
will really fit in with the clinical
workflow and provide the right kind
of assistance to doctors and patients?
Coleman Stavish: I think it's a,
it's a great question, Heather.
I, I think there's, there's.
Main aspects of it.
I see.
One is, is the, from a, from an input
and output perspective, is the model or
the product that's using, that's using
machine learning in some way, is it, is
it answering the right, right question?
Is it producing a output that.
Is valuable and is going to have
some impact, a positive one on the,
either a pathologist's ability to
diagnose a case or a, um, a laboratory
technician's ability to complete
their work in an efficient manner.
Um, or any number of other business
challenges or, or diagnostic
challenges that a lab may face.
So there's, there's that kind
of basic question, are you
solving the right problem?
Um, and, uh, that that's one.
But the other thing is, let's say
you've solved the right problem.
From a on paper, uh,
how is it being applied?
How is that solution being
applied into the laboratory?
Is it, does it have the right
hooks into the existing workflow?
Uh, is it, is it usable?
Is it, is it something that a
pathologist can access and, and.
They can, they can get through
their day with that, uh, without
having to make a big detour and,
and actually add time and effort.
Uh, so something as an example,
you know, when I, when I first got
involved in pathology, uh, I, I
had to learn a lot about pathology.
And so I read so many papers
that described these incredible
applications of, um, of image
analysis and machine learning
technology to do things like identify.
Disease and predict patient outcomes.
And yet, when I first set foot inside
a pathology lab and got a tour, I
didn't see any of that technology,
uh, not even on a trial basis.
And, uh, there were many reasons for this.
And one is, you know, I was a little
bit naive and I, you know, I was, I, you
know, later realized, well, just because
it's in a paper doesn't mean it's ready
for use on patients, of course, but.
There's also a bigger reason
I think, uh, for that.
And that was no matter how accurate
or how valuable that information is
that's produced by the model, if it's
not actually introduced in the right
way into the, the overall workflow, uh,
it's not gonna be put into routine use.
Um, labs are, are.
Very busy operations.
They don't, they don't necessarily have
time to do a lot of exploratory things.
They are, they are tightly
optimized machines.
And if you're gonna introduce new
technology, it's really has to fit,
uh, within that existing status quo.
And so for a, for a company or, or a
group of people that are developing
technology and hoping to actually see
it put into practice, there needs to
be a deep understanding of that current
status quo and a deep respect really
for the practitioners, the people who.
Not just the pathologist, but anyone
who's working in that environment, you're
asking them to, to change something,
and you really wanna make them make
the smallest change possible that's
gonna have the biggest impact for them.
So it requires that, that, that
respect and understanding of
how things are currently done.
Uh, and so, and that, that gives you the
visibility to say, well, I think this
is where we can put the technology in.
And it's, it's not, Machine learning model
with a graphical interface on top of it.
It's, there's, there's more to the
deployment methodology in terms of, we
need, we need a, a software solution that
can incorporate one or more different
machine learning, uh, modules and present
those results in the right place at
the right time to the right person.
Uh, and so that's where
as, as a company at Prosha.
We invest a lot in not just the AI
aspect or the, um, the analytic aspect,
but also in what we call platform, uh,
which is able to drive that workflow,
um, for the, for the base case,
just operate the digital laboratory.
And then that is the, the, the vehicle in
which we can introduce, um, thoughtfully,
hopefully, uh, the, the novel technology
that can, that can have some, some
positive impacts on top of that.
And so we've been really fortunate to.
Um, colleagues at Prosha who have spent
long careers working in laboratory
medicine, uh, both in, in the, as a, as a,
as physicians and as, uh, other lab staff.
Um, and we've, we've had just fantastic
feedback from over the years from
partners and customers who have,
um, in the early days, took a chance
on our new products and, and we're.
Actually use them and, and then
provide really constructive feedback.
And we've been able
to, to iterate on that.
So that's been a, that's been,
we're very, you know, thankful and
fortunate to have that opportunity.
Heather Couture: Is there any
advice you could offer to other
leaders of AI powered startup?
Julianna Ianni: Yeah, I think, uh, one
thing I would say is prepare to iterate.
Uh, so.
A solution that you build is
probably not gonna be the final
destination, the final solution.
And, um, I think the fast pace
of this field kind of demands
some constant innovation.
Um, but iterating, iterating is really
what's going to, to get you, uh, a
product that your users can actually use.
Or if it's, you know,
internally focused ai.
Um, same for your internal users.
I'd also say, um, to heavily invest in
your team, there's really nothing that
replaces having good people, um, and
very skilled people working for you.
Um, and building, building
these AI products.
Um, so that's been, that's been
one of the keys to our success.
And, um, then kind of echoing
a lot of what Coleman.
Um, if you're a user facing AI
company, just constantly getting
feedback from your users at every
stage of development and even post
development, uh, I think there's really,
really nothing that's more valuable.
Okay.
Heather Couture: What about you, Coleman?
Any advice to to ads?
Coleman Stavish: So I would echo
everything Julian has said and I
would, I would add just kind of
on the startup angle specifically.
You know, there's, there's a lot of
investment dollars, um, going after
AI oriented startups and, and that's,
that's been great for the field.
Um, but I think it's something that
we've learned our ourselves is, is how
to, is how to balance, uh, that the,
uh, the investor pitch about AI and, and
its potential with, with, uh, near and
immediate term, smaller successes that.
Build you a road to that, you
know, more, uh, ambitious future.
If, if there, if you've articulated
a vision, uh, for what you want your
AI to do in five years, that's great.
You need to have that vision.
Um, but you, it's, it can't be, it
really shouldn't be, in my opinion,
a binary outcome of we did or didn't
achieve that five year vision.
There have to be a, you.
Ideally a series of milestones that
you can, that each one is within reach.
Uh, it's maybe ambitious, but each one
is, is credibly within your reach with
your current resources, um, at that time.
And so, uh, that's something that,
that startups really need to.
Be, be critical about, they have
to, they have to critically assess
what they're capable of, um, push
themselves, but also make sure
that there are intermediate wins.
Whether that's, uh, a research, whether
that's a, uh, a research publication
that demonstrates, you know, the
technology's been de-risked in some way.
Whether that's early adopters who are
using the technology on a trial basis,
uh, whether that's, um, first revenue.
Uh, that's, you know, that's,
uh, is a sign of something that
can be grown that there's, there
is a market for the technology.
There's an, depending on the industry,
there could be different intermediate
milestones that are smaller wins.
Each of them.
You have to, you have to celebrate
those when you, when you have the
opportunity and, and make sure
that you can build that path.
So that five year vision rather
than, um, just kind of, uh, hope it's
all gonna work out in five years.
Julianna Ianni: Yeah, I love
that small winds along the way.
Heather Couture: Finally,
where do you see the impact of
Prosha in three to five years?
Coleman Stavish: So from my view,
I, you know, pro is not developing
new treatments for cancer.
We're not personally diagnosing patients
and choosing their treatment plans.
We're not doing that either.
And so I, I really hope our impact
will be measured instead in, in the
following two ways on the, on the
research and drug development side.
I hope that, As a company, we can build.
Um, build the software platforms
that house pathology data, uh, for,
for, uh, pharma and life sciences
organizations who are developing new
drugs and putting them through trials.
Uh, I also, um, and I, I think that can,
that can have, I hope that has some,
you know, positive impact in terms of
the number of the new therapies, uh, new
therapeutics that can go through that
pipeline and, and make it to the clinic.
I, you know, I think Prosha will play
a, in the grand scheme of things, um,
I think a, a, a small role in that
there's, there's so much other things
that go into drug development besides,
um, what we do at prosha, but I hope we
can play a, a role there and at least
being a broad platform that can, that
can house all that data and enable, you
know, a faster, um, research process
and, and potentially enable people to.
Do experiments that they may not have
been able to conceive prior to them
having that data all in one place.
On the diagnostic side, I, I think
we can, we hope to see impact over
a five year period in terms of, The
pathologists who are working every day
reviewing, uh, patient samples, that
they could be using our software to
help them go through that process and,
um, have a better ergonomic experience
than they currently have on a microscope
that they could have the ability to.
Diagnose cases, uh, remotely without
having and maybe assisting patients who
are in, uh, far flung, uh, areas of the
world that may not have access to, uh,
subspecialty pathologist, um, expertise.
Uh, and also to, to the extent that we can
introduce, uh, helpful technology that can
perhaps improve the quality of diagnosis.
Um, and maybe that has, For each
patient, uh, maybe a marginal,
uh, to, to, to positive impact
in terms of their, their outcome.
Maybe, maybe it means someone
gets the right diagnosis a little
bit faster, um, in aggregate.
I think that could have
a really big impact.
And I think we have to just focus on
enabling the practitioners to, to, to
work to the top of their license, um,
and, and enable the scientists within
the, the drug development field to, to
work to the top of their license as well.
And, and I think I see pro as
an enabling technology and.
You know, pushing, pushing the existing
brilliant minds to, to maybe be a bit, uh,
to have a bit more impact, uh, themselves.
So that's, that's how I see it.
Heather Couture: That's great to hear.
This has been great.
Your team at Prosha is doing some
really interesting work for pathology.
I expect that the insights you shared
will be valuable to other AI companies.
Where can people find out
more about you online?
Coleman Stavish: So our
website is uh, prosha.com.
It's P r o s C I a.com.
And from there, I think we, we have, uh,
lots of information about the company as
well as some, some of the, uh, scientific
publications that Juliana mentioned.
Heather Couture: Great.
I'll link to that in the show notes.
All right, everyone, thanks for listening.
I'm Heather Couture and I hope you
join me again next time for Impact ai.