Proteomics in Proximity

Welcome to the Olink® Proteomics in Proximity podcast! 
 
 
Below are some useful resources from this episode: 
 
Published study of primary focus
Koprulu M, Carrasco-Zanini J, Wheeler E, Lockhart S, Kerrison ND, Wareham NJ, Pietzner M, Langenberg C. Proteogenomic links to human metabolic diseases. Nat Metab. 2023 Mar;5(3):516-528. doi: 10.1038/s42255-023-00753-7. Epub 2023 Feb 23. Erratum in: Nat Metab. 2023 Mar 19;: PMID: 36823471; PMCID: PMC7614946. https://pubmed.ncbi.nlm.nih.gov/36823471/
 
Laboratory, first author, and corresponding author of the study
·         Public Health University Research Institute (PHURI), a multidisciplinary research center to drive personalized healthcare: https://www.qmul.ac.uk/phuri/about/
·         Mine Koprulu (first author), PhD student, University of Cambridge: https://www.linkedin.com/in/mine-koprulu-497659b9/  
·         Dr. Claudia Langenberg (corresponding author); Director of PHURI, Queen Mary, University of London; Professor of Computational Medicine, Berlin Institute of Health at Charité: https://www.qmul.ac.uk/phuri/our-people/professor-claudia-langenberg/ 
 
Olink tools and software
·         Olink® Explore 3072, the platform that measured proteins in this study with a next-generation sequencing (NGS) readout: https://olink.com/products-services/explore/
 
UK Biobank Pharma Proteomics Project (UKB-PPP), one of the world’s largest scientific studies of blood protein biomarkers conducted to date, https://www.ukbiobank.ac.uk/learn-more-about-uk-biobank/news/uk-biobank-launches-one-of-the-largest-scientific-studies 
 
Genotype-Tissue Expression (GTEx) project, a biobank and open-access database to study tissue-specific gene expression and regulation: https://www.gtexportal.org/home/
 
European Prospective Investigation into Cancer (EPIC)-Norfolk study, a prospective cohort of middle-aged individuals from Eastern England: https://www.epic-norfolk.org.uk/ 
 
Genome Aggregation Database (gnomAD), the largest publicly available collection of population variation from harmonized exome and genome sequencing data: https://gnomad.broadinstitute.org/ 
 
 
Would you like to subscribe to the podcast on your favorite player or app? You can do so here: 
Apple Podcasts: https://apple.co/3T0YbSm 
Spotify Podcasts: https://open.spotify.com/show/2sZ2wxO... 
Google Podcasts: https://podcasts.google.com/feed/aHR0... 
Amazon Music: https://music.amazon.com/podcasts/d97... 
Podcast Addict: https://podcastaddict.com/podcast/409... 
Deezer: https://www.deezer.com/show/5178787 
Player FM: https://player.fm/series/series-3396598 
 
In case you were wondering, Proteomics in Proximity refers to the principle underlying Olink technology called the Proximity Extension Assay (PEA). More information about the assay and how it works can be found here: https://bit.ly/3Rt7YiY 
 
For any questions regarding information Olink Proteomics, please email us at info@olink.com or visit our website: https://www.olink.com/

WHAT IS PROTEOMICS IN PROXIMITY?
Proteomics in Proximity discusses the intersection of proteomics with genomics for drug target discovery, the application of proteomics to reveal disease biomarkers, and current trends in using proteomics to unlock biological mechanisms. Co-hosted by Olink's Cindy Lawley and Sarantis Chlamydas.

What is Proteomics in Proximity?

Proteomics in Proximity discusses the intersection of proteomics with genomics for drug target discovery, the application of proteomics to reveal disease biomarkers, and current trends in using proteomics to unlock biological mechanisms. Co-hosted by Olink's Dale Yuzuki, Cindy Lawley and Sarantis Chlamydas.

Welcome
to the Proteomics and Proximity Podcast

where your co-hosts, Cindy Lawley
and Sarantis Chlamydas, from Olink Proteomics

talk about the intersection of proteomics
with genomics for drug target

discovery, the application of proteomics
to reveal disease biomarkers,

and current trends in using proteomics
to unlock biological mechanisms.

Here we have your hosts, Cindy and Sarantis.

Hello, everyone.

Thanks so much for joining
Proteomics in Proximity.

I'm one of your co-hosts, Cindy Lawley,
and I have with me my other co-host.

Hello, everybody.

I'm Sarantis Chlamydas, happy to be here.

Very good.

So so today we are joined by Mine Koprulu
and Claudia Langenberg.

We are talking today about
a wonderful paper that came out in Nature

Metabolism in March [2023] called "Proteo-
genomic links to human metabolic diseases."

Very exciting paper. By
by Claudia Langenberg and Mine

Koprulu standards,
a modest sample size but amazing findings,

so I think it's really enabling.
We'll dig into that,

but first, let's introduce our guests.

Sarantis,
do you want to do the honors?

Absolutely, we'll start with Mine.
Mine is a PhD

student on Gates Cambridge scholarship
and supervised by Dr.

Langenberg.

Before her PhD, she received a Bachelor
of Studies in Human Genetics at UCL [University of College London]

and then subsequently she completed

a Masters of Philosophy in Genomic
Medicine at University of Cambridge

working in genomics data and

proteomics data and dealing with population studies

and the UK Biobank data with us.

And we are really looking forward to hearing
about your project and the paper about proteogenomics.

Yeah,

and digging into how you got
where you are.

Anything you want to add to that, Mine,
that you want people to know going in?

No, I think that's a very clear
introduction and it's lovely

to be here and I'm looking forward
to sharing a bit more about

both our journey
and our recent publication.

Awesome, fantastic.

So then I have the privilege and and the

opportunity to introduce
Claudia Langenberg.

Now, Claudia is probably someone
who needs no introduction, but

the I think the impact

that she's had in the publication
space is huge.

Last I saw, she had I think this is in

2021 over
300 peer reviewed publications at her

very young age and and I think

over 40,000 citations at that point.

And that's already several years old.

So this is a
this is a world-renowned

scholar that is working with Mine,

And of course, many of her publications
are in

journal series
like Nature, JAMA, Lancet,

very, very

prestigious and very impactful,
working with massive datasets.

So I think of her as a large scale
population epidemiologist,

such a big word,
but it integrates many of the omics

and I think she keeps an eye
on the technology advances to be able to

build systematic approaches
to understanding human biology.

So Claudia, I think now you're in you're
at Queen Mary, University of London.

I think when I met you originally,
you were at the Berlin Institute of

of Health and Charité.

So I'd love for you to just tell us
a little bit

about your position now
and your affiliations, if you don't mind.

Well, thank you so much
for the extremely kind introduction.

And I think the most complimentary thing
was about the young age,

which is not only a compliment
but a lie,

so this
I think, puts everything in perspective.

And yes, also I should say the work
that we've done that

Mine is a student
at the University of Cambridge

where I've had a dual affiliation
and a long career.

And prior to me arriving
at the Berlin Institute of Health,

so originally be Germany
and training as a doctor in Germany.

I came to the UK and I was in London
originally where I did my Ph.D.

in epidemiology and a Master's of hygiene.

And before that I went to Cambridge

to focus more specifically on genomics
and then later other omics.

And so yeah, it's a huge privilege
to be here

and it's an even bigger privilege
to work with talented people like Mine

and others in our team.

So it's a real team effort
and a lot of fun there to do that.

So it does seem like you have fun.

And I'll tell you what, whenever
I talk to Mine and when I've seen her

present,
she's always got a big smile on her face.

So I would love to dig into your journey
to end up in Claudia's lab,

Mine.

Do you mind giving us a little background
and how you got interested in science?

I mean, feel free to start
wherever you want.

Yeah, sure.

I've always been very interested
in science, and especially biology.

But especially I always wanted to do
something with my career, which improves

life, supports and contributes
to the society in a meaningful way.

And I've said given
I was quite interested in biology,

human genetics
especially quite interested me

because a lot of the sciences
are quite established in the sense

that, you know, the
human genome is very recent compared

to many other scientific fields
and there's so much unknown.

And I think that is especially
what interested me

as well as its potential to, you know,
improve the current healthcare system

and the clinical translational potential
of the human genetics findings.

So that was my inspiration
to go into that.

And unlike some other people,
I think I've been in that quite

narrow specific path ever since then.

So I did my bachelor's at UCL

as Sarantis mentioned in human genetics -
it was just strictly.

And after that I did my master's
in University of Cambridge

in genomic medicine, which is where I got
introduced to bioinformatics.

So I did my Masters project
at Sanger Institute working on UK Biobank

as well as whole genome sequencing data
for each population isolates

at least

in that lab at the time.
And after that I moved back to back home,

which is Turkey for me to work on rare
disease genetics.

So I spent two years back in Turkey
at a

university
researching rare disease genetics.

So that was more studying
whole exome sequencing data

from consanguineous marriages

or consanguineous parents having
disabled children trying to better

pinpoint the exact genetic changes
that caused such severe, rare disorders.

And in the meantime, while I was back
home, I was thinking about my PhD

and what I might want to pursue
and multi-omics integration.

So better understanding the different "omic" layers
and how that can actually improve

our understanding of genomic studies
and their clinical translation

was what interested me.

So I got in touch with Claudia

and we had a brief interview
and that was the beginning

of such a beautiful collaboration,
at least on my part.

And then starting a
little bit about your paper -

What were you able to understand about the data,
about this EPIC-Norfolk study?

The EPIC-Norfolk cohort. What are the characteristics
of this cohort and what makes it so special?

Sure, the EPIC-Norfolk study
is a population cohort

that was first established
in beginning of 1990s,

so it has approximately 30,000

participants that were followed up.

Phenotypic research in terms of different

behavioral and sort of health
characteristics for several years,

as well as linkage to their health records.

And we had proteomics samples

measured from 3000 of
EPIC-Norfolk participants.

But Claudia, if you have anything more
to add on the EPIC reference study ...

I think I can say one of the opportunities
this study has offered

and I think the beauty
of such perspectives cohorts such as EPIC-Norfolk,

and UK Biobank
and so on, things that were set up

in the past is that we benefit from the samples
that were stored at baseline

in liquid nitrogen.

And those samples are so valid

because anything you can do in the future
and now you measure 3000 [proteins].

And you measure 5000, you measure ten
and more thousand proteins.

So anything we use
now we won't be able to use in the future.

So we're always very cautious

of that and hence
cautious with our sample use.

So it's really important that proteomic
technologies now use such a little sample.

So you could have used more sample
in the past to measure three proteins.

So to be able to [measure] thousands

in such a small amount of blood
is absolutely amazing.

And to have the opportunity, you know,

PIs when our PIs of those
cohorts had the foresight of setting this up.

Nick Wareham in Cambridge and before that
Kay-Tee Khaw, others who let us study.

And other cohorts that we had
the privilege of using the samples for.

It's really,
really amazing to be able to do that.

That's one thing.

But the other thing
that's important is as time

elapses, these codes
become even more valuable because,

you know, sadly people have events, they
they develop diseases, they die.

So if you then look backwards
because you have samples stored,

provides the opportunity to have such
an efficient design of doing it,

you don't have to measure
everyone. But you can have a kind of a

very nifty design in choosing
people who have developed a disease, i.e.

new onset disease, but
the sample was stored before they had it.

So you avoid this
kind of reverse causation by which disease

impacts your proteome and hence
you can't really just associate.

This is what we exactly did
in this context.

Mine's study was one of the projects
that we did on context of that design.

But you can also look at

different diseases
based on the people who developed them

and then studied. Use that baseline sample
and then you use a big control cohort

of people who serve as the controls
for many different diseases.

It's called a nested case cohort study.

And it's a really beautiful design,
but that can only be used

if you had the foresight of setting up
a large prospective cohort like that.

So we were very grateful
to all the participants.

EPIC-Norfolk and the PIs, of course,
who's enabled the use of that,

and also coming to the cost
of some of these kind of very,

you know,

informative molecular technology
such as yours is.

Of course they're not cheap

and hence that means
if you can use a design that minimizes

the number of samples
you have to use for given purpose,

of course
that's incredibly useful for us. So

and I think I just

wanted to comment on this,
this liquid nitrogen aspect, this ability,

I think this foresight -
I work with a lot of groups

that do a lot of different things
with large population health studies.

There are very few cohorts
that had that much

attention
to reducing pre-analytical variation

and tracking it
as is documented in this cohort.

So that's exciting to see.

I think it's maybe less important
for proteomics

than metabolomics as an example,
but you work in all of that.

And so yeah,
I just wanted to underscore that point.

Yeah, sorry, Sarantis, please go ahead.

Okay.

I will like Mine
to summarize the take-home messages

of your paper.

and what's so exciting
about this paper.

Yeah.

First, and just also
clarify a bit [about] the paper,

I was the first author,

but of course it was a team work
and everyone contributed a lot

making this work possible,
including the EPIC-Norfolk participants

of course.

Just to summarize the paper,

I think I could just briefly say
we looked at the sort of systematically

linking genetic variants, blood protein
levels as well as disease risk data,

to be able to pinpoint causal genes
and proteins that underlie diseases.

So just to give a bit of background since
enablement

of the genotyping technology,
little studies have been conducted

associating genetic variants
with different disease risks or

disease susceptibility.

And today 200,000 genetic variant disease

associations have been established
and are publicly available.

Looking at such numbers and figures,

we would assume we know
the biological basis for all diseases.

However, we all know that's not at all
the case, and I think that's

where proteomics and different
sorts of omic layers come into play:

in helping us better understand
the disease mechanisms of

actually what's happening
underlying the diseases in our body.

So in this study
we had samples from 3000 individuals

measuring three approximately
3000 blood protein levels.

So we first looked at the genetic
regulation of the cis regions.

So the cis regions are the protein-
encoding regions

and sort of flanking regions around
the gene for the protein target itself,

because of our moderate sample
size, as we've already mentioned.

So we looked at the cis genetic regulation
or different blood

protein levels,
some of which had never been targeted

before, given the recent
Olink platform [Olink Explore 3072]

and then we have used that knowledge,
that sort of protean genomic knowledge,

to better understand

causal genes of proteins that
underlie diseases in a systematic manner.

So we have first
looked at shared genetic regulation

for different disease outcomes and blood
protein level regulation,

pQTLs as we call it -
protein Quantitative trait loci.

And we identified 224 targets

that regulates 500,

approximately 500 different traits.

And we also refined the causal genes
or proteins

for 40% of the previously established
genetic risk loci,

which was which sort of highlighted
that even moderately sized

proteogenomic studies can contribute
to our covering of novel biology

for consanguineous risk loci that were
published in the literature.

And finally, we looked at the convergence

of the pQTLs
studies and the regulation

where
the rare variant gene burden analysis.

So, on one hand, comparing the loss
of function of the genes with that sort

of genetic regulation of the proteins.

And so I wanted to click back on this:
the disease associations

that were enabled by genetics. I think the

the ability to make those associations

in small sample sizes was like
low hanging fruit in the early days

of that sort of GWAS era,
which I think of maybe early 2000s, right?

2005, 2006

that's when those things started
really ramping up discoveries

and then they became harder and harder
to make those associations

as those really strong,

strong associations
were discovered and documented

and then larger and larger populations
were needed to make those associations.

I think what you're saying

is that these other layers are helping you

to to do more
with those more modest populations.

And Claudia already made the point
about the costs of layering

on metabolomics or proteomics
or these additional omics.

Can you say something about what

what that enables and what you think
will happen in the future there?

Yeah, of course.

So basically, as I mentioned
and as you've mentioned already, thousands

of genetic variants,
a disease associations are being made

and those were very valuable
and contributed to our understanding

of diseases to some extent.

However, the majority of those associations
fell into noncoding regions of the genome,

meaning it was quite difficult
to actually interpret which causal gene

or protein were acting in
causing the disease.

Hence, it was quite difficult
to understand the pathways

and the mechanisms.

And, well, if
we don't have a functional target,

it is difficult to actually build
more effective and safer

therapies or repurpose existing ones.

So in that sense, having the initial layer
of the proteomics states actually

helps us to pinpoint a functional entity

that plays a role in the disease.

And so

I said there's

certain statistical methods
that help us better understand

the shared genetic regulation
of both the disease risk

and the blood protein levels
or abundance of certain proteins.

And we see strong statistical evidence
for a shared genetic

regulation done within a large region
of many candidate genes.

We can actually refine
that to a single candidate gene or protein

for particular diseases
in a systematic way

which can then be used
for as I mentioned, intervention

or more targeted
therapy, safer therapies.

So in that sense, even moderately sized,
more regular pQTL studies

can really help us better understand
the disease and pathways that are involved

and also build more effective therapies.

I think we're on a path.

I think this is a path for discovery
that is, I think just going to ramp up.

I think very similarly to what we've seen,
those discoveries enabled by genetics.

I'm really excited about that.

I always think about

these different diseases
that might share pathways in common,

almost like a Venn diagram
and understanding the complexities of

of why certain proteins or protein

pathways are related to like,
I don't know, caspase,

I think is like showing up
in breast cancer and asthma. It's

an example. I think
those diseases are so disparate to me.

I wonder if you
think that proteins or pathways

that are showing up as causal
and I think you make a good point about

why causality is so important
for therapies.

Do you expect those same proteins
to be causal in other diseases?

Like what do
you think about that?

You're much more versed
in seeing the data.

So I think that is sort of the beauty
of our study design

that we sort of approach
with different layers

of biological data
starting from the genome

all the way to the phenome
in a hypothesis-free manner.

So doing that and looking at the data
systematically without any sort of prior

hypothesis
allows us to see what the data tells us.

So as you mentioned, certainly in looking
at what we have discovered in our papers,

that genetically anchored
proteogenomics studies so the proteomics

can help us discover
molecular hubs.

All associations of proteins
with diseases that we wouldn't

have predicted otherwise from just
the general literature or prior knowledge.

And that is actually quite interesting
to follow up

because, yeah, the data uncovers
something that was unexpected.

But likewise we can also see
quite specific protein disease pairs,

which can also allow us
to have quite specific therapies

for potentially not intervened
diseases before.

But just one more thing to add
is that we

currently do not have not a very full complete
picture

of the proteomics, so we're only able
to do that for the proteins

we are able to target.

So for future directions
you asked about,

having a more complete idea
about the full landscape of proteomics

would be ideal to better understand
those molecular hubs

that we have just talked about.

But so maybe
I can add one more thing to this,

because as Mine's work
other proteogenomics work

than we previously did in the past
has shown is exactly the beauty is that, as

you scale up the number of proteins
and as you scale up

the number of diseases,
you can look at this kind of proteo-

genomic approach that doesn't require
that each of these layers is measured

in the same sample size.

You can utilize the power that
you have across any biggest genomic study

for diseases that you have to get
a reaction, a proteogenomic study.

And that is the beauty of looking exactly
like you say, Cindy,

at the overlap of a specific gene to protein
across any disease that you can look at

because the community is sharing
data openly or the summary statistics

for each of these diseases is shared
openly for most diseases. Cancer is

lagging behind sadly,
a little bit in terms of the openness.

But for many diseases that is there,
it enables us to do exactly

that: to draw a whole map
from gene to protein to disease

and is that link coincidental

or is possibly causal
and a rare genetic signal?

And that's such a good way
of prioritizing what you then use for

experimental work downstream

because of course
this is in the computational approach

and not an outcome of proof,
but it's such a good and data driven way

of prioritizing

links between
diseases, links for where we have already,

you know, specific drug targets, new ones,
potentially adverse effects.

So it's a really versatile way
of looking at all of that.

And then just maybe to add one thing
again, as you said, is

you can kind of ramp this up
as you increase the number of proteins

or as you increase the density

of your genomic array
or the coverage of rare variants

and by sequencing,

and so on. Of course
an important part of really making

this more useful
is increasing the phenotypic spectrum.

And that really is only possible
if we move away from diseases. We all

study opportunistically to diseases
that you can't really easily measure

or nobody's interested in, but they're still
really important for patients and a lot

less headway has been made in terms
of understanding that genetics

releasing the summary statistics
for those studies, and that's really where

huge studies come in that have electronic
health record data from GPs [general practitioners],

from hospitals, from death certificates,
and bring all of this together.

And that's why, for example, UK Biobank
but also Fingen

and many other endeavors around the world
can really help us

to not just increase the molecular side
but the phenotypic side.

And that is so important to also
link diseases which have not really been

so much in the center of attention
and but need to be.

Can you give an example of one of those?

Are we thinking rare disease here?

So I think it can be

across the frequency spectrum,
it can be across specialties.

I think these disease
examples that I would choose,

others that possibly are not as easily

diagnosed, are not severe enough
to always require

hospitalization,
because I think most people around

the world have tried to get
their hospitalization data

I see decoded is a relatively easy,
but the data, for example, in the UK

that comes from England, from primary care
records, is harder to map,

is kind of bit more diversity in terms
of systems, the data structures and so on.

So diseases that are predominantly managed
and diagnosed in

primary care are more
where we need to move towards.

Oh, that's so interesting.

Yeah. I have a bit of a

technical question here.

How do you make the selection

of false positives
and how do you define

these in your analysis
compared to

a lot of technologies, for example,
that you applied before?

You're

asking about false positives? Yes.

So in our analysis, we try to be careful

given we're working
with data and diseases, as you mentioned.

So in terms of our analysis,
we always use quite a rigorous

threshold to report
what is statistically significant.

So in terms of our statistical threshold,
we always called for genome wide

significance times the number of proteins
that we are including in the study

to sort of minimize the error, including
thousands of targets in this case.

So we usually go for quite
a rigorous statistical threshold

as well as good
QC before we feed in the data,

of course.

So that's great.

So some of these pQTLs

may fall in non-coding regions.

Do you have any idea about the type of these regions?
Maybe have a promoters there,

I don't know the answers.
Do you have an idea and can you map these regions?

So we mapped

all the variants in terms of what
they're predicted to be in the genome.

So we look at the proportion of protein
altering variants, those variants

that fall into the protein
coding region itself and alter

the shape or the structure of
the protein.

And the percentage
that fall into non-coding regions.

However, there's quite limited knowledge
about the functional characterization

of the non-coding variants themselves
acquired from particular groups studying

particular genes and more sort of cellular
or functional models.

So in terms of our work so far
we have only computationally annotated

the predicted consequence of the variants.

However, of course further follow up
of analytes and what exactly those

non-coding variants do in terms
of cis regulation of the protein itself.

That's great.

So I think a lot of studies
and then from the literature,

as you as you mentioned, for different
proteins from different

mice knockouts,

you can have an idea
about how they are regulated.

Is there any plan to follow up, like more
mice knockouts

in the future to follow
some of these targets or collaborations

in this respect?

If you can share this?

I mean, we of course always welcome

any sort of collaborations
on the functional characterization.

The idea of doing these studies
is not to put in a long Excel sheet,

as we have done so far

in terms of our supplementary table,
but rather genuinely understand

the biological mechanisms
that underlie diseases

and how that can contribute
to clinical translation

of some of the findings
and the sort of longer runs.

So, with that, an open call
to anyone who finds any of

that targets
that we highlight interesting.

We are of course

very keen to functionally characterize
and better understand the mechanisms.

I think
in the paper you mentioned something about -

I was just looking for the reference,
something about cellular models

and the history of
of demonstrating functional associations

or understanding
a function around cellular models

and how this approach

can be a complementary way
to add an understanding of function.

Do you mind talking a little bit
about that,

like where this fits
into our traditional approaches?

Like where it might fit in
in understanding

what we're learning from single cell
or spatial

work that some of those technologies
are really advancing right now?

I'm curious your thoughts on those.

Of course.

So what we do in terms of our team
is more bioinformatics.

So what we work with is what the data and

sort of predictions
and better understanding the computational

and statistical conclusions
we can draw from the data.

However, as we all know
and as you've highlighted,

they are complementary ways
in better understanding these.

And basically we are very interested
to see whether

what we observe computationally
and statistically actually

translates into biology,
starting from cellular models,

building up to animal models
and sort of human biology.

So what we're observing almost

quite isolated

looking at singular
sort of targets and singular

genetic variants. It will be quite interesting
in whether our conclusions hold or

sort of variations from
our conclusions we see in the different models.

So I definitely agree
that they're complementary

and there's ways of integrating those
knowledge to build a more comprehensive

or sort of holistic understanding
of the biological mechanisms.

Oh, I love it.

I love the background
you have both in an understanding

of epidemiology
and your mad bioinformatics skills.

Where does machine learning
fit into all of this?

Is it something you've -

yeah, I'll just stop with the question.

No, of course, machine
learning and artificial intelligence

is certainly fields
that are very quickly developing

and are grabbing a lot of attention
at the moment. So what

we use are considered basic machine
learning models themselves.

But in terms of employing machine
learning or artificial intelligence to

sort of prettier genomics for these
or more biological studies at the moment

is rather challenging
because as we all spoken about, this large

high triplet biological data
is recently being generated

and they are so recent,
but we're spending a lot of time

better understanding what they mean
and what they actually are.

And frankly, the biological data
the way they are actually violates

a lot of assumptions that machine
learning and artificial intelligence

models make.

So I think in the future,
they can certainly be very useful,

but we certainly need to be cautious
and better understand

them
before feeding them into the models,

I personally say.
And maybe I can add something to this.

So I'm currently sitting in
what's called the DERI.

So when I came to QMU, or Queen
Mary University of London last September,

I had this opportunity to set up this
new institute for precision healthcare.

And it's kind of quite unique
because it's a cross functional institute.

So it really draws in a very broad
range of expertise and so [there are] big plans

and developments for a new life sciences
building, and that's all quite away.

So we have to wait for this.

So we need an interim space and we're very
lucky and it's actually not coincidence

Where
I'm sitting right now is called the DERI.

It's the only other cross functional
institute that QMU has and DERI stands

for the Digital Environment Research
Institute, which is focused on AI.

And so it's not just in healthcare, it's
AI across a broad range of applications.

You know, from games - we're sitting on
the second floor with the games people,

which my my kids think I have the best job
in the world.

Actually what they don't know

when they have conversations
about board games is

I have not heard of a single one of them,
so I can't mention really anything else.

But anyway, having said that, this is the
environment in which it can flourish.

So we focus on health
and on healthcare data

and DERI has
as part of its remit as well.

So it's really important
to bring this together.

And, you know,
going away from,

you know, this pitch on
how it's important, the concrete thing

that we're already doing is
the molecular data, which is so complex,

you need efficient
and unbiased ways of data reduction.

That, at the moment, is most of the use
for machine learning in our

arena of work.

So where we kind of try to, let's say,

predict diseases,
how are you going to prioritize?

Nobody wants to go into the clinic
and measure 3000 proteins

to predict a disease.

Do you even have any benefit of

what's the incremental value
of measuring proteins

once you have a core set of ten
most important ones?

So those are all questions
that machine learning can help us to

to address more systematically,
and that's incredibly useful.

So is a steep learning curve for us.

This one, it's not our bread and butter.

We're not the methodological experts
of developing it,

but being in an institute
like this one, for example, means

that we're only very a short way removed

from the people who do develop these
methods and that we at the right time

can employ them, can test other useful,
could they be biased.

And so that's really, really
yeah, it's a great opportunity, I think.

Well,
I think those people that are developing

those methods aren't going to have
the biological know-how to know

when they've overtrained those methods
and are going off on a spurious tangent

so the context I think is essential.

I think Mine makes that point.

Yeah, exactly.

It's this exactly.
It's a team effort, right?

And you need the people, have
the samples, and the clinical knowledge.

You need the people who have
some other kind of bioinformatics,

computational and you need the,
you know, method developers.

So it's beautiful.

Nobody can do it all.

So, you know, I think it's a great.

It's CQ, right?

It's collaboration quotient, right?

You have your IQ, your EQ
and CQ. From my experiences,

once you
develop your collaboration quotient,

which you're teaching your students
and your postdocs and all of that

if they don't already have it,

Claudia, is once you have it,
you can't go back?

I don't think

once you experience
the joy and the ability to work together

cross-functionally, it's really
remarkable what can be accomplished.

Well, coming back to the question that

Sarantis was asking,
and I think it's a very good one

and Mine has said so.

The problem that we have encountered
is, you know, is Mine

has come up
with these beautiful candidates.

And it's almost there's two things
that are important to kind of what

she said

so well, which is that now using humans
as the model organism

is a great way of prioritization.
That's number one.

But the second one is kind of, you know,
if you come from a Bayesian framework,

you really want to find out what's
your greatest chance of success?

How do you increase your prior
of this having success?

And given that the experimental procedure
is expensive, it's lengthy

and it can go wrong in so many ways,

It is really a great opportunity.

to prioritize
on the basis of human proteomics

as Mine has said.
The problem is: how do you get people

who have the experimental set up
to come to our results and take it?

Because that's not as easy as we thought.

We think, "Oh, here!"
We're so excited to see it.

We go to the world expert in this
and bring them our example,

it's hard.

It's hard to motivate people
to step out of what they're already doing

and focus on your finding.

It's hard if you have somebody
who's focused their whole life

on a given pet protein to say,
how about this one?

So we need help,
I think, to try and learn

how we engage the relevant
clinicians and molecular biologists

and other people to also use our
results rather and at least,

you know,
complement what they're already doing

in order to be able to follow some of it
up, because we do depend

on the functional value
and the expertise of those people.

So how do we reach out?
How can you help us

to reach out to the relevant people
to really make it worthwhile?

What are the titles of those people?

I mean, what are their jobs?

You know, like,
I know there's more than one, but

I think of translational,

certainly translational scientist
or implementation scientist

or maybe those are pieces of it.

But you know, who are the ones
that we need to

listen to this?

So I think it depends on the stage
of translation

that you're at. For really early
experimental work,

it's a very different set of people
than it would be for the people

if you have a different kind of work

where you really are ready to maybe,
possibly already consider an initial trial,

then you would need a relevant clinician
or a head of that clinical department

to enable that kind of set up within that,
let's say, hospital.

So I think it really depends
on what level you're talking.

The initial level I think is
what's the experimental follow up

and validation, because what we do is
yeah, the statistics and probability an an error,

but it's still an association
where we're under no doubt we do not prove causality.

And so that kind of functional
follow-up is crucial.

And those are the people at the first stage.

Think actually you don't know
the cause of some of these diseases,

you don't know
the cause of the effect, right?

You don't know if this is what you see
in the protein level. Is it a cuase

or the effect of the disease?
It could be both. The last comment

to me I really like when

you see the co-localization with eQTLs
from different tissues

like does what we see in the plasma proteome

reflect what happens in the tissues?

And do you have any comment

on how you see these correlate
and what is the distribution

of the singular feature
to the plasma proteome?

You have some examples,
some ideas on that?

And a shout out to gnomAD and GTEX

I will say fantastic resources. Yes.

No, we are very grateful
for all the publicly available resources,

ranging from the eQTL studies
all the way to the GWAS summary statistics,

which have made our work possible as well.
In terms of the eQTL overlap,

that is rather challenging
in our field in the sense that now

we see, for example, in our study

that only approximately 4% of the pQTLs
that we see

are in close linkage disequilibrium

that we deal with an eQTL

and certain examples,
we see beautiful overlap, right?

Like we find an example
with a disease for that

act in a particular tissue
and we see co-localization,

quite strong co-localization
with a tissue of interest

where the story becomes quite beautiful
and everything makes sense.

However, there is also the flip of the coin

where we see in some examples
quite limited overlap

between the pQTL and eQTL data.

And although our current paper
hasn't really focused on that,

I think that is something that we
as a community need to better understand

where there is overlap
and where there's a lack of

and what might be the underlying
reasons that would be.

I think this is a great

but to mention the really interesting
link to type two diabetes

that you found in here that that we want
someone to follow up on, as Claudia mentioned.

Do you want to just summarize
that really quickly, that pathway?

Of course,
one of the examples that we have noticed

or we have highlighted in our paper
is the gastrin-releasing

peptide, or GRP for short
and its link with type two diabetes.

So what we saw in our paper is that

beautifully different layers
of biological data,

so evidence from my studies,
evidence from human data as we have been talking

about and different sort of similar models
all overlapped with the same conclusion.

So what we've identified
was that the higher levels of GRP in human

plasma were co-localizing

with lower risk of type two diabetes.

And when we integrated
in the different sort of

intermediate layers,
which are different body sets and

distribution traits,
so where we actually accumulate

fat in our body
as well as the overall fat,

what we observed
was that higher levels of GRP

was leading to less fat
accumulation overall

in our body, leading to lower
risk of type two diabetes.

And as I mentioned,
there were previous studies

that were published
where human recombinant GRP

led to actually reduced
intake and weight loss.

So we have highlighted in our paper
that GRP can be a new example

or a potential therapy for type
two diabetes by decreasing or

lowering the fat accumulation
in our body.

It's actually a hot topic
for the weight loss.

A lot of companies now
are trying to follow this

game of weight-loss strategy.

I think that weight-loss
drugs are really interesting,

and I am looking forward
to people following up

on this.

And I love the use of

the word beautiful, right.

But it's beautiful
because these layers are all agreeing.

It's giving us a preponderance of evidence
that gives us a lot of confidence,

that gives us confidence
in the other results that you see, too.

Right.

So the fact that you've built
this systematic map

of these potential causalities.

Yeah.

This corroborates, I think, the approach
which is

certainly beautiful.

At this point,

I would like to thank Mine and Claudia.

We could just stay here talking
like this for hours or for days on some

amazing paper and amazing data.

I mean, we learned so much today

and I would like to invite if you have to
add something to our audience. Mine, Claudia,

from your perspective, and where
do you see this going in the future,

that would be great.
I'd love to hear your thoughts about.

Yeah, anything you'd like to add?

We'd love to hear it.

No, I think frankly, as I mentioned,
the increasing large scale, high

throughput generation
of additional layers of biological

data is frankly very exciting
and I'm very excited to be in a field

where we are better understanding
their potential translational capacities.

And with that,

I also would like to thank both

my colleagues,
which have enabled all the work that we do

possible, as well as the first instance
of EPIC-Norfolk study

and the past and present team members
which have made this study possible.

Anything more from you, Claudia?

Yeah, I just
I think I said already, looking forward,

how it'd be very valuable
to increase the kind of phenotypic

space in the diseases we can look at.

I think also moving forward,
I certainly think the beauty

of these large-scale population
based studies is one thing.

That's certainly one of the reasons
why I love sitting here in East London

and being like just a hundred reaches away
from one of the largest hospitals

is because the next step really is how
do we enable in clinical proteomics, i.e.

proteomics in patient studies,
how do we design that well

and in a way that enables flexibility
of different research questions?

So that's kind of what I want to focus on
and which I think is very exciting.

And for that we would need technologies
that are kind of ready

to move to that
standard relatively quickly.

And so I think that's an exciting new area
and doing that in a way

that is affordable even within the context
of a national health system.

So it's exciting,
It's really amazing times

and working together with talented

people like Mine
and other people in our team. just as you

As one of my mentors used to say:

if it's not fun, it's not epidemiology.

I love that.

Well, you

certainly make it look fun,
that's for sure.

Now, thank you so much for the opportunity
to talk today.

It was a pleasure to have you guys.

All right.

Thanks, everyone.

Thank you.

Thank you for listening to the Proteomics
in Proximity

podcast brought to you by Olink
Proteomics. To contact the hosts

or for further information,
simply email info@olink.com