Proteomics in Proximity

The UK Biobank has enrolled half a million individuals and are "investigating the respective contributions of genetic predisposition and environmental exposure to the development of disease". Having begun in 2006, proteomics can greatly increase what can be discovered using this valuable resource. #ASHG22

Show Notes

Show Notes for PiP Ep 02 “Empower Genomics with Proteomics”

For more information about the UK Biobank, an Olink to Science blog post called “Genetic Regulation of the Human Plasma Proteome in the UK Biobank” is available here. The preprint publication itself is available here on bioRxiv

If you’d like to see a great 15 minute presentation on what the goals are for the UK Biobank Pharma Proteomics Project, Dr. Chris Whelan (Biogen) presented this YouTube video at one of the UK Biobank’s scientific meetings that is worth watching.

A paper discussed by Folkersen et al., “Genomic and drug target evaluation of 90 cardiovascular proteins in 30,931 individuals” was published in Nature Metabolism in late 2020, and is available here.

If you would like to contact Dale, Cindy or Sarantis feel free to email us at info@olink.com and if you would like to learn more about our backgrounds, Cindy’s LinkedIn is here, Sarantis’ is here, and Dale’s is here.

In case you were wondering, Proteomics in Proximity refers to the principle underlying Olink Proteomics assay technology called the Proximity Extension Assay (PEA), and more information about the assay and how it works can be found here.

What is Proteomics in Proximity?

Proteomics in Proximity discusses the intersection of proteomics with genomics for drug target discovery, the application of proteomics to reveal disease biomarkers, and current trends in using proteomics to unlock biological mechanisms. Co-hosted by Olink's Dale Yuzuki, Cindy Lawley and Sarantis Chlamydas.

Welcome to the Proteomics and Proximity podcast,
where your co-hosts Dale Yuzuki,

Cindy Lawley and Sarantis Chlamydas
from Olink Proteomics

talk about the intersection of proteomics
with genomics for drug target discovery,

the application of proteomics
to reveal disease biomarkers and current

trends in using proteomics
to unlock biological mechanisms.

Here we have your hosts, Dale,
Cindy and us.

Thank you for joining us on the Proteomics
and Proximity Podcast.

I'm your host Dale Yuzuki with my co-hosts
Cindy Lawley

and my other co-hosts are Sarantis Chlamydas.

Great.

This morning we are talking about Empower
Genomics with proteomics.

And Cindy,
I'd like to ask you the question.

If I'm involved in genomics, why should I add proteomics?

Well, I'm so happy that you asked me about that, Dale.

So, as you know, I tell this story
quite a bit, and so I'm delighted

to do it in the context with both of you,
because you add so much to this.

But you know, we've got a big project
going with the UK Biobank,

the UK

Biobank,
of course, being one of the largest

nationally associated population biobanks

in the world with clinical, genetic

and now on a subset of of over
54,000 samples

of proteomic data.

It's an exciting time,
I think, to demonstrate

in large populations the value of layering
proteomics onto genomics.

And of course we've we've across
many cohorts invest

did a lot in genomics
as the costs have have gone down.

Now to back up a little bit, the UK
Biobank tell me a little bit about it.

Sure.

So, yeah, it's as a as the name implies,
it's based in the UK,

it's affiliated with,
you know, the UK has,

you know, one of the largest single payer
health care systems in the world.

And having a population based biobank

primarily
of northern European descent or ancestry,

but certainly representing Asian
as well as as African

and African diaspora descent as well
Pakistani descent.

There's quite a nice subset of diversity

in the Biobank,
but primarily it's it's northern European.

You know, it was
it was started what I think 20 years ago.

I actually should know
that off the top of my head.

But it was it was started with the promise
of being able to characterize the value

of longitudinal information
to health care.

And I think Eric
Topol says that it takes about 20 years

on average to move
something from discovery to the clinic.

He uses the example of the stethoscope

and moving the stethoscope into the clinic
took 17 years.

That seems like a pretty simple mechanism,
right?

Listening to your heart.

But yet it took a long time for it
to be demonstrated and approved

and to get into the clinic
and routine use.

So by longitudinal, then you mean
I think they recruit what, half a million

individuals and longitudinal means
what they follow them over time.

It means that
they're able to call them back.

So they're able

they have medical access to their,
you know, clinical data over time.

They understand
over time what is, you know, what is

the outcomes within this population.

And I think that's incredibly valuable
that consenting has changed over time.

So the ability to actually call back
wasn't

initially
in many of these biobanks. Right.

And so , I think of FinnGen as one of those that another biobank

that's based in Finland,
a population health biobank as well.

They really did sort of lead the way
with some of the ability to share data

and protect it at the same time,
you know, primarily focused on genetics

and I would say UK Biobank as well
has led the way in these abilities

to work with both private
and public partnerships.

So being able to work with pharma
and in this case

with the proteomic data,
this initial set of proteomic data

that was initially started with ten pharma
partners.

The model was interestingly

based upon
the exome sequencing consortium.

So a group of pharma partners
came together in order to get some

to get the participants within the UK
Biobank exome sequenced.

There's of course also I think it's
150,000 individuals in the UK Biobank

that are also whole genome sequenced,
which is pretty phenomenal, right?

That's a huge number.

So a. Massive management.

And yeah, of all of those dollars and time

and and the potential for building
analysis tools with such large datasets.

Right. That's that's,
you know, goes without saying.

I remember I think it was 2018,
I was at ASHG,

which is the American Society
for Human Genetics,

and I was blown away by all of the talks
that were referencing the UK

Biobank data,
building tools, having discoveries.

Right.

I'm excited to see that same

evolution in

in the discussion

with crowdsourcing
these data with proteomics.

So to get back to your original question,
Dale, what's the value?

I think Karsten Suhre would say that
when you have genetic data

and you have disease,
that there's a certain power

you have to detect the relationship
between the two.

And in some cases,
we have smoking guns like BRCA. Right.

So we're able to to see that there's
a lot of penetrance for a variant that

that shows up and has a lot of influence
on on predisposition to disease.

But for most diseases, it's death
by a thousand cuts, right?

Small amounts of influence.

Cindy, this UK
Biobank sounds so interesting.

What can you tell me more about it?

Yeah. So.

So the UK Biobank itself is a longitudinal collection of data

I think it started in the mid 2000.

So around 2006 they targeted an age group,
sort of middle age group

and they followed them over time

and so it's over
half a million individuals within the UK.

Yeah, quite, quite a undertaking to enroll
all those participants.

And what I will never forget
is when I attended ASHG

the American Society of Human Genetics
in in 2018

the number of talks
I remember searching on UK BB

as just in a short
an acronym and the number of talks

talking about using the UK
B data, particularly genetic data

for validating clinical findings

was there were a lot of talks.

So it's always been high on Illumina's

radar and it's very high
on all of those sequencing

technology innovator radar, innovator’s radars

like Thermo Fisher, of course,
and all of those library

prep companies that have different
methods of library prep.

Have they already sequenced everybody?

So they've got a whole genome sequence
on, I think it's about 150,000 individuals

that was primarily
led by Decode Genetics.

As I understand it,

the publication is pretty recent actually.

So it's a

there's a lot to dig into.

It's a pretty phenomenal dataset.

The bulk of the sequencing for most
samples, I believe is exome sequencing.

That's my understanding.

So I think it's it's over
450,000 individuals.

So I think for some they've got both.

And since it's a single payer system

with the electronic records of NHS,

that means they can drill down
into exactly right

their whole exome sequence and whatever condition they may have

And this is an ongoing thing.

Is that right? Cancer, diabetes?

That's right.

And the ability to actually return results
to those patients,

I think has evolved over time. Right.

Because that costs money

to set expectations,
make sure that we're we're communicating,

you know, in a way that's best practices.

So I think that the UK
Biobank has spearheaded

a lot of our understanding about

best practices there as well.

And as far as.... Go ahead.

Oh, I was just going to say so
back to your original question

about what's the value of layering
proteomics onto genomics.

This is a great example where an enormous
amount of investment has gone

into collecting genetic information
for this very valuable population,

with advances in diagnostics,
in guiding cancer treatment.

So a lot of these advances
have made it to the clinic already,

which is pretty phenomenal.

And that's been driven,
you know, globally, it's it's exciting

where proteomics fits in
or the way that I think of it.

I was

I had a conversation with Karsten Suhre,
who's at Weill Cornell in Qatar

and in New York and he he really
I had an aha moment with him.

He he essentially would say that
an intermediate phenotype like proteomics

acts to magnify the effect
between genetics and disease.

So, of course,

we've been looking for these associations
between genetics and disease

since
we've been collecting genetic information.

And some of those

links are hard to see because

we need so many samples
to be able to see them.

And so as we've increased the numbers
of samples like in the UK Biobank,

we're able to make these associations
more clearly.

And I'll say, you know, we wished early on
we hoped for smoking

guns for a lot of diseases
and we we did see a few of them.

Right.

So there's certainly PCSK9
for familial hypercholesterolemia.

There are some standard ones, BRCA for breast cancer.

There's there's some examples where
we have a lot of penetrance or a lot of,

you know, a lot of affect on someone's
likelihood of getting a disease

from single variants
or single loci or single genes.

But for most diseases like Type 2 diabetes, cardiovascular disease

this is a death by a thousand cuts, meaning lots of variants.

Give a little tiny effect
in changing our risk.

And so that's where that's
where having having the ability

to amplify
or to put a magnifying glass on those

those relationships between genetics
and disease is incredibly useful.

So I think, you know, a

ton of of work has been done in proteomics
and cardiovascular disease.

And I think many advances have happened there.

Now getting back to the UK
Biobank.

You mentioned before that recently
they were working with Olink

then to look at the proteome of tens
of thousands of individuals.

Yeah. Yeah.

I would say it's not just the UK Biobank,
but but 13 pharma partners.

Right.

So, so it certainly required consent
and partnership with the UK Biobank,

just like the exome sequencing

consortium was was done in collaboration
with the UK Biobank.

But the access to the technology,
just like with the exome

sequencing consortium, access
to the technology was spearheaded

by pharma partners that were very keen

to build a structure for a more,
I like to say, a systematic approach

to therapeutic target discovery,
not only biomarker discovery,

which is sort of traditional proteomics,
but to to therapeutic target discovery,

which I think is enabled by genetics,
proteomics as well as clinical data.

So we're getting back to this idea
of empowering genomics

with proteomics, right?

What can you tell me about that?

Yeah.

So I think there's this,

you know, I immediately think of that
Karsten Suhre, magnifying glass, right.

But the, the UK Biobank’s

initial findings
which are in a preprint that came out

in June, middle of June, that’s on bioRxiv.

Their initial paper
really just was scratching

the surface of what's possible
with this enormous dataset.

So their first paper
was about 1500 proteins.

So our first, our first product
that Olink first product

on the Explore platform that has the NGS readout.

So they use that first tranche of proteins
across 54,000 samples

and the first really the bulk of

those data are to look at correlations
between gene regions,

you know,
and the genotypes in those gene regions.

And protein levels.

So really just looking
what are the correlations?

What's the list of all the possible
relationships between genetic regions

and protein levels
that might be elucidated

and examined
further in this beautiful dataset?

What I

you know, and I'll speculate on what
I think they're going to be doing next

and what my guess is that
they're very deep into doing this within

these companies is to
then do Mendelian randomization,

which is a statistical approach to kind of

determine which of these relationships,
which of these correlations

between gene regions and
and protein levels.

You know, when you put it when you bring
in the clinical data on disease,

which of these hold up as being unlikely
to be happening by chance alone?

So now you sort of have the the ones that
are likely just, you know, coincidence.

I mean, they might still be important,
but but let's just I think pharma

would like to have
ten great targets, over 400

unsure

targets, because that
that's a lot of rabbit holes to go down.

So if they can narrow it down, then

then I think there's a lot of excitement
around being able

to have some quick wins with proteomics,
genomics and clinical data.

Well, to back up just a little bit,

you're talking about
Mendelian randomization, you're talking

about genomic data in terms of a whole
exome of 10,000 people or 50,000 people,

and now you're talking about 1500
proteins.

Can you walk me through that a little bit?

Yeah, sure.

So so when you're looking
at the genetic data and here we've got,

you know, some whole genome sequencing
data as well as exome sequencing data.

So you can imagine you have a list of ways
or places in the genome almost like

like geographic locations,
almost GPS coordinates in the

on the chromosomes where we know
they vary across the samples.

So those variable regions, we’ll call them SNPs, you know,

that that's the term that that we use
for the simplest kind of variation,

just single base pair variation,
but we'll just call them snips

because, because there can be other kinds
of variation that are captured there too.

But if you look at the variance, it's
just a single variant within the genome.

You can look at the

the representation of what people's
genotype

is at that location,
and you can look at every single protein

in that 1500 protein list and see,

do we have a significant correlation
between the genotype

and the protein level?

So that's sort of the first step.

That's a lot of tests write comparisons.

Cindy, so I understand these SNPs can also be outside of the gene.

Right would be also make it

So they could be regulatory regions. Absolutely. It’s not necessarily falling into the gene

Is there any threshold to what they are checking, what region they check in the gene?

Yeah.

So there's both a statistical threshold
that they accept as a as a standard.

But also when you're doing so many tests,
you have to correct

for multiple tests, right?

Because the more tests you do,
the you're increasing your chances

of seeing a false positive.
So adjusting for that

is something that, you know,

we go through peer review to make sure
we have best practices and agreement on.

I mean, these statistical
associations are massive.

I mean, in a given single individual's
whole genome, you're looking at maybe

four million SNPs? Yeah, right.

You have 4 million SNPs and then you’ve got

1500 proteins you're associating
those with, if I understand you correctly.

Yeah.

And then you multiply this times
what they did, 54,000 individuals.

They did 54,000. So I mean.

And they discovered,
you know, about 10,000,

I think it was around 10,200 relationships
between gene regions and protein levels.

Right. That's a massive number.

So that's those those could many of those
could just be coincidence, right?

Just correlations, not causation. Right.

We were all familiar with that,
that phrase.

So so 85% of those
relationships were novel.

Now, the relationship.

Cindy, could they be both in cis and trans, both of them?

Correlation or.

What's that?

Sarantis. Cis and trans would be like this
correlations could be there.

So yeah.

So these correlations can be what we call
cis or they can be in trans.

So cis just is getting back to Dale’s question

about whether or actually
was your question Sarantis, about whether

these variants, these SNPs are inside genes

or are they outside genes?

And if they're in genes or or in close
proximity

to the genes,
that code for the protein itself.

Right.

So you've got a variant
that's in a gene coding for a protein.

If you see a correlation
that's significant between

those two, we call that a ”cis-pQTL” and that’s a feel

good measure that says, oh,
we must be measuring the right protein.

If this is real,
then and there's ways to to press on it,

to check it and validate it, of course,
with us orthogonal data, but that's it.

So people often talk about cis-pQTL discoveries being verification

then of of having measured
the right protein

because of course our assay is not a Mass Spec method.

We're using antibodies as hooks.

We're using two antibodies
as a hook to hook

a protein out of out of solution.

And we have little single stranded
oligos attached to them.

So those oligos can then hybridize,
we can extend and amplify

that up just like any old library
prep for for sequencing.

And then we count those oligos

as a proxy for the original level
of the proteins in the sample.

And so when you're doing an affinity
method, right, a hooking method

to pull it out, not only is it
great for low abundant proteins,

that's one of the things we add value to
with with mass spec folks.

They, they like us because they can look
at areas of the proteome like

they couldn't see easily with mass spec,
without tons of sample

and a lot of a lot of control
of variability.

So it's, it's, it's
a nice method from that perspective,

but it's, it's a little bit indirect
because we're,

we're pulling out the protein
and converting it to DNA signal.

So making sure
we have a way to normalize those data

and an end, you know, just like with mass
spec in any proteomics experiment

to manage variability

from batch to batch, you know,
these are important aspects that that

proteomics scientists are much better
prepared to describe or explain than I am.

I have come to appreciate it.

Now, Cindy,
something that you touched upon,

right, was the sort of drug discovery
dimension of this.

But even before we go in that direction,
you also mentioned something

in terms of SNPs and genes,

the majority of GWAS, thousands, right.

3500 GWAS studies or

many, many, many.

Oftentimes these SNPs that are associated with risk.

Often
our gene deserts are there in. Right.

There's no function.

That's right.

What can you comment on that? Right.

So these pQTLs. Right.

are SNPs, but aren’t they just random, so to speak random places in the genome?

Yeah, so good question.

So I'm going to reference a paper
by Lasse Folkersen and Anders Mälarstig.

Now the two of them,
along with collaborators,

there's a long list of authors
that I won't I won't list.

Brilliant,
obviously across multiple cohorts,

they have their milestone paper
within a study

called the Scallop Study,
which is really a cohort of cohorts.

They were doing what the UK
Biobank wants to do.

This is me putting words in the UK Biobank’s mouth.

But I think that the Folkersen et al. milestone

publication, is a powerful precursor
to what the UK Biobank

is,

has the possibility to do. In Folkersen et al.

they looked at just 90 proteins
I say just although at the time in 2020

that was a lot of multiplex proteins of course

they looked at cardio, primarily
cardiovascular, what we what we broadly

categorize as cardiovascular proteins
and they did the same kind of study.

So on 30,000 samples,
they looked at 90 proteins

with genetic, clinical and proteomic data.

They did the correlations just like the UK
Biobank has done in their preprint.

The 90 proteins resulted in 450.

Yea, a little over 450 pQTLs

Some of those are cis-pQTLs as Sarantis hints on

Right.

So 88% of the proteins had cis-pQTLs identified there

That's like I said,
a feel good method, a method

or something we can kind of point to,
to say this, this, this looks like it's,

you know, increasing our confidence

that we're measuring the right protein,
although there are good biological reasons

why you might not see a cis-pQTL, but the remaining

trans-pQTLs were

essential discovery of trans-pQTLs is incredibly important

to understand protein-protein interactions.

So I may have taken a bit of a meandering
way to get back to your question, Dale,

about these relationships but trans-pQTLs

and figuring out where

those are coding, you know, what proteins

are they coding for, what gene regions
are they associated with?

It's not a trivial matter.

And so I've had discussions with Lasse
as well as Anderson

or sorry as well as Anders around this

this challenge.

And so just to define trans-pQTLs… so as a reminder cis-pQTLs

are where you have the gene variant is either in or near

the gene that codes for the protein
that you're measuring.

So the relationship between them,
the correlation is between

the gene region and the protein itself that’s a cis-pQTL

So you see you have say you have a particular protein

your measuring will say

TNF-alpha, or alpha-TNF and there’s a SNP

Then that codes for alpha TNF

is in the same chromosome within,
I don't know.

A couple hundred pairs
of a. Million base pairs.

So yeah.

When you're.

In the general in the general region
and so there could be

those million base pairs
a lot of other genes, but nonetheless.

Right.

You're saying that that particular snip
was controlling alpha TNF.

It's suggests that I think they might not.

Yeah, they may not say it quite
so strongly simply because there's you.

Know it's association.

Yeah. Yeah exactly. It's just.

statistical calculation. Got it.

and so with a trans-pQTL, what that is, is you’ve got a variant

you know,
you might have a gene coding for a protein

and that gene might be on chromosome nine.

But you might have the pQTL on chromosome 19.

You know, you might have it

on a completely different chromosome,
a correlation with that same protein.

So the sort of Occam's Razor,
you know, the easiest

the most straightforward possibility

is that that there's a relationship
between those two proteins, right?

That there’s protein-protein interaction going on there.

And in fact, the STRING database is a publicly available database

that records and collects and is curated around protein-protein interactions.

And so what the team would do,
you know, in asking them how they

how do they dig
into each of these relationships?

And what they would do is look,
they report the closest

gene to the location
that's in trans with this protein.

They they report the closest gene
geographically.

And then they also report

because they do kind of a deep dive into surrounding genes, as you say Dale.

There could be, you know, surrounding
genes that that might be implicated.

They look at those other surrounding genes
and they say, you know,

what's the shortest pathway
back to that protein?

And that is a fascinating conversation
because once you once you put together

a pathway analysis like that
and we talk about different diseases,

now you've got some pathways
in, say, Alzheimer's disease

and you’ve got some pathways in, say, schizophrenia.

I'm just picking
two neurological diseases.

And now if you can imagine a Venn diagram
of the pathways those two have in common,

and that is

an opportunity for us to understand
the mechanistic biology

that's in common between those two
neurological diseases, if any.

You know, I'm just picking those
out of the air.

If we can return back to that Folkersen
landmark paper.

Mm hmm.

So, if I understand correctly,
there were 90 proteins.

How many tens of thousands of samples?

30,000 samples. Just over.

Okay, so 30,000 samples times 90 proteins.

And they also had like whole genome data
on those 30,000 individuals.

Is that. Right?
They had genetic data that you could.

So I don't know that.

Remember, this is a cohort of cohorts.

So I think they had
GWAS data or genotyping data,

you know, array data on some of those
and sequencing data on others.

I wouldn't want to represent that,
but my guess is that they

they had variation, genetic data that they had in common.

Right.

Because you can convert

a whole genome sequencing dataset to a list of variants

Understood. And yeah. Right.

So they had all the genetic data
of 30,000 individuals.

They looked at these 90 proteins

and then you mentioned that they're able
to connect it then to disease.

Yeah. So so you do the same thing.

This looked at relationships
between genetic,

you know, state and protein levels.

So you look for all those correlations.

In this paper, they found 450 pQTLs that exceeded

their significance threshold.

And you could, you know,
as you touched on before there, you know,

that's why we have peer review
to make sure that we're not

that we're held accountable
for the number of tests that we're doing,

that we're you know,
we're really trying to be

as as transparent as possible
and publishing these data.

And by the way, it was published
in Nature Metabolism in 2020.

So once you

you see all the correlations, imagine
you have this list of correlations.

You can layer those clinical data then in.

So now you know the disease information

and you can look
at these different sets of data.

So genetics, proteomics and disease

and you can sample from these
and determine how often

with the relationships
between three of these units,

how often would that happen
by chance alone?

If it would happen by chance alone?

Quite often.

Then we let that fall away.

If it seems quite unusual
to see these relationships,

then those are the ones that we elevate
to potential causality.

And so in this paper they elevated
from the 450 relationships correlations.

They elevated 25
that they suggest appear causal.

And some of those examples
I think, are validated.

All I know are validated clinical targets
for existing therapies, super exciting

because then it's like, oh, looks like
we're on the right track, right?

And then of course some novel findings.

So they they report 14 validated
clinical targets,

known clinical targets like CASP-8 in breast cancer was one of them that

I can think of.

So CASP-8 is something known already before to be involved in breast cancer?

That’s right. And then they rediscovered it?

Yea, CASP-8 is a known therapeutic target in breast cancer.

I see.

And then 11 of those were novel,
so they were not able to see any evidence

of 11 of their findings that elevated
again to causality, potential causality.

And those are the exciting ones
for for a new programs potentially

and then and then 18

they they reported
18 potential repurposing opportunities.

So that's super exciting to me
because if you've got an existing drug

for one indication, say tocilizumab for rheumatoid arthritis

and yeah have you
you have the possibility of then

using that in a different indication that
that would be a repurposing opportunity.

So for example in eczema.

I guess it

doesn't make sense to think about
using an anti rheumatoid arthritis drug.

Right.

That's on market to treat eczema.

That's just I mean.

There's one in clinical trials.

I mean, coming back to the cohorts, Cindy,
I think also

the fact that these cohorts there
from different geographical places

increase the possibility to illuminate, for example, biases on SNPs.

Right.

Did you have any discussion with the authors about that?

Do they ever consider
that the bias, geographical bias

may influence their data?

Can you comment on this?

Yeah, it's a great question.

So they primarily

represent northern European populations.

There were there was some representation
of Asian populations in there,

but not a not a lot.

And I'm trying to remember, I don't think
there was any African diaspora

in this milestone

paper in the subset of samples
that they had in this milestone paper.

So that's that's a
you know, it's a blessing and a curse.

Right. For them.

It eases the analysis to your point
for the opportunity to make discoveries

because of diversity
within the ancestry of our genomes.

It’s a “miss”, right?

And an enormous potential future
opportunity, which I think

is very exciting and very important
for equity in health care.

I mean, essential.

So we
have to start somewhere, though, right?

So we start with the populations
that we have.

It's fascinating thinking about

the 90 proteins,
all the different things that discovered.

Right, these 25 drug targets for.

That explains
why the pharma interests in the UK Biobank

by doing the extrapolation.

Have you done the extrapolation?

How many drug targets they expect. Yeah.

So with this, you know it’s around five and a half percent of the pQTLs

discovered in Folkersen
et al, converted

to, you know, potentially causal. Interesting.

So if we applied that same percentage
which is lofty, right, that's

is a lot of proteins and,

and these 90 in Folkersen et al were well

studied you know considering across
30,000 samples so you know I would

I would expect maybe four, four
and a half percent to maybe

5% converting.

In this initial set of proteins, I think to be a little conservative,

you know, not not trying to be
too bullish, but even with that, we're

talking about potentially listing off
causal markers to examine, to investigate

potentially causal markers of,
you know, around 500.

So that.

Five hundred potential drug targets.”

Potential therapeutic targets.
That's right.

And to be fair, some of these might
show up as potential therapeutic targets

that would never be considered if they're
in signaling pathways, for example.

So so it's up to pharma.

And certainly people that are

are more versed in

clinical trials and potential,
you know, pathways

for these and implications of side effects
to then up score and down score these.

But the exciting aspect of this

is to have a systematic approach
by which to do that,

to actually make that list of 500
and then up score some and start programs.

Because we like to say

that clinical trials are twice
as likely to be successful.

If you go into that trial
with genetic information

that's certainly, you know, been published
and we like to say that

adding proteomic data,
I'd really love to see

what that means for our potential for

for improving our ability
to be successful in clinical trials.

And these 13, I guess if you take those 500 targets (or potential drug targets)

divided by 13 different pharma partners,
that's like, what, 35 apiece.

Yeah, that's right.
That's a lot of programs.

That's a lot of programs.

I mean, that's going to be
a wealth of data for them.

Now, I understand why they would
invest in such a project

when what is the next step

then in the UK Biobank project
and how people find out more about it?

Good question. Yeah.

So the what I fully expect
and I know of at least at least eight

abstracts that have been submitted
for ASHG this year.

Now ASHG, American Society
for Human Genetics,

as I mentioned earlier, will be

in Los Angeles in October.

And so I know that those pharma partners
and the researchers within those pharma

partners are submitting abstracts
to present there

and I'm sure some of them will get oral
or oral presentations.

Many of them will get poster
presentations.

But I will be keeping a close eye on that
and I will absolutely be there.

And I think we should do a podcast
episode.

There you go.

You will have a post ASHG.

This is what I got out of it.

That would be great.

And maybe drag a few guests on
if if we can.

That's great. Yeah, that'd be great. Yeah.

And as far as what I think is next, think
they're going to be digging into these

these correlations, 85% of them novel.

So roughly 8000 novel relationships
between genetic regions

and protein levels.

They're going to be looking into
which of those are appear

causal within certain diseases.

Do you know when that will be available?

Publicly-available data?

How how scientists can have access to
these?

Is it a easy process or a difficult process
to have access on that?

Yeah.

So as, as you probably know, Sarantis,
but our listeners may not know the UK

Biobank data through a data use agreement

is, is broadly available.

So this is one of the, the reasons there's
so much use of those data as validation

data and for discoveries with very clever

informatics scientists and biologists to

think of

creative ways to use such a large dataset,

the proteomics data,
the first set of proteomics data.

So the first 1500 proteins, the subject of the June bioRxiv paper

Those data have been stated

that they will be publicly available
by the end of the year.

So I expect, you know, by October at ASHG
we'll know better.

The timing for that,

yeah, those pharma partners,
of course have had access to those data

as they should, which is why
they were able to publish that,

that paper so quickly.

And so the next tranche of data
for the full 3000 proteins.

And can I just say, you know,

you see what's possible
with 90 proteins and Folkersen et al.

Imagine what's possible,
you know, with 30,000

proteins,
3000 proteins and 54,000 individuals.

That's a lot of power to deduct
relationships between proteins and

and many proteins
that really just haven't had assays

for, for examining them.

So just such a such an opportunity
for discovery.

We touched upon yeah,
we touched upon the enormous investment

made to-date to collect these 500,000 samples

That's right.

And to follow up
and all those like genetics.

Yeah.

The whole genome,
whole exome data on all these individuals

and then now overlaying empowering
the genomics with the proteomics.

It's as if we're a part of something
that is the next

big thing in genetics is proteomics.

I think it's you know,
and when you think about the

the central dogma of biology, right?

You've got DNA.

RNA, we've done
a great job of looking at DNA.

RNA has been our proxy for time biology
for a long time because it was

it was available to
to look at with sequencing technologies.

In fact you and I Dale, I think have talked about how the RNA-Seq

and the ability to do what we call “digital gene expression” sold

many of those initial instruments

that were,
you know, next generation sequencing instruments.

But now we have this
this ability to measure

proteins directly in a
in a very scalable way.

And I am excited, as you

know, about this capability,

but it's really the researchers
and what they can do with it

that will tell us the true
potential of this. Super.

Well. Thank you, Cindy,

for sharing your thoughts
on empowering genomics with proteomics.

And we'll see you soon.

That was great.

Thank you very much.

Thank you for listening to the Proteomics in Proximity podcast brought to you

by Olink Proteomics. To contact the hosts or for further information

simply email: info@olink.com.