The UK Biobank has enrolled half a million individuals and are "investigating the respective contributions of genetic predisposition and environmental exposure to the development of disease". Having begun in 2006, proteomics can greatly increase what can be discovered using this valuable resource. #ASHG22
Proteomics in Proximity discusses the intersection of proteomics with genomics for drug target discovery, the application of proteomics to reveal disease biomarkers, and current trends in using proteomics to unlock biological mechanisms. Co-hosted by Olink's Dale Yuzuki, Cindy Lawley and Sarantis Chlamydas.
Welcome to the Proteomics and Proximity podcast,
where your co-hosts Dale Yuzuki,
Cindy Lawley and Sarantis Chlamydas
from Olink Proteomics
talk about the intersection of proteomics
with genomics for drug target discovery,
the application of proteomics
to reveal disease biomarkers and current
trends in using proteomics
to unlock biological mechanisms.
Here we have your hosts, Dale,
Cindy and us.
Thank you for joining us on the Proteomics
and Proximity Podcast.
I'm your host Dale Yuzuki with my co-hosts
Cindy Lawley
and my other co-hosts are Sarantis Chlamydas.
Great.
This morning we are talking about Empower
Genomics with proteomics.
And Cindy,
I'd like to ask you the question.
If I'm involved in genomics, why should I add proteomics?
Well, I'm so happy that you asked me about that, Dale.
So, as you know, I tell this story
quite a bit, and so I'm delighted
to do it in the context with both of you,
because you add so much to this.
But you know, we've got a big project
going with the UK Biobank,
the UK
Biobank,
of course, being one of the largest
nationally associated population biobanks
in the world with clinical, genetic
and now on a subset of of over
54,000 samples
of proteomic data.
It's an exciting time,
I think, to demonstrate
in large populations the value of layering
proteomics onto genomics.
And of course we've we've across
many cohorts invest
did a lot in genomics
as the costs have have gone down.
Now to back up a little bit, the UK
Biobank tell me a little bit about it.
Sure.
So, yeah, it's as a as the name implies,
it's based in the UK,
it's affiliated with,
you know, the UK has,
you know, one of the largest single payer
health care systems in the world.
And having a population based biobank
primarily
of northern European descent or ancestry,
but certainly representing Asian
as well as as African
and African diaspora descent as well
Pakistani descent.
There's quite a nice subset of diversity
in the Biobank,
but primarily it's it's northern European.
You know, it was
it was started what I think 20 years ago.
I actually should know
that off the top of my head.
But it was it was started with the promise
of being able to characterize the value
of longitudinal information
to health care.
And I think Eric
Topol says that it takes about 20 years
on average to move
something from discovery to the clinic.
He uses the example of the stethoscope
and moving the stethoscope into the clinic
took 17 years.
That seems like a pretty simple mechanism,
right?
Listening to your heart.
But yet it took a long time for it
to be demonstrated and approved
and to get into the clinic
and routine use.
So by longitudinal, then you mean
I think they recruit what, half a million
individuals and longitudinal means
what they follow them over time.
It means that
they're able to call them back.
So they're able
they have medical access to their,
you know, clinical data over time.
They understand
over time what is, you know, what is
the outcomes within this population.
And I think that's incredibly valuable
that consenting has changed over time.
So the ability to actually call back
wasn't
initially
in many of these biobanks. Right.
And so , I think of FinnGen as one of those that another biobank
that's based in Finland,
a population health biobank as well.
They really did sort of lead the way
with some of the ability to share data
and protect it at the same time,
you know, primarily focused on genetics
and I would say UK Biobank as well
has led the way in these abilities
to work with both private
and public partnerships.
So being able to work with pharma
and in this case
with the proteomic data,
this initial set of proteomic data
that was initially started with ten pharma
partners.
The model was interestingly
based upon
the exome sequencing consortium.
So a group of pharma partners
came together in order to get some
to get the participants within the UK
Biobank exome sequenced.
There's of course also I think it's
150,000 individuals in the UK Biobank
that are also whole genome sequenced,
which is pretty phenomenal, right?
That's a huge number.
So a. Massive management.
And yeah, of all of those dollars and time
and and the potential for building
analysis tools with such large datasets.
Right. That's that's,
you know, goes without saying.
I remember I think it was 2018,
I was at ASHG,
which is the American Society
for Human Genetics,
and I was blown away by all of the talks
that were referencing the UK
Biobank data,
building tools, having discoveries.
Right.
I'm excited to see that same
evolution in
in the discussion
with crowdsourcing
these data with proteomics.
So to get back to your original question,
Dale, what's the value?
I think Karsten Suhre would say that
when you have genetic data
and you have disease,
that there's a certain power
you have to detect the relationship
between the two.
And in some cases,
we have smoking guns like BRCA. Right.
So we're able to to see that there's
a lot of penetrance for a variant that
that shows up and has a lot of influence
on on predisposition to disease.
But for most diseases, it's death
by a thousand cuts, right?
Small amounts of influence.
Cindy, this UK
Biobank sounds so interesting.
What can you tell me more about it?
Yeah. So.
So the UK Biobank itself is a longitudinal collection of data
I think it started in the mid 2000.
So around 2006 they targeted an age group,
sort of middle age group
and they followed them over time
and so it's over
half a million individuals within the UK.
Yeah, quite, quite a undertaking to enroll
all those participants.
And what I will never forget
is when I attended ASHG
the American Society of Human Genetics
in in 2018
the number of talks
I remember searching on UK BB
as just in a short
an acronym and the number of talks
talking about using the UK
B data, particularly genetic data
for validating clinical findings
was there were a lot of talks.
So it's always been high on Illumina's
radar and it's very high
on all of those sequencing
technology innovator radar, innovator’s radars
like Thermo Fisher, of course,
and all of those library
prep companies that have different
methods of library prep.
Have they already sequenced everybody?
So they've got a whole genome sequence
on, I think it's about 150,000 individuals
that was primarily
led by Decode Genetics.
As I understand it,
the publication is pretty recent actually.
So it's a
there's a lot to dig into.
It's a pretty phenomenal dataset.
The bulk of the sequencing for most
samples, I believe is exome sequencing.
That's my understanding.
So I think it's it's over
450,000 individuals.
So I think for some they've got both.
And since it's a single payer system
with the electronic records of NHS,
that means they can drill down
into exactly right
their whole exome sequence and whatever condition they may have
And this is an ongoing thing.
Is that right? Cancer, diabetes?
That's right.
And the ability to actually return results
to those patients,
I think has evolved over time. Right.
Because that costs money
to set expectations,
make sure that we're we're communicating,
you know, in a way that's best practices.
So I think that the UK
Biobank has spearheaded
a lot of our understanding about
best practices there as well.
And as far as.... Go ahead.
Oh, I was just going to say so
back to your original question
about what's the value of layering
proteomics onto genomics.
This is a great example where an enormous
amount of investment has gone
into collecting genetic information
for this very valuable population,
with advances in diagnostics,
in guiding cancer treatment.
So a lot of these advances
have made it to the clinic already,
which is pretty phenomenal.
And that's been driven,
you know, globally, it's it's exciting
where proteomics fits in
or the way that I think of it.
I was
I had a conversation with Karsten Suhre,
who's at Weill Cornell in Qatar
and in New York and he he really
I had an aha moment with him.
He he essentially would say that
an intermediate phenotype like proteomics
acts to magnify the effect
between genetics and disease.
So, of course,
we've been looking for these associations
between genetics and disease
since
we've been collecting genetic information.
And some of those
links are hard to see because
we need so many samples
to be able to see them.
And so as we've increased the numbers
of samples like in the UK Biobank,
we're able to make these associations
more clearly.
And I'll say, you know, we wished early on
we hoped for smoking
guns for a lot of diseases
and we we did see a few of them.
Right.
So there's certainly PCSK9
for familial hypercholesterolemia.
There are some standard ones, BRCA for breast cancer.
There's there's some examples where
we have a lot of penetrance or a lot of,
you know, a lot of affect on someone's
likelihood of getting a disease
from single variants
or single loci or single genes.
But for most diseases like Type 2 diabetes, cardiovascular disease
this is a death by a thousand cuts, meaning lots of variants.
Give a little tiny effect
in changing our risk.
And so that's where that's
where having having the ability
to amplify
or to put a magnifying glass on those
those relationships between genetics
and disease is incredibly useful.
So I think, you know, a
ton of of work has been done in proteomics
and cardiovascular disease.
And I think many advances have happened there.
Now getting back to the UK
Biobank.
You mentioned before that recently
they were working with Olink
then to look at the proteome of tens
of thousands of individuals.
Yeah. Yeah.
I would say it's not just the UK Biobank,
but but 13 pharma partners.
Right.
So, so it certainly required consent
and partnership with the UK Biobank,
just like the exome sequencing
consortium was was done in collaboration
with the UK Biobank.
But the access to the technology,
just like with the exome
sequencing consortium, access
to the technology was spearheaded
by pharma partners that were very keen
to build a structure for a more,
I like to say, a systematic approach
to therapeutic target discovery,
not only biomarker discovery,
which is sort of traditional proteomics,
but to to therapeutic target discovery,
which I think is enabled by genetics,
proteomics as well as clinical data.
So we're getting back to this idea
of empowering genomics
with proteomics, right?
What can you tell me about that?
Yeah.
So I think there's this,
you know, I immediately think of that
Karsten Suhre, magnifying glass, right.
But the, the UK Biobank’s
initial findings
which are in a preprint that came out
in June, middle of June, that’s on bioRxiv.
Their initial paper
really just was scratching
the surface of what's possible
with this enormous dataset.
So their first paper
was about 1500 proteins.
So our first, our first product
that Olink first product
on the Explore platform that has the NGS readout.
So they use that first tranche of proteins
across 54,000 samples
and the first really the bulk of
those data are to look at correlations
between gene regions,
you know,
and the genotypes in those gene regions.
And protein levels.
So really just looking
what are the correlations?
What's the list of all the possible
relationships between genetic regions
and protein levels
that might be elucidated
and examined
further in this beautiful dataset?
What I
you know, and I'll speculate on what
I think they're going to be doing next
and what my guess is that
they're very deep into doing this within
these companies is to
then do Mendelian randomization,
which is a statistical approach to kind of
determine which of these relationships,
which of these correlations
between gene regions and
and protein levels.
You know, when you put it when you bring
in the clinical data on disease,
which of these hold up as being unlikely
to be happening by chance alone?
So now you sort of have the the ones that
are likely just, you know, coincidence.
I mean, they might still be important,
but but let's just I think pharma
would like to have
ten great targets, over 400
unsure
targets, because that
that's a lot of rabbit holes to go down.
So if they can narrow it down, then
then I think there's a lot of excitement
around being able
to have some quick wins with proteomics,
genomics and clinical data.
Well, to back up just a little bit,
you're talking about
Mendelian randomization, you're talking
about genomic data in terms of a whole
exome of 10,000 people or 50,000 people,
and now you're talking about 1500
proteins.
Can you walk me through that a little bit?
Yeah, sure.
So so when you're looking
at the genetic data and here we've got,
you know, some whole genome sequencing
data as well as exome sequencing data.
So you can imagine you have a list of ways
or places in the genome almost like
like geographic locations,
almost GPS coordinates in the
on the chromosomes where we know
they vary across the samples.
So those variable regions, we’ll call them SNPs, you know,
that that's the term that that we use
for the simplest kind of variation,
just single base pair variation,
but we'll just call them snips
because, because there can be other kinds
of variation that are captured there too.
But if you look at the variance, it's
just a single variant within the genome.
You can look at the
the representation of what people's
genotype
is at that location,
and you can look at every single protein
in that 1500 protein list and see,
do we have a significant correlation
between the genotype
and the protein level?
So that's sort of the first step.
That's a lot of tests write comparisons.
Cindy, so I understand these SNPs can also be outside of the gene.
Right would be also make it
So they could be regulatory regions. Absolutely. It’s not necessarily falling into the gene
Is there any threshold to what they are checking, what region they check in the gene?
Yeah.
So there's both a statistical threshold
that they accept as a as a standard.
But also when you're doing so many tests,
you have to correct
for multiple tests, right?
Because the more tests you do,
the you're increasing your chances
of seeing a false positive.
So adjusting for that
is something that, you know,
we go through peer review to make sure
we have best practices and agreement on.
I mean, these statistical
associations are massive.
I mean, in a given single individual's
whole genome, you're looking at maybe
four million SNPs? Yeah, right.
You have 4 million SNPs and then you’ve got
1500 proteins you're associating
those with, if I understand you correctly.
Yeah.
And then you multiply this times
what they did, 54,000 individuals.
They did 54,000. So I mean.
And they discovered,
you know, about 10,000,
I think it was around 10,200 relationships
between gene regions and protein levels.
Right. That's a massive number.
So that's those those could many of those
could just be coincidence, right?
Just correlations, not causation. Right.
We were all familiar with that,
that phrase.
So so 85% of those
relationships were novel.
Now, the relationship.
Cindy, could they be both in cis and trans, both of them?
Correlation or.
What's that?
Sarantis. Cis and trans would be like this
correlations could be there.
So yeah.
So these correlations can be what we call
cis or they can be in trans.
So cis just is getting back to Dale’s question
about whether or actually
was your question Sarantis, about whether
these variants, these SNPs are inside genes
or are they outside genes?
And if they're in genes or or in close
proximity
to the genes,
that code for the protein itself.
Right.
So you've got a variant
that's in a gene coding for a protein.
If you see a correlation
that's significant between
those two, we call that a ”cis-pQTL” and that’s a feel
good measure that says, oh,
we must be measuring the right protein.
If this is real,
then and there's ways to to press on it,
to check it and validate it, of course,
with us orthogonal data, but that's it.
So people often talk about cis-pQTL discoveries being verification
then of of having measured
the right protein
because of course our assay is not a Mass Spec method.
We're using antibodies as hooks.
We're using two antibodies
as a hook to hook
a protein out of out of solution.
And we have little single stranded
oligos attached to them.
So those oligos can then hybridize,
we can extend and amplify
that up just like any old library
prep for for sequencing.
And then we count those oligos
as a proxy for the original level
of the proteins in the sample.
And so when you're doing an affinity
method, right, a hooking method
to pull it out, not only is it
great for low abundant proteins,
that's one of the things we add value to
with with mass spec folks.
They, they like us because they can look
at areas of the proteome like
they couldn't see easily with mass spec,
without tons of sample
and a lot of a lot of control
of variability.
So it's, it's, it's
a nice method from that perspective,
but it's, it's a little bit indirect
because we're,
we're pulling out the protein
and converting it to DNA signal.
So making sure
we have a way to normalize those data
and an end, you know, just like with mass
spec in any proteomics experiment
to manage variability
from batch to batch, you know,
these are important aspects that that
proteomics scientists are much better
prepared to describe or explain than I am.
I have come to appreciate it.
Now, Cindy,
something that you touched upon,
right, was the sort of drug discovery
dimension of this.
But even before we go in that direction,
you also mentioned something
in terms of SNPs and genes,
the majority of GWAS, thousands, right.
3500 GWAS studies or
many, many, many.
Oftentimes these SNPs that are associated with risk.
Often
our gene deserts are there in. Right.
There's no function.
That's right.
What can you comment on that? Right.
So these pQTLs. Right.
are SNPs, but aren’t they just random, so to speak random places in the genome?
Yeah, so good question.
So I'm going to reference a paper
by Lasse Folkersen and Anders Mälarstig.
Now the two of them,
along with collaborators,
there's a long list of authors
that I won't I won't list.
Brilliant,
obviously across multiple cohorts,
they have their milestone paper
within a study
called the Scallop Study,
which is really a cohort of cohorts.
They were doing what the UK
Biobank wants to do.
This is me putting words in the UK Biobank’s mouth.
But I think that the Folkersen et al. milestone
publication, is a powerful precursor
to what the UK Biobank
is,
has the possibility to do. In Folkersen et al.
they looked at just 90 proteins
I say just although at the time in 2020
that was a lot of multiplex proteins of course
they looked at cardio, primarily
cardiovascular, what we what we broadly
categorize as cardiovascular proteins
and they did the same kind of study.
So on 30,000 samples,
they looked at 90 proteins
with genetic, clinical and proteomic data.
They did the correlations just like the UK
Biobank has done in their preprint.
The 90 proteins resulted in 450.
Yea, a little over 450 pQTLs
Some of those are cis-pQTLs as Sarantis hints on
Right.
So 88% of the proteins had cis-pQTLs identified there
That's like I said,
a feel good method, a method
or something we can kind of point to,
to say this, this, this looks like it's,
you know, increasing our confidence
that we're measuring the right protein,
although there are good biological reasons
why you might not see a cis-pQTL, but the remaining
trans-pQTLs were
essential discovery of trans-pQTLs is incredibly important
to understand protein-protein interactions.
So I may have taken a bit of a meandering
way to get back to your question, Dale,
about these relationships but trans-pQTLs
and figuring out where
those are coding, you know, what proteins
are they coding for, what gene regions
are they associated with?
It's not a trivial matter.
And so I've had discussions with Lasse
as well as Anderson
or sorry as well as Anders around this
this challenge.
And so just to define trans-pQTLs… so as a reminder cis-pQTLs
are where you have the gene variant is either in or near
the gene that codes for the protein
that you're measuring.
So the relationship between them,
the correlation is between
the gene region and the protein itself that’s a cis-pQTL
So you see you have say you have a particular protein
your measuring will say
TNF-alpha, or alpha-TNF and there’s a SNP
Then that codes for alpha TNF
is in the same chromosome within,
I don't know.
A couple hundred pairs
of a. Million base pairs.
So yeah.
When you're.
In the general in the general region
and so there could be
those million base pairs
a lot of other genes, but nonetheless.
Right.
You're saying that that particular snip
was controlling alpha TNF.
It's suggests that I think they might not.
Yeah, they may not say it quite
so strongly simply because there's you.
Know it's association.
Yeah. Yeah exactly. It's just.
statistical calculation. Got it.
and so with a trans-pQTL, what that is, is you’ve got a variant
you know,
you might have a gene coding for a protein
and that gene might be on chromosome nine.
But you might have the pQTL on chromosome 19.
You know, you might have it
on a completely different chromosome,
a correlation with that same protein.
So the sort of Occam's Razor,
you know, the easiest
the most straightforward possibility
is that that there's a relationship
between those two proteins, right?
That there’s protein-protein interaction going on there.
And in fact, the STRING database is a publicly available database
that records and collects and is curated around protein-protein interactions.
And so what the team would do,
you know, in asking them how they
how do they dig
into each of these relationships?
And what they would do is look,
they report the closest
gene to the location
that's in trans with this protein.
They they report the closest gene
geographically.
And then they also report
because they do kind of a deep dive into surrounding genes, as you say Dale.
There could be, you know, surrounding
genes that that might be implicated.
They look at those other surrounding genes
and they say, you know,
what's the shortest pathway
back to that protein?
And that is a fascinating conversation
because once you once you put together
a pathway analysis like that
and we talk about different diseases,
now you've got some pathways
in, say, Alzheimer's disease
and you’ve got some pathways in, say, schizophrenia.
I'm just picking
two neurological diseases.
And now if you can imagine a Venn diagram
of the pathways those two have in common,
and that is
an opportunity for us to understand
the mechanistic biology
that's in common between those two
neurological diseases, if any.
You know, I'm just picking those
out of the air.
If we can return back to that Folkersen
landmark paper.
Mm hmm.
So, if I understand correctly,
there were 90 proteins.
How many tens of thousands of samples?
30,000 samples. Just over.
Okay, so 30,000 samples times 90 proteins.
And they also had like whole genome data
on those 30,000 individuals.
Is that. Right?
They had genetic data that you could.
So I don't know that.
Remember, this is a cohort of cohorts.
So I think they had
GWAS data or genotyping data,
you know, array data on some of those
and sequencing data on others.
I wouldn't want to represent that,
but my guess is that they
they had variation, genetic data that they had in common.
Right.
Because you can convert
a whole genome sequencing dataset to a list of variants
Understood. And yeah. Right.
So they had all the genetic data
of 30,000 individuals.
They looked at these 90 proteins
and then you mentioned that they're able
to connect it then to disease.
Yeah. So so you do the same thing.
This looked at relationships
between genetic,
you know, state and protein levels.
So you look for all those correlations.
In this paper, they found 450 pQTLs that exceeded
their significance threshold.
And you could, you know,
as you touched on before there, you know,
that's why we have peer review
to make sure that we're not
that we're held accountable
for the number of tests that we're doing,
that we're you know,
we're really trying to be
as as transparent as possible
and publishing these data.
And by the way, it was published
in Nature Metabolism in 2020.
So once you
you see all the correlations, imagine
you have this list of correlations.
You can layer those clinical data then in.
So now you know the disease information
and you can look
at these different sets of data.
So genetics, proteomics and disease
and you can sample from these
and determine how often
with the relationships
between three of these units,
how often would that happen
by chance alone?
If it would happen by chance alone?
Quite often.
Then we let that fall away.
If it seems quite unusual
to see these relationships,
then those are the ones that we elevate
to potential causality.
And so in this paper they elevated
from the 450 relationships correlations.
They elevated 25
that they suggest appear causal.
And some of those examples
I think, are validated.
All I know are validated clinical targets
for existing therapies, super exciting
because then it's like, oh, looks like
we're on the right track, right?
And then of course some novel findings.
So they they report 14 validated
clinical targets,
known clinical targets like CASP-8 in breast cancer was one of them that
I can think of.
So CASP-8 is something known already before to be involved in breast cancer?
That’s right. And then they rediscovered it?
Yea, CASP-8 is a known therapeutic target in breast cancer.
I see.
And then 11 of those were novel,
so they were not able to see any evidence
of 11 of their findings that elevated
again to causality, potential causality.
And those are the exciting ones
for for a new programs potentially
and then and then 18
they they reported
18 potential repurposing opportunities.
So that's super exciting to me
because if you've got an existing drug
for one indication, say tocilizumab for rheumatoid arthritis
and yeah have you
you have the possibility of then
using that in a different indication that
that would be a repurposing opportunity.
So for example in eczema.
I guess it
doesn't make sense to think about
using an anti rheumatoid arthritis drug.
Right.
That's on market to treat eczema.
That's just I mean.
There's one in clinical trials.
I mean, coming back to the cohorts, Cindy,
I think also
the fact that these cohorts there
from different geographical places
increase the possibility to illuminate, for example, biases on SNPs.
Right.
Did you have any discussion with the authors about that?
Do they ever consider
that the bias, geographical bias
may influence their data?
Can you comment on this?
Yeah, it's a great question.
So they primarily
represent northern European populations.
There were there was some representation
of Asian populations in there,
but not a not a lot.
And I'm trying to remember, I don't think
there was any African diaspora
in this milestone
paper in the subset of samples
that they had in this milestone paper.
So that's that's a
you know, it's a blessing and a curse.
Right. For them.
It eases the analysis to your point
for the opportunity to make discoveries
because of diversity
within the ancestry of our genomes.
It’s a “miss”, right?
And an enormous potential future
opportunity, which I think
is very exciting and very important
for equity in health care.
I mean, essential.
So we
have to start somewhere, though, right?
So we start with the populations
that we have.
It's fascinating thinking about
the 90 proteins,
all the different things that discovered.
Right, these 25 drug targets for.
That explains
why the pharma interests in the UK Biobank
by doing the extrapolation.
Have you done the extrapolation?
How many drug targets they expect. Yeah.
So with this, you know it’s around five and a half percent of the pQTLs
discovered in Folkersen
et al, converted
to, you know, potentially causal. Interesting.
So if we applied that same percentage
which is lofty, right, that's
is a lot of proteins and,
and these 90 in Folkersen et al were well
studied you know considering across
30,000 samples so you know I would
I would expect maybe four, four
and a half percent to maybe
5% converting.
In this initial set of proteins, I think to be a little conservative,
you know, not not trying to be
too bullish, but even with that, we're
talking about potentially listing off
causal markers to examine, to investigate
potentially causal markers of,
you know, around 500.
So that.
Five hundred potential drug targets.”
Potential therapeutic targets.
That's right.
And to be fair, some of these might
show up as potential therapeutic targets
that would never be considered if they're
in signaling pathways, for example.
So so it's up to pharma.
And certainly people that are
are more versed in
clinical trials and potential,
you know, pathways
for these and implications of side effects
to then up score and down score these.
But the exciting aspect of this
is to have a systematic approach
by which to do that,
to actually make that list of 500
and then up score some and start programs.
Because we like to say
that clinical trials are twice
as likely to be successful.
If you go into that trial
with genetic information
that's certainly, you know, been published
and we like to say that
adding proteomic data,
I'd really love to see
what that means for our potential for
for improving our ability
to be successful in clinical trials.
And these 13, I guess if you take those 500 targets (or potential drug targets)
divided by 13 different pharma partners,
that's like, what, 35 apiece.
Yeah, that's right.
That's a lot of programs.
That's a lot of programs.
I mean, that's going to be
a wealth of data for them.
Now, I understand why they would
invest in such a project
when what is the next step
then in the UK Biobank project
and how people find out more about it?
Good question. Yeah.
So the what I fully expect
and I know of at least at least eight
abstracts that have been submitted
for ASHG this year.
Now ASHG, American Society
for Human Genetics,
as I mentioned earlier, will be
in Los Angeles in October.
And so I know that those pharma partners
and the researchers within those pharma
partners are submitting abstracts
to present there
and I'm sure some of them will get oral
or oral presentations.
Many of them will get poster
presentations.
But I will be keeping a close eye on that
and I will absolutely be there.
And I think we should do a podcast
episode.
There you go.
You will have a post ASHG.
This is what I got out of it.
That would be great.
And maybe drag a few guests on
if if we can.
That's great. Yeah, that'd be great. Yeah.
And as far as what I think is next, think
they're going to be digging into these
these correlations, 85% of them novel.
So roughly 8000 novel relationships
between genetic regions
and protein levels.
They're going to be looking into
which of those are appear
causal within certain diseases.
Do you know when that will be available?
Publicly-available data?
How how scientists can have access to
these?
Is it a easy process or a difficult process
to have access on that?
Yeah.
So as, as you probably know, Sarantis,
but our listeners may not know the UK
Biobank data through a data use agreement
is, is broadly available.
So this is one of the, the reasons there's
so much use of those data as validation
data and for discoveries with very clever
informatics scientists and biologists to
think of
creative ways to use such a large dataset,
the proteomics data,
the first set of proteomics data.
So the first 1500 proteins, the subject of the June bioRxiv paper
Those data have been stated
that they will be publicly available
by the end of the year.
So I expect, you know, by October at ASHG
we'll know better.
The timing for that,
yeah, those pharma partners,
of course have had access to those data
as they should, which is why
they were able to publish that,
that paper so quickly.
And so the next tranche of data
for the full 3000 proteins.
And can I just say, you know,
you see what's possible
with 90 proteins and Folkersen et al.
Imagine what's possible,
you know, with 30,000
proteins,
3000 proteins and 54,000 individuals.
That's a lot of power to deduct
relationships between proteins and
and many proteins
that really just haven't had assays
for, for examining them.
So just such a such an opportunity
for discovery.
We touched upon yeah,
we touched upon the enormous investment
made to-date to collect these 500,000 samples
That's right.
And to follow up
and all those like genetics.
Yeah.
The whole genome,
whole exome data on all these individuals
and then now overlaying empowering
the genomics with the proteomics.
It's as if we're a part of something
that is the next
big thing in genetics is proteomics.
I think it's you know,
and when you think about the
the central dogma of biology, right?
You've got DNA.
RNA, we've done
a great job of looking at DNA.
RNA has been our proxy for time biology
for a long time because it was
it was available to
to look at with sequencing technologies.
In fact you and I Dale, I think have talked about how the RNA-Seq
and the ability to do what we call “digital gene expression” sold
many of those initial instruments
that were,
you know, next generation sequencing instruments.
But now we have this
this ability to measure
proteins directly in a
in a very scalable way.
And I am excited, as you
know, about this capability,
but it's really the researchers
and what they can do with it
that will tell us the true
potential of this. Super.
Well. Thank you, Cindy,
for sharing your thoughts
on empowering genomics with proteomics.
And we'll see you soon.
That was great.
Thank you very much.
Thank you for listening to the Proteomics in Proximity podcast brought to you
by Olink Proteomics. To contact the hosts or for further information
simply email: info@olink.com.