1
00:00:06,458 --> 00:00:10,500
Welcome to the Proteomics and Proximity podcast,
where your co-hosts Dale Yuzuki,

2
00:00:10,500 --> 00:00:13,916
Cindy Lawley and Sarantis Chlamydas
from Olink Proteomics

3
00:00:14,250 --> 00:00:18,458
talk about the intersection of proteomics
with genomics for drug target discovery,

4
00:00:18,708 --> 00:00:22,750
the application of proteomics
to reveal disease biomarkers and current

5
00:00:22,750 --> 00:00:26,875
trends in using proteomics
to unlock biological mechanisms.

6
00:00:27,166 --> 00:00:30,583
Here we have your hosts, Dale,
Cindy and us.

7
00:00:30,958 --> 00:00:35,375
Thank you for joining us on the Proteomics
and Proximity Podcast.

8
00:00:35,708 --> 00:00:38,583
I'm your host Dale Yuzuki with my co-hosts
Cindy Lawley

9
00:00:38,916 --> 00:00:41,458
and my other co-hosts are Sarantis Chlamydas.

10
00:00:42,166 --> 00:00:42,625
Great.

11
00:00:42,625 --> 00:00:47,500
This morning we are talking about Empower
Genomics with proteomics.

12
00:00:48,000 --> 00:00:50,625
And Cindy,
I'd like to ask you the question.

13
00:00:50,875 --> 00:00:54,708
If I'm involved in genomics, why should I add proteomics?

14
00:00:55,583 --> 00:00:58,208
Well, I'm so happy that you asked me about that, Dale.

15
00:00:58,500 --> 00:01:03,666
So, as you know, I tell this story
quite a bit, and so I'm delighted

16
00:01:03,833 --> 00:01:08,875
to do it in the context with both of you,
because you add so much to this.

17
00:01:08,875 --> 00:01:12,958
But you know, we've got a big project
going with the UK Biobank,

18
00:01:14,541 --> 00:01:15,166
the UK

19
00:01:15,166 --> 00:01:18,250
Biobank,
of course, being one of the largest

20
00:01:18,833 --> 00:01:21,833
nationally associated population biobanks

21
00:01:21,833 --> 00:01:25,208
in the world with clinical, genetic

22
00:01:25,541 --> 00:01:29,583
and now on a subset of of over
54,000 samples

23
00:01:30,250 --> 00:01:32,750
of proteomic data.

24
00:01:33,041 --> 00:01:35,666
It's an exciting time,
I think, to demonstrate

25
00:01:35,666 --> 00:01:40,708
in large populations the value of layering
proteomics onto genomics.

26
00:01:40,708 --> 00:01:44,041
And of course we've we've across
many cohorts invest

27
00:01:44,041 --> 00:01:49,041
did a lot in genomics
as the costs have have gone down.

28
00:01:49,541 --> 00:01:53,750
Now to back up a little bit, the UK
Biobank tell me a little bit about it.

29
00:01:54,625 --> 00:01:55,458
Sure.

30
00:01:55,500 --> 00:02:00,875
So, yeah, it's as a as the name implies,
it's based in the UK,

31
00:02:01,250 --> 00:02:05,000
it's affiliated with,
you know, the UK has,

32
00:02:05,000 --> 00:02:08,625
you know, one of the largest single payer
health care systems in the world.

33
00:02:09,083 --> 00:02:12,208
And having a population based biobank

34
00:02:12,583 --> 00:02:16,250
primarily
of northern European descent or ancestry,

35
00:02:16,250 --> 00:02:20,125
but certainly representing Asian
as well as as African

36
00:02:20,500 --> 00:02:24,083
and African diaspora descent as well
Pakistani descent.

37
00:02:24,083 --> 00:02:26,708
There's quite a nice subset of diversity

38
00:02:26,708 --> 00:02:29,875
in the Biobank,
but primarily it's it's northern European.

39
00:02:30,625 --> 00:02:34,208
You know, it was
it was started what I think 20 years ago.

40
00:02:34,208 --> 00:02:36,000
I actually should know
that off the top of my head.

41
00:02:36,000 --> 00:02:42,166
But it was it was started with the promise
of being able to characterize the value

42
00:02:42,583 --> 00:02:46,083
of longitudinal information
to health care.

43
00:02:46,666 --> 00:02:51,125
And I think Eric
Topol says that it takes about 20 years

44
00:02:51,125 --> 00:02:55,041
on average to move
something from discovery to the clinic.

45
00:02:55,875 --> 00:02:59,958
He uses the example of the stethoscope

46
00:03:00,750 --> 00:03:04,166
and moving the stethoscope into the clinic
took 17 years.

47
00:03:04,333 --> 00:03:08,000
That seems like a pretty simple mechanism,
right?

48
00:03:08,000 --> 00:03:09,708
Listening to your heart.

49
00:03:09,708 --> 00:03:13,333
But yet it took a long time for it
to be demonstrated and approved

50
00:03:13,875 --> 00:03:16,041
and to get into the clinic
and routine use.

51
00:03:16,583 --> 00:03:21,125
So by longitudinal, then you mean
I think they recruit what, half a million

52
00:03:21,125 --> 00:03:25,041
individuals and longitudinal means
what they follow them over time.

53
00:03:25,375 --> 00:03:27,625
It means that
they're able to call them back.

54
00:03:27,625 --> 00:03:28,375
So they're able

55
00:03:28,375 --> 00:03:32,583
they have medical access to their,
you know, clinical data over time.

56
00:03:32,583 --> 00:03:38,208
They understand
over time what is, you know, what is

57
00:03:39,833 --> 00:03:42,291
the outcomes within this population.

58
00:03:42,583 --> 00:03:47,166
And I think that's incredibly valuable
that consenting has changed over time.

59
00:03:47,166 --> 00:03:50,250
So the ability to actually call back
wasn't

60
00:03:50,958 --> 00:03:53,750
initially
in many of these biobanks. Right.

61
00:03:53,750 --> 00:03:58,416
And so , I think of FinnGen as one of those that another biobank

62
00:03:58,416 --> 00:04:02,500
that's based in Finland,
a population health biobank as well.

63
00:04:03,166 --> 00:04:07,791
They really did sort of lead the way
with some of the ability to share data

64
00:04:08,333 --> 00:04:12,541
and protect it at the same time,
you know, primarily focused on genetics

65
00:04:12,875 --> 00:04:17,708
and I would say UK Biobank as well
has led the way in these abilities

66
00:04:17,708 --> 00:04:22,041
to work with both private
and public partnerships.

67
00:04:22,041 --> 00:04:26,125
So being able to work with pharma
and in this case

68
00:04:26,125 --> 00:04:29,708
with the proteomic data,
this initial set of proteomic data

69
00:04:30,208 --> 00:04:35,250
that was initially started with ten pharma
partners.

70
00:04:35,750 --> 00:04:38,416
The model was interestingly

71
00:04:38,416 --> 00:04:41,291
based upon
the exome sequencing consortium.

72
00:04:41,916 --> 00:04:46,583
So a group of pharma partners
came together in order to get some

73
00:04:47,000 --> 00:04:51,958
to get the participants within the UK
Biobank exome sequenced.

74
00:04:52,500 --> 00:04:58,250
There's of course also I think it's
150,000 individuals in the UK Biobank

75
00:04:58,250 --> 00:05:01,875
that are also whole genome sequenced,
which is pretty phenomenal, right?

76
00:05:02,166 --> 00:05:04,083
That's a huge number.

77
00:05:04,083 --> 00:05:05,666
So a. Massive management.

78
00:05:05,666 --> 00:05:10,166
And yeah, of all of those dollars and time

79
00:05:10,583 --> 00:05:15,500
and and the potential for building
analysis tools with such large datasets.

80
00:05:15,500 --> 00:05:18,416
Right. That's that's,
you know, goes without saying.

81
00:05:18,416 --> 00:05:21,791
I remember I think it was 2018,
I was at ASHG,

82
00:05:21,791 --> 00:05:24,208
which is the American Society
for Human Genetics,

83
00:05:24,750 --> 00:05:29,208
and I was blown away by all of the talks
that were referencing the UK

84
00:05:29,250 --> 00:05:32,958
Biobank data,
building tools, having discoveries.

85
00:05:32,958 --> 00:05:33,416
Right.

86
00:05:33,500 --> 00:05:37,083
I'm excited to see that same

87
00:05:38,041 --> 00:05:40,666
evolution in

88
00:05:40,666 --> 00:05:43,250
in the discussion

89
00:05:43,250 --> 00:05:47,250
with crowdsourcing
these data with proteomics.

90
00:05:48,125 --> 00:05:51,583
So to get back to your original question,
Dale, what's the value?

91
00:05:52,208 --> 00:05:56,541
I think Karsten Suhre would say that
when you have genetic data

92
00:05:56,666 --> 00:05:59,750
and you have disease,
that there's a certain power

93
00:05:59,750 --> 00:06:02,333
you have to detect the relationship
between the two.

94
00:06:03,041 --> 00:06:06,750
And in some cases,
we have smoking guns like BRCA. Right.

95
00:06:06,750 --> 00:06:11,000
So we're able to to see that there's
a lot of penetrance for a variant that

96
00:06:11,291 --> 00:06:15,041
that shows up and has a lot of influence
on on predisposition to disease.

97
00:06:15,875 --> 00:06:19,583
But for most diseases, it's death
by a thousand cuts, right?

98
00:06:19,583 --> 00:06:24,125
Small amounts of influence.

99
00:06:24,625 --> 00:06:27,208
Cindy, this UK
Biobank sounds so interesting.

100
00:06:27,208 --> 00:06:29,541
What can you tell me more about it?

101
00:06:29,541 --> 00:06:30,208
Yeah. So.

102
00:06:30,208 --> 00:06:35,708
So the UK Biobank itself is a longitudinal collection of data

103
00:06:35,750 --> 00:06:38,875
I think it started in the mid 2000.

104
00:06:38,875 --> 00:06:45,083
So around 2006 they targeted an age group,
sort of middle age group

105
00:06:45,416 --> 00:06:47,208
and they followed them over time

106
00:06:47,208 --> 00:06:51,083
and so it's over
half a million individuals within the UK.

107
00:06:51,166 --> 00:06:56,333
Yeah, quite, quite a undertaking to enroll
all those participants.

108
00:06:56,333 --> 00:07:01,041
And what I will never forget
is when I attended ASHG

109
00:07:01,666 --> 00:07:06,708
the American Society of Human Genetics
in in 2018

110
00:07:07,416 --> 00:07:11,208
the number of talks
I remember searching on UK BB

111
00:07:11,375 --> 00:07:15,166
as just in a short
an acronym and the number of talks

112
00:07:15,166 --> 00:07:19,583
talking about using the UK
B data, particularly genetic data

113
00:07:20,291 --> 00:07:22,708
for validating clinical findings

114
00:07:22,708 --> 00:07:25,791
was there were a lot of talks.

115
00:07:25,791 --> 00:07:29,291
So it's always been high on Illumina's

116
00:07:29,291 --> 00:07:33,083
radar and it's very high
on all of those sequencing

117
00:07:33,083 --> 00:07:37,458
technology innovator radar, innovator’s radars

118
00:07:38,500 --> 00:07:41,208
like Thermo Fisher, of course,
and all of those library

119
00:07:41,208 --> 00:07:44,750
prep companies that have different
methods of library prep.

120
00:07:45,000 --> 00:07:47,666
Have they already sequenced everybody?

121
00:07:47,666 --> 00:07:52,416
So they've got a whole genome sequence
on, I think it's about 150,000 individuals

122
00:07:52,416 --> 00:07:56,500
that was primarily
led by Decode Genetics.

123
00:07:56,500 --> 00:07:58,583
As I understand it,

124
00:07:58,583 --> 00:08:01,750
the publication is pretty recent actually.

125
00:08:01,791 --> 00:08:03,833
So it's a

126
00:08:04,291 --> 00:08:06,208
there's a lot to dig into.

127
00:08:06,208 --> 00:08:09,666
It's a pretty phenomenal dataset.

128
00:08:10,000 --> 00:08:16,083
The bulk of the sequencing for most
samples, I believe is exome sequencing.

129
00:08:16,083 --> 00:08:17,000
That's my understanding.

130
00:08:17,000 --> 00:08:20,541
So I think it's it's over
450,000 individuals.

131
00:08:20,541 --> 00:08:24,125
So I think for some they've got both.

132
00:08:24,208 --> 00:08:26,541
And since it's a single payer system

133
00:08:26,541 --> 00:08:29,625
with the electronic records of NHS,

134
00:08:29,958 --> 00:08:33,708
that means they can drill down
into exactly right

135
00:08:33,708 --> 00:08:38,750
their whole exome sequence and whatever condition they may have

136
00:08:38,750 --> 00:08:40,083
And this is an ongoing thing.

137
00:08:40,083 --> 00:08:42,541
Is that right? Cancer, diabetes?

138
00:08:42,833 --> 00:08:43,375
That's right.

139
00:08:43,375 --> 00:08:46,500
And the ability to actually return results
to those patients,

140
00:08:46,500 --> 00:08:48,583
I think has evolved over time. Right.

141
00:08:48,583 --> 00:08:49,666
Because that costs money

142
00:08:49,666 --> 00:08:53,750
to set expectations,
make sure that we're we're communicating,

143
00:08:55,166 --> 00:08:57,250
you know, in a way that's best practices.

144
00:08:57,250 --> 00:09:00,625
So I think that the UK
Biobank has spearheaded

145
00:09:00,625 --> 00:09:03,666
a lot of our understanding about

146
00:09:03,666 --> 00:09:06,666
best practices there as well.

147
00:09:06,833 --> 00:09:08,666
And as far as.... Go ahead.

148
00:09:08,666 --> 00:09:11,166
Oh, I was just going to say so
back to your original question

149
00:09:11,166 --> 00:09:14,333
about what's the value of layering
proteomics onto genomics.

150
00:09:14,750 --> 00:09:20,000
This is a great example where an enormous
amount of investment has gone

151
00:09:20,000 --> 00:09:24,083
into collecting genetic information
for this very valuable population,

152
00:09:24,708 --> 00:09:30,541
with advances in diagnostics,
in guiding cancer treatment.

153
00:09:30,541 --> 00:09:33,958
So a lot of these advances
have made it to the clinic already,

154
00:09:34,000 --> 00:09:36,041
which is pretty phenomenal.

155
00:09:36,041 --> 00:09:39,416
And that's been driven,
you know, globally, it's it's exciting

156
00:09:39,791 --> 00:09:42,416
where proteomics fits in
or the way that I think of it.

157
00:09:42,416 --> 00:09:43,291
I was

158
00:09:44,083 --> 00:09:48,958
I had a conversation with Karsten Suhre,
who's at Weill Cornell in Qatar

159
00:09:48,958 --> 00:09:54,041
and in New York and he he really
I had an aha moment with him.

160
00:09:54,041 --> 00:09:58,916
He he essentially would say that
an intermediate phenotype like proteomics

161
00:09:59,625 --> 00:10:03,625
acts to magnify the effect
between genetics and disease.

162
00:10:03,625 --> 00:10:04,166
So, of course,

163
00:10:04,166 --> 00:10:07,916
we've been looking for these associations
between genetics and disease

164
00:10:09,000 --> 00:10:10,958
since
we've been collecting genetic information.

165
00:10:10,958 --> 00:10:13,041
And some of those

166
00:10:14,125 --> 00:10:16,166
links are hard to see because

167
00:10:16,166 --> 00:10:19,375
we need so many samples
to be able to see them.

168
00:10:19,791 --> 00:10:23,416
And so as we've increased the numbers
of samples like in the UK Biobank,

169
00:10:23,666 --> 00:10:27,208
we're able to make these associations
more clearly.

170
00:10:27,958 --> 00:10:32,583
And I'll say, you know, we wished early on
we hoped for smoking

171
00:10:32,583 --> 00:10:36,291
guns for a lot of diseases
and we we did see a few of them.

172
00:10:36,291 --> 00:10:36,625
Right.

173
00:10:36,625 --> 00:10:41,833
So there's certainly PCSK9
for familial hypercholesterolemia.

174
00:10:41,833 --> 00:10:44,750
There are some standard ones, BRCA for breast cancer.

175
00:10:45,625 --> 00:10:50,666
There's there's some examples where
we have a lot of penetrance or a lot of,

176
00:10:50,666 --> 00:10:55,208
you know, a lot of affect on someone's
likelihood of getting a disease

177
00:10:55,875 --> 00:11:01,083
from single variants
or single loci or single genes.

178
00:11:01,083 --> 00:11:04,708
But for most diseases like Type 2 diabetes, cardiovascular disease

179
00:11:04,708 --> 00:11:08,458
this is a death by a thousand cuts, meaning lots of variants.

180
00:11:08,458 --> 00:11:12,291
Give a little tiny effect
in changing our risk.

181
00:11:12,291 --> 00:11:16,083
And so that's where that's
where having having the ability

182
00:11:16,083 --> 00:11:19,250
to amplify
or to put a magnifying glass on those

183
00:11:19,625 --> 00:11:24,208
those relationships between genetics
and disease is incredibly useful.

184
00:11:24,208 --> 00:11:25,333
So I think, you know, a

185
00:11:25,333 --> 00:11:29,375
ton of of work has been done in proteomics
and cardiovascular disease.

186
00:11:29,375 --> 00:11:32,250
And I think many advances have happened there.

187
00:11:32,250 --> 00:11:35,541
Now getting back to the UK
Biobank.

188
00:11:35,541 --> 00:11:38,833
You mentioned before that recently
they were working with Olink

189
00:11:39,000 --> 00:11:42,916
then to look at the proteome of tens
of thousands of individuals.

190
00:11:42,916 --> 00:11:43,916
Yeah. Yeah.

191
00:11:43,916 --> 00:11:47,875
I would say it's not just the UK Biobank,
but but 13 pharma partners.

192
00:11:47,875 --> 00:11:48,166
Right.

193
00:11:48,166 --> 00:11:53,875
So, so it certainly required consent
and partnership with the UK Biobank,

194
00:11:54,208 --> 00:11:55,875
just like the exome sequencing

195
00:11:55,875 --> 00:11:59,458
consortium was was done in collaboration
with the UK Biobank.

196
00:11:59,833 --> 00:12:03,625
But the access to the technology,
just like with the exome

197
00:12:03,625 --> 00:12:07,541
sequencing consortium, access
to the technology was spearheaded

198
00:12:07,541 --> 00:12:10,250
by pharma partners that were very keen

199
00:12:10,791 --> 00:12:15,750
to build a structure for a more,
I like to say, a systematic approach

200
00:12:16,083 --> 00:12:19,791
to therapeutic target discovery,
not only biomarker discovery,

201
00:12:19,791 --> 00:12:23,541
which is sort of traditional proteomics,
but to to therapeutic target discovery,

202
00:12:23,541 --> 00:12:28,291
which I think is enabled by genetics,
proteomics as well as clinical data.

203
00:12:29,000 --> 00:12:31,750
So we're getting back to this idea
of empowering genomics

204
00:12:31,750 --> 00:12:33,541
with proteomics, right?

205
00:12:33,541 --> 00:12:35,625
What can you tell me about that?

206
00:12:35,625 --> 00:12:35,958
Yeah.

207
00:12:35,958 --> 00:12:37,166
So I think there's this,

208
00:12:37,166 --> 00:12:41,083
you know, I immediately think of that
Karsten Suhre, magnifying glass, right.

209
00:12:41,791 --> 00:12:44,291
But the, the UK Biobank’s

210
00:12:44,333 --> 00:12:48,125
initial findings
which are in a preprint that came out

211
00:12:48,125 --> 00:12:52,041
in June, middle of June, that’s on bioRxiv.

212
00:12:52,375 --> 00:12:56,250
Their initial paper
really just was scratching

213
00:12:56,250 --> 00:12:59,875
the surface of what's possible
with this enormous dataset.

214
00:12:59,875 --> 00:13:04,000
So their first paper
was about 1500 proteins.

215
00:13:04,000 --> 00:13:08,000
So our first, our first product
that Olink first product

216
00:13:08,000 --> 00:13:10,791
on the Explore platform that has the NGS readout.

217
00:13:11,375 --> 00:13:16,375
So they use that first tranche of proteins
across 54,000 samples

218
00:13:16,916 --> 00:13:19,583
and the first really the bulk of

219
00:13:19,583 --> 00:13:24,416
those data are to look at correlations
between gene regions,

220
00:13:24,958 --> 00:13:27,208
you know,
and the genotypes in those gene regions.

221
00:13:27,541 --> 00:13:28,666
And protein levels.

222
00:13:28,666 --> 00:13:30,750
So really just looking
what are the correlations?

223
00:13:30,750 --> 00:13:34,708
What's the list of all the possible
relationships between genetic regions

224
00:13:35,291 --> 00:13:38,458
and protein levels
that might be elucidated

225
00:13:38,666 --> 00:13:41,916
and examined
further in this beautiful dataset?

226
00:13:42,916 --> 00:13:43,541
What I

227
00:13:43,541 --> 00:13:46,791
you know, and I'll speculate on what
I think they're going to be doing next

228
00:13:46,791 --> 00:13:51,291
and what my guess is that
they're very deep into doing this within

229
00:13:52,041 --> 00:13:56,125
these companies is to
then do Mendelian randomization,

230
00:13:56,125 --> 00:13:58,375
which is a statistical approach to kind of

231
00:13:58,916 --> 00:14:02,458
determine which of these relationships,
which of these correlations

232
00:14:02,458 --> 00:14:05,875
between gene regions and
and protein levels.

233
00:14:06,416 --> 00:14:10,250
You know, when you put it when you bring
in the clinical data on disease,

234
00:14:10,625 --> 00:14:15,500
which of these hold up as being unlikely
to be happening by chance alone?

235
00:14:15,958 --> 00:14:21,500
So now you sort of have the the ones that
are likely just, you know, coincidence.

236
00:14:21,500 --> 00:14:24,833
I mean, they might still be important,
but but let's just I think pharma

237
00:14:24,833 --> 00:14:29,750
would like to have
ten great targets, over 400

238
00:14:30,750 --> 00:14:31,375
unsure

239
00:14:31,375 --> 00:14:34,375
targets, because that
that's a lot of rabbit holes to go down.

240
00:14:34,375 --> 00:14:37,041
So if they can narrow it down, then

241
00:14:37,500 --> 00:14:42,041
then I think there's a lot of excitement
around being able

242
00:14:42,041 --> 00:14:46,958
to have some quick wins with proteomics,
genomics and clinical data.

243
00:14:47,208 --> 00:14:49,208
Well, to back up just a little bit,

244
00:14:50,000 --> 00:14:53,125
you're talking about
Mendelian randomization, you're talking

245
00:14:53,125 --> 00:14:59,125
about genomic data in terms of a whole
exome of 10,000 people or 50,000 people,

246
00:14:59,791 --> 00:15:02,625
and now you're talking about 1500
proteins.

247
00:15:02,875 --> 00:15:04,875
Can you walk me through that a little bit?

248
00:15:04,875 --> 00:15:05,583
Yeah, sure.

249
00:15:05,583 --> 00:15:10,166
So so when you're looking
at the genetic data and here we've got,

250
00:15:11,416 --> 00:15:14,458
you know, some whole genome sequencing
data as well as exome sequencing data.

251
00:15:14,458 --> 00:15:18,750
So you can imagine you have a list of ways
or places in the genome almost like

252
00:15:19,583 --> 00:15:24,458
like geographic locations,
almost GPS coordinates in the

253
00:15:24,500 --> 00:15:28,833
on the chromosomes where we know
they vary across the samples.

254
00:15:29,125 --> 00:15:32,791
So those variable regions, we’ll call them SNPs, you know,

255
00:15:33,000 --> 00:15:37,000
that that's the term that that we use
for the simplest kind of variation,

256
00:15:37,000 --> 00:15:39,750
just single base pair variation,
but we'll just call them snips

257
00:15:40,041 --> 00:15:43,458
because, because there can be other kinds
of variation that are captured there too.

258
00:15:43,791 --> 00:15:48,208
But if you look at the variance, it's
just a single variant within the genome.

259
00:15:48,666 --> 00:15:50,125
You can look at the

260
00:15:51,166 --> 00:15:53,958
the representation of what people's
genotype

261
00:15:53,958 --> 00:15:58,625
is at that location,
and you can look at every single protein

262
00:15:58,625 --> 00:16:01,916
in that 1500 protein list and see,

263
00:16:01,916 --> 00:16:06,208
do we have a significant correlation
between the genotype

264
00:16:06,791 --> 00:16:08,000
and the protein level?

265
00:16:08,000 --> 00:16:09,500
So that's sort of the first step.

266
00:16:09,500 --> 00:16:12,250
That's a lot of tests write comparisons.

267
00:16:12,833 --> 00:16:16,500
Cindy, so I understand these SNPs can also be outside of the gene.

268
00:16:16,500 --> 00:16:17,750
Right would be also make it

269
00:16:17,750 --> 00:16:21,375
So they could be regulatory regions. Absolutely. It’s not necessarily falling into the gene

270
00:16:21,625 --> 00:16:25,250
Is there any threshold to what they are checking, what region they check in the gene?

271
00:16:25,625 --> 00:16:26,041
Yeah.

272
00:16:26,041 --> 00:16:31,583
So there's both a statistical threshold
that they accept as a as a standard.

273
00:16:31,583 --> 00:16:35,833
But also when you're doing so many tests,
you have to correct

274
00:16:35,833 --> 00:16:37,083
for multiple tests, right?

275
00:16:37,083 --> 00:16:40,375
Because the more tests you do,
the you're increasing your chances

276
00:16:40,375 --> 00:16:43,083
of seeing a false positive.
So adjusting for that

277
00:16:44,333 --> 00:16:45,750
is something that, you know,

278
00:16:45,750 --> 00:16:50,166
we go through peer review to make sure
we have best practices and agreement on.

279
00:16:50,375 --> 00:16:53,541
I mean, these statistical
associations are massive.

280
00:16:53,541 --> 00:16:57,666
I mean, in a given single individual's
whole genome, you're looking at maybe

281
00:16:57,666 --> 00:16:59,916
four million SNPs? Yeah, right.

282
00:17:00,291 --> 00:17:02,958
You have 4 million SNPs and then you’ve got

283
00:17:03,333 --> 00:17:07,375
1500 proteins you're associating
those with, if I understand you correctly.

284
00:17:07,375 --> 00:17:07,791
Yeah.

285
00:17:07,791 --> 00:17:12,583
And then you multiply this times
what they did, 54,000 individuals.

286
00:17:12,791 --> 00:17:15,333
They did 54,000. So I mean.

287
00:17:15,625 --> 00:17:18,625
And they discovered,
you know, about 10,000,

288
00:17:18,625 --> 00:17:24,166
I think it was around 10,200 relationships
between gene regions and protein levels.

289
00:17:24,166 --> 00:17:26,083
Right. That's a massive number.

290
00:17:26,083 --> 00:17:29,541
So that's those those could many of those
could just be coincidence, right?

291
00:17:29,541 --> 00:17:31,625
Just correlations, not causation. Right.

292
00:17:31,625 --> 00:17:35,208
We were all familiar with that,
that phrase.

293
00:17:35,541 --> 00:17:40,541
So so 85% of those
relationships were novel.

294
00:17:41,916 --> 00:17:43,708
Now, the relationship.

295
00:17:43,708 --> 00:17:46,666
Cindy, could they be both in cis and trans, both of them?

296
00:17:46,666 --> 00:17:48,291
Correlation or.

297
00:17:49,208 --> 00:17:49,916
What's that?

298
00:17:49,916 --> 00:17:53,666
Sarantis. Cis and trans would be like this
correlations could be there.

299
00:17:53,833 --> 00:17:54,833
So yeah.

300
00:17:54,833 --> 00:17:59,125
So these correlations can be what we call
cis or they can be in trans.

301
00:17:59,666 --> 00:18:02,583
So cis just is getting back to Dale’s question

302
00:18:02,583 --> 00:18:06,666
about whether or actually
was your question Sarantis, about whether

303
00:18:06,666 --> 00:18:10,750
these variants, these SNPs are inside genes

304
00:18:11,375 --> 00:18:13,500
or are they outside genes?

305
00:18:14,375 --> 00:18:17,458
And if they're in genes or or in close
proximity

306
00:18:17,458 --> 00:18:20,458
to the genes,
that code for the protein itself.

307
00:18:21,041 --> 00:18:21,416
Right.

308
00:18:21,416 --> 00:18:26,958
So you've got a variant
that's in a gene coding for a protein.

309
00:18:27,375 --> 00:18:30,416
If you see a correlation
that's significant between

310
00:18:30,416 --> 00:18:34,375
those two, we call that a ”cis-pQTL” and that’s a feel

311
00:18:34,375 --> 00:18:38,250
good measure that says, oh,
we must be measuring the right protein.

312
00:18:38,250 --> 00:18:42,250
If this is real,
then and there's ways to to press on it,

313
00:18:42,250 --> 00:18:46,500
to check it and validate it, of course,
with us orthogonal data, but that's it.

314
00:18:46,791 --> 00:18:52,166
So people often talk about cis-pQTL discoveries being verification

315
00:18:52,166 --> 00:18:54,791
then of of having measured
the right protein

316
00:18:55,125 --> 00:18:59,541
because of course our assay is not a Mass Spec method.

317
00:18:59,541 --> 00:19:02,083
We're using antibodies as hooks.

318
00:19:02,083 --> 00:19:05,708
We're using two antibodies
as a hook to hook

319
00:19:05,708 --> 00:19:09,208
a protein out of out of solution.

320
00:19:09,958 --> 00:19:13,375
And we have little single stranded
oligos attached to them.

321
00:19:13,375 --> 00:19:18,458
So those oligos can then hybridize,
we can extend and amplify

322
00:19:18,458 --> 00:19:22,500
that up just like any old library
prep for for sequencing.

323
00:19:22,500 --> 00:19:24,333
And then we count those oligos

324
00:19:24,333 --> 00:19:27,958
as a proxy for the original level
of the proteins in the sample.

325
00:19:28,500 --> 00:19:32,208
And so when you're doing an affinity
method, right, a hooking method

326
00:19:32,250 --> 00:19:36,083
to pull it out, not only is it
great for low abundant proteins,

327
00:19:36,083 --> 00:19:39,416
that's one of the things we add value to
with with mass spec folks.

328
00:19:39,416 --> 00:19:42,583
They, they like us because they can look
at areas of the proteome like

329
00:19:42,583 --> 00:19:46,458
they couldn't see easily with mass spec,
without tons of sample

330
00:19:46,458 --> 00:19:49,375
and a lot of a lot of control
of variability.

331
00:19:49,916 --> 00:19:53,750
So it's, it's, it's
a nice method from that perspective,

332
00:19:54,166 --> 00:19:57,125
but it's, it's a little bit indirect
because we're,

333
00:19:57,125 --> 00:20:00,291
we're pulling out the protein
and converting it to DNA signal.

334
00:20:00,625 --> 00:20:03,500
So making sure
we have a way to normalize those data

335
00:20:04,041 --> 00:20:07,916
and an end, you know, just like with mass
spec in any proteomics experiment

336
00:20:07,916 --> 00:20:10,208
to manage variability

337
00:20:11,125 --> 00:20:15,291
from batch to batch, you know,
these are important aspects that that

338
00:20:15,291 --> 00:20:20,000
proteomics scientists are much better
prepared to describe or explain than I am.

339
00:20:20,250 --> 00:20:23,000
I have come to appreciate it.

340
00:20:23,000 --> 00:20:25,541
Now, Cindy,
something that you touched upon,

341
00:20:25,583 --> 00:20:29,666
right, was the sort of drug discovery
dimension of this.

342
00:20:30,000 --> 00:20:33,916
But even before we go in that direction,
you also mentioned something

343
00:20:33,916 --> 00:20:36,291
in terms of SNPs and genes,

344
00:20:37,208 --> 00:20:40,041
the majority of GWAS, thousands, right.

345
00:20:40,041 --> 00:20:42,291
3500 GWAS studies or

346
00:20:43,250 --> 00:20:45,208
many, many, many.

347
00:20:45,208 --> 00:20:48,500
Oftentimes these SNPs that are associated with risk.

348
00:20:48,625 --> 00:20:51,958
Often
our gene deserts are there in. Right.

349
00:20:51,958 --> 00:20:53,583
There's no function.

350
00:20:53,583 --> 00:20:54,416
That's right.

351
00:20:54,416 --> 00:20:56,000
What can you comment on that? Right.

352
00:20:56,000 --> 00:20:57,958
So these pQTLs. Right.

353
00:20:57,958 --> 00:21:03,625
are SNPs, but aren’t they just random, so to speak random places in the genome?

354
00:21:04,041 --> 00:21:05,333
Yeah, so good question.

355
00:21:05,333 --> 00:21:10,791
So I'm going to reference a paper
by Lasse Folkersen and Anders Mälarstig.

356
00:21:11,291 --> 00:21:13,791
Now the two of them,
along with collaborators,

357
00:21:13,791 --> 00:21:16,958
there's a long list of authors
that I won't I won't list.

358
00:21:17,291 --> 00:21:20,750
Brilliant,
obviously across multiple cohorts,

359
00:21:21,166 --> 00:21:25,041
they have their milestone paper
within a study

360
00:21:25,041 --> 00:21:28,083
called the Scallop Study,
which is really a cohort of cohorts.

361
00:21:29,541 --> 00:21:33,916
They were doing what the UK
Biobank wants to do.

362
00:21:34,250 --> 00:21:36,791
This is me putting words in the UK Biobank’s mouth.

363
00:21:36,833 --> 00:21:41,041
But I think that the Folkersen et al. milestone

364
00:21:41,041 --> 00:21:45,500
publication, is a powerful precursor
to what the UK Biobank

365
00:21:46,500 --> 00:21:48,083
is, 

366
00:21:48,083 --> 00:21:52,000
has the possibility to do. In Folkersen et al.

367
00:21:52,041 --> 00:21:56,250
they looked at just 90 proteins
I say just although at the time in 2020

368
00:21:56,250 --> 00:21:58,708
that was a lot of multiplex proteins of course

369
00:21:59,291 --> 00:22:03,291
they looked at cardio, primarily
cardiovascular, what we what we broadly

370
00:22:03,291 --> 00:22:08,166
categorize as cardiovascular proteins
and they did the same kind of study.

371
00:22:08,166 --> 00:22:11,875
So on 30,000 samples,
they looked at 90 proteins

372
00:22:12,041 --> 00:22:15,291
with genetic, clinical and proteomic data.

373
00:22:16,333 --> 00:22:20,166
They did the correlations just like the UK
Biobank has done in their preprint.

374
00:22:20,875 --> 00:22:24,208
The 90 proteins resulted in 450.

375
00:22:24,208 --> 00:22:27,041
Yea, a little over 450 pQTLs

376
00:22:27,791 --> 00:22:31,375
Some of those are cis-pQTLs as Sarantis hints on

377
00:22:31,375 --> 00:22:31,708
Right.

378
00:22:31,708 --> 00:22:36,166
So 88% of the proteins had cis-pQTLs identified there

379
00:22:36,416 --> 00:22:39,083
That's like I said,
a feel good method, a method

380
00:22:39,958 --> 00:22:44,375
or something we can kind of point to,
to say this, this, this looks like it's,

381
00:22:45,000 --> 00:22:46,500
you know, increasing our confidence

382
00:22:46,500 --> 00:22:50,125
that we're measuring the right protein,
although there are good biological reasons

383
00:22:50,125 --> 00:22:54,833
why you might not see a cis-pQTL, but the remaining

384
00:22:56,208 --> 00:22:58,666
trans-pQTLs were

385
00:22:59,458 --> 00:23:04,375
essential discovery of trans-pQTLs is incredibly important

386
00:23:04,708 --> 00:23:06,833
to understand protein-protein interactions.

387
00:23:06,833 --> 00:23:11,625
So I may have taken a bit of a meandering
way to get back to your question, Dale,

388
00:23:11,625 --> 00:23:15,458
about these relationships but trans-pQTLs

389
00:23:16,541 --> 00:23:18,500
and figuring out where

390
00:23:18,500 --> 00:23:22,375
those are coding, you know, what proteins

391
00:23:22,375 --> 00:23:26,041
are they coding for, what gene regions
are they associated with?

392
00:23:26,125 --> 00:23:29,125
It's not a trivial matter.

393
00:23:29,125 --> 00:23:32,541
And so I've had discussions with Lasse
as well as Anderson

394
00:23:32,916 --> 00:23:35,666
or sorry as well as Anders around this

395
00:23:36,958 --> 00:23:38,000
this challenge.

396
00:23:38,000 --> 00:23:42,208
And so just to define trans-pQTLs… so as a reminder cis-pQTLs

397
00:23:42,208 --> 00:23:47,291
are where you have the gene variant is either in or near

398
00:23:47,666 --> 00:23:52,041
the gene that codes for the protein
that you're measuring.

399
00:23:52,041 --> 00:23:55,375
So the relationship between them,
the correlation is between

400
00:23:55,375 --> 00:23:59,958
the gene region and the protein itself that’s a cis-pQTL

401
00:23:59,958 --> 00:24:03,083
So you see you have say you have a particular protein

402
00:24:03,083 --> 00:24:04,750
your measuring will say

403
00:24:05,708 --> 00:24:09,916
TNF-alpha, or alpha-TNF and there’s a SNP

404
00:24:10,166 --> 00:24:12,625
Then that codes for alpha TNF

405
00:24:13,000 --> 00:24:16,791
is in the same chromosome within,
I don't know.

406
00:24:17,125 --> 00:24:19,666
A couple hundred pairs
of a. Million base pairs.

407
00:24:19,666 --> 00:24:20,333
So yeah.

408
00:24:20,333 --> 00:24:20,750
When you're.

409
00:24:20,750 --> 00:24:25,583
In the general in the general region
and so there could be

410
00:24:25,583 --> 00:24:30,291
those million base pairs
a lot of other genes, but nonetheless.

411
00:24:30,333 --> 00:24:30,708
Right.

412
00:24:30,708 --> 00:24:34,250
You're saying that that particular snip
was controlling alpha TNF.

413
00:24:34,458 --> 00:24:36,916
It's suggests that I think they might not.

414
00:24:36,958 --> 00:24:41,166
Yeah, they may not say it quite
so strongly simply because there's you.

415
00:24:41,166 --> 00:24:42,791
Know it's association.

416
00:24:42,791 --> 00:24:44,208
Yeah. Yeah exactly. It's just.

417
00:24:44,208 --> 00:24:45,833
statistical calculation. Got it.

418
00:24:45,833 --> 00:24:49,583
and so with a trans-pQTL, what that is, is you’ve got a variant

419
00:24:49,875 --> 00:24:52,833
you know,
you might have a gene coding for a protein

420
00:24:53,666 --> 00:24:55,750
and that gene might be on chromosome nine.

421
00:24:56,416 --> 00:25:00,833
But you might have the pQTL on chromosome 19.

422
00:25:01,291 --> 00:25:02,000
You know, you might have it

423
00:25:02,000 --> 00:25:05,708
on a completely different chromosome,
a correlation with that same protein.

424
00:25:06,291 --> 00:25:10,041
So the sort of Occam's Razor,
you know, the easiest

425
00:25:10,041 --> 00:25:11,958
the most straightforward possibility

426
00:25:11,958 --> 00:25:15,291
is that that there's a relationship
between those two proteins, right?

427
00:25:15,291 --> 00:25:17,500
That there’s protein-protein interaction going on there.

428
00:25:17,500 --> 00:25:21,708
And in fact, the STRING database is a publicly available database

429
00:25:21,750 --> 00:25:27,208
that records and collects and is curated around protein-protein interactions.

430
00:25:27,208 --> 00:25:30,833
And so what the team would do,
you know, in asking them how they

431
00:25:31,708 --> 00:25:34,250
how do they dig
into each of these relationships?

432
00:25:34,791 --> 00:25:38,458
And what they would do is look,
they report the closest

433
00:25:38,458 --> 00:25:42,208
gene to the location
that's in trans with this protein.

434
00:25:42,833 --> 00:25:46,125
They they report the closest gene
geographically.

435
00:25:46,125 --> 00:25:47,666
And then they also report

436
00:25:47,666 --> 00:25:51,625
because they do kind of a deep dive into surrounding genes, as you say Dale.

437
00:25:51,625 --> 00:25:55,625
There could be, you know, surrounding
genes that that might be implicated.

438
00:25:55,625 --> 00:25:58,916
They look at those other surrounding genes
and they say, you know,

439
00:25:58,916 --> 00:26:02,666
what's the shortest pathway
back to that protein?

440
00:26:02,916 --> 00:26:07,791
And that is a fascinating conversation
because once you once you put together

441
00:26:07,791 --> 00:26:12,541
a pathway analysis like that
and we talk about different diseases,

442
00:26:12,875 --> 00:26:17,083
now you've got some pathways
in, say, Alzheimer's disease

443
00:26:17,958 --> 00:26:20,208
and you’ve got some pathways in, say, schizophrenia.

444
00:26:20,208 --> 00:26:22,250
I'm just picking
two neurological diseases.

445
00:26:22,625 --> 00:26:27,416
And now if you can imagine a Venn diagram
of the pathways those two have in common,

446
00:26:28,416 --> 00:26:29,666
and that is

447
00:26:29,666 --> 00:26:33,666
an opportunity for us to understand
the mechanistic biology

448
00:26:34,000 --> 00:26:37,916
that's in common between those two
neurological diseases, if any.

449
00:26:37,916 --> 00:26:41,000
You know, I'm just picking those
out of the air.

450
00:26:41,166 --> 00:26:45,083
If we can return back to that Folkersen
landmark paper.

451
00:26:45,291 --> 00:26:46,166
Mm hmm.

452
00:26:46,708 --> 00:26:49,416
So, if I understand correctly,
there were 90 proteins.

453
00:26:49,708 --> 00:26:52,500
How many tens of thousands of samples?

454
00:26:52,500 --> 00:26:53,750
30,000 samples. Just over.

455
00:26:53,750 --> 00:26:56,916
Okay, so 30,000 samples times 90 proteins.

456
00:26:56,916 --> 00:27:00,958
And they also had like whole genome data
on those 30,000 individuals.

457
00:27:00,958 --> 00:27:03,000
Is that. Right?
They had genetic data that you could.

458
00:27:03,000 --> 00:27:04,791
So I don't know that.

459
00:27:04,791 --> 00:27:07,250
Remember, this is a cohort of cohorts.

460
00:27:07,250 --> 00:27:10,875
So I think they had
GWAS data or genotyping data,

461
00:27:10,875 --> 00:27:15,125
you know, array data on some of those
and sequencing data on others.

462
00:27:15,125 --> 00:27:18,500
I wouldn't want to represent that,
but my guess is that they

463
00:27:19,000 --> 00:27:22,041
they had variation, genetic data that they had in common.

464
00:27:22,041 --> 00:27:22,333
Right.

465
00:27:22,333 --> 00:27:23,291
Because you can convert

466
00:27:23,291 --> 00:27:27,166
a whole genome sequencing dataset to a list of variants

467
00:27:27,166 --> 00:27:28,958
Understood. And yeah. Right.

468
00:27:28,958 --> 00:27:32,541
So they had all the genetic data
of 30,000 individuals.

469
00:27:32,541 --> 00:27:34,541
They looked at these 90 proteins

470
00:27:35,000 --> 00:27:38,458
and then you mentioned that they're able
to connect it then to disease.

471
00:27:38,666 --> 00:27:41,250
Yeah. So so you do the same thing.

472
00:27:41,250 --> 00:27:44,916
This looked at relationships
between genetic,

473
00:27:45,333 --> 00:27:47,375
you know, state and protein levels.

474
00:27:47,375 --> 00:27:49,708
So you look for all those correlations.

475
00:27:49,708 --> 00:27:54,166
In this paper, they found 450 pQTLs that exceeded

476
00:27:54,166 --> 00:27:55,916
their significance threshold.

477
00:27:55,916 --> 00:27:59,375
And you could, you know,
as you touched on before there, you know,

478
00:27:59,375 --> 00:28:01,750
that's why we have peer review
to make sure that we're not

479
00:28:02,625 --> 00:28:06,208
that we're held accountable
for the number of tests that we're doing,

480
00:28:06,208 --> 00:28:08,375
that we're you know,
we're really trying to be

481
00:28:08,375 --> 00:28:12,458
as as transparent as possible
and publishing these data.

482
00:28:12,458 --> 00:28:16,041
And by the way, it was published
in Nature Metabolism in 2020.

483
00:28:16,666 --> 00:28:17,666
So once you

484
00:28:17,666 --> 00:28:21,416
you see all the correlations, imagine
you have this list of correlations.

485
00:28:21,958 --> 00:28:24,958
You can layer those clinical data then in.

486
00:28:24,958 --> 00:28:27,416
So now you know the disease information

487
00:28:28,000 --> 00:28:30,833
and you can look
at these different sets of data.

488
00:28:30,833 --> 00:28:33,750
So genetics, proteomics and disease

489
00:28:34,333 --> 00:28:37,916
and you can sample from these
and determine how often

490
00:28:37,916 --> 00:28:40,833
with the relationships
between three of these units,

491
00:28:41,250 --> 00:28:43,916
how often would that happen
by chance alone?

492
00:28:43,916 --> 00:28:45,333
If it would happen by chance alone?

493
00:28:45,333 --> 00:28:46,250
Quite often.

494
00:28:46,250 --> 00:28:48,500
Then we let that fall away.

495
00:28:49,166 --> 00:28:52,708
If it seems quite unusual
to see these relationships,

496
00:28:52,708 --> 00:28:56,750
then those are the ones that we elevate
to potential causality.

497
00:28:57,125 --> 00:29:01,583
And so in this paper they elevated
from the 450 relationships correlations.

498
00:29:01,583 --> 00:29:06,208
They elevated 25
that they suggest appear causal.

499
00:29:06,791 --> 00:29:10,916
And some of those examples
I think, are validated.

500
00:29:10,916 --> 00:29:17,500
All I know are validated clinical targets
for existing therapies, super exciting

501
00:29:17,500 --> 00:29:20,500
because then it's like, oh, looks like
we're on the right track, right?

502
00:29:21,000 --> 00:29:22,583
And then of course some novel findings.

503
00:29:22,583 --> 00:29:27,000
So they they report 14 validated
clinical targets,

504
00:29:27,000 --> 00:29:30,750
known clinical targets like CASP-8 in breast cancer was one of them that

505
00:29:31,791 --> 00:29:32,833
I can think of.

506
00:29:32,833 --> 00:29:36,625
So CASP-8 is something known already before to be involved in breast cancer?

507
00:29:36,625 --> 00:29:39,458
That’s right. And then they rediscovered it?

508
00:29:39,458 --> 00:29:43,083
Yea, CASP-8 is a known therapeutic target in breast cancer.

509
00:29:43,083 --> 00:29:44,041
I see.

510
00:29:44,041 --> 00:29:48,416
And then 11 of those were novel,
so they were not able to see any evidence

511
00:29:48,416 --> 00:29:53,541
of 11 of their findings that elevated
again to causality, potential causality.

512
00:29:54,000 --> 00:29:57,750
And those are the exciting ones
for for a new programs potentially

513
00:29:58,375 --> 00:30:00,833
and then and then 18

514
00:30:00,875 --> 00:30:04,666
they they reported
18 potential repurposing opportunities.

515
00:30:04,666 --> 00:30:08,750
So that's super exciting to me
because if you've got an existing drug

516
00:30:08,750 --> 00:30:13,083
for one indication, say tocilizumab for rheumatoid arthritis

517
00:30:13,500 --> 00:30:16,041
and yeah have you
you have the possibility of then

518
00:30:16,333 --> 00:30:20,500
using that in a different indication that
that would be a repurposing opportunity.

519
00:30:20,500 --> 00:30:22,166
So for example in eczema.

520
00:30:23,125 --> 00:30:23,875
I guess it

521
00:30:23,875 --> 00:30:30,083
doesn't make sense to think about
using an anti rheumatoid arthritis drug.

522
00:30:30,166 --> 00:30:30,458
Right.

523
00:30:30,458 --> 00:30:32,583
That's on market to treat eczema.

524
00:30:32,583 --> 00:30:35,541
That's just I mean.

525
00:30:35,541 --> 00:30:38,125
There's one in clinical trials.

526
00:30:38,125 --> 00:30:41,250
I mean, coming back to the cohorts, Cindy,
I think also

527
00:30:41,750 --> 00:30:45,500
the fact that these cohorts there
from different geographical places

528
00:30:46,333 --> 00:30:50,375
increase the possibility to illuminate, for example, biases on SNPs.

529
00:30:50,375 --> 00:30:51,083
Right.

530
00:30:51,375 --> 00:30:54,416
Did you have any discussion with the authors about that?

531
00:30:54,750 --> 00:30:57,416
Do they ever consider
that the bias, geographical bias

532
00:30:57,583 --> 00:30:59,666
may influence their data?

533
00:30:59,666 --> 00:31:01,041
Can you comment on this?

534
00:31:01,041 --> 00:31:02,708
Yeah, it's a great question.

535
00:31:02,708 --> 00:31:05,791
So they primarily

536
00:31:05,791 --> 00:31:08,458
represent northern European populations.

537
00:31:08,916 --> 00:31:13,250
There were there was some representation
of Asian populations in there,

538
00:31:14,458 --> 00:31:16,500
but not a not a lot.

539
00:31:16,500 --> 00:31:20,791
And I'm trying to remember, I don't think
there was any African diaspora

540
00:31:21,208 --> 00:31:22,166
in this milestone

541
00:31:22,166 --> 00:31:25,416
paper in the subset of samples
that they had in this milestone paper.

542
00:31:25,916 --> 00:31:29,416
So that's that's a
you know, it's a blessing and a curse.

543
00:31:29,416 --> 00:31:30,333
Right. For them.

544
00:31:30,333 --> 00:31:35,708
It eases the analysis to your point
for the opportunity to make discoveries

545
00:31:35,708 --> 00:31:40,041
because of diversity
within the ancestry of our genomes.

546
00:31:40,041 --> 00:31:41,333
It’s a “miss”, right?

547
00:31:41,333 --> 00:31:45,291
And an enormous potential future
opportunity, which I think

548
00:31:45,833 --> 00:31:50,125
is very exciting and very important
for equity in health care.

549
00:31:50,333 --> 00:31:51,541
I mean, essential.

550
00:31:53,000 --> 00:31:54,958
So we
have to start somewhere, though, right?

551
00:31:54,958 --> 00:31:59,250
So we start with the populations
that we have.

552
00:31:59,250 --> 00:32:01,333
It's fascinating thinking about

553
00:32:02,291 --> 00:32:05,583
the 90 proteins,
all the different things that discovered.

554
00:32:05,583 --> 00:32:08,958
Right, these 25 drug targets for.

555
00:32:08,958 --> 00:32:12,083
That explains
why the pharma interests in the UK Biobank

556
00:32:12,583 --> 00:32:15,208
by doing the extrapolation.

557
00:32:15,208 --> 00:32:16,708
Have you done the extrapolation?

558
00:32:16,708 --> 00:32:19,791
How many drug targets they expect. Yeah.

559
00:32:19,791 --> 00:32:23,500
So with this, you know it’s around five and a half percent of the pQTLs

560
00:32:23,541 --> 00:32:26,666
discovered in Folkersen
et al, converted

561
00:32:26,666 --> 00:32:29,750
to, you know, potentially causal. Interesting.

562
00:32:29,958 --> 00:32:34,250
So if we applied that same percentage
which is lofty, right, that's

563
00:32:35,333 --> 00:32:37,333
is a lot of proteins and,

564
00:32:37,333 --> 00:32:42,291
and these 90 in Folkersen et al were well

565
00:32:43,291 --> 00:32:47,458
studied you know considering across
30,000 samples so you know I would

566
00:32:47,541 --> 00:32:51,500
I would expect maybe four, four
and a half percent to maybe

567
00:32:51,500 --> 00:32:54,208
5% converting.

568
00:32:54,541 --> 00:32:58,166
In this initial set of proteins, I think to be a little conservative,

569
00:32:58,166 --> 00:33:01,458
you know, not not trying to be
too bullish, but even with that, we're

570
00:33:01,458 --> 00:33:07,750
talking about potentially listing off
causal markers to examine, to investigate

571
00:33:08,208 --> 00:33:11,500
potentially causal markers of,
you know, around 500.

572
00:33:12,250 --> 00:33:13,666
So that.

573
00:33:13,666 --> 00:33:17,166
Five hundred potential drug targets.”

574
00:33:17,166 --> 00:33:19,291
Potential therapeutic targets.
That's right.

575
00:33:19,333 --> 00:33:23,666
And to be fair, some of these might
show up as potential therapeutic targets

576
00:33:24,041 --> 00:33:28,041
that would never be considered if they're
in signaling pathways, for example.

577
00:33:28,250 --> 00:33:30,000
So so it's up to pharma.

578
00:33:30,000 --> 00:33:31,791
And certainly people that are

579
00:33:32,750 --> 00:33:33,833
are more versed in

580
00:33:33,833 --> 00:33:38,625
clinical trials and potential,
you know, pathways

581
00:33:38,625 --> 00:33:43,041
for these and implications of side effects
to then up score and down score these.

582
00:33:43,500 --> 00:33:46,041
But the exciting aspect of this

583
00:33:46,041 --> 00:33:49,125
is to have a systematic approach
by which to do that,

584
00:33:49,250 --> 00:33:53,500
to actually make that list of 500
and then up score some and start programs.

585
00:33:53,500 --> 00:33:55,500
Because we like to say

586
00:33:55,500 --> 00:34:00,416
that clinical trials are twice
as likely to be successful.

587
00:34:00,875 --> 00:34:03,791
If you go into that trial
with genetic information

588
00:34:03,791 --> 00:34:06,958
that's certainly, you know, been published
and we like to say that

589
00:34:08,083 --> 00:34:11,208
adding proteomic data,
I'd really love to see

590
00:34:12,208 --> 00:34:15,458
what that means for our potential for

591
00:34:15,791 --> 00:34:18,958
for improving our ability
to be successful in clinical trials.

592
00:34:19,250 --> 00:34:24,583
And these 13, I guess if you take those 500 targets (or potential drug targets)

593
00:34:24,583 --> 00:34:28,583
divided by 13 different pharma partners,
that's like, what, 35 apiece.

594
00:34:28,791 --> 00:34:30,916
Yeah, that's right.
That's a lot of programs.

595
00:34:30,958 --> 00:34:32,375
That's a lot of programs.

596
00:34:32,375 --> 00:34:35,458
I mean, that's going to be
a wealth of data for them.

597
00:34:35,458 --> 00:34:39,958
Now, I understand why they would
invest in such a project

598
00:34:41,041 --> 00:34:42,958
when what is the next step

599
00:34:42,958 --> 00:34:47,458
then in the UK Biobank project
and how people find out more about it?

600
00:34:47,750 --> 00:34:48,708
Good question. Yeah.

601
00:34:48,708 --> 00:34:54,041
So the what I fully expect
and I know of at least at least eight

602
00:34:54,041 --> 00:34:57,791
abstracts that have been submitted
for ASHG this year.

603
00:34:57,791 --> 00:35:00,833
Now ASHG, American Society
for Human Genetics,

604
00:35:00,833 --> 00:35:03,000
as I mentioned earlier, will be

605
00:35:04,041 --> 00:35:06,208
in Los Angeles in October.

606
00:35:06,208 --> 00:35:11,000
And so I know that those pharma partners
and the researchers within those pharma

607
00:35:11,000 --> 00:35:14,958
partners are submitting abstracts
to present there

608
00:35:15,291 --> 00:35:19,375
and I'm sure some of them will get oral
or oral presentations.

609
00:35:19,375 --> 00:35:21,500
Many of them will get poster
presentations.

610
00:35:21,500 --> 00:35:25,666
But I will be keeping a close eye on that
and I will absolutely be there.

611
00:35:25,666 --> 00:35:28,041
And I think we should do a podcast
episode.

612
00:35:28,208 --> 00:35:29,000
There you go.

613
00:35:29,000 --> 00:35:32,750
You will have a post ASHG.

614
00:35:32,750 --> 00:35:34,500
This is what I got out of it.

615
00:35:34,500 --> 00:35:35,750
That would be great.

616
00:35:35,750 --> 00:35:39,875
And maybe drag a few guests on
if if we can.

617
00:35:39,875 --> 00:35:42,458
That's great. Yeah, that'd be great. Yeah.

618
00:35:42,750 --> 00:35:46,416
And as far as what I think is next, think
they're going to be digging into these

619
00:35:46,916 --> 00:35:50,250
these correlations, 85% of them novel.

620
00:35:50,250 --> 00:35:53,916
So roughly 8000 novel relationships
between genetic regions

621
00:35:53,916 --> 00:35:55,333
and protein levels.

622
00:35:55,333 --> 00:35:58,291
They're going to be looking into
which of those are appear

623
00:35:58,291 --> 00:36:00,500
causal within certain diseases.

624
00:36:01,291 --> 00:36:03,125
Do you know when that will be available?

625
00:36:03,125 --> 00:36:05,166
Publicly-available data?

626
00:36:05,583 --> 00:36:08,541
How how scientists can have access to
these?

627
00:36:08,541 --> 00:36:11,291
Is it a easy process or a difficult process
to have access on that?

628
00:36:11,750 --> 00:36:12,041
Yeah.

629
00:36:12,041 --> 00:36:16,250
So as, as you probably know, Sarantis,
but our listeners may not know the UK

630
00:36:16,250 --> 00:36:19,583
Biobank data through a data use agreement

631
00:36:20,541 --> 00:36:22,916
is, is broadly available.

632
00:36:22,916 --> 00:36:28,000
So this is one of the, the reasons there's
so much use of those data as validation

633
00:36:28,000 --> 00:36:31,541
data and for discoveries with very clever

634
00:36:31,958 --> 00:36:34,875
informatics scientists and biologists to

635
00:36:35,833 --> 00:36:36,208
think of

636
00:36:36,208 --> 00:36:38,833
creative ways to use such a large dataset,

637
00:36:39,458 --> 00:36:43,041
the proteomics data,
the first set of proteomics data.

638
00:36:43,041 --> 00:36:48,208
So the first 1500 proteins, the subject of the June bioRxiv paper

639
00:36:48,583 --> 00:36:50,958
Those data have been stated

640
00:36:51,208 --> 00:36:54,041
that they will be publicly available
by the end of the year.

641
00:36:54,958 --> 00:36:59,166
So I expect, you know, by October at ASHG
we'll know better.

642
00:36:59,166 --> 00:37:01,416
The timing for that,

643
00:37:01,416 --> 00:37:05,041
yeah, those pharma partners,
of course have had access to those data

644
00:37:05,041 --> 00:37:08,375
as they should, which is why
they were able to publish that,

645
00:37:09,125 --> 00:37:11,416
that paper so quickly.

646
00:37:11,416 --> 00:37:16,375
And so the next tranche of data
for the full 3000 proteins.

647
00:37:16,375 --> 00:37:18,083
And can I just say, you know,

648
00:37:18,083 --> 00:37:22,041
you see what's possible
with 90 proteins and Folkersen et al.

649
00:37:22,041 --> 00:37:26,125
Imagine what's possible,
you know, with 30,000

650
00:37:26,125 --> 00:37:30,125
proteins,
3000 proteins and 54,000 individuals.

651
00:37:30,125 --> 00:37:33,833
That's a lot of power to deduct
relationships between proteins and

652
00:37:34,166 --> 00:37:37,000
and many proteins
that really just haven't had assays

653
00:37:38,333 --> 00:37:39,875
for, for examining them.

654
00:37:39,875 --> 00:37:42,125
So just such a such an opportunity
for discovery.

655
00:37:42,958 --> 00:37:47,000
We touched upon yeah,
we touched upon the enormous investment

656
00:37:47,000 --> 00:37:51,000
made to-date to collect these 500,000 samples

657
00:37:51,000 --> 00:37:51,458
That's right.

658
00:37:51,458 --> 00:37:54,125
And to follow up
and all those like genetics.

659
00:37:54,125 --> 00:37:54,666
Yeah.

660
00:37:54,666 --> 00:37:57,791
The whole genome,
whole exome data on all these individuals

661
00:37:58,208 --> 00:38:03,208
and then now overlaying empowering
the genomics with the proteomics.

662
00:38:03,208 --> 00:38:06,875
It's as if we're a part of something
that is the next

663
00:38:06,875 --> 00:38:09,916
big thing in genetics is proteomics.

664
00:38:09,916 --> 00:38:12,416
I think it's you know,
and when you think about the

665
00:38:13,458 --> 00:38:15,500
the central dogma of biology, right?

666
00:38:15,500 --> 00:38:16,583
You've got DNA.

667
00:38:16,583 --> 00:38:19,708
RNA, we've done
a great job of looking at DNA.

668
00:38:19,708 --> 00:38:23,458
RNA has been our proxy for time biology
for a long time because it was

669
00:38:23,458 --> 00:38:28,583
it was available to
to look at with sequencing technologies.

670
00:38:28,583 --> 00:38:33,041
In fact you and I Dale, I think have talked about how the RNA-Seq

671
00:38:33,083 --> 00:38:36,250
and the ability to do what we call “digital gene expression” sold

672
00:38:36,250 --> 00:38:38,375
many of those initial instruments

673
00:38:39,333 --> 00:38:42,291
that were,
you know, next generation sequencing instruments.

674
00:38:42,958 --> 00:38:45,875
But now we have this
this ability to measure

675
00:38:46,250 --> 00:38:49,875
proteins directly in a
in a very scalable way.

676
00:38:50,250 --> 00:38:53,125
And I am excited, as you

677
00:38:53,125 --> 00:38:55,208
know, about this capability,

678
00:38:56,250 --> 00:39:00,333
but it's really the researchers
and what they can do with it

679
00:39:00,333 --> 00:39:05,083
that will tell us the true
potential of this. Super.

680
00:39:05,333 --> 00:39:06,666
Well. Thank you, Cindy,

681
00:39:06,666 --> 00:39:10,041
for sharing your thoughts
on empowering genomics with proteomics.

682
00:39:10,458 --> 00:39:12,208
And we'll see you soon.

683
00:39:12,208 --> 00:39:12,958
That was great.

684
00:39:12,958 --> 00:39:15,083
Thank you very much.

685
00:39:19,041 --> 00:39:22,833
Thank you for listening to the Proteomics in Proximity podcast brought to you

686
00:39:22,833 --> 00:39:26,833
by Olink Proteomics. To contact the hosts or for further information

687
00:39:27,083 --> 00:39:31,750
simply email: info@olink.com.