How AI Is Built

Transcript

Ever wondered why your vector search becomes painfully slow after scaling past a million vectors? You're not alone - even tech giants struggle with this.

Charles Xie, founder of Zilliz (company behind Milvus), shares how they solved vector database scaling challenges at 100B+ vector scale:

Key Insights:

Multi-tier storage strategy:
- GPU memory (1% of data, fastest)
- RAM (10% of data)
- Local SSD
- Object storage (slowest but cheapest)
Real-time search solution:
- New data goes to buffer (searchable immediately)
- Index builds in background when buffer fills
- Combines buffer & main index results
Performance optimization:
- GPU acceleration for 10k-50k queries/second
- Customizable trade-offs between:
  - Cost
  - Latency
  - Search relevance
Future developments:
- Self-learning indices
- Hybrid search methods (dense + sparse)
- Graph embedding support
- Colbert integration

Perfect for teams hitting scaling walls with their current vector search implementation or planning for future growth.

Worth watching if you're building production search systems or need to optimize costs vs performance.

Charles Xie:

LinkedIn
Zilliz
Milvus
Milvus Discord

Nicolay Gerold:

⁠LinkedIn⁠
⁠X (Twitter)

00:00 Introduction to Search System Challenges 00:26 Introducing Milvus: The Open Source Vector Database 00:58 Interview with Charles: Founder of Zilliz 02:20 Scalability and Performance in Vector Databases 03:35 Challenges in Distributed Systems 05:46 Data Consistency and Real-Time Search 12:12 Hierarchical Storage and GPU Acceleration 18:34 Emerging Technologies in Vector Search 23:21 Self-Learning Indexes and Future Innovations 28:44 Key Takeaways and Conclusion

What is How AI Is Built ?

Real engineers. Real deployments. Zero hype. We interview the top engineers who actually put AI in production. Learn what the best engineers have figured out through years of experience. Hosted by Nicolay Gerold, CEO of Aisbach and CTO at Proxdeal and Multiply Content.

Nicolay Gerold: Search systems come
with its own set of challenges.

The data is often too large to be.

Stored in a single note, we often need
to handle 10,000 to 50,000 queries.

The indices are very slow to build.

But we still.

I want to search the fresh data.

So we have a bunch of
different bottlenecks.

The next or trade-offs
costs, latency scale fresh.

Freshness of data, high throughput.

So how can we solve this?

Milvus is the open
source specter database.

That gives you the necessary levels.

To pick your trade-offs and solve
the bottlenecks you care about.

In your application.

You want no cost?

Place more of your data and objects.

It's storage.

You want higher throughput
at GPU acceleration.

You care.

About the fresh data.

Create a buffer that
stores to new data inquiry.

From it.

And use your main database for the older.

Data and basically just
combine the results.

Today we are.

Continuing with our series on search.

And today.

On how AI is spilled.

We are talking to
Charles , who is the founder?

And CEO of Zilliz the
company behind Milvus.

Charles, previously worked at
Oracle as a founding engineer

of the 12c cloud database.

And we talk about the bottlenecks
you will face in your vector database

and how tries to solve them through
mutli-tier storage, GPU acceleration

and we also get a glimpse at the future.

Self-learning indices for.

Charles Xie: I think a lot of organization
from smaller startup to large enterprise.

They are all looking for.

Vector database solution, but their
requirement is a little bit different.

So for for early stage startup for smaller
groups they care about easy to use.

So basically they want to.

Get the application up running
with the least amount of time.

But for enterprise, they
care about performance.

They care about scalability.

They care about the maintenance of the
system, how you easy to maintain at a

large scale, and also security compliance.

And how you integrate with the ecosystem
to take the data from the upstream.

Nicolay Gerold: And in terms of scale,
can you tease a little bit what is

like the largest vector database in
terms of like dimensions plus amount

of vectors you've scaled up today?

Charles Xie: yeah.

So for the largest one, they
get a like 100 billion scale.

So basically, that's that pretty
much like the Internet scale.

So on the Internet, we probably
have like 1000 billion, several

thousand billion of vectors issue.

If you want to index the whole Internet.

So we saw some the largest
deployment in that range.

And also we saw that there are a
lot of companies, they are doing

vector simulated search on on a
billion scale 10 billion scale.

And when it comes to dimension what we
found is that like three, four years ago,

they started with like 512 dimensions.

And now they are in in between 1,
000, 2, 000 each dimensions depends

on the application they are building.

Nicolay Gerold: And what are the primary
bottlenecks in your opinion, when a vector

database has to scale up to that size?

Charles Xie: So the biggest
challenge is that you have to build

a scalable system from bottom up.

I want to take one step
back to give you an analogy.

So if we look at the traditional
relational database ecosystem.

You will see that it was 20, 30 years ago.

We have we have a postgreSQL.

We have a MySQL.

And both of them are Single instance.

And so it's a very easy to use.

They get a lot of popularities.

But when the data volume of the of
the all the organization grows people

may need solutions that that can
accommodate a large volume of data.

So that's why we have we have a big data.

We have a lot of solution
to scale your database.

And so in the very beginning, there
comes a very easy idea is that you add a

sharding schema on the application layer.

So there are a lot of solution on top of
MySQL Postgres to, to, to basically to

basically partition the data into shards.

And and then you can do load balancing
on top of it on the application layer.

But this solution has some drawbacks.

So first of all it's, most of the
solution is going to be like single,

single write and multiple reads.

And then because you only get a
single loads or single instance

for for data modification.

So it's going to be your bottleneck
and but but if you want to.

Remove the bottleneck, you
have to build a system.

You have to build a algorithm to, to
support this distributed consistency.

And it is going to be a
lot of challenge on that.

So we have this RAF proto implementation
to support distributed consistency.

And also when you are trying to build
a scalable system, you add another

layer of complexity in network in
communication and with which will

further complicate the system.

And you have to consider that how
you do load balancing, how you do

data replica, how you do data fail
over in the data recovery, but

all in a distributed environment.

In the vector database space, in
all this complexity also applies.

And so we have to handle all this
distributor data consistency.

You have to and what makes things
more challenges That vector

data actually is very large.

So if you look at a single entry of
embedding vector actually is pretty much.

Bigger then and then a typical entry in
a traditional relational database system.

As I mentioned that typical
embedded vectors, they could be

1000 or even 2000 dimensions.

And so to transmit all this data
over in a di in a distributed

environment, it could be challenge.

And and also if you want to to
have this data consistency and also

without sacrificing the performance.

It's going to be a lot
of challenge over there.

Nicolay Gerold: Yeah, and we moved a lot
into the distributed system space already.

So for those who aren't familiar with
especially all the data data engineering

stuff, Raft is basically an algorithm to
reach consensus between different nodes.

So if you have like a writes and I
think it's easier to explain, but like

a transactional database, like if you
have multiple transactional leaders

distributed across the globe, and
you ride like a new entry for a bank

transaction, that's happening into one
of them, the other nodes have to be.

Basically up to date with a node
who has been written to, and you

basically have to reach the consensus.

And you also have to consider like
the second part, basically the

networking that the network might
fail or might take longer to basically

communicate between the different
nodes, which is like the additional

complexity in distributed systems.

Nice.

And especially for Raft, can you go
a little bit into the indexing part?

Especially because we don't really
have the lightweight writes.

You typically see in transactional
databases, but you have heavy vectors.

So how are the heavy vectors
basically distributed and how

does the consensus mechanism work
exactly between the different nodes?

Charles Xie: So basically every
vector database system may take

a different approach, a slightly
different approach or totally different

approach to resolve this problem.

But but as it is.

So the vector database system we build
so the open source Milvus we have support

different kinds of a data consistency.

So we.

So by default, it's going to be eventually
consistency, but but but if you really

want to have a strong consistency
you can also support that, but with

a slightly loss of the performance.

So it's going to be a little bit
slower, but but we do support

different consistency level.

And because because we think that's
important A lot of a lot of a company

that they had that they are building
vector database solutions for different

scenario and some of them care a lot
about performance and they want to

real time data visibility and for those
cases, eventually data consistency

maybe a good solution for that.

And as for some of the for example,
they are building this fraud detection

and also they are, they are doing this
real time fraud detection for financial

institutions and and they care a lot
about stronger data consistency,

and we can also support that.

Nicolay Gerold: Yeah.

And you build on top of HNSW, the index.

HNSW, especially at large scale, it
takes a while to re index when new

vectors are inserted into the database.

How do you actually support these
like more real time scenarios?

So where you basically have a read
in close proximity to a write.

Charles Xie: Yeah so first of all,
for all the data we ingest into the

system we will have them to go through
basically a distributed logging system.

So at the moment we are using either Kafka
or our party portal for this purpose.

And so every single data, we will put
them into a distributed write ahead log.

And we're trying to use this distributed
WAL (Write-ahead-log) to support all

these different kinds of data consistency.

And and other than that as you mentioned
that for real time data visibility and

to make sure that our data are visible
in real time or searchable in real time

without a sacrifice of performance.

We have two access paths for for the data.

So one is for this for this fresh data.

So for fresh data they first
we put them into Into a buffer.

But for the data in this buffer we
don't have to build indexing algorithm

so basically for the data in this buffer.

Buffer.

We just do a brute force search and to
make sure that the data is going to be

accessible and searchable in real time.

But but when the data grows
in this buffer, when it

reach at a certain threshold,

so once you reach out this threshold
it will trigger the index building.

And on the backend.

And so system, we started building
the index and it could be HNSW it

could be this IV pq and it could
be other indexes behind the scene.

And but so this is pretty.

So this is similar to the strategy
adapted by a lot of traditional big

data or database system like like
Cassandra also have this a SMT structure.

So they put data in the
buffer and then they.

They build this hierarchical
merge tree and and to gradually

build or rebuild the indexes.

Nicolay Gerold: Yeah, and I think what
you've seen, especially in the last year

as well, with something like LanceDB
to be that a bunch of companies are

starting to adopt a hybrid strategy of
in memory indices and disk based indices.

At Zilliz Is there something similar?

And if yes how do you actually decide
which data goes into which index?

Silence.

Charles Xie: We will be doing this hybrid
search for quite a while since year 2019,

but we actually we take one step further.

So it's not only about like
memory versus a local disc.

We take one step further.

So we have built this hierarchical storage

we have four layers of data storage.

At the bottom we can use
the, all this object store A3

storage for to store the data.

And then on top of it, we can use
local disk as, to, so for example,

to use this M-M-E-M-E disc to support
high more efficient disc data access.

And on top of it we have a memory
and, and also for some user

scenarios that we can even catch
the data into your own chip memory.

So for example it was three years ago, we
started this collaboration with NVIDIA.

So we are using GPU to accelerate
vector similarity search.

So for, so if you are using a
GPU, we can cache some amount

of data in the GPU memory.

So if you take a top down perspective,
you will see that so it's going to

be GPU memory as a high speed cache.

Then you have the memory then you
have a local disk, and then you

have a distributed object store.

So if you walk down into the
hierarchy you will see that

you can put more and more data.

But but the latency may increase.

Basically you are doing a trade off
between the volume of data you can put

into a system and versus the performance.

If you want a higher
performance, you may want to.

Build a system with more memory
with more high speed local disk.

But but if you want to you want to care
about cost efficiency you are building.

Pretty much like an offline analysis
application and the performance doesn't

matter too much you can definitely
use cheap machines that are with less

memory with less local disk, but with
a larger object store and so basically

we are giving our customer, our users
the opportunity to configure their own

vector database system in a way that they
can make a tradeoff between performance.

Consistency accuracy and
also cost efficiency.

Nicolay Gerold: That's really interesting.

And when it comes to the.

GPU are you doing GPU optimized indexes
as well for the search, or are you only

using the GPU to basically do a faster
or batch processing of the vectors?

Charles Xie: The idea is using
the computation power of GPU to

accelerate and the index building
part, but as well, the search part.

So for both part we have been working with
the NVIDIA to work on this GPU friendly

or GPU customized indexing algorithm.

So for example and there's a algorithm
called RAFT which can support,

high performance index building
and also indexing serving on GPU.

This is also part of this new
library released by NVIDIA

called q Vector computing.

And also you have to optimize data
transmission between memory and the GPU.

Because most likely it's
going to be the bottleneck.

The bandwidth between local memory
to the GPU is going to be limited.

So we have to implement a lot of
algorithm to do data prefetching,

data caching, to minimize the data
transmission between CPU and the GPU.

Nicolay Gerold: Yeah, not even just if
the indexes of the data can be stored

in like multiple different tiers,
not just the CPU and the GPU, but the

different layers of storage as well.

Charles Xie: Let me give you
some more detail about it.

So as I mentioned that GPU is
limited in some memory, but it

get a lot of computation capacity.

So that's why having this hierarchical
data storage is going to be important.

So basically we probably cash or store
most of the data on your local disk or in

your object store and we probably store
like one 10th of the data in local memory.

And also on top of that, we probably
store one out of 100 in the GPU memory.

You can accommodate the scalability of
the of the data support a massive amount

of data, but also you can achieve super
high performance because, GPU only store

a very small percentage of the data.

But you have to make a lot of
modifications of your indexing algorithm.

Basically it's going to be a
hierarchical indexing algorithm.

So on GPU, you just do some fast
probe to, to find that okay for this

particular search which partition or
which sector of data, which class of

vector data we should do the search.

And and also on the GPU on the
local memory you catch more

information and you can support more
granularity so real the real vector

search happens on the disk level.

So you have just a hierarchical indexing
to make sure that you can make best

use of a GPU CPU in a local disk.

Nicolay Gerold: At which scale of data
have you actually noticed that the

GPU brings a real performance benefit?

So it will give like a benefit already
at like small scales of data, but

you will probably not see like a
large impact in performance boost.

Charles Xie: Yeah.

I would say it's not a, at
a which scale is more like.

It's more what do you want of
from your application perspective.

So basically what do we observe is that
GPU acceleration is good for scenarios

that you need a high throughput.

So you need you need
like thousands of qps.

So you want to have 10,000 or
even 50,000 query per second.

So that's where GPU is going to shine.

You basically send a lot of data to
the GPU processor for batch processing

and you get all the results in a batch.

Nicolay Gerold: New embeddings
especially like Colbert, ColPali,

but also Matryoshka embeddings.

What challenges do the new type of
embeddings actually present for vector

search similarity systems and also
the new type of calculations you are

actually doing with these embeddings?

Charles Xie: So this is
definitely emerging technology

in the vector search space.

And back in the past five years, 10 years,
we saw a lot of our application about

just search and and also when when we are
talking about a similarity a lot of people

are defaulting to cosine similarity.

At the moment that, we are in the
process to supporting all just a

different kind of a new vector search
operators into our vector database

We are going to support ColBERT and we
are going to support a sparse index.

And we are going to also support,
re ranking and customize scoring

functions and things like that.

Nicolay Gerold: For the calculations
within Colbert for the late interactions,

what kind of new indexes or what kind
of optimizations to the existing ones

are actually necessary to perform
those efficiently at large scale.

Charles Xie: So first of all compared
to cosine distance ColBERT is going

to be more computationally intensive
which will have an impact on the scale

of the distributed system, so you may
need a large cluster to support that.

And also like a ColBERT also
means that you are transmitting

more data, so a single entry
made of a higher volume of data.

We are doing tons of optimization
to accommodate the computation

and also data transmission

Nicolay Gerold: yep.

And have you actually already played
around with additional optimizations?

Especially in research, there's a lot
of talk, for example, about pruning,

removing the less relevant tokens, for
example quantization, of course, but

also caching a few of the embeddings,
especially the more important ones.

What are you actually considering
beyond those to optimize

the Colbert calculations?

Charles Xie: We have we are supporting all
these approaches and this is still some

ongoing work and so we are not sure which
approach works the best for us because

again, me it's a distributed system and
a lot of things going to be much more

complicated in a distributed environment
and we're definitely exploring all these

approaches, but we haven't decided yet.

Nicolay Gerold: On the sparseness,
as you mentioned, is this among

others SPLADE because there would be
really interesting, like how are you

handling the expansion at query time?

Because this is always something
that really bogged me about SPLADE

that it's really hard to implement
the expansion at query time to

actually make it really useful.

Charles Xie: Yeah so we also think this
sparse index is very important because you

can also you basically add another other
other way for us to to search the data.

You can combine it with a similarity
search and also dense vectors.

You can you have to higher recall.

Nicolay Gerold: Yeah.

And are there any emerging technologies
or research areas that you are

particularly excited about at the moment?

Charles Xie: So there
are a few technology.

So first of all to integrate and
support all this more traditional data

retrieval algorithms in vector search.

For example BM 25.

We saw the importance to support a BM 25.

One so to integrate a BEM 2025 with
a vector simulated search, that's

one thing that we are working on.

And also the other thing
that we are trying to.

We are trying to
accommodate a lot of data.

So when we are talking about
embeddings that a lot of people we

are just talking about like text
images and, but in the domain of a

text and in the knowledge base, we
found that there are a lot of data.

They are starting in
more like a traditional.

And we found out that we can
actually convert all this knowledge

base into graph embedding.

So there are a lot of
graph embedding algorithms.

So if you pick up one of this
algorithm to transfer knowledge

base or your graph data into graph
embedding and you put this embedding

in your vector similarity search.

Actually, you can achieve
superior data accuracy.

So actually we did some experiment.

We found out that you can achieve
even higher accuracy on data retrieval

than than graph RAG for example.

There are going to be more
and more indexing algorithm.

And for each indexing algorithms, there
are a lot of parameters like segmentation

size and how you casting the data, there
are tons of parameters to fine tuning.

That makes us to think about
autonomous driving for vector indexes.

And so basically basically we are, we're
trying to use this machine algorithms,

machine learning algorithms to provide
tuning on top of vector indexes.

And and so the customer, the user
don't have to care about, the

configuration and the parameters
of different indexing anymore.

And it just works out of the box.

And lastly we have a lot
of advancement in indexing

algorithms from HNSW to the IVPQ.

But we found out that there could
be like a self learned indexes.

And so it could be powered by all
these AI algorithms but it is highly

customized for your data distribution.

So for example, HNSW is
basically a graph based algorithms

But every single customer may have a.

Different data workload.

So what if we can use machine learning
or some algorithms to customize

the HNS W algorithm or the graph
based algorithm just for that data

distribution, for that kind of workload.

And and so this will help
people to achieve higher.

Performance and also have them to achieve
low cost and yeah, so I think those are

the those are the things that are emerging
in the next in the next a few years.

Nicolay Gerold: And by self learning
algorithm, do you, or adapting index,

do you actually mean that the index sees
the queries and which query types are

performed frequently and then basically
makes those faster by adapting the index?

Or is it also based on the data
structure and data distributions that it

reorganizes itself over time to basically
organize similar items closer together?

Charles Xie: Yeah, it could be both.

Yeah so you are absolutely right.

So we want to achieve both.

So let me give you one example.

We know that eventually you
have to build several indexes.

So if you have a larger volume of vector
data, you have to segment your data

and then build a several indexes and
how you are going to segment your data.

It is going to be a big challenge because
you want to maintain those data locality.

So if you know that the pattern
that you are going to retrieve your

data, you can build the indexes
without sacrificing data locality.

You basically have a better data locality.

And also as you mentioned there are
some data that you probably want to

retrieve with a higher frequency and then.

For those data, you can definitely
put them somewhere in this storage

hierarchy that have a higher,
affinity to GPU or CPU, you basically

store them in the GPU memory or
local memory and things like that.

Nicolay Gerold: Yeah.

And they're like in HNSW, you have a lot
of different things you could basically

add on probably like the choice of
connections per node, like the number

of connections per node, the number of
layers you have, like you could probably

also add new entry points into the data.

Are you also exploring like in
reinforcement learning learning

Custom distance functions.

Charles Xie: So this is
a very interesting idea.

We had a discussion about that.

I think one year ago.

But unfortunately, at the moment,
we don't have the engineering

bandwidth to explore that.

But but that's a very interesting idea.

Nicolay Gerold: Yeah.

It could be really interesting.

And if people really want to
start building a stuff, check out

Zilliz where can they do that?

And also what's on the horizon?

What can you tease or what should
people really look out for?

Charles Xie: I think that so as it
is that we are building we have been

building a vector database system for
high performance for scalability and also

without a sacrifice of a cost efficiency.

So for those users who want
super high performance.

They they want a very high QPS.

They want a very low latency
and also they get tons of data.

They get million and above and also
and also they want to achieve a

good TCO to total cost of ownership.

And I think we are the we
are the solution to go.

Nicolay Gerold: And if people
want to follow along with

you where can they do that?

Charles Xie: So they can.

So they can reach out to us
in the open source community.

So Milvus is open source on GitHub
and, so it was one of the most popular

vector database system in the world.

And yeah, so just search Milvus on GitHub.

And also we have a Discord channel
and to hold all this discussion about

vector database and about Milvus

Nicolay Gerold: So what can we take away?

I think there are two very
different sets of learning here.

One for like the developer who's working
on the actual databases and one for

more of the user of the databases.

So let's maybe start with the user, which
is the group, which I identify with more.

And I think what really is obvious
through the episode is how much effort

you actually should spend on figuring
out the requirements of the system

you're building so that you actually
sit down, look at the use case.

And figure out, okay, what am I
optimizing for or what do I have to

optimize for based on what the user
needs and also the use case requires.

And this can be very different things.

I think like the three main aspects
I'm looking at is for one always cost.

The second one is latency.

And the last one is basically in,
in search databases, relevance or

relevancy, like how relevant or good
do my search results have to be.

And you have a lot of different
levers to pull in your system to

actually have an impact on that.

So just optimizing for cost, I would place
more of my data in a cold form of storage.

But this also means I'm
trading off the latency part.

It doesn't mean I'm getting a
worse relevancy, but I will have a

worse latency just because of the
network cost I will be encountering.

And On the other hand, when I have more
of an e commerce use case where I'm in

the domain of 100 milliseconds, where I
have to return the results to the user,

I would have to optimize for latency.

So I would have to place
more of the data on, in RAM.

Or at the maximum in local SSD.

Because I wouldn't be able to basically
do the round trips in the network

before I have to return the results.

Especially when I have to
move large amounts of data.

Which I have to in search systems.

And the last one is the relevancy.

And here this could mean when I
have a use case where relevancy

is of the utmost importance.

And This could also mean that,
like, latency doesn't matter at all.

So when I'm looking, for example, at
report generation, which is something that

is typically done more async, so the day
prior, the night prior, and then I serve

to the user, I really can take my time.

And can focus more on getting
a really high relevance.

So doing, for example, an exhaustive
search over the entire vector

database because I just don't really
care about the latency I will have.

The other thing would be I
could even add more components.

So, for example, a re ranker,
which is something very hard to

do when you're actually working
in a low latency environment.

And I could also, when I have

a use case where I actually need high
relevance, but also very low load and

latency, then I would have to trade
off the cost massively and probably go

into GPU acceleration or just adding
more nodes to handle all those QPS.

And I think being explicit about this
in the very beginning, what are you

optimizing for is what will get you
the good results when you're actually

building a system down the road.

And the other set of learnings, which
is more of the like database engineer

or however you want to call it, who
is building the databases is there are

a lot of different, like interesting
things they've built that have emerged

over the years of building databases,
like one, the dual path Of search, like

you have a right path, which is going
to a buffer and it's only indexed once

you hit a certain threshold of data and
the read path basically queries from the

buffer, but also the already indexed data.

And I think that's very similar
to Cassandra's LSM tree.

And.

Also, like it's, it looks a lot like
a cache aside and you can basically

achieve the real time searchability.

So basically the data is fresh without
really sacrificing the performance

because you couldn't search it in real
time if you would have to build the

index first and also the hierarchical
storage element, like having for one,

like the fastest is the GPU memory.

Then you have the RAM, then you
have the local SSD, and then

you have the object storage.

And in each one of those, you
have a different bottleneck.

So in GPU memory, the bottleneck
is CPU to GPU transfer.

In RAM, the bottleneck
is memory bandwidth.

In local SSD, you have IO operations.

In object store, like S3,
you have the network latency.

So you have to solve a lot of
problems to actually make this work.

But you basically can do a refined
search and go basically through

the different types of data.

And he also mentioned that they have
different amounts of data tend to be

stored on the different types of storage.

So, for example, the GPU has 1%.

At the maximum, probably even less.

And the, in RAM, are 10%.

Then in disk and object store,
I think this is where, at the

moment, the main trade off is done.

Like, on the local SSD, or
whether it's in object storage.

And, the ratio, I think,
was really interesting.

I think he even mentioned, like,
89 percent is roughly on local SSD.

The architecture basically allows
them to maintain the high performance

while still handling massive data
sets in a cost efficient way.

And this is again going into like
the trade off direction, like

what are you trading off for?

And as a systems engineer, database
engineer, you basically have to

build the wheels so the user is
actually able to make his decisions

on what he's optimizing for.

But yeah, I think this was
a very interesting episode.

We will be going even deeper
into Milvus and its architecture

in an upcoming episode.

I'm not sure whether I will post it next
week or at a point later in this season.

#30 Charles Xie on Vector Search at Scale, Why One Size Doesn't Fit All | Search

#30 Charles Xie on Vector Search at Scale, Why One Size Doesn't Fit All | Search#30 Charles Xie on Vector Search at Scale, Why One Size Doesn't Fit All | Search

More episodes

#30 Charles Xie on Vector Search at Scale, Why One Size Doesn't Fit All | Search

#30 Charles Xie on Vector Search at Scale, Why One Size Doesn't Fit All | Search

Chapters

What is How AI Is Built ?