How AI Is Built

Transcript

Imagine a world where data bottlenecks, slow data loaders, or memory issues on the VM don't hold back machine learning.

Machine learning and AI success depends on the speed you can iterate. LanceDB is here to to enable fast experiments on top of terabytes of unstructured data. It is the database for AI. Dive with us into how LanceDB was built, what went into the decision to use Rust as the main implementation language, the potential of AI on top of LanceDB, and more.

"LanceDB is the database for AI...to manage their data, to do a performant billion scale vector search."

“We're big believers in the composable data systems vision."

"You can insert data into LanceDB using Panda's data frames...to sort of really large 'embed the internet' kind of workflows."

"We wanted to create a new generation of data infrastructure that makes their [AI engineers] lives a lot easier."

"LanceDB offers up to 1,000 times faster performance than Parquet."

Change She:

LinkedIn
X (Twitter)

LanceDB:

X (Twitter)
GitHub
Web
Discord
VectorDB Recipes

Nicolay Gerold:

LinkedIn
X (Twitter)

00:00 Introduction to Multimodal Embeddings
00:26 Challenges in Storage and Serving
02:51 LanceDB: The Solution for Multimodal Data
04:25 Interview with Chang She: Origins and Vision
10:37 Technical Deep Dive: LanceDB and Rust
18:11 Innovations in Data Storage Formats
19:00 Optimizing Performance in Lakehouse Ecosystems
21:22 Future Use Cases for LanceDB
26:04 Building Effective Recommendation Systems
32:10 Exciting Applications and Future Directions

What is How AI Is Built ?

Real engineers. Real deployments. Zero hype. We interview the top engineers who actually put AI in production. Learn what the best engineers have figured out through years of experience. Hosted by Nicolay Gerold, CEO of Aisbach and CTO at Proxdeal and Multiply Content.

Nicolay Gerold: So we are now
starting with a little bit on.

On multimodal embeddings, a bunch of
my audio files got corrupted last week.

So we're going to kick it off
with a new episode and publish.

Two episodes this week.

And we will be talking.

Talking a lot.

About how to fine tune embedding models.

How to use multimodal AI.

In the enterprise.

But.

Multimodal.

Embeddings, especially
come with a lot of pain.

And a large portion of that is
because of storage and serving.

Because you often in the enterprise
have different access patterns.

To embeddings.

And.

You often.

I have to go.

With multiple store.

Storage solutions at the same time.

To store the different.

Parts what makes up in
multimodal embedding.

So you typically.

store the source.

Data as well.

So you have.

Half the blops in some
form of block storage.

In a bucket on GCP or AWS.

At the same time you have.

Have V embeddings either in
a transactional database.

If your dataset is small
or in a vector database.

If you have to serve them
for a more real time.

Or near real time application.

And then you also have the metadata.

The data, which is often stored in
something more structured, mostly.

Most likely.

Uh, transactional database like Postgres.

Oscars.

So.

You have a lot of different stories.

Storage solutions, which makes
it very hard to work with multi.

Multimodal data.

And.

In AI, you have a set.

Of operations, all of that has to support.

For example.

Example you want to access data randomly,
so you actually can run your training.

You can do a train test split.

And.

At.

At the same time, it has to
support filtering because

during training, for example, I.

I want to do a stratified sampling.

Because I don't want to split.

Split completely randomly because there
might be some patterns in my data.

I have to consider or in time
series data, and I'm doing an.

Anomaly detection.

I want to split based on the daytime.

Time column to only run the
predictions on future data.

Points.

And train on past ones.

So it isn't.

Enough to have completely
random access, but I need it.

But also based on basically
a certain column, I have.

I have to be able to filter.

At the same time as the dataset.

Evolves.

I want to be able to version it.

Um, but I also want to be able to use.

Use.

The metadata.

In the best case.

Uh, where it lives with
the raw data and Lance.

Trying to build the solution for that.

At the moment they're really optimize.

for large scale storage.

So really going.

Like 10 billion, 50
billion back those plus.

And.

How they do that is by
building a solution.

On top of.

Five, sorry.

Short block storage on top
of S three GCP pockets.

And this means the storage is very.

Very very cheap.

And they do that by
separating the compute and.

Storage.

By building their own open table format.

Which is Lance.

And for those.

Those of you who don't know what an open
table format basically is a file format.

Plus.

Metadata.

And there are some.

Open table formats, which
are more for transactional.

Or BI business intelligence
workloads, workloads like Hudi,

Iceberg, but also Deltatable.

And.

Very basically store the raw data in.

Parquet and add metadata and JSON fires.

To allow.

ACID transactions to allow
versioning and stuff like that.

And Lance has built their
own solution for that.

But building on top.

Top of their own file format.

Which allows for.

For additional query
types and access patterns.

And also.

So for example, similarity search
on this space in the cities.

And we will be interviewing Chang She.

who is the CEO of LanceDB,
but also one of the.

Original creators and maintainers of
the Pandas library and we'll be diving.

Into how Lance works.

What use cases?

Stand out to him.

What Lance is use for how
you can use Lance for.

Your own projects and also
how to build retrieval search.

Rag, but also.

Chang She: LanceDB is the database
for AI, so we help our users who are

generally AI teams looking to build
applications to manage their data to do

a performant billion scale vector search
and also deliver better performance

for training, fine tuning, and the
whole sort of data infrastructure

needs throughout their AI life cycle.

Nicolay Gerold: Yeah, great.

And can you tell me your story,
how you got actually into building

LanceDB and the file format Lance?

What was the problem you've actually
seen in your day to day life that

convinced you to go into the space?

Chang She: It's very interesting because
so much has happened over the last year.

We started LanceDB about two
years ago and working on a Sort of

related but very related problem,
but in a very different form.

We started because I was at Tubi TV,
a streaming company, and my co founder

lay was at cruise and what we saw
was that there was so much pain for

ML teams when they were working with
unstructured data for multimodal AI.

These are like, images
videos, audio tracks, PDFs.

Point clouds, you name it.

So the news in the new era for
machine learning and AI, the data

doesn't really fit well into a table.

And so all of the data infrastructure.

Was not optimized for that.

And our teams just had a terrible time
dealing with that in, in production.

So we wanted to create a new
generation of data infrastructure

that makes their lives a lot easier.

And this is really informed by
my experience in open source.

So I've been building data tools for
machine learning engineers and data

scientists for So I've been working with
pandas for almost two decades at this

point, I was one of the original creators
of the pandas library and this is my

passion is as I love sort of creating
a tool and seeing the users get a 10 X

in boost in productivity or just gang
superpowers that's what makes me happy.

And so we wanted to do that for this new
generation of multimodal AI applications.

Nicolay Gerold: Yeah.

And that's especially like
the big trend at the moment.

I'm seeing as well in the data space.

There are so many libraries popping up,
which are streamlining so much of the

process and which are actually focused
on improving the performance so much.

And most of them are actually
based on Rust like you are.

What was the decision to actually
jump onto the Rust train?

Chang She: Yeah, so our Rust
Pill moment was the end of 2022.

So we had spent a good portion of 2022 in
C writing the core Lance columnar format,

and this was used by our early customers.

Mostly in Autonomous Vehicles to
manage petabyte scale vision datasets.

And of the things more than one with
C our frustrations were it was very

easy to write unsafe code that creates
segfaults in production when you run it.

It's really difficult
to wrestle with CMake.

And overall it was just difficult to
move very quickly and be very agile.

And the Christmas of 2022, we
were just having a hackathon

project for a customer demo.

And.

We said, Hey, let's learn rust.

So we wrote that and we were just
amazed by how productive we are, how

easy the tooling ecosystem was and
how confident it made us feel because

the code that we wrote was so safe.

And as a Christmas present to ourselves,
we decided to say, Hey let's make a

big bet on Rust, ditch all of our C
code and rewrite it all in in Rust.

And so we did that in the new year
and it took us about, I think,

three weeks to rewrite roughly
Five months of C development.

Yeah.

And as learners, beginners in Rust.

And I think that's not
a unique story to us.

So like a lot of, I've heard from a lot
of friends who are rewriting things in

Rust that they were able to just move
a lot faster because of the language,

Nicolay Gerold: Do you see actually.

More companies which are implementing
AI and not really developing like

the infrastructure, the libraries
for it, switching to Rust as well.

Or do you see Rust rather as a support
language for something like Python,

where a lot of the libraries are actually
implemented in Rust and you use it

in Python just for the ease of use?

Chang She: I think in terms of
number of users, we will still

see the largest usage in Python.

Just because the Python ecosystem for
AI is so rich and has so much tradition.

But we're starting to see a lot of
the sort of AI application or AI

infrastructure companies use Rust under
the hood for their systems and services.

And as that happens, a richer
Rust ecosystem is also.

Coming up so that a lot of these
systems can now talk to each other

directly in Rust rather than having
to use pure gRPC HTTP protocols or

have to go through the Python layer.

Nicolay Gerold: Yeah, and now jumping back
into LanceDB and Lance, what are the major

components for the developer when he's
interacting with LanceDB and when he is

actually using the library for Basically
data insertion, but also vector search.

Chang She: Yeah, so for Lance DB we
built that around our core Lance format,

and so built, essentially when users
are interacting with Lance DB they are

interacting with either the Python,
the Rust, or the JavaScript APIs.

And they're talking to a query layer
that then wraps around Lance format.

And so we tried to build it so that
it's very, it feels very familiar to

users who are used to using the usual
data tools, especially in Python.

For example, you can.

Insert data into Lancey beef using
pandas, data frames, pollers, data frames.

You can plug Lance into, let's say,
like a spark cluster and you can write.

Data, massive amounts
of data very quickly.

We have one sort of user that is this
tells us they have a 10 trillion token

data set that they're writing into
Lance in a massive spark cluster.

And so from, small local experiments
with pandas to really large embed,

the internet kind of workflows.

If you're used to the sort of
traditional Python tool chain for

working with data, it's very easy
for you to work with Lance DB.

And I think one of the interesting
things that I feel personally pretty

passionate about developing Lance
DB is we're big believers in the.

Composable data systems vision.

So my my old co founder from my
pre previous startup Wes McKinney

he really made this popular
when he was at Voltron data.

And so the idea was, Hey the database
is going to get blown up and we're going

to have in each layer, we're going to
have really great open source projects

that make it easy to talk to each other
and you can now Easily create very

customized data systems by just picking
and choosing the right projects on

putting them in the right package on.

So we follow that philosophies so
that, our internal query engine.

For example, we use data fusion.

We use Apache arrow as the the in memory
API into the rest of the ecosystem,

and so automatically we're getting
compatibility with, dozens of tools that

have integrated with the arrow already.

Nicolay Gerold: If I'm looking
at databases often, it's like

you have a pretty fixed process.

So first I have like data
modeling and defining a schema.

Then I'm setting up the database and then
I'm actually inserting the database and

schema, and then I can start querying what
do these components look like in LanceDB?

Chang She: Yeah.

So under the hood, the schema
is arrow schema is everything

in the interaction is arrow.

But one of the one of the things
that we've integrated into

LanceDB that I think is really
helpful for our users is Pydantic.

So arrow is great.

For infrastructure, but as a user facing
tool, the API is not all that is not

necessarily intuitive all the time.

To put it lightly.

And so what we do is we actually allow
you to create the data model using a

Pydantic model, and we take care of
the conversion from pedantic to arrow.

So you can, when you create a table it's
very easy in Python to say, I'm going

to I don't really know what the arrow
types and all that are, but it's very

easy for me to just create a class that's
inherits from Pydantic based model.

And I can use that as the schema
and then I can also just pass in

Pydantic object instances directly
into LanceDB and it knows to translate

that into the database schema.

And additionally.

So what really allows us to do is we've
got this embedding registry functionality

that allows you to say, Hey, this
column, or this Pydantic field is the

vector column, and it's going to be
generated using, open AI or sentence

transformers out of this source attribute
in your, in the same pedantic schema.

So once you do that, When you
create the table, you now no longer

need to do embedding generation,
either when you insert data or

when you create the database.

So now instead of this sort of
simple vector search tool we can now

give Lance DB as a search backend.

A more generalized search back end
for our users so they can just throw

in text or throw in images and they
can add in, text embedding functions,

image embedded functions, text,
image, multimodal embedding functions,

and it's very easy for them to say.

I want to query, vectors, or I
want to do create keywords and

they don't have to deal with most
of the details under the hood.

Nicolay Gerold: Yeah.

And what other indexes do you support?

In Lance so do you, because Postgres is
particularly known of having like a shit

ton of support for different data types.

How is it looking for Lance?

So do you also have like geographical data
types which you can query or date times?

Chang She: So datetime is
definitely, we have support for that.

We generally support most
of the arrow data types.

And.

We have rich semantic
types on top of that.

So for example, we have a
semantic type for vector and

the storage is fixed size list.

We have semantic types for
tensors and images, and we're

making one for video as well.

I think the type ecosystem
is very rich in Lance.

And what's interesting, I think, is when
you bring up the type of ecosystem is

that Lance is a little bit different
from other vector databases in that

we let users actually store the raw
data and especially the training data.

We have some of our
largest customers, right?

Storing petabytes of image data for
training in the same data set that there's

you storing the vectors for serving.

And so this allows, gives them
that single source of truth.

So they don't have to keep like
keeping two data sets in sync with

each other and things like that.

Nicolay Gerold: Yeah, and you have
on your website actually the phrase

which I really love is like up to
1, 000 times faster than Parquet.

What is the main optimizations
you actually do below the hood

to get that performance benefit?

Chang She: Yeah.

So, all benchmarks are wrong,
but some are useful, right?

So the so the whole overall context
on that sort of 1000 time claim is

for machine learning and AI, a lot
of times dealing with the data means

you need very fast, random access.

So this is useful for indexing,
when you do shuffling or

sampling and things like that.

And so when you when you want
to fetch a small number of rows

scattered throughout your data set.

That's when we can get up to that
a thousand times speed of parquet.

And then I think sort of.

The reason why we were able to achieve
that with the format is because we lay

out the data differently from Parquet.

The sort of IO plan for the dataset is
also different and we've updated a lot

of the access assumptions based on more
modern storage technologies, right?

Parquet was designed 10 years
ago and storage technologies

were very different then.

Especially there wasn't as many
there, there wasn't as prevalent usage

of cloud based object storage, for
example, and NVMe SSDs weren't really

around or weren't nearly as fast.

So by doing all those, all three
of those things, we can achieve

very meaningful performance gains.

Nicolay Gerold: Yeah, and when I'm
looking especially at the Lakehouse

ecosystem as well, they are also
trying to basically abstract away

the complete storage element.

From the user and do all of the different
autom optimizations below the hood

and especially in Delta table, I think
what's prevalent now or what's coming

is liquid clustering is what I call it.

I think where they try to group
the similar elements, which are

likely to be retrieved together,
where they try to group it together.

Have you considered such
optimizations as well?

Chang She: Yeah.

So we've added we've added stats
based pruning in the format already.

So typically the use cases for these
types of things are just for much

faster filtering under the hood.

And and on top of that,
because we actually support.

Fast random access.

It's easy to then attach
an index for that.

So that increases filtering
performance even more, especially

if your filters are very selective.

And I know that like Delta
under the hood uses parquet for.

For storage same with,
iceberg and then hoodie.

In those types of use cases, Lance is
going to be much faster because the

file format is different under the hood.

Nicolay Gerold: Yeah.

And looking at Delta table, Iceberg
and Hudi what would be use cases

where you would go like with more
of the old formats overlands?

Chang She: I think, for now, if you
have Just very traditional BI use cases.

Where all the vast majority of your
workload is just, scanning one column

for filtering and then scanning another
column to compute some aggregate, right?

If and these are like just tabular
data columns, then there's probably

not a compelling reason to.

Switch.

And if you're already in those
ecosystems, there's not a compelling

reason to switch out for those use cases.

So I would say that, and Lance is very
much Lance is as good in performance

as parquet and the systems built on
parquet for those types of use cases but

Lance is much more optimized if you're
working with embeddings, if you're

working with images and large blob data.

Nicolay Gerold: Yeah, and what we already
touched upon is like the multi modal part.

And this is particularly interesting,
especially getting into more and

more of the multi modal models,
what we are seeing right now,

and what use cases do you think?

LanceDB will enable in the future
through this multimodal storage and

the associated embeddings, which are
all stored in one place together.

Chang She: So I think we're
seeing a lot of the these use

cases in a nascent form already.

So I think one is being able to quickly
train models for multimodal use cases.

For example, You get in your car and
you can say, roll that you can point

to one of the windows, say, roll down
that window or, turn on the radio and

play something I like, or just, um,
when with AR and mixed reality also

with different attempts at glasses and
vision I think there's a lot of room

for those types of alloc applications.

Now, the interesting thing is
this is where I think multimodal

models, and especially like vision
models are very different from

large language models, because in
production, the language that you see.

Is Pretty similar in distribution to the
language that LLMs are trained on, right?

But for vision models, that's
not necessarily the case.

Like large vision models by default are
trained on, I don't know, like stock

images or something like that, right?

Or like Cocoa or ImageNet or whatnot.

But in production, There's
no control over that.

If you're using that in a a chip
manufacturer, they're all going to be

top down images of like your board.

If you're using them for autonomous
vehicles, they're all going to be street

scenes and you're going to be picking
out very specific classes of things.

And so I think for multimodal AI,
there's going to be a lot bigger

need to At least fine tune your own
model, if not completely retrain it.

And so the need for data storage and
data management then becomes much bigger.

And I think along with that for
AI and production, I think a lot

of the success factors are going
to be the same as classical ML.

It makes me feel really weird to
think classical ML is two years old,

or maybe one and a half years old.

But here we are.

And I think the biggest thing that
I learned from Tubi was that machine

learning and generally AI success
depends on the speed at which

you can iterate in an experiment.

And so having something like Lance
is, gives you a lot of advantages.

So for example, one of our users
told us, Hey, I've got, I think

something like a hundred terabytes of.

Image and video data in S3, when we
want to run an experiment, we have

to download all of that data create
a new feature, create the new feature

data as a new column, put that, add
that into the new copy and then upload

it to back into S3 as a new copy.

And so this makes it incredibly expensive
and also it takes incredibly long to run

every new experiment, but instead with
Lance, you can keep that same copy and

you just, you don't have to load anywhere.

You just write new feature data as new
columns directly onto S3 and you can

roll back in time so you can discard
experiments that have completed and

you won't roll out into production.

I think use cases like these are what I'm
really looking forward to to helping our

customers and users do in 2024 and 2025,

Nicolay Gerold: A few use cases,
especially listed on your platform,

which are RAG which is the obvious
one, let's say, and search.

And one I'm particularly interested in
is the recommendation system use case,

because I think in the near future,
we will see more and more of people

realizing that most of the stuff we
are doing with LLMs at the moment is.

Recommendation system, especially
in RAG we're recommending the right

pieces of content, but also with the
trend of shifting the computation

to inference time that we generate
multiple different options as answers

as texts, but you in the end have to
put a recommendation system on top of

that, which answer is the most relevant.

I think we will see more and more of that.

And I'm really curious, how would you
set up the typical recommendation system

data model, like having items, having
users, and then having the analytics?

How would you set this up in LanceDB?

Chang She: Yeah, this is very interesting.

I think what's so let's talk
about what's similar, right?

And so the problem is you want
production quality retrieval that is

tailored towards our target, right?

And both.

And so the way that Recsys solves
it is you have first, you have

a first pass sort of recall step
that uses a bunch of different

retrieval techniques to get relevant.

Data, then you put them all together
and you pass them through a re ranking

model to do fine grain re ranking, right?

And then you present that results
to the user, usually in either sort

of some sort of feed format or some
grid format, depending on if you're

talking about news or streaming
or e commerce or things like that.

And then what's really interesting
about Recsys is, of course, you

can take that user feedback and
you can now fine tune your system.

You can change your recaller models,
you can change your re ranker models and

you can add additional sort of business
logic rules to change the rankings

in a non machine learning context.

Now, with RAG far it looks.

Much simpler than that, right?

We just say there's a recaller step.

Typically, most people are
just using vector search.

So a single recaller, there's no
real re ranking, but we just feed

the results into into that LLM query.

Now, when people go into production,
what they're looking for is, okay.

Vector search gets me
maybe 60 or 70 percent of.

The actual results that I
want from my knowledge base.

How do I get, how do I do better?

I need to get to maybe 90%.

In order for my application to feel
production quality to my users.

So now we have a lot more experiments
in production around, okay.

Instead of just vector search,
you can now add keyword search.

You can do SQL queries or even.

Add graph databases or
something like that, right?

And so now you have multiple
recall results, right?

Now you need the way to
combine them and re rank them.

And of course, I think the
new term is now a rank fusion.

It's basically just re ranking, right?

And then you're presenting
the results to the LLM.

So it's really starting
to look very similar.

And what's missing is that feedback
cycle of, okay, can I take user

feedback and can I fine tune the
embedding model, for example, can

I fine tune the re ranking model?

And I think in LanceDB we're working
on that, that closing that loop.

In a more automated fashion, a lot
of our users today are doing that.

And we want to make that easier.

We recently released a much more complete
workflow for hybrid search and re ranking.

But so I think soon we'll have
this sort of complete story

and it'll look very similar.

Now there are, I think, some
important differences, right?

Because when you talked about Recsys
earlier, you mentioned, hey, we're, we

have users and items and analytics and
we're recommending the items to the users.

The issue with LLMs is you're not you're
not recommending results to a user, you're

actually recommending results to a query.

And so it's, the results are,
even if they're issued by the

same user, different results are
going to be relevant for different

queries even if by the same user.

And I've been discussing this with
a friend of mine Brian Bishop.

So any insight that sounds interesting
is probably coming from him and not me.

It's only secondhand.

Nicolay Gerold: so

Chang She: Yeah.

So I think there's, so the systems
won't look exactly the same, right?

You won't have this like really
simple users item dichotomy.

And it's going to be
much more sophisticated.

At a high level, it
does look very similar.

Nicolay Gerold: Yeah, and I think
what's already pretty pronounced

in recommendation system, you
always have a secondary storage, at

least for the embeddings, because
they tend to get quite large.

Would you envision a future where
you also do user based content based

filtering or even collaborative filtering?

How would you integrate this into Lance
if you would want to enable this more?

Chang She: Think so with
collaborative filtering, essentially.

What you end up doing is you're,
you end up generating the embeddings

for users and items that way.

And so if we're generating if we're
generating like item embeddings, Using

chat using an open AI or some embedding
model you probably won't need something

like collaborative filtering or these
traditional techniques, Rexis techniques.

But you will, you, you still
will have item embeddings.

Now it's not clear to me what the
user embeddings will look like.

But you, I could envision
a future where the.

Results the way that we recall
results and the way that we re rank

them can differ based on the user.

And typically this would involve some sort
of feedback mechanism that allows us to

estimate a distribution for given users
based on what types of questions they tend

to ask, what types of responses are more
irrelevant to them and things like that.

So we haven't done we haven't done all
the all the work to test that out yet.

So this is all sort of
speculation at the moment.

Nicolay Gerold: Yeah.

And can you tell me like, what is
the most surprising, interesting,

unexpected way Lens is used in
production right now you've seen so far?

Chang She: So there, there are a couple
of in retrospect, not surprising, but

when they were presented to me was
very novel and made me say, huh, I

didn't think of I didn't think of that.

One was interested there are a couple of.

Desktop applications that use electron
that now use LanceDB and they have

local vector storage and some of
them have even local LLM or, make a

ChatGPT call or something like that.

And so that was very interesting when we
set out to make the data format and the

vector DB We hadn't even thought about
desktop applications in a long time but

it seemed very useful and it was great.

I think there were lots of rough
edges, especially around windows

distributions that we've ironed out
and there's some still, some issues

that we're probably still needs to fix.

For those guys who've built on top
of LanceDB thanks for your patience.

And so that's one class of applications
that's been really interesting.

Another type is actually we've
had a couple of like newer machine

learning frameworks that are now
building on top of lance format.

And a particular one was very interesting.

There's a framework called SPN
ML out of Fujitsu in Japan.

And when I talked with those guys.

Their goal is to use Lance, not just
for the data storage, but when they

train models and when they create new
model artifacts, they actually want to

store the models in Lance DB as well.

And what's interesting for them is
one Lance DB is automatically version.

So as you accumulate more
artifacts, it's very easy for them

to manage the different versions.

And that's one.

Two is.

They can keep the the models with
their metadata as part of Lance and

be able to work with that easily.

And then number three is once they've
stored it in Lance, they're still

trying to figure this part out,
but you can create an embedding

out of the metadata of that model.

And now you have a model database
that is semantically searchable.

So their users can come into
the system and say, I want it.

I want to compute.

I want to compute like I don't know what
they would, I think they're saying like,

I want to solve this classification
problem using these kinds of data and

they'll be able to essentially recommend
the right models to their users using

using Lance as that search backend.

And the interesting thing that we're
still trying to figure out with them is

what is the right thing to embed To, to
make these model artifacts searchable.

Is it, if it's a description, how do
you make sure their users are actually

putting in, useful descriptions, if it's
something else, like a code artifact,

what are the relevant things that will
support semantic search like that?

that was also very surprising
and very interesting.

And I think this is something that
makes me think land CV might be really

useful for agents in the future as well.

So if you have agents.

That, that starts to collect
lots of different capabilities.

You don't want to say, okay, my agent
knows how to do a hundred things.

I'm going to send them all to
chat, GPT function calls as

contacts and get something back.

You want to pick the most relevant
few and send that to, to chat GPT.

And this is where I think Lancey B can
come in as a semantically searchable

function registry for agents, if you will.

Nicolay Gerold: Yeah, like
we've seen in the Voyager paper.

Where they are basically, where
they're also embedding the

Actions the agents learned.

Nice.

So what tool would you love to see
being built on top or next to Lance,

which would make it even more powerful?

Chang She: That's really interesting.

Yeah, I think what I'm looking forward
to immediately is sort of integration.

Bigger integrations
into the data ecosystem.

And so we want to enable our users to
not have to think about not have to

think about that whole like data prep
preparation pipeline and ecosystem and

all those tasks and be able to think,
okay, I can just throw data and Lance

DB and then on the other end, I'm
going to be able to run search results.

Small to large data systems being
connected at a native level.

I think that's what really
excites me is to get more data in.

And I think I love to see going
back to that agentic workflow.

We now have some users that are like.

A very domain specific multi agent,
agentic workflow that they're

trying to push into production.

So I think it would be pretty amazing
to, to work with them to build these

semantically searchable, function
registries and capabilities into Lance DB.

I think, rag, we've done a
lot of that in the past year.

And, I think Everybody's
probably sick of chatbots by now.

And I think multimodal and agents
are the sort of next frontier.

And so I'm really looking forward
to seeing what our users will build

on top of Lansky B in those areas.

Nicolay Gerold: Nice.

So what's next for LanceDB?

Chang She: So I think the next
thing on LanceDB is, continue

to to optimize our, scale and
performance and ease of use for users.

And I think start to deliver on a lot
of the topics that we talked about

today of how to make it easy for.

Folks to, to create a Recsys and rag how
to make it easier for people to embed

the internet and quickly ingest that
and index that and make that searchable.

So I think a lot of that is, driven
by the fact that, Hey, this year AI

teams and individual developers are
really trying to seriously build.

Generative applications in production.

And so how do we serve the needs
that are different in production that

weren't really a problem last year
when it was mostly like demos and POCs.

Nicolay Gerold: Yeah.

Nice.

So where can people follow
along with you and with LanceDB?

Chang She: Yeah, absolutely.

I'd love to, to chat with
folks on Twitter or GitHub.

It's the same handle I've had for 25
years, I think which is changiskhan.

And we have a discord community
for LanceDB and it's pretty active.

And there are lots of really
interesting applications and use

cases that you can see in there.

Yeah.

And love, love to talk shop, on discord
or on Twitter, or, if they just want to

ping me privately I'm chang@lancedb.com.

Nicolay Gerold: So what can
we take away when we want to

take stuff into production?

so be fanboying a little bit because I
love lansdb and have it in production

in two separate projects So I I
can't recommend it highly enough.

So first it's open Um, and I think they
built a solution in a space where There

wasn't a good alternative And There
are two main Areas I use Lens, one is

large scale embedding and data storage.

So basically keeping everything
together when I have a lot of

embeddings, a lot of raw data,
but also the associated metadata.

And this can be, for example,
image data plus bounding boxes.

Which is often stored as more
structured information, but it

can also be for a recommendation
system, images plus text plus.

The embeddings to do a recommendation
system or a search on top of,

and basically just keep a caching
layer in front of it to keep the

latency at a level that I actually
needed for, for the application.

And I think that's what they actually
built in their hosted offering of

Lance DB is basically a solution
which combines like the scalability.

Off lands, lands, DB, the open
source project on top of S three or

whatever cloud storage they are using.

Plus a really smart caching solution,
which has a good way of basically

invalidating the cache and also filling
the cache with the data that you need.

In.

Low latency and that's in the end for
you to figure out like what data is

important and what data will be requested
by the user the most So you actually,

it's valuable to keep it low latency.

And I built mine in Redis, um, because
it has instant updates to the index,

which is something I learned fairly
recently, but I think you can rebuild

that stack pretty, pretty fast.

And Lance.

The second area where
I use it is embedded.

I think he went into a few of the
use cases of, for example, VR,

AR, or vehicles, where you would
actually use something like Lance

as an, as a vector store, the
advantage of using Lance is that you

actually don't have to waste memory.

So when you are in a resource constrained
environment, you can You actually want

to focus that most of the RAM you're
using is actually going into some form

of business logic or some important
processes that are happening on the

end device and not on storing vectors
and making search very, very fast.

And I think Lance is a
good solution for that.

I also use it on, if I have use
cases where I'm not running.

A real time search, but there are
rather there are documents or some

form of information coming in and
i'm Processing it live and i'm

not really storing it in any way.

So i'm basically just creating a Lance
table with the embeddings with the

additional information and just querying
on top of that and This also makes it very

efficient because I just can use a Lambda,
attach a little bit of file storage and

don't waste my memory for keeping the,
the vectors, the embeddings in memory.

And this is, I think what makes
Lance very, very special that you

actually can decide like where
to put your resources in the end.

I think use lance try out the tutorials
they have on the websites They have some

really good stuff the check also their
blog posts and their documentation.

I think it's one of the better
documentations out there

explaining the different components
behind lance and Follow chang.

And if you want to see some of the
things I built with Lance, let me know.

Give me a ping on LinkedIn, on Twitter,
or somewhere, and I might make it open.

I'm not entirely sure.

I might have to do some adjustments.

Otherwise, we will be continuing
on Multimodal AI on Thursday.

So in the next episode, if you're catching
this later, where we will be talking about

fine tuning multimodal embedding models.

With Michael GÃ¼nther who is a researcher
at Jina ai i'm really excited for that So

give it a follow wherever you're listening
on youtube spotify apple or wherever else

and I would love to hear your feedback
on this episode and especially like

what Stuff we could improve like the
critical feedback is always more welcome

because it helps us improve the podcast
And yeah, I will see you in the next

episode and hopefully talk to you soon

Building the database for AI, Multi-modal AI, Multi-modal Storage | S2 E10

More episodes

Building the database for AI, Multi-modal AI, Multi-modal Storage | S2 E10

Building the database for AI, Multi-modal AI, Multi-modal Storage | S2 E10

Chapters

What is How AI Is Built ?