Welcome to the Location Insights Podcast
Where we bring you information, news, and insights from Partners, Clients, and Data Scientists about: Location marketing, Retail Media, POI, Data Analysis, and other interesting topics.
Hi, and welcome to Location Insights Podcast. I'm your host, Kevin, the Global Marketing Manager at unerry Inc.
unerry is a location analytics platform and data insights company based in Tokyo, Japan.
And with me today is Jason. He's the CEO of SafeGraph. Hi, Jason. Great to have you on the show. How are you doing today?
I'm doing great. Thanks so much for having me. I just want to first say it's an honor to join this podcast and speak with a Japanese audience.
Japan has an incredibly advanced spatial ecosystem, and I personally have always admired the precision and innovation that Japanese companies bring to this space.
I'm really grateful for the opportunity to share what our perspective is today.
Excellent. That sounds like a great way to start off.
So a quick step back, maybe introduce yourself and your background.
So you're based in the U.S., right?
Yes.
And SafeGraph is based out of the U.S.
So how did you come into the POI market and maybe a little bit about?
Yeah.
So I'll take a quick step back.
So my name's Jason Richmond.
I'm a CEO of SafeGraph.
For those who don't know, SafeGraph is a pure play data company that builds high precision
data about physical places globally.
And we basically use our data to power analytics and applications for some of the world's top
innovators.
I'm based here in Miami, Florida.
I was previously in New York, but the SafeGraph team is a remote first startup.
And the way that I kind of found myself in this business, it wasn't a linear path.
So I'll kind of start from the very beginning.
I basically spent a better part of my career at the intersection of data and technology,
helping startups commercialize complex B2B products and scale them into real businesses.
I've been at SafeGraph now for eight years.
But prior to SafeGraph, I worked at a company called Metamarkets,
which was a SaaS analytics startup that catered to the advertising technology ecosystem.
So for those who don't know, in advertising technology, it's very similar to how high-frequency
trading works in stock markets.
Ads are getting bought and sold in near real time using algorithms.
And so there's a huge amount of data flowing through the advertising ecosystem.
And Metamarkets was a tool that allowed these different technology platforms to visualize,
process, and store vast amounts of programmatic advertising data.
And so I was leading sales at Metamarkets up until the exit to Snap Inc, which owned Snapchat back in 2017.
And then through that process, I ended up meeting Auren Hoffman, who was the original founder of SafeGraph.
Now, Auren was a serial entrepreneur who also founded a company called LiveRamp.
And so I got really excited when he reached out to me as an investor of my prior business and basically recruited me to come join SafeGraph as one of the first commercial hires.
And so I started out as one of the first sales hires, eventually worked my way up into revenue leadership.
And then last year, I stepped into the role of a CEO.
And so I've been in the CEO seat for the last 14 months or so.
But across everything I've done, the through line is pretty simple.
I love building companies and I enjoy the crazy, chaotic, fast-paced startups.
Awesome. Awesome.
That's a great way, I think, to start off with the introduction of yourself.
Now, for SafeGraph, you're all about POIs, right?
So maybe just for listeners, if you're not familiar, in simple terms, what is your definition of what POI data is?
Yeah, that's a great question.
So POI data describes a specific location that someone like you or I may visit.
It may tell you what the place is, where it is, and any key metadata about the location.
So an example could be your local restaurant, gas station, a Starbucks, a mall, a strip mall, even a doctor's office or an EV charging location.
It's basically anything where people interact with the physical environment.
You know, POI data is the backbone of almost every single location-based application,
from navigation to retail analytics platforms and fintech data enrichment APIs.
And so POIs are really, really exciting a space to be in.
You know, what originally pulled me into the location data industry, to answer your earlier question, it wasn't some long-term plan, right?
I didn't go to school.
For GIS, I actually went to undergrad business school and studied finance.
But for me, it was really about just intellectual curiosity.
When I started reading up on the location data industry, I was really fascinated by all the use cases that require accurate POI or location data, things such as retail, CPG, financial services, and mapping.
And then I had previously worked in software sales.
And so when I heard about a data as a service or data only business, I got really excited.
And then fast forward, I started talking to actual customers in the space.
And through that conversation, I really appreciated the opportunity in front of me when I realized that there was a real problem around, you know, building accurate data at scale.
And I recognized that SafeGraph was in a unique position to become a core player in this ecosystem.
Awesome.
I think that's a great introduction for listeners that are new to POI and want to have a better grasp in terms of what it is.
So what are some of the typical use cases that a company would use POI data for?
Yeah, so when you're a data business, naturally you can be horizontal because you're a data ingredient or utility.
Now, we have definitely decided to focus our ICP or ideal customer profile here at SafeGraph,
but I will run through a couple of typical examples
that we see for commercial allocations.
So back to that advertising or ad tech use case,
there are companies within programmatic advertising
that build audience segments
and audience segments allow advertisers
to more efficiently target and reach their consumers
with ads at the right time.
And so there are audiences out there
that are built off of location data.
These are companies working with cell phone GPS data
coming off SDKs.
and these raw location things are tied to advertising IDs or mates.
And these audience builders need a really good way to enrich these raw location signals
so they understand the behavior of consumers for these audience segments.
And so SafeCrep comes in by providing a very accurate map of the visible world
by way of licensing POIs, and then more often than just giving POIs,
we're also giving polygon data.
We call that our geometry product, which gives really precise tenant level shapes or boundaries of POIs so that companies can do more accurate attribution of location data to places.
And then in return, build more accurate audience products for their advertisers and advertising technology partners.
So that's one example.
Another example might be working with a large multinational retailer who's trying to expand its brick and mortar footprint.
right so it's a classic real estate or retail site selection use case you know you're going
into a new market maybe you're looking around tokyo to open up a new lawson's and you want to
understand if this is a good location for every lawson's location uh and so you basically want
to look at a catchment area around a prospective site and it's really important to have full spatial
awareness of what are all the types of places co-located within that catchment area now those
might be competitor locations. They might be other types of retail generators. It might be transit
stops for schools, but POI data is a core ingredient or input into these site selection
models and workflows for these data mature retailers and QSR chains. And then last but not least,
another common use case might be in the financial services sector. You have private equity funds who
are trying to acquire companies that have a physical world or brick and mortar presence.
And so naturally they use POI data for target identification, doing pre diligence, looking
at sort of PAM analysis or white space analysis.
Again, these are funds that are specifically going after businesses that have that physical
world presence, but the POI data is typically layered on with other types of data sets to
inform some type of investment decision.
Very, very interesting.
And from what I understand, SafeGraph actually has POI data, not only from the United States
where you're based, but also like globally.
So countries all over the world.
So like around how many countries do you cover and how do you get the data from so many different
places?
Yeah.
So before I get into the specifics there, I think it's important to step back and highlight
that SafeGraph is very unique in this space and that we are one of the only true pure play data
companies and what I mean by that is unlike some of our peers we only license a flat file of POI
data we don't provide any software we don't do insights or analytics we are just focused on
being this high quality weighted ingredient so that our customers can rely on facts about the
physical world and build their sort of derivative software analytics and services on top of data.
Now that maniacal focus allows us to really scale the global POI product with really high degrees
of quality. So the core SafeRef product today is called Places. And to answer your earlier question,
we maintain a global database of over 80 million POI across 200 plus countries and territories.
Again, as I said earlier, these places may include things like a Starbucks, a Lawson's,
and McDonald's. It could also include an Amazon warehouse, a Tesla supercharger, a park, a school,
or your local mom-and-pop grocer. All of this data today is provided to our partners as a flat file,
typically through cloud storage on S3, Snowflake, or the equivalent. And then we make sure that we're
updating this data on a monthly cadence so that our customers and partners are getting the freshest
data possible for their downstream analytics and applications. In addition to providing obviously
the location, you know here is a Starbucks on 555 Main Street, SafeRamp has also built a very robust
library of rich metadata or attributes about these locations. So we offer very flexible
licensing configurations where somebody can say okay I know the name, the address, the category,
in the phone number, I also would like a building footprint or the square footage or the hours of
operation or I'm looking at restaurants and I want to understand the types of amenities or even the
cuisine that is offered at a location. And so it's not just about getting the rows correct,
but it's also about bringing in really rich, accurate metadata about these locations
to inform various use cases for the customers.
yeah that's that's really interesting and a huge huge number of pois there you said over
8 million 80 million 80 million well yeah that's kind of um so like what what continents if you
sort of break it down by continents is is that covering yeah so if you if anyone who's listening
if you go to our uh our doc site at docs.safegraph.com we provide the most transparent technical
documentation around the Safe Graph Places product. And on there, you could see a stack ranked list
of all the geos that we support and the various coverages across these geographies. But we,
you know, sort of look at it in terms of tiers. So we're a US-based company. We have very strong
footprints in, you know, North America, Western Europe, parts of APAC, such as Australia and Japan,
for example, and that in LATAM, you know, we have a pretty big database in Brazil.
We may not have as many rows of data as a headline number compared to some of our peers,
but that is actually by design because more data doesn't necessarily mean always good data, right?
What we do is we're a little bit more systematic of how we go about building data
because we're constantly thinking about recall versus precision.
And what we want to ensure is that we're not introducing false positives, duplicates, or permanently closed locations that ultimately would end up in our customers' pipelines and products and create sort of data cleanup challenges for the customer, right?
So that's how we design the product.
But we also do strategic data sourcing.
So somebody can approach us and tell us, hey, I really want to do a little bit deeper R&D in this specific market.
And we have very scalable infrastructure that I'll highlight a little bit later in this podcast that allows us to go and build up those countries at scale for specific use cases.
I think the key takeaway for the audience to remember is that as a data company, our value proposition to our partners is we just want to curate, validate, and transform places data.
so that our customers can really save time and money and focus on the derivative,
which is building out their analytics, their models, and their applications for their customers.
So the analogy I like to say to people is I'm not building a cake,
but I'm providing really high-quality flour and sugar so that people can go build their own cake.
Right, right. Yeah, I think that's a great analogy for the type of data and the data set itself
that your company provides.
So I think you touched a little bit, but maybe if you can sort of give an overview.
So how do you keep the POI data fresh, right?
There's new stores opening, new stores where some stores aren't doing well.
They shut down, they close, you know, restaurants, especially coffee shops.
There's very fast turnover rate.
So how do you update all these POIs?
Yeah, it's a really hard problem, right?
Which is why we decided to only focus on this problem and just be a pure play data provider,
because it requires that level of attention so as you correctly called out
places are constantly changing to be clear I don't think anybody out there
who's in this space is going to have a data set that is a hundred percent
perfect because data is a moving target right stores are constantly opening
stores are closing stores are rebranding open hours are changing right cuisine
types on the menu might be changing. So to sum it up, it is a hard problem. Now, we have a phenomenal
engineering and product team here at SafeGraph that I've been working with for many years.
And we have a very robust data sourcing and pipeline process. And so it starts with sourcing
or gathering data. So we go out and we crawl the web. We go out and we crawl every single week,
thousands and thousands of public websites where we bring in different pieces of metadata about a
place right and really we do it from thousands of sources because we don't want to be reliant on one
single source of data instead we want to bring in as much data as possible and cross-requence
different components of data from different sources to basically understand is this place
open or is it a real place right um once we bring in that data from thousands of sources
we then run some preliminary sort of etl on that data right so we're cleaning we're parsing the
data from input sources uh we're enriching some of the input data right we have these like
data classification models where we go in and we tag this is a pharmacy or this is a gas station
and then we push these inputs downstream into our data pipeline for further sort of enrichment and
cleaning right but really what's happening is there's all the sourcing and data coming in
and then there's just a lot of data conflation that happens to determine whether or not a place
has actually existed in the physical world and you know it's really important that we do these things
because for many enterprises out there bad poi data specifically can cost 10 to 100 times more
than high quality data costs a license right imagine for example you use poi data in a
navigation system like an av system right and that data is incorrect and you are routing people to
the wrong location right well that person could potentially be put into an unsafe condition or
they can churn from your app and no longer become a customer because of a poor customer experience.
And so I could go on and on around, you know, the implications of bad data, but this is really
why we spend so much time on keeping this data fresh and today. Yeah, definitely. It's a huge
challenge considering the scale and the number of POIs, but also like the, you know, the sort of
ad hoc frequency that stores may open or close, depending on how their business goes. So like,
You touched a little bit on it, but do you want to add anything on SafeGraph's POI verification model or process or any kind of metrics that you use when validating?
Yeah, so it is like a multi-layer process, to be clear.
So I touched on, you know, the sourcing process where we do a lot of web crawling.
There's also anomaly detection, entity resolution, classification, deduplication, as well as AI and human QA efforts.
But essentially, our verification model blends machine learning, rules-based QA, and human oversight to eliminate false positives and maximize accuracy.
So, for example, our customers, if you take one of our columns, we have category information about a POI.
So, we'll tell you this is the NAICS code of a location, right?
And then here's the top level and the subcategory of the location.
It's really important for a lot of our customers that we don't mislabel places, right?
An example would be like an advertising technology.
Companies use our data as like a blacklist
or suppression list to make sure
they're not geo-targeting sensitive locations, right?
So the stakes are really high.
And so to do best in class industry categorization on POIs,
we do multilingual LLN based models, right?
And then we do this AI agent post-processing to fix errors
when we do make a mistake before it goes to the customer, right?
That's just one example of verification specific to categories.
The other thing too is conflation, right?
As I said earlier, not relying on a single source,
but taking in thousands of sources
and then using an ML model for identifying groups of input POI
that correspond to the same place
and then merging those different sources together, right?
We also have built out address parsing capabilities.
So we have a suite of NLP models for parsing addresses
into constituent components, right?
Or we're building out this data.
And then last but not least,
we have something internally called a reality model,
which is an upstream ML model for scoring raw POI data
and a post-conflation model for identifying any fake POIs
that have sort of made its way through the pipeline.
And then it allows us to kind of go in and course correct
before we give that data out to our partners, right?
And so really the takeaway here is, you know, we don't just crawl data.
Like, you know, we're not a crawl company.
We are a true data product, right?
So there's so much work that goes into the product every single month before it's ready
to go and it's delivered to our partners on that monthly cadence.
Yeah, yeah.
I think that's a great sort of high-level overview in terms of what is involved with
handling this data, checking it, verifying it.
There's a lot that goes into it.
Exactly. And I'll say this, like, we've always been very sophisticated. You kind of have to be when you're just focused on this data problem. But I'll say that the tailwinds of AI have given us even greater leverage in technologies to continue pushing the pace of innovation here at SafeGraph, right, and within this specific industry.
And so we've definitely been dabbling. You know, we've always been using machine learning and neural networks and large language models, as I mentioned earlier.
But we definitely have made an effort internally to build out a genetic workflow.
So we've created a couple of AI agents ourselves that sit in different parts of our pipeline from crawl to classification to verification.
And so, again, we feel like we're at the forefront here of this space in terms of delivering the best product possible for our partners.
And again, it all comes back to data veracity, and we take that very seriously.
Yeah, yeah, I think that's a great way to describe it.
I think one of the kind of standouts or things that the SafeGraph has that other of your peers don't have.
you touched on a little bit at the beginning, I think, but I wanted to sort of highlight it,