Location Insights

Welcome to the Location Insights Podcast
This episode features an interview with Jason Richman, CEO of Safegraph, discussing the fascinating world of Location Data and POIs (Points of Interest).

Safegraph website: https://www.safegraph.com
Safegraph POI Data Documentation: https://docs.safegraph.com

Topics covered include:
  • Jason's background and experience leading up to the location data industry.
  • Anatomy of POI Data:
    • Simple definition of POI Data and its common business uses.
    • How Safegraph collects POI data globally.
    • Strategies for keeping POI data FRESH (Up-to-date).
    • Safegraph’s POI Verification process.
  • Current changes and trends observed in the POI data industry.
Have questions or like to be on the show drop us a mail: global@unerry.co.jp 

Creators and Guests

Host
Kevin
Global Marketing Manager at unerry
Guest
Jason
Jason Richman is the CEO of Safegraph

What is Location Insights?

Welcome to the Location Insights Podcast
Where we bring you information, news, and insights from Partners, Clients, and Data Scientists about: Location marketing, Retail Media, POI, Data Analysis, and other interesting topics.

Hi, and welcome to Location Insights Podcast. I'm your host, Kevin, the Global Marketing Manager at unerry Inc.

unerry is a location analytics platform and data insights company based in Tokyo, Japan.

And with me today is Jason. He's the CEO of SafeGraph. Hi, Jason. Great to have you on the show. How are you doing today?

I'm doing great. Thanks so much for having me. I just want to first say it's an honor to join this podcast and speak with a Japanese audience.

Japan has an incredibly advanced spatial ecosystem, and I personally have always admired the precision and innovation that Japanese companies bring to this space.

I'm really grateful for the opportunity to share what our perspective is today.

Excellent. That sounds like a great way to start off.

So a quick step back, maybe introduce yourself and your background.

So you're based in the U.S., right?

Yes.

And SafeGraph is based out of the U.S.

So how did you come into the POI market and maybe a little bit about?

Yeah.

So I'll take a quick step back.

So my name's Jason Richmond.

I'm a CEO of SafeGraph.

For those who don't know, SafeGraph is a pure play data company that builds high precision

data about physical places globally.

And we basically use our data to power analytics and applications for some of the world's top

innovators.

I'm based here in Miami, Florida.

I was previously in New York, but the SafeGraph team is a remote first startup.

And the way that I kind of found myself in this business, it wasn't a linear path.

So I'll kind of start from the very beginning.

I basically spent a better part of my career at the intersection of data and technology,

helping startups commercialize complex B2B products and scale them into real businesses.

I've been at SafeGraph now for eight years.

But prior to SafeGraph, I worked at a company called Metamarkets,

which was a SaaS analytics startup that catered to the advertising technology ecosystem.

So for those who don't know, in advertising technology, it's very similar to how high-frequency

trading works in stock markets.

Ads are getting bought and sold in near real time using algorithms.

And so there's a huge amount of data flowing through the advertising ecosystem.

And Metamarkets was a tool that allowed these different technology platforms to visualize,

process, and store vast amounts of programmatic advertising data.

And so I was leading sales at Metamarkets up until the exit to Snap Inc, which owned Snapchat back in 2017.

And then through that process, I ended up meeting Auren Hoffman, who was the original founder of SafeGraph.

Now, Auren was a serial entrepreneur who also founded a company called LiveRamp.

And so I got really excited when he reached out to me as an investor of my prior business and basically recruited me to come join SafeGraph as one of the first commercial hires.

And so I started out as one of the first sales hires, eventually worked my way up into revenue leadership.

And then last year, I stepped into the role of a CEO.

And so I've been in the CEO seat for the last 14 months or so.

But across everything I've done, the through line is pretty simple.

I love building companies and I enjoy the crazy, chaotic, fast-paced startups.

Awesome. Awesome.

That's a great way, I think, to start off with the introduction of yourself.

Now, for SafeGraph, you're all about POIs, right?

So maybe just for listeners, if you're not familiar, in simple terms, what is your definition of what POI data is?

Yeah, that's a great question.

So POI data describes a specific location that someone like you or I may visit.

It may tell you what the place is, where it is, and any key metadata about the location.

So an example could be your local restaurant, gas station, a Starbucks, a mall, a strip mall, even a doctor's office or an EV charging location.

It's basically anything where people interact with the physical environment.

You know, POI data is the backbone of almost every single location-based application,

from navigation to retail analytics platforms and fintech data enrichment APIs.

And so POIs are really, really exciting a space to be in.

You know, what originally pulled me into the location data industry, to answer your earlier question, it wasn't some long-term plan, right?

I didn't go to school.

For GIS, I actually went to undergrad business school and studied finance.

But for me, it was really about just intellectual curiosity.

When I started reading up on the location data industry, I was really fascinated by all the use cases that require accurate POI or location data, things such as retail, CPG, financial services, and mapping.

And then I had previously worked in software sales.

And so when I heard about a data as a service or data only business, I got really excited.

And then fast forward, I started talking to actual customers in the space.

And through that conversation, I really appreciated the opportunity in front of me when I realized that there was a real problem around, you know, building accurate data at scale.

And I recognized that SafeGraph was in a unique position to become a core player in this ecosystem.

Awesome.

I think that's a great introduction for listeners that are new to POI and want to have a better grasp in terms of what it is.

So what are some of the typical use cases that a company would use POI data for?

Yeah, so when you're a data business, naturally you can be horizontal because you're a data ingredient or utility.

Now, we have definitely decided to focus our ICP or ideal customer profile here at SafeGraph,

but I will run through a couple of typical examples

that we see for commercial allocations.

So back to that advertising or ad tech use case,

there are companies within programmatic advertising

that build audience segments

and audience segments allow advertisers

to more efficiently target and reach their consumers

with ads at the right time.

And so there are audiences out there

that are built off of location data.

These are companies working with cell phone GPS data

coming off SDKs.

and these raw location things are tied to advertising IDs or mates.

And these audience builders need a really good way to enrich these raw location signals

so they understand the behavior of consumers for these audience segments.

And so SafeCrep comes in by providing a very accurate map of the visible world

by way of licensing POIs, and then more often than just giving POIs,

we're also giving polygon data.

We call that our geometry product, which gives really precise tenant level shapes or boundaries of POIs so that companies can do more accurate attribution of location data to places.

And then in return, build more accurate audience products for their advertisers and advertising technology partners.

So that's one example.

Another example might be working with a large multinational retailer who's trying to expand its brick and mortar footprint.

right so it's a classic real estate or retail site selection use case you know you're going

into a new market maybe you're looking around tokyo to open up a new lawson's and you want to

understand if this is a good location for every lawson's location uh and so you basically want

to look at a catchment area around a prospective site and it's really important to have full spatial

awareness of what are all the types of places co-located within that catchment area now those

might be competitor locations. They might be other types of retail generators. It might be transit

stops for schools, but POI data is a core ingredient or input into these site selection

models and workflows for these data mature retailers and QSR chains. And then last but not least,

another common use case might be in the financial services sector. You have private equity funds who

are trying to acquire companies that have a physical world or brick and mortar presence.

And so naturally they use POI data for target identification, doing pre diligence, looking

at sort of PAM analysis or white space analysis.

Again, these are funds that are specifically going after businesses that have that physical

world presence, but the POI data is typically layered on with other types of data sets to

inform some type of investment decision.

Very, very interesting.

And from what I understand, SafeGraph actually has POI data, not only from the United States

where you're based, but also like globally.

So countries all over the world.

So like around how many countries do you cover and how do you get the data from so many different

places?

Yeah.

So before I get into the specifics there, I think it's important to step back and highlight

that SafeGraph is very unique in this space and that we are one of the only true pure play data

companies and what I mean by that is unlike some of our peers we only license a flat file of POI

data we don't provide any software we don't do insights or analytics we are just focused on

being this high quality weighted ingredient so that our customers can rely on facts about the

physical world and build their sort of derivative software analytics and services on top of data.

Now that maniacal focus allows us to really scale the global POI product with really high degrees

of quality. So the core SafeRef product today is called Places. And to answer your earlier question,

we maintain a global database of over 80 million POI across 200 plus countries and territories.

Again, as I said earlier, these places may include things like a Starbucks, a Lawson's,

and McDonald's. It could also include an Amazon warehouse, a Tesla supercharger, a park, a school,

or your local mom-and-pop grocer. All of this data today is provided to our partners as a flat file,

typically through cloud storage on S3, Snowflake, or the equivalent. And then we make sure that we're

updating this data on a monthly cadence so that our customers and partners are getting the freshest

data possible for their downstream analytics and applications. In addition to providing obviously

the location, you know here is a Starbucks on 555 Main Street, SafeRamp has also built a very robust

library of rich metadata or attributes about these locations. So we offer very flexible

licensing configurations where somebody can say okay I know the name, the address, the category,

in the phone number, I also would like a building footprint or the square footage or the hours of

operation or I'm looking at restaurants and I want to understand the types of amenities or even the

cuisine that is offered at a location. And so it's not just about getting the rows correct,

but it's also about bringing in really rich, accurate metadata about these locations

to inform various use cases for the customers.

yeah that's that's really interesting and a huge huge number of pois there you said over

8 million 80 million 80 million well yeah that's kind of um so like what what continents if you

sort of break it down by continents is is that covering yeah so if you if anyone who's listening

if you go to our uh our doc site at docs.safegraph.com we provide the most transparent technical

documentation around the Safe Graph Places product. And on there, you could see a stack ranked list

of all the geos that we support and the various coverages across these geographies. But we,

you know, sort of look at it in terms of tiers. So we're a US-based company. We have very strong

footprints in, you know, North America, Western Europe, parts of APAC, such as Australia and Japan,

for example, and that in LATAM, you know, we have a pretty big database in Brazil.

We may not have as many rows of data as a headline number compared to some of our peers,

but that is actually by design because more data doesn't necessarily mean always good data, right?

What we do is we're a little bit more systematic of how we go about building data

because we're constantly thinking about recall versus precision.

And what we want to ensure is that we're not introducing false positives, duplicates, or permanently closed locations that ultimately would end up in our customers' pipelines and products and create sort of data cleanup challenges for the customer, right?

So that's how we design the product.

But we also do strategic data sourcing.

So somebody can approach us and tell us, hey, I really want to do a little bit deeper R&D in this specific market.

And we have very scalable infrastructure that I'll highlight a little bit later in this podcast that allows us to go and build up those countries at scale for specific use cases.

I think the key takeaway for the audience to remember is that as a data company, our value proposition to our partners is we just want to curate, validate, and transform places data.

so that our customers can really save time and money and focus on the derivative,

which is building out their analytics, their models, and their applications for their customers.

So the analogy I like to say to people is I'm not building a cake,

but I'm providing really high-quality flour and sugar so that people can go build their own cake.

Right, right. Yeah, I think that's a great analogy for the type of data and the data set itself

that your company provides.

So I think you touched a little bit, but maybe if you can sort of give an overview.

So how do you keep the POI data fresh, right?

There's new stores opening, new stores where some stores aren't doing well.

They shut down, they close, you know, restaurants, especially coffee shops.

There's very fast turnover rate.

So how do you update all these POIs?

Yeah, it's a really hard problem, right?

Which is why we decided to only focus on this problem and just be a pure play data provider,

because it requires that level of attention so as you correctly called out

places are constantly changing to be clear I don't think anybody out there

who's in this space is going to have a data set that is a hundred percent

perfect because data is a moving target right stores are constantly opening

stores are closing stores are rebranding open hours are changing right cuisine

types on the menu might be changing. So to sum it up, it is a hard problem. Now, we have a phenomenal

engineering and product team here at SafeGraph that I've been working with for many years.

And we have a very robust data sourcing and pipeline process. And so it starts with sourcing

or gathering data. So we go out and we crawl the web. We go out and we crawl every single week,

thousands and thousands of public websites where we bring in different pieces of metadata about a

place right and really we do it from thousands of sources because we don't want to be reliant on one

single source of data instead we want to bring in as much data as possible and cross-requence

different components of data from different sources to basically understand is this place

open or is it a real place right um once we bring in that data from thousands of sources

we then run some preliminary sort of etl on that data right so we're cleaning we're parsing the

data from input sources uh we're enriching some of the input data right we have these like

data classification models where we go in and we tag this is a pharmacy or this is a gas station

and then we push these inputs downstream into our data pipeline for further sort of enrichment and

cleaning right but really what's happening is there's all the sourcing and data coming in

and then there's just a lot of data conflation that happens to determine whether or not a place

has actually existed in the physical world and you know it's really important that we do these things

because for many enterprises out there bad poi data specifically can cost 10 to 100 times more

than high quality data costs a license right imagine for example you use poi data in a

navigation system like an av system right and that data is incorrect and you are routing people to

the wrong location right well that person could potentially be put into an unsafe condition or

they can churn from your app and no longer become a customer because of a poor customer experience.

And so I could go on and on around, you know, the implications of bad data, but this is really

why we spend so much time on keeping this data fresh and today. Yeah, definitely. It's a huge

challenge considering the scale and the number of POIs, but also like the, you know, the sort of

ad hoc frequency that stores may open or close, depending on how their business goes. So like,

You touched a little bit on it, but do you want to add anything on SafeGraph's POI verification model or process or any kind of metrics that you use when validating?

Yeah, so it is like a multi-layer process, to be clear.

So I touched on, you know, the sourcing process where we do a lot of web crawling.

There's also anomaly detection, entity resolution, classification, deduplication, as well as AI and human QA efforts.

But essentially, our verification model blends machine learning, rules-based QA, and human oversight to eliminate false positives and maximize accuracy.

So, for example, our customers, if you take one of our columns, we have category information about a POI.

So, we'll tell you this is the NAICS code of a location, right?

And then here's the top level and the subcategory of the location.

It's really important for a lot of our customers that we don't mislabel places, right?

An example would be like an advertising technology.

Companies use our data as like a blacklist

or suppression list to make sure

they're not geo-targeting sensitive locations, right?

So the stakes are really high.

And so to do best in class industry categorization on POIs,

we do multilingual LLN based models, right?

And then we do this AI agent post-processing to fix errors

when we do make a mistake before it goes to the customer, right?

That's just one example of verification specific to categories.

The other thing too is conflation, right?

As I said earlier, not relying on a single source,

but taking in thousands of sources

and then using an ML model for identifying groups of input POI

that correspond to the same place

and then merging those different sources together, right?

We also have built out address parsing capabilities.

So we have a suite of NLP models for parsing addresses

into constituent components, right?

Or we're building out this data.

And then last but not least,

we have something internally called a reality model,

which is an upstream ML model for scoring raw POI data

and a post-conflation model for identifying any fake POIs

that have sort of made its way through the pipeline.

And then it allows us to kind of go in and course correct

before we give that data out to our partners, right?

And so really the takeaway here is, you know, we don't just crawl data.

Like, you know, we're not a crawl company.

We are a true data product, right?

So there's so much work that goes into the product every single month before it's ready

to go and it's delivered to our partners on that monthly cadence.

Yeah, yeah.

I think that's a great sort of high-level overview in terms of what is involved with

handling this data, checking it, verifying it.

There's a lot that goes into it.

Exactly. And I'll say this, like, we've always been very sophisticated. You kind of have to be when you're just focused on this data problem. But I'll say that the tailwinds of AI have given us even greater leverage in technologies to continue pushing the pace of innovation here at SafeGraph, right, and within this specific industry.

And so we've definitely been dabbling. You know, we've always been using machine learning and neural networks and large language models, as I mentioned earlier.

But we definitely have made an effort internally to build out a genetic workflow.

So we've created a couple of AI agents ourselves that sit in different parts of our pipeline from crawl to classification to verification.

And so, again, we feel like we're at the forefront here of this space in terms of delivering the best product possible for our partners.

And again, it all comes back to data veracity, and we take that very seriously.

Yeah, yeah, I think that's a great way to describe it.

I think one of the kind of standouts or things that the SafeGraph has that other of your peers don't have.

you touched on a little bit at the beginning, I think, but I wanted to sort of highlight it,