Data Dialogues

Fraudsters across the globe were particularly active in 2020 as the U.S. government distributed more than $1 trillion dollars in economic stimulus for struggling families and businesses as a result of the COVID-19 pandemic. While this opportunity for fraud might have been new, the underlying challenges we’re facing, like synthetic identities, are the same. So how can businesses use data and analytics to drive insights and mitigate these growing threats? The answer: smart data.

Aparna Sheth, product leader for Equifax’s Identity and Fraud Solutions Group, and Cori Shen, leader of Equifax’s Identity and Fraud Data Science Team, delve into how data scientists can intelligently and efficiently assemble multi-source data for the right insights.

Show Notes

In this episode of Data Dialogues, we explain how smart data can help organizations combat the growing digital threat of identity fraud. Aparna Sheth, product leader for Equifax’s Identity and Fraud Solutions Group, interviews Cori Shen, who leads a data science team responsible for data and machine learning and AI-driven product innovations to solve identity and fraud challenges. 

Jump ahead to these highlights:

0:40 - Cori’s role and team responsibilities

0:54 - Consumers shift from digital-first to digital-only business environment

1:37 - Fraud has multiplied

2:18 - New fraud opportunities emerge during unprecedented economic conditions

3:36 - How to use data and analytics to solve fraud

4:41 - How smart data works

8:40 - Role of digital signals and bureau data

10:00 - Explaining graph networks

10:58 - How to make the insights actionable and examples

14:38 - Our smart data approach


Podcast Transcription

Aparna:
Welcome to Data Dialogues.  Today, we are discussing how smart data can help organizations fight the evolving challenges of identity fraud. My name is Aparna Sheth. I'm a product leader here at Equifax in our identity and fraud solutions group. And I'm so happy to have Cori Shen here with me, who leads our data science team. Hi, Cori, would you like to share more about what you do?

Cori:
Sure. Thanks, Aparna. Happy to be here too. And I'm glad that we can discuss this topic together. I'm Cori Shen. I lead our identity and fraud data science team for Equifax.

Aparna:
Alright. So speaking of identity and fraud, 2020 has been quite a year. COVID accelerated digital transformation across the board. We saw a stark paradigm shift take place last year, where we went from a “digital first” to “digital only” business environment. And this was of course brought on by abrupt shelter in place orders.

Cori:
That's right, Aparna. I totally agree with you. You know, consumers were forced to do everything online from buying groceries to ordering food. And of course they're doing all their financial transactions online. You know, last year 80% of my groceries were done through a mobile app.

Aparna:
Oh wow.  Yeah, I know. And we saw during this pandemic that not only did the new fraud schemes emerge, but we also saw the existing types of fraud have multiplied. Right?

Cori:
That's absolutely true. You probably saw this report coming from the Federal Trade Commission, right?  The report shows they have received about, I think 275,000 fraud complaints last year. And also when we track the fraud trends in our own data, we see that the authorized user abuse risk in 2020 went up by over 23% compared to 2019 and 2018.

Aparna:
Wow. The other factor, of course, was the unprecedented unemployment rates and economic downturn. And to combat that, as we all know, Congress passed trillion plus dollars of stimulus relief packages to help struggling families and boost the economy. We saw new fraud schemes in March exploiting PPP, which is the Payroll Protection Program, as well as the expanded unemployment insurance program.So as millions of Americans were applying for help, we had these international and national criminal rings that were working relentlessly to steal these funds, using sophisticated methods of identity theft.

Cori:
That's right, Aparna. You know, with all the relief money that went to the market in 2020, I think it really made fraudsters go all out on it. As a matter of fact, these fraud schemes might be new, but the underlying fraud challenges are the same ones like synthetic ID, the compromised ID, which has been around for years. And I think that's why now more than ever, we need something better in identity and fraud prevention.

Aparna:
I couldn't agree more. So let's talk about how we can use data and analytics to solve this, right? There is just so much data out there. Not just related to our credit file, but also every digital interaction that we make as individuals. Be it social media or when we shop online. So how do we sort through these billions of interactions and use analytics to really drive those insights that can be used to mitigate against these growing challenges?

Cori:
This is a great question.  Because if we look at today's digital paradigm, managing big data from multiple sources is no longer a challenge. What matters most is how to make sense of big data and how to intelligently and efficiently assemble multi-source data for the right insights. And we will call it smart data because we want data to talk, and we want data to be able to offer recommendations.

Aparna:
I love it. Smart data. I mean, it sounds fantastic, right? But it's easier said than done, isn't it? Let's take synthetic identities for example.  We know that many of these have been in the system for a while and they look like legitimate people. Very often their identity information is complete, and it matches to what systems have. As a matter of fact, sometimes they even have a matched social media profile. That's why these fake identities look like real people and can be used to create fake businesses, defraud the system with millions of dollars of PPP or employment claims. Right? So even if we do identity verification matches from multiple sources, we may not be able to catch them. So what should we do?

Cori:
Ah, what should we do? This is exactly the right question. I totally agree with you. If we're just talking about matching identities from multiple sources, it is not smart data. Smart data has two components: insights and connections. We think a real effective way to build smart data is to connect to the useful insights from a graph network perspective. Let me take synthetic ID detection for example. Here is how you can build.  First, build useful insights from multiple sources. You want to search for the abnormal signals throughout an identity's lifecycle. To do so you will need the consumer activity data from multiple sources and from multiple systems. For example, the consumer applies for credit cards or loans. The consumer checks their credit online. They enroll. We're logging into an online system. They're making payments. They're making purchases from e-commerce sites. All these different data points are consumer activity data.

We all know that we cannot listen to what fraudsters say. But we need to watch what they do. Because fraudsters will give you a fake ID and tell you, Hey, everything's good. Everything matched. And I want to borrow $50,000. But when you get the power of the consumer activity data, what you can do is that you can look closely into their activities. And then, you will find out a lot of secrets about them. And here are some examples. All the synthetic ID outliers appear at an early stage. You will see some synthetic IDs apply for mortgages and shop for luxury cars. However, when you look at the activity pattern for a regular legit consumer at the earlier stage, you will often see they only apply for cell phone, apartments, internet service, credit cards.  These types of starter programs. Another example, sometimes synthetic ID can be a very patient game. This means that, you know, fraudsters can wait for a couple years to build their credit history before they take action. However, the interesting thing about their activity is that once they start taking actions, they do it super fast to an extreme extent. So this means that when you explore the trended activities, you see these ideas can be dormant for a while. And then all of sudden you see a huge spike in their activities. These huge spikes are usually something like they are desperately shopping around for money all over the place. For example, they try to get as many credit cards or loans as possible from lots of different institutions. Also, they will act extremely anxiously in monitoring their credit. It's like they're doing this every day while they're shopping for money.

Aparna:
This is very interesting. Thanks for sharing these secrets on consumer behavior anomalies, and how they can be used in synthetic identity detection. So what about the digital signals and bureau data? I would think they are also very useful in identifying synthetic identities, right?

Cori:
Digital signals are definitely powerful and critical. Here's another example about synthetic ID to establish and maintain synthetic ID. The fraudsters like to manipulate identities via online channels. They like to change addresses and alter names online or from their mobile phones. At that time, you may see there could be the same device links to many different IDs for name and address change request. You may also see that the IP geo location is far away from the existing addresses they're using and the new addresses they requested. Speaking of bureau data, it is also really helpful when you use them to explore the risk of signals like piggybacking credit using authorized user abuse schemes.

Aparna:
Ah, I see. So it's, it's really neat to see how we can derive so many different insights to look for anomalies and then use those for synthetic identity detection. So Cori, you earlier, you mentioned that one recommended way to assemble for smart data is to connect these insights in a graph network. I'm aware of link analysis, which is a very effective tool used in fraud review. Can you tell me a little bit more about graph networks? Is it the same tool you use in your lab?

Cori:
It's a little bit different from visualization. So what I'm saying is that what we do in our innovation lab is not to run one or two graphs. In order to find the true meaning of the connections, we need to build graph networks on very large scale data, like billions of transactions.

Aparna:
Wow. Building graphs on billions of records. I mean, it's impressive, but there is a lot of information. So how could you make sense of these connections so that the outcome from it can be actionable? You know, versus something that's too abstract and which cannot be easily explained?

Cori:
This is a really good point because it is very important for our smart data to not only be predictive, but also prescriptive. So because of that, let me explain to you how you can make sense of the connections when we are processing billions of transactions. And then you can come up with the actionable recommendations through our work. So basically this is a machine learning capability. You can build with a graph database on a scalable and distributed system such as Google cloud. So this is what you're gonna do. So first you can link all the identities from billions of transactions based on address, phone number, email and device. So what I mean by linking is for example, a group of family members can be linked and connected to each other, as they might be living in the same place, using the same address. However, two strangers probably cannot be connected directly because there's no reason they will live together or they will use the same email accounts. So by doing this linking, you are connecting the identities. So now you're going to have millions of groups, right? Some groups connect more people and some groups connect less people. So next, you can then assign these synthetic ID insights… the ones we just talked about earlier, remember? The authorized user abuse risk, consumer activity pattern outliers, or the high risk digital signals. You can assign them to the identities in each group. So this way, by connecting the identities, now you're indeed connecting the insights.

Aparna:
Let me see if I've got this. So the first thing this tool does is link people and tie them together in a group based on some PII, right? And then you layer in the synthetic identity insights that we discussed earlier to detect anomalies, which will possibly indicate synthetic identities? Is that correct?

Cori:
That is absolutely right. So to put into a more concrete perspective. For example, when you have this connection, and you may find a group with a hundred people in it. And in this group, you can see some strangers are connected to each other, and you can also see some people in this group showing one or multiple synthetic ID risks. Those synthetic ID insights. And now what does that mean to you is that this group is indeed a synthetic ID crime ring.

Aparna:
Now I can see the whole picture of how we build smart data for this use case. So you derive insights, then you connect those insights to discover a synthetic identity fraud ring. And once we do that, we can take actions, right? We can conduct fraud reviews, we can do step up authentication against these synthetic IDs and stop them from stealing money, right?

Cori:
That's absolutely right. So this is our smart data approach. We assembled data insights differently for a very effective fraud detection. And moreover the outcome from the connected insights like you just mentioned is indeed actionable.

Aparna:
Thanks Cori. This really helps me, and I'm sure our listeners as well understand how our smart data strategies can be used to mitigate identity and fraud threats. So to summarize, the amount of digital transactions and data is growing exponentially since the pandemic began last year. So it is really critical we can use this data and assemble it in a way to make the data talk as in smart data to derive actionable insights. And we have seen organizations who have recognized this and have been data-driven and proactive to be very successful in combating identity and fraud challenges for this post pandemic era. Thank you so much, Cori, for discussing this topic with me today. It was a lot of fun.

What is Data Dialogues?

A podcast where innovative business leaders discuss data: how to think about, how to use it and how it can help us all make better business decisions every day. As they tell their stories of trials and triumphs, you’ll gain key insights to leverage in your own day-to-day operations.