Your data is being tracked. But, where is it going, what are they doing with it, and how are they getting it? Liberty and Scott are investigating just how dangerous the data economy really is.
We face many overwhelming challenges in America today: systemic racism, data privacy, and political misinformation. These are big problems, and there are a lot of opinions and ideas on how to fix them. Scholars and industry experts often disagree on how to find solutions. So, how can we find the right way to move forward? We let the data speak for itself. Join hosts Liberty Vittert and Scott Tranter as they gather data and get the facts about today’s most pressing problems to find out: are solutions even possible? They’ll investigate with MIT professors dedicated to researching these issues, and talk with the people on the ground encountering these problems every day so that we can find the best solutions that triumph over these challenges and solve America’s biggest problems.
Data Nation is a production of MIT's Institute for Data, Systems, and Society, with Voxtopica.
Speaker 1 (00:04):
Welcome to Data Nation, and I'm Munther Dahleh, the Director of the MIT's Institute of Data Systems and Society. Today on Data Nation, Liberty and Scott are looking into the truth behind data privacy.
Speaker 2 (00:23):
Your rights are being violated, and most people don't even know it. Your data is used by governments, corporations, charities, and many others to manipulate, convince, sway, and sell you in every way imaginable. The newest, biggest, baddest guy on the block, the most influential industry today is data. So it begs the question of what kind of toll does that take on us as a society?
Speaker 3 (00:54):
Data in itself isn't bad, but sometimes even the best intentions cause poor outcomes. And in some cases people can intentionally use data to cause damage. So racial discrimination takes place not just in the real world, but online too. Facebook's advertising tools allowed advertisers to discriminate based on race and excluded users from seeing their ads. Looking back at October 2016, we saw a group of ProPublica Journalists wanted to expose this, so they bought a housing related ad and intentionally excluded certain users from seeing this ad, including Hispanics, Asian Americans, and African Americans. It was as simple as clicking a box. So the journalist, all they had to do was choose which races they wanted to see the ad, showing a preference for certain groups and excluding others, very clear violation of the Fair Housing Act and Facebook pledged to fix it.
Speaker 2 (01:41):
The data economy is a vast world of people buying and selling data. There's tons of moving parts and a lot of things that are completely unknown about it, sort of in this dark, shady world. And inevitably, and that means a lot of people are really worried about it.
Speaker 3 (01:59):
And so I think the real question is, should we be worried about it? What exactly are the problems when it comes to how data is being used? If there's anyone who knows, it's Kevin Roose. Kevin is an award-winning technology columnist for the New York Times. Before joining the Times he was a writer at New York Magazine and a host and executive producer of Real Future, a documentary TV series about technology and innovation. So when it comes to data, our data is used differently by corporations, governments, and individuals. But how is it actually being used and is it a net positive when it's being used? Do the positive cases outweigh the negative ones?
Speaker 4 (02:35):
I think it's really important that consumers understand how their data is being used and what is being collected. And I also think it's important that on the regulatory side, we have something like a national privacy bill to prevent the worst exploitation. I think you really have to kind of break it down into constituent parts, right? An algorithm that is designed to show toaster ads to people is, I would argue, substantially different than one that is being used to hijack people's attention, to show them things that are likely to produce outrage and mistrust. The same machine learning techniques are used for both. They're used to keep you glued to your TikTok feed, they're also used to sell you things. And so I think we really need to be specific about what problems we're trying to solve and then tailor our intended solutions to those specific problems.
Speaker 2 (03:30):
Scott and I were just talking about that ProPublica investigation where they exposed how Facebook was ad targeting with housing ads, and those housing ads were targeted at people based upon race and violated the Fair Housing Act. Do you see this in everyday life, in many other cases? Have you seen this kind of very specifically targeted advertising?
Speaker 4 (03:57):
So if you're advertising, say menstrual products, you don't necessarily want to pay to show those to men. You might want to just say, I'm going to target this product at women between the ages of 18 and 35 or something like that. And Facebook will let you do that, and it will let you do that not just for gender or age, but until sort of somewhat recently, would let you do that by of ethnic group, ethnic affinity group. And so Facebook got into some hot water when it turned out that people were in fact advertising jobs, houses, various other kinds of categories for which federal regulation prohibits racial segmenting using their advertising tools.
Speaker 2 (04:49):
I'm pausing this interview to make a quick editor's note for our listeners. We recorded this interview with Kevin before the Supreme Court overturned Roe v. Wade, and this next question that I ask Kevin is about how people's location data can be used against them. And I actually used the example of location data being used to specifically target women seeking abortions. At the time of the recording, this question was an absolute speculative, worst case scenario that I couldn't even really imagine happening. But now it's the unfortunate reality for many women, especially women living in states where after the overturn, trigger laws automatically went into effect and criminalized getting an abortion. So as we continue this conversation on data privacy and how data can be used against you, keep in mind that this is now a very real threat and present current situation. It seems like some of this stuff has been done unknowingly, right?
(05:50):
The algorithm targeting or people have done this without meaning to do anything bad, including some of the bias that comes into algorithms and the data. But what about when things are intended to be bad? So I remember there was a New York Times investigation and it looked at how your location data was not actually private so that they track someone and could figure out who this woman was based upon her location data, even when it was supposed to be completely anonymous. So do you have the potential for something to happen that's truly meant to be bad? For example, trying to use cell phone location data to figure out who goes to abortion clinics and then selling the data of who goes to abortion clinics and then anti-abortion protesters can go stand outside Mary Smith's house because they know she went to an abortion clinic. Do you have that risk with this type of data?
Speaker 4 (06:42):
Absolutely. And so I think that's why it's important that even though the intentions of people in the tech industry who are building these products may be benign, if the outcomes are malign, if the outcomes are bad, I believe that we still need to hold them responsible for that. I mean, just having good intentions doesn't matter if you're building some surveillance dragnet that's going to be used to arrest and prosecute women seeking abortions. And I think once you have some data showing that that is happening or might reasonably happen, then you have a moral imperative to shut that down.
Speaker 3 (07:27):
So the problems with the data economy are definitely there. It seems that we all have to live with the consequences.
Speaker 2 (07:33):
Right. And while most of the time we don't feel burdened by being targeted by an algorithm, there are times that it can become a psychological stressor for some people. There's many women who believe that Instagram have preyed on their maternal instinct. One example is a woman, when her son was born last year, she posted a photo of him every day on Instagram and immediately following that, her explore page was populated with images of babies with severe health conditions, cleft lip and palate, missing limbs. And she believes that Instagram intentionally preyed on her vulnerability as a new parent. Other people share the same feeling that Instagram preys on your insecurities. Nicole Gill, the co-founder of the advocacy group, Accountable Tech believes that Instagram was damaging to her mental health in the postpartum period, constantly showing her posts of how to lose baby weight. I think the real question is how did they actually get this data?
(08:35):
Is it bought? Is it given out by Instagram? Where did it come from in the first place? And people, they claim their phones are listening to them, is that really happening? And the person who knows the ins and outs of the data economy is Dean Eckles. Dean is a professor at MIT, affiliated with both the Sloan School and MIT IDSS and was previously a scientist at Facebook and Nokia. I want to start with what I believe is the first question on people's minds, and that's are our phones, our technology really listening to us to collect data and give targeted ads? If I pick up my significant other's phone and say, puppy, get a puppy, cute puppy, buy a puppy, is he going to magically get some puppy ads? Do our phones really have the capability of doing that?
Speaker 5 (09:27):
I think just given how leaky Facebook has been recently, I think if that was true, we would've heard about it from internal information. And then second, it's just technically difficult. One of the things that's nice about a mobile phone is we carry it around with us. That means that it's battery powered. So if you start using the mic all the time, your phone is going to die pretty quickly. So I think for apps that a lot of people are using, it's not really technically feasible. Now on the other hand, if you've got Google Home or Alexa or something similar that's just plugged into the wall and it's potentially listening, there's really no technical reason that that couldn't have that kind of a process running.
Speaker 2 (10:07):
I think a lot of people want to know how their data is actually being acquired. What data are algorithms taking from the everyday user and how is it being bought and sold?
Speaker 5 (10:21):
Some of the way you frame that reflects how a lot of people think about this is this idea that social media companies are actually sort of taking data. I'd say one of the ways of describing this is actually that everyone is giving them data for free. And by everyone, I don't just mean users, I actually mean other companies. So part of the way that these advertising giants know things about individual people is not that they're sort of trying to buy this data or collect this data in some particular way, it's that actually advertisers are just constantly giving them that data for free. And maybe in a really superficial way, you looked at some shoes on a retailer's website and then those shoes are following you around.
(11:00):
So that's a case where it's really just advertisers are constantly uploading data about people and their behaviors online and offline to these advertising platforms. So a lot of what's happening is that the data is flowing into the platforms for free. Facebook isn't really buying your data or selling your data, everyone is giving it to them for free, in order to be able to potentially run more targeted ads, ads that are targeted based on the data that those advertisers have.
Speaker 3 (11:29):
So I like to point this out a lot. 30 years ago we had the Whitepages where once a quarter a magical book would be dropped on your doorstep and a list of people's names, addresses, and phone numbers. Today, the amount of data the industry has is astounding, and it looks like as a time goes on, more data is available. So it's more than just knowing someone's name and address or there's someone is college educated or not. We can get things like purchase habits, credit scores, and they're accessible to basically the average person who wants to spend a little bit of money to get that data. So is this prevalence of data going to be normal in the next 20 years or will there be a backlash where industry pulls back and anonymizes a lot of this data?
Speaker 5 (12:10):
Yeah, I think that's a great question that puts that kind of in the broader historical context. And of course, just because something's been happening for a while doesn't mean that we shouldn't be worried about some of the privacy implications of that. So I think while there's reason for a scrutiny of these big advertising giants like Google and Facebook and Amazon, a lot of the scrutiny could also be more generally on this space of data brokers. So there's a lot of ways that data is flowing around, often involving the companies that you haven't necessarily heard of their names. This is happening a little bit more behind the scenes.
(12:42):
And so that's actually an area where I'd like to see more of the kind of regulatory scrutiny focused. Maybe it's not as exciting for Congress people to be kind of yelling at executives of these companies that no one has heard of as opposed to yelling at a Google exec, right? But that's where I think a lot of the scrutiny should be from a regulatory perspective is these cases where there's not really that much of a direct relationship between the consumer and who ended up with this data. The data is flowing around in pretty opaque ways.
Speaker 2 (13:14):
With all this data flowing around, a lot of people are worried that their location data isn't anonymous and can easily be tracked, there was a New York Times article about that. So the government is already using people's location data to use as evidence against them in crimes. But do you think that the government could take on this Big Brother, 1984 really scary type of approach in tracking everybody?
Speaker 5 (13:42):
Yeah, I think location data is definitely an example of sensitive data and data where someone can say, oh, this is somehow aggregated, or this is somehow anonymized, or this is going to be linked based on this other sort of identifier. That's hard to link to a person. We know that if data is high dimensional, that if there's a lot of individual bits of data, then everyone is kind of unique.
Speaker 2 (14:04):
So it's like a fingerprint, you'd say.
Speaker 5 (14:06):
A statisticians would say, right, that it's like high dimensional space is a lonely place, right? If there's a lot of characteristics describing an individual, then they're probably not really near any other individuals in that space. And it may be easy to with side data, other data that you have from another source to figure out who that individual is, even if the data is in some way ostensibly anonymous. So I think that is a concern. And one of the really exciting areas of theory moving into practice right now is what are some of the techniques to guarantee that data that is released or transferred or statistics that are reported about data, that those don't reveal too much about individuals, that those are privacy preserving. So I'm thinking of techniques like differential privacy, which is being used by the US Census Bureau as well as Apple and Google and Facebook.
Speaker 3 (15:00):
So we know that data can be easily assessed, given away, and bought and data collection isn't new. But like Dean said, just because something can be done doesn't mean it should be done. So how much of our personal data is actually private? What is the government allowed to access? I mean, I'm sure there are lots of data floating around that could be used in serious crimes, but what's the balance here?
Speaker 2 (15:20):
That's a good question because Dean mentioned that it's likely our phones aren't listening to us. Doesn't mean they can't, but they likely aren't. But the Amazon Alexa listens and it records. But I guess the question is that always a bad thing? In the case of the Victor Collins murder trial, Alexa really could be an expert witness. If you remember, the body of Victor Collins was found floating in a hot tub at his friend James Andrew Bate's home, and Bates was later charged with his murder after an Echo device was found on the Bates property. Prosecutors in the murder trial requested that Amazon provide all the recordings and the data from the device, but Amazon refused to comply. And there's a similar story taking place in the case of Silvia Gava, Florida woman who was murdered. Adam Crespo, her boyfriend was charged with the murder and Crespo's attorney believes that recordings from their home Amazon Echo could have witnessed the crime and provide essential evidence to exonerate his client.
(16:26):
So it begs the question can and more importantly, should data from these recording devices be used for good? How do you balance the privacy of the individual with the power of Alexa's data? And Kevin Roose is very familiar with the privacy side of the data economy. We always blame the social media companies for everything, but on the same side, we've all heard about cell phone data in terms of catching somebody with a crime and it's actually the big tech companies that are kind of protecting us, or maybe they're not. Maybe you're going to tell me they're not. They're protecting us against the Big Brother government. So is really the only thing protecting us in some capacity from this police state is the big tech companies?
Speaker 4 (17:11):
I don't know that they're the only thing protecting us. Things like privacy laws could also protect us. But I do think that the platforms thus far, the big ones have been relatively good at turning down frivolous or unwarranted data requests from governments both in the US and abroad. There are lots of countries abroad where an authoritarian regime might want to get its hands on, say a list of users who had liked LGBT content or something like that. And so that's something that the platforms have, to my understanding, been reasonably strong in turning down those requests in making governments get warrants and things like that before they are able to access user data.
(17:57):
So yeah, I mean I don't think they're doing that out of sheer altruism. I think they also have a vested interest in not just handing over all their data to whatever government decides to ask for it. But I do think that it has been largely up to the tech companies to fight this sort of surveillance, this quest for more control of data. I mean, the example of that is Apple's long protracted battle with the FBI over access to iPhones in the case of murder suspects and things like that. And they have actually fought fairly hard over the years for their users' right to privacy.
Speaker 2 (18:37):
Kevin, should people be doing anything to protect their privacy and data? And if so, what advice do you have for them?
Speaker 4 (18:46):
I generally am not all that worried about the privacy of my, say individual transactions on the internet. I don't think targeted ads are all that appealing, but I also get a lot of untargeted ads and they're not so great either. And I don't get creeped out when something's like, oh, you bought a toaster so maybe you want some toast, like that just makes sense to me. I do think that most people have what's called privacy through obscurity. No one really cares about 99.99% of the data that's out there. But if you become interesting for some reason, and if that's to a government, if you're a dissident or if you experience a moment of viral fame, if you suddenly come under investigation for something, there is just a wealth of data out there.
(19:38):
And so I think most people should be conscious of their data footprint and know, for example, the basics of how online advertising works and maybe take some steps to preserve your privacy. I turn off all the app tracking stuff on my iPhone, I use encrypted messaging services. Those are reasonable steps. But even so, I'm not under any illusion that I live... If someone really wanted to find out what brand of toaster I own or maybe even what my phone number is, it's not that hard to do.
Speaker 3 (20:10):
So Kevin, I've heard that you have a really great story about privacy and how private our data actually is.
Speaker 4 (20:16):
I will say that a few years ago, for a story I was curious about hacking and kind of what my own data trail was like out there. And so I invited a group of world class hackers to hack me. I just said, "Spend the next two weeks hacking me in any way you want. Figure out whatever you want, no limits, and then I'll do a story about it." And so they went to town and it was the stuff that actually gave away my privacy was not what I thought. I was sure that they were going to find some shadowy data broker and find a file on the dark net that had my address in it. But really it was that I had posted a photo of my dog on social media and if you zoomed in really close on the dog photo, you could see his tag and on that tag was my address. So I...
Speaker 2 (21:08):
I love that.
Speaker 4 (21:09):
It was actually not the thing that I thought I was doing that was risky that ended up giving away my address. And I think that's probably true for a lot of people, like your biggest exposure is not where you think it is.
Speaker 2 (21:26):
The data economy can be very useful. I love it when Amazon reminds me to buy dog food because otherwise Henry wouldn't have dog food that week. But it has a dangerous and really insidious side and our government is not stepping in.
Speaker 3 (21:43):
So even when current government regulations hold tech companies accountable, it really doesn't affect them all that much. Google was fined a 170 million for violating children's privacy on YouTube. They were accused of illegally collecting data from children who watched child related videos and selling it to companies for targeted advertisements. So a 170 million sounds like a lot, but it amounts to only two days of Google's post tax earnings.
Speaker 2 (22:07):
It's not just the big tech companies that we should be worried about. It's really these sneaky little apps that we don't know who's behind or what they're doing with our information. I recently saw an ad for palm reading on Instagram, shows you what I'm looking at. It instructed me to get a close up picture of my entire hand, including my fingers, so that I could get this detailed intricate poem reading for free. But there's no such thing as a free palm reading. I'm actually giving up my Instagram username, which is my full name and my fingerprints. The terms and conditions of this app stated that they could keep and use them in perpetuity.
Speaker 3 (22:49):
So there are countless other examples of your personal data being used for worrisome purposes. But here's the crux of the matter, an unimaginable number of laws and rules have been put in place to protect your rights as a citizen of the United States. When it comes to protecting your data, there are virtually none, so what is it we can do, Liberty?
Speaker 2 (23:07):
I think it's clear from what Dean and Kevin said, that it's really time for our government to start regulating it. We as citizens need to urge lawmakers to make laws to regulate big tech. And I think in the meantime, you should just take a scroll through your phone and delete some of those sneaky little apps. It might be time for me to delete that random boat tracking app I downloaded. Thanks so much for listening to this episode of Data Nation. This podcast is brought to you by MIT's Institute for Data Systems and Society. And if you want to learn more about what IDSS does, please follow us @MITIDSS on Twitter or visit our website at idss.mit.edu.