Data Privacy Detective

Science and knowledge advance through information gathered, organized, and analyzed. It is only through databases about people that social scientists, public health experts and academics can study matters important to us all. As never before, vast pools of personal data exist in data lakes controlled by Facebook, Google, Amazon, Acxiom, and other companies. Our personal data becomes information held by others. To what extent can we trust those who hold our personal information not to misuse it or share it in a way that we don’t want it shared? And what will lead us to trust our information to be shared for database purposes that could improve the lives of this and future generations, and not for undesirable and harmful purposes?

Dr. Cody Buntain, Assistant Professor at the New Jersey Institute of Technology’s College of Computing and an affiliate of New York University’s Center for Social Media and Politics discusses in this podcast how privacy and academic research intersect.

Facebook, Google, and other holders of vast stores of personal information face daunting privacy challenges. They must guard against unintended consequences of sharing data. They will not generally share with and will not sell to academic researchers access to databases. However, they will consider and approve collaborative agreements with researchers that result in providing academics access to information for study purposes. This access can aim to limit access to identifying individuals through various techniques, including encryption, anonymization, pseudonymization, and “noise” (efforts to block users from being able to identify individuals who contributed to a database).

“Differential privacy” is an approach to the issues of assuring privacy protection and database access for legitimate purposes. It is described by Wikipedia as “a system for publicly sharing information about a dataset by describing the patterns of groups within the dataset while withholding information about individuals in the dataset.” The concept is based on the point that it is the group’s information that is being measured and analyzed, and any one individual’s particular circumstances are irrelevant to the study. By eliminating the need for access to each individual’s identity, the provider of data through differential privacy seeks to assure data contributors that their privacy is respected, while providing to the researcher a statistically valid sample of a population. Differentially private databases and algorithms are designed to resist attacks aimed at tracing data back to individuals. While not foolproof, these efforts aim to reassure those who contribute their personal information to such sources that their private information will only be used for legitimate study purposes and not to identify them personally and thus risk exposure of information the individuals prefer to keep private.

“Data donation” is an alternative. This provides a way for individuals to provide their own data to researchers for analysis. Some success has been achieved by paying persons to provide their data or allowing an entity gathering data for research to collect what it obtains by agreement with a group of persons. Both solutions have their limits of protection, and each can result in selection bias. Someone active in an illicit or unsavory activity will be reluctant to share information with any third party.

We leave “data traces” through our daily activity and use of digital technology. Information about us becomes 0’s and 1’s that are beyond erasure. There can be false positives and negatives. Algorithms can create mismatches, for example a mistaken report from Twitter and Reddit identifying someone as a Russian disinformation agent.

If you have ideas for more interviews or stories, please email info@thedataprivacydetective.com.

Show Notes

Science and knowledge advance through information gathered, organized, and analyzed. It is only through databases about people that social scientists, public health experts and academics can study matters important to us all. As never before, vast pools of personal data exist in data lakes controlled by Facebook, Google, Amazon, Acxiom, and other companies. Our personal data becomes information held by others. To what extent can we trust those who hold our personal information not to misuse it or share it in a way that we don’t want it shared? And what will lead us to trust our information to be shared for database purposes that could improve the lives of this and future generations, and not for undesirable and harmful purposes? Dr. Cody Buntain, Assistant Professor at the New Jersey Institute of Technology’s College of Computing and an affiliate of New York University’s Center for Social Media and Politics discusses in this podcast how privacy and academic research intersect. Facebook, Google, and other holders of vast stores of personal information face daunting privacy challenges. They must guard against unintended consequences of sharing data. They will not generally share with and will not sell to academic researchers access to databases. However, they will consider and approve collaborative agreements with researchers that result in providing academics access to information for study purposes. This access can aim to limit access to identifying individuals through various techniques, including encryption, anonymization, pseudonymization, and “noise” (efforts to block users from being able to identify individuals who contributed to a database). “Differential privacy” is an approach to the issues of assuring privacy protection and database access for legitimate purposes. It is described by Wikipedia as “a system for publicly sharing information about a dataset by describing the patterns of groups within the dataset while withholding information about individuals in the dataset.” The concept is based on the point that it is the group’s information that is being measured and analyzed, and any one individual’s particular circumstances are irrelevant to the study. By eliminating the need for access to each individual’s identity, the provider of data through differential privacy seeks to assure data contributors that their privacy is respected, while providing to the researcher a statistically valid sample of a population. Differentially private databases and algorithms are designed to resist attacks aimed at tracing data back to individuals. While not foolproof, these efforts aim to reassure those who contribute their personal information to such sources that their private information will only be used for legitimate study purposes and not to identify them personally and thus risk exposure of information the individuals prefer to keep private. “Data donation” is an alternative. This provides a way for individuals to provide their own data to researchers for analysis. Some success has been achieved by paying persons to provide their data or allowing an entity gathering data for research to collect what it obtains by agreement with a group of persons. Both solutions have their limits of protection, and each can result in selection bias. Someone active in an illicit or unsavory activity will be reluctant to share information with any third party. We leave “data traces” through our daily activity and use of digital technology. Information about us becomes 0’s and 1’s that are beyond erasure. There can be false positives and negatives. Algorithms can create mismatches, for example a mistaken report from Twitter and Reddit identifying someone as a Russian disinformation agent. If you have ideas for more interviews or stories, please email info@thedataprivacydetective.com.

What is Data Privacy Detective?

The internet in its blooming evolution makes personal data big business – for government, the private sector and denizens of the dark alike. The Data Privacy Detective explores how governments balance the interests of personal privacy with competing needs for public security, public health and other communal goods. It scans the globe for champions, villains, protectors and invaders of personal privacy and for the tools and technology used by individuals, business and government in the great competition between personal privacy and societal good order.

We’ll discuss how to guard our privacy by safeguarding the personal data we want to protect. We’ll aim to limit the access others can gain to your sensitive personal data while enjoying the convenience and power of smartphones, Facebook, Google, EBay, PayPal and thousands of devices and sites. We’ll explore how sinister forces seek to penetrate defenses to access data you don’t want them to have. We’ll discover how companies providing us services and devices collect, use and try to exploit or safeguard our personal data.

And we’ll keep up to date on how governments regulate personal data, including how they themselves create, use and disclose it in an effort to advance public goals in ways that vary dramatically from country to country. For the public good and personal privacy can be at odds. On one hand, governments try to deter terrorist incidents, theft, fraud and other criminal activity by accessing personal data, by collecting and analyzing health data to prevent and control disease and in other ways most people readily accept. On the other hand, many governments view personal privacy as a fundamental human right, with government as guardian of each citizen’s right to privacy. How authorities regulate data privacy is an ongoing balance of public and individual interests. We’ll report statutes, regulations, international agreements and court decisions that determine the balance in favor of one or more of the competing interests. And we’ll explore innovative efforts to transcend government control through blockchain and other technology.

If you have ideas for interviews or stories, please email info@thedataprivacydetective.com.