
A Brief Introduction to Importance of Differential Privacy
Why are companies adopting Differential Privacy Algorithms?
In the 21st century, privacy has become a great luxury. We are familiar that big tech companies like Facebook, Google, Apple, and Amazon are increasingly infiltrating our personal and social interactions to collect vast amounts of data on us every day. Besides, we come across numerous news about privacy violations in cyberspace. But did you know it can be quantified? Today, we can devise privacy-preserving strategies and protect ourselves from hackers too. Enter differential privacy (DP). It is a cryptographic technique that attempts to maximize the accuracy of statistical queries from a database while minimizing the chances of identifying its records. Cynthia Dwork invented it in 2006 at Microsoft Research Labs.
IBM says this technique employs mathematical noise to preserve individuals’ privacy and confidentiality while allowing population statistics to be observed. This concept has a natural extension to machine learning, where we can protect models against privacy attacks while maintaining overall accuracy. In recent years, differential privacy has emerged as the de facto standard privacy notion for research in privacy-preserving data analysis and publishing. In addition it is important to note that this method does not rely on encryption.
Data is produced almost across every industry vertical, at a rapid rate and in large amounts. However, while data is required to run businesses, privacy has become a necessary attribute requiring emphasis. Privacy can be defined as the capacity of a person or group to seclude themselves or information about themselves and express them selectively. While there are several ways to protect privacy, DP has evolved an advanced cybersecurity concept that advocates claim can protect personal data far better than traditional methods. Its algorithms guarantee that the attacker can learn virtually nothing more about an individual than they would learn if that person’s record were absent from the dataset. In simpler words, differential privacy guarantees that its algorithm’s behavior remains unaffected even when a data subject is included or removed from the analysis.
This ensures that the output obtained by performing a differentially private analysis on a dataset containing a particular data subject will give similar results to an analysis performed on the same dataset after excluding that particular data subject. This gives a formal guarantee that individual-level information about participants in the database is not leaked. Further, thanks to the introduction of noise, the analysis’s output is an approximation and not the exact result that would’ve resulted if performed on the actual data set.
Over the past year, Microsoft and Harvard worked to build an open solution that utilizes differential privacy to keep data private while empowering researchers across disciplines to gain insights that possess the potential to advance human knowledge rapidly. Meanwhile, it is already being used by Apple, Uber, the US Census Bureau, and other organizations. In June 2016, Apple announced that it would begin to use differentially private algorithms to collect behavioral statistics from iPhones. This announcement, besides causing a huge spike in differential privacy interest, showed that differential privacy can help major organizations get new value from data that they previously did not touch due to privacy concerns. Apple started leveraging this technique from iOS 10. Even IBM has been working on its own open-source version, and this year in June, the tech giant published the latest release v0.3. The IBM Differential Privacy Library boasts a suite of tools for machine learning and data analytics tasks, all with built-in privacy guarantees. According to Naoise Holohan, a research staff member on IBM Research Europe’s privacy and the security team, this library is unique to others in giving scientists and developers access to lightweight, user-friendly tools for data analytics and machine learning in a familiar environment–in fact, most tasks can be run with only a single line of code. In Google’s RAPPOR project, Google has implemented a feature in Chrome that collects behavioral statistics from Chrome browsers through a differentially private randomized response algorithm.
In has been observed through extensive theoretical research that differential privacy shows promise in enabling research data to be shared in a wide variety of settings. It can be used to count queries, make histograms, cumulative distribution functions, linear regression, developing statistical and machine learning techniques that involve clustering and classification, and synthetic data generation and other statistical disclosure limitation techniques. DP also guarantees resilience to post-processing; post-processing the output of a differentially private algorithm will not affect the algorithm’s differential privacy.
NIST Challenge
At present, the National Institute of Standards and Technology (NIST) is launching the Differential Privacy Temporal Map Challenge with an intention to crowdsource new ways of handling personally identifiable information (PII) in public safety datasets. NIST is offering a total of US$276,000 in prize money across three categories.
The Better Meter Stick for Differential Privacy Contest will award a total prize purse of US$29,000 for winning submissions that propose novel scoring metrics by which to assess the quality of differentially private algorithms on temporal map data. The three Temporal Map Algorithms sprints will award a total prize purse of US$147,000 over a series of three sprints for developing algorithms that preserve data utility of temporal and spatial map data sets while guaranteeing privacy. The Open Source and Development Contest will award a total prize purse of US$100,000 to teams leading in the sprints to increase their algorithm’s utility and usability for open-source audiences.
The challenge is accepting submissions now through January 5, 2021. And winners will be announced on February 4, 2021.