Hours:
20 hours (5 credits)
Room:
Aula Riunioni del Dipartimento di Ingegneria dell’Informazione, Via G. Caruso 16, Pisa - Ground Floor
To register to the course, click here
Short Abstract:
The course aims at allowing PhD students to familiarize with the concept of Privacy, understand its relevance when handling users’ data, and learn some practical techniques that can be employed to anonymize and release the data. Privacy protection has ethical [1], legal [2,3] and economic [4] implications that need to be accounted when developing a system which processes, transmits or handles personal and user’s generated data. Indeed, privacy plays a prominent role in granting cybersecurity in many digital environments, ranging from customers’ data and click logs [5], to medical and genomic data [6,7], to geospatial information [8].
After a brief introduction of the General Data Protection Regulation (GDPR) [2], the major European legal framework that regulates privacy aspects, the course will introduce privacy techniques derived from three major areas: microdata protection, differential privacy, and geomasking.
Microdata are data concerning single individuals. Data used in biological scenarios and generated by sensors typically fall within this category of data. Their use comes with severe risks of reidentification and record linkage [9]. The course will equip the PhD students with statistical techniques to operate securely on such type of data [10]. Furthermore, the course will introduce the main theoretical frameworks to handle this type of information, such as k-Anonymity [11], l-Diversity [12], and t-Closeness [13].
The second module regards Differential Privacy [14]. Differential Privacy is considered the de-facto standard to release privatized data, a paramount task when it comes to digital communications and data publication at large. During the second part of the course, the Phd Students will learn the main basic Differential Privacy mechanisms [14], the building blocks of more advanced solutions, and will be introduced to real world solutions developed by major IT companies, such as Google’s RAPPOR [15],
Apple’s Private CMS [16] and Microsoft’s LDP [17].
The final module will concern geographical data. Due to its volume and sensitivity, this class of data presents additional vulnerabilities and requires proper strategies to be handled in a secure manner. To this end, the students will be introduced to the major Geomasking approaches, including statistical solutions [8] and Metric Differential Privacy [18].
Each module will be followed by a hands-on laboratory where the students can learn how to practically implement the techniques discussed theoretically during the lectures. The implementation will be done in Python, using the major data science packages, such as NumPy, Pandas and Scipy. The students will be able to apply privacy protection approaches to the data they use in their research, or on synthetic or publicly available datasets.
Course Contents in brief:
- Microdata Protection:
- Definition of the concept of microdata and related aspects, such as Personal Identifiable Information (PII), Identifiers, Quasi-Identifiers, and sensitive attributes.
- Analysis of the main micro-data protection techniques, including local suppression, recoding, resampling, Post RAndomized Methods (PRAM), micro-aggregation.
- Introduction to the main micro-data anonymization frameworks, such as k-Anonymity, l-Diversity, t-Closeness.
- Differential privacy:
- Introduction of the concept of Differential Privacy, with its use cases, application scenarios and limitations.
- Introduction to the main differentially private mechanisms, including the randomization mechanism, Laplace mechanism, exponential mechanism.
- Introduction of some real-world applications of Differential Privacy, such as Google’s RAPPOR, Apple’s Private Count Mean Sketch, and Microsoft’s LDP.
- Biosciences:
- Understand the criticalities and additional challenges that rise in terms of privacy when handling medical and biological data, including genomic information.
- Application of privacy preserving techniques to protect microdata in the medical and biological domains.
- Devise approaches to produce aggregated statistics on biological data that can be safely released (i.e., published) by employing differential privacy.
- Automation Engineering:
- Identification of the privacy risks derived from handling data generated through sensors.
- Introduction to the privacy risks associated with geospatial information (reverse geocoding) and main approaches to handle geographical data in a privacy preserving manner (geomasking and metric Differential Privacy).
- Application of the techniques learned during lecture to protect sensor data and user generated information.
- Telecommunications:
- Analysis of the main approaches studied in the theoretical part of the course explicitly designed for telecommunication data, such as Microsoft’s LDP.
- Application of privacy preserving techniques to telecommunication tasks, such as traffic pattern analysis and resource allocation.
- Implementation of privacy-preserving solutions specifically tailored towards telecommunication and stream data.
Schedule:
- 12/5/2025 – 14:30-18:30
- 13/5/2025 – 9-13
- 14/5/2025 – 9-13
- 15/5/2025 – 9-13
- 16/5/2025 – 9-13