LLM Privacy Project

The LLM Privacy Project is a collaborative research initiative aimed at benchmarking privacy protection in Large Language Models (LLMs). This project is supported by the Office of the Privacy Commissioner of Canada and led by esteemed researchers from the University of Ottawa.

Our Goal

The Office of the Privacy Commissioner of Canada is interested in robust privacy guarantees for machine learning algorithms and large language models (LLMs). In response to this need, our research focuses on analyzing differential privacy mechanisms within the framework of stochastic gradient descent. By rigorously evaluating the privacy properties of these methods, we aim to ensure that enhanced privacy does not come at the cost of model performance.

About the Project

This work integrates differential privacy into machine learning, particularly in supervised learning, by developing a mathematical framework to derive privacy bounds and analyze the convergence, stability, and statistical properties of private estimators. Overall, the research bridges theory and practice to fine-tune models that effectively balance data privacy with utility, providing a robust foundation for future privacy-preserving machine learning applications.

2023-2024 Report (PDF) - Last year's analysis and findings.
2024-2025 Report (PDF) - Ongoing research and experiments.

Explore Our Resources

2023-2024 Edition

The project uncovered inconsistencies in differential privacy definitions and parameters, conducted experiments showing that pre-processing offers stronger privacy while post-processing better preserves data utility, and explored combining differential privacy with k-anonymity. It also provided practical guidelines for aligning these approaches with legal data protection frameworks, supported by a student survey and policy engagement.

Learn more →

2024-2025 Edition

This project addresses privacy challenges in large language models by developing robust standards that balance advanced AI capabilities with protecting sensitive data. It refines methods like differential privacy to ensure both data utility and confidentiality, while aligning these techniques with current legal frameworks and engaging in policy and educational outreach.

Learn more →

Course Information

"Personal information comprises the most sensitive and intimate details of one’s life." Data must be released in a manner that minimizes the risk of re-identification while preserving quality. This course provides the theoretical foundations and statistical methodologies for ensuring privacy guarantees while maintaining data utility.

Topics include:

Introduction
Statistical Disclosure Control, disclosure risk measures, basic anonymization methods.
Differential privacy (DP).
Synthetic data.

Learn more →

Experiments & Code Course

Access tutorials, experiments, and code samples designed to give you a practical understanding of implementing privacy mechanisms. We run experiments comparing the performance of differentially private stochastic gradient descent (DP-SGD) with a traditional noise addition algorithm where noise is applied to the model outputs or parameters after training. Standard synthetic data and performance metrics are used to assess convergence rates, model utility, and privacy guarantees.

Learn more →

Downloads

Get all available reports, research papers, and supplemental materials to deepen your exploration of privacy in AI technologies.

Learn more →

Welcome to the LLM Privacy Project

Our Goal

About the Project

Explore Our Resources

2023-2024 Edition

2024-2025 Edition

Course Information

Experiments & Code Course

Downloads