About me
My recent research interests are: 1) Enabling Gemini to power breakthrough applications through tool-use and RL, 2) Improving methods for privacy-preserving machine learning.
I am a Research Scientist at Google Research. I earned my Ph.D. from Sun Yat-sen University in 2024, where I had the privilege of being supervised by Prof. Tie-Yan Liu and Prof. Jian Yin. During my Ph.D., I was a member of the Joint PhD program between Microsoft Research Asia and Sun Yat-sen University. I received a Microsoft Research PhD Fellowship in 2021. I earned my Bachelor’s degree in Computer Science from Sun Yat-sen University in 2019.
News
[09/18/2025] Our work Scaling Embedding Layers in Language Models and Synthesize Privacy-Preserving High-Resolution Images via Private Textual Intermediaries have been accepted to NeurIPS 2025!
[08/21/2025] I will be serving as an Area Chair for ICLR 2026.
[07/07/2025] Our work URANIA: Differentially Private Insights into AI Use has been accepted to COLM 2025!
[06/14/2025] Check out our new work Synthesize Privacy-Preserving High-Resolution Images via Private Textual Intermediaries on arXiv.
[05/19/2025] We’ve just uploaded a new version of our paper, Scaling Embedding Layers in Language Models, on arXiv.
Recent Publications
Please see Google Scholar for an up-to-date list.
Gemini 2.5: Pushing the Frontier With Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities
Gemini team
Technical Report
Scaling Embedding Layers in Language Models, [community implementation]
Da Yu, Edith Cohen, Badih Ghazi, Yangsibo Huang, Pritish Kamath, Ravi Kumar, Daogao Liu, Chiyuan Zhang
NeurIPS 2025
Synthesize Privacy-Preserving High-Resolution Images via Private Textual Intermediaries
Haoxiang Wang, Zinan Lin, Da Yu, Huishuai Zhang
NeurIPS 2025
URANIA: Differentially Private Insights into AI Use
Daogao Liu, Edith Cohen, Badih Ghazi, Peter Kairouz, Pritish Kamath, Alexander Knop, Ravi Kumar, Pasin Manurangsi, Adam Sealfon, Da Yu, Chiyuan Zhang
COLM 2025
Selective Pre-training for Private Fine-tuning, [code]
Da Yu, Sivakanth Gopi, Janardhan Kulkarni, Zinan Lin, Saurabh Naik, Tomasz Lukasz Religa, Jian Yin, Huishuai Zhang
TMLR, 2024
Privacy-Preserving Instructions for Aligning Large Language Models, [code], [exp configs], [poster]
Da Yu, Peter Kairouz, Sewoong Oh, Zheng Xu
ICML, 2024
Differentially Private Synthetic Data via Foundation Model APIs 2: Text, [code]
Chulin Xie, Zinan Lin, Arturs Backurs, Sivakanth Gopi, Da Yu, Huseyin A Inan, Harsha Nori, Haotian Jiang, Huishuai Zhang, Yin Tat Lee, Bo Li, and Sergey Yekhanin
ICML, 2024 (Spotlight)
Individual Privacy Accounting for Differentially Private Stochastic Gradient Descent, [code]
Da Yu, Gautam Kamath, Janardhan Kulkarni, Tie-Yan Liu, Jian Yin, Huishuai Zhang
TMLR, 2023
Exploring the Limits of Differentially Private Deep Learning with Group-wise Clipping
Jiyan He*, Xuechen Li*, Da Yu*, Huishuai Zhang, Janardhan Kulkarni, Yin Tat Lee, Arturs Backurs, Nenghai Yu, Jiang Bian
ICLR, 2023
Differentially Private Fine-tuning of Language Models, [code]
Da Yu, Saurabh Naik, Arturs Backurs, Sivakanth Gopi, Huseyin A. Inan, Gautam Kamath, Janardhan Kulkarni, Yin Tat Lee, Andre Manoel, Lukas Wutschitz, Sergey Yekhanin, Huishuai Zhang
ICLR, 2022
Large Scale Private Learning via Low-rank Reparametrization, [code]
Da Yu, Huishuai Zhang, Wei Chen, Jian Yin, Tie-Yan Liu
ICML, 2021
Do not Let Privacy Overbill Utility: Gradient Embedding Perturbation for Private Learning, [code]
Da Yu*, Huishuai Zhang*, Wei Chen, Tie-Yan Liu
ICLR, 2021
Availability Attacks Create Shortcuts, [code]
Da Yu, Huishuai Zhang, Wei Chen, Jian Yin, Tie-Yan Liu
KDD, Research Track, 2022
How Does Data Augmentation Affect Privacy in Machine Learning?, [code]
Da Yu, Huishuai Zhang, Wei Chen, Jian Yin, Tie-Yan Liu
AAAI, 2021
Academic Service
I am an Area Chaire for ICLR 2026. I am a reviewer for ICML 2022-2025, NeurIPS 2022-2025, and ICLR 2023-2025. I’m awarded as a top reviewer for several times.