Publications

Please visit the PRADA Lab publication page for the full list of publication of our lab.

(★ my Master/PhD/Intern/Visiting students)

2026

Conference Papers

[NDSS] Revisiting Differentially Private Hyper-parameter Tuning [Link] Abstract▼
We investigate the application of differential privacy in hyper-parameter tuning, a process involving selecting the best run from several candidates. Unlike many private learning algorithms, including the prevalent DP-SGD, the privacy implications of selecting the best are often overlooked. While recent works propose a generic private selection solution for the tuning process, an open question persists: is such privacy upper bound tight? This paper provides both empirical and theoretical examinations of this question. Initially, we provide studies affirming the current privacy analysis for private selection is indeed tight in general. However, when we specifically study the hyper-parameter tuning problem in a white-box setting, such tightness no longer holds. This is first demonstrated by applying privacy audit on the tuning process. Our findings underscore a substantial gap between the current theoretical privacy bound and the empirical privacy leakage derived even under strong audit setups. This gap motivates our subsequent theoretical investigations, which provide improved privacy upper bound for private hyper-parameter tuning due to its distinct properties. Our improved bound leads to better utility. Our analysis also demonstrates broader applicability compared to prior analyses, which are limited to specific parameter configurations. Overall, we contribute to a better understanding of how privacy degrades due to selection.

Zihang Xiang^★, Tianhao Wang, Cheng-Long Wang^★, Di Wang
Network and Distributed System Security Symposium (NDSS 2026)

[ICSE] Scrub It Out! Erasing Sensitive Memorization in Code Language Models via Machine Unlearning [Link] Abstract▼
While Code Language Models (CLMs) have demonstrated superior performance in software engineering tasks such as code generation and code summarization, recent empirical studies reveal a critical privacy vulnerability: these models exhibit unintended memorization of sensitive training data, enabling verbatim reproduction of confidential information when specifically prompted. To address this issue, several approaches, including dataset deduplication and differential privacy augmentation, have been proposed. However, these methods require full-model retraining for deployed CLMs, which incurs substantial computational costs. In this paper, we aim to answer the following research question: Can sensitive information memorized by CLMs be erased effectively and efficiently? We conduct a pioneering investigation into erasing sensitive memorization in CLMs through machine unlearning—a post hoc modification approach that removes specific information from trained models without requiring full retraining. Specifically, we first quantify the memorization risks of sensitive data within CLM training datasets and curate a high-risk dataset of 50,000 sensitive memorization samples by identifying and selecting vulnerable elements as unlearning targets. We investigate two widely-used gradient ascent-based unlearning approaches: the vanilla method and the constraint-based method, and introduce an advanced variant, termed CodeEraser, which selectively unlearns sensitive memorization elements in code while preserving the structural integrity and functional correctness of the surrounding code. Extensive experiments on three families of CLMs, i.e., CodeParrot, CodeGen-Mono and Qwen2.5-Coder, validate the effectiveness and efficiency of CodeEraser in erasing targeted sensitive memorization while maintaining model utility.

Zhaoyang Chu, Yao Wan, Zhikun Zhang, Di Wang , Zhou Yang, Hongyu Zhang, Pan Zhou, Xuanhua Shi, Hai Jin, David Lo.
International Conference on Software Engineering (ICSE 2026)

2025

Conference Papers

[CoLM] Towards User-level Private Reinforcement Learning with Human Feedback [Link] Abstract▼
Reinforcement Learning with Human Feedback (RLHF) has emerged as an influential technique, enabling the alignment of large language models (LLMs) with human preferences. However, how to protect user preference privacy has become a crucial issue, as LLMs tend to remember users' preferences. Most previous work has focused on using differential privacy (DP) to protect the privacy of individual data. However, they have concentrated primarily on item-level privacy protection and have unsatisfactory performance for user-level privacy, which is more common in RLHF. This study proposes a novel framework, AUP-RLHF, which integrates user-level label DP into RLHF. We first show that the classical random response algorithm, which achieves an acceptable performance in item-level privacy, leads to suboptimal utility when in the user-level settings. We then establish a lower bound for the user-level label DP-RLHF and develop the AUP-RLHF algorithm, which guarantees user-level privacy and achieves an improved estimation error. Experimental results show that AUP-RLHF outperforms existing baseline methods in sentiment generation and summarization tasks, achieving a better privacy-utility trade-off.

Jiaming Zhang^★, Mingxi Lei^★, Meng Ding, Mengdi Li, Zihang Xiang, Difei Xu, Jinhui Xu, Di Wang
Conference on Language Modeling (CoLM 2025)

[CoLM] Scalable Zeroth-Order Fine-Tuning for Extremely Large Language Models with Limited GPU Memory [Link] Abstract▼
Fine-tuning large pre-trained LLMs generally demands extensive GPU memory. Traditional first-order optimizers like SGD encounter substantial difficulties due to increased memory requirements from storing activations and gradients during both the forward and backward phases as the model size expands. Alternatively, zeroth-order (ZO) techniques can compute gradients using just forward operations, eliminating the need to store activations. Furthermore, by leveraging CPU capabilities, it's feasible to enhance both the memory and processing power available to a single GPU. We propose a novel framework, ZO2 (Zeroth-Order Offloading), for efficient zeroth-order fine-tuning of LLMs with only limited GPU memory. Our framework dynamically shifts model parameters between the CPU and GPU as required, optimizing computation flow and maximizing GPU usage by minimizing downtime. This integration of parameter adjustments with ZO's double forward operations reduces unnecessary data movement, enhancing the fine-tuning efficacy. Additionally, our framework supports an innovative low-bit precision approach in AMP mode to streamline data exchanges between the CPU and GPU. Employing this approach allows us to fine-tune extraordinarily large models, such as the OPT-175B with 175 billion parameters, on a mere 18GB GPU. Moreover, our framework achieves these results with almost no additional time overhead and absolutely no accuracy loss compared to standard zeroth-order methods.

Liangyu Wang^★, Jie Ren, Hang Xu, Junxiao Wang, Huanyi Xie, David E. Keyes, Di Wang
Conference on Language Modeling (CoLM 2025)

[ICCV] Semi-supervised Concept Bottleneck Models [Link] Abstract▼
Concept Bottleneck Models (CBMs) have garnered increasing attention due to their ability to provide concept-based explanations for black-box deep learning models while achieving high final prediction accuracy using human-like concepts. However, the training of current CBMs heavily relies on the accuracy and richness of annotated concepts in the dataset. These concept labels are typically provided by experts, which can be costly and require significant resources and effort. Additionally, concept saliency maps frequently misalign with input saliency maps, causing concept predictions to correspond to irrelevant input features - an issue related to annotation alignment. To address these limitations, we propose a new framework called SSCBM (Semi-supervised Concept Bottleneck Model). Our SSCBM is suitable for practical situations where annotated data is scarce. By leveraging joint training on both labeled and unlabeled data and aligning the unlabeled data at the concept level, we effectively solve these issues. We proposed a strategy to generate pseudo labels and an alignment loss. Experiments demonstrate that our SSCBM is both effective and efficient. With only 10% labeled data, our model's concept and task accuracy on average across four datasets is only 2.44% and 3.93% lower, respectively, compared to the best baseline in the fully supervised learning setting.

Lijie Hu^★*, Tianhao Huang^★*, Huanyi Xie, Xilin Gong, Chenyang Ren, Zhengyu Hu, Lu Yu, Ping Ma, Di Wang
International Conference on Computer Vision (ICCV 2025)

[ECML-PKDD] Stable Vision Concept Transformers for Medical Diagnosis [Link] Abstract▼
Transparency in medicine has spurred interest in explainable AI (XAI), with Concept Bottleneck Models (CBMs) aiming to constrain latent spaces to human-understandable concepts. Despite their promise, CBMs rely exclusively on concept features for predictions, neglecting intrinsic image features and leading to performance degradation and unstable explanations under input perturbations. Addressing these limitations, we introduce the Vision Concept Transformer (VCT) and its advanced variant, the Stable Vision Concept Transformer (SVCT). SVCT integrates a conceptual layer within a vision transformer (ViT) backbone, enhancing decision-making by fusing conceptual and image features while ensuring model faithfulness through Denoised Diffusion Smoothing. This approach not only bridges the utility gap between original and concept-based models but also provides stable explanations upon encountering perturbations. Our experiments across four medical datasets demonstrate that both VCT and SVCT maintain high accuracy alongside interpretability compared to existing models. Notably, SVCT consistently offers faithful explanations even when faced with input variations, fulfilling critical requirements for applications in the medical field. Thus, our proposed models represent significant advancements towards more reliable and interpretable AI systems in healthcare.

Lijie Hu^★*, Songning Lai^★*, Yuan Hua^★*, Shu Yang ^★, Jingfeng Zhang, Di Wang
European Conference on Machine Learning (ECML 2025)

[ECML-PKDD] Differentially Private Sparse Linear Regression with Heavy-tailed Responses [Link] Abstract▼
As a fundamental problem in machine learning and differential privacy (DP), DP linear regression has been extensively studied. However, most existing methods focus primarily on either regular data distributions or low-dimensional cases with irregular data. To address these limitations, this paper provides a comprehensive study of DP sparse linear regression with heavy-tailed responses in high-dimensional settings. In the first part, we introduce the DP-IHT-H method, which leverages the Huber loss and private iterative hard thresholding to achieve an estimation error bound of $ \tilde{O}\biggl( s^{* \frac{1}{2}} \cdot \biggl(\frac{\log d}{n}\biggr)^{\frac{\zeta}{1 + \zeta}} + s^{* \frac{1 + 2\zeta}{2 + 2\zeta}} \cdot \biggl(\frac{\log^2 d}{n \varepsilon}\biggr)^{\frac{\zeta}{1 + \zeta}} \biggr) $ under the $(\varepsilon, \delta)$-DP model, where $n$ is the sample size, $d$ is the dimensionality, $s^*$ is the sparsity of the parameter, and $\zeta \in (0, 1]$ characterizes the tail heaviness of the data. In the second part, we propose DP-IHT-L, which further improves the error bound under additional assumptions on the response and achieves $ \tilde{O}\Bigl(\frac{(s^*)^{3/2} \log d}{n \varepsilon}\Bigr). $ Compared to the first result, this bound is independent of the tail parameter $\zeta$. Finally, through experiments on synthetic and real-world datasets, we demonstrate that our methods outperform standard DP algorithms designed for ``regular'' data.

Xizhi Tian^★, Meng Ding^★, Youming Tao^★, Zihang Xiang^★, Di Wang
European Conference on Machine Learning (ECML 2025)

[ACL] COMPKE: Complex Question Answering under Knowledge Editing [Link] Abstract▼
Knowledge Editing-Efficiently modifying the knowledge in large language models has gathered great attention. Current benchmarks primarily use multi-hop question answering to assess and analyze newly injected or updated knowledge. However, we argue that these benchmarks fail to effectively evaluate how well the updated models apply this knowledge in real-life scenarios, particularly when questions require complex reasoning involving one-to-many relationships or multi-step logical intersections. To fill in this gap, we introduce a new benchmark, COMPKE: Complex Question Answering under Knowledge Editing, which includes 11,924 complex questions that reflect real-life situations. We perform a comprehensive evaluation of four different knowledge editing methods in COMPKE, and our results show that the performance of these methods varies between different models. For example, MeLLo achieves an accuracy of 39.47 on GPT-4o-mini but drops significantly to 3.83 on Qwen2.5-3B. We further analyze the reasons behind these results from both methodological and model perspectives. Our dataset will be publicly available on GitHub.

Keyuan Cheng^★, Zijian Kan, Zhuoran Zhang^★, Muhammad Asif Ali, Lijie Hu^★, Di Wang
Annual Meeting of the Association for Computational Linguistics (ACL 2025 Findings)

[ACL] Can Language Models Be Used for Code Migration? [Link] Abstract▼
Large language models (LLMs) have demonstrated remarkable proficiency in handling a wide range of tasks within the software engineering domain, but their ability to perform code migration—adapting code to different environments—remains underexplored. In this work, we propose a novel benchmark, \OurDATA{}: \underline{\textbf{Code}} \underline{\textbf{M}}igration Across \underline{\textbf{Env}}ironment, designed to evaluate LLMs' performance in handling code migration tasks. The benchmark comprises 922 data points across 19 Python and Java packages, offering three tasks to systematically evaluate code migration: identifying version-incompatible functions, determining function changes, and adapting code to target environments. Experimental evaluation of \OurDATA{} across seven LLMs revealed an average pass@1 rate of 26.50%, with \textsc{GPT-4o} performing best at 43.84%. We highlight our key findings as follows: (i) LLMs are more familiar with newer function versions, making them better at migrating legacy code, and (ii) a logical inconsistency where LLMs sometimes identify irrelevant function changes for the target migration environment.

Keyuan Cheng^★, Xudong Shen, Yihao Yang, Tengyue Wang, Yang Cao, Muhammad Asif Ali, Hanbin Wang, Lijie Hu^★, Di Wang
Annual Meeting of the Association for Computational Linguistics (ACL 2025 Findings)

[ACL] Fraud-R1 : A Multi-Round Benchmark for Assessing the Robustness of LLM Against Augmented Fraud and Phishing Inducements [Link] Abstract▼
With the increasing integration of large language models (LLMs) into real-world applications such as finance, e-commerce, and recommendation systems, their susceptibility to misinformation and adversarial manipulation poses significant risks. Existing fraud detection benchmarks primarily focus on single-turn classification tasks, failing to capture the dynamic nature of real-world fraud attempts. To address this gap, we introduce Fraud-R1, a challenging bilingual benchmark designed to assess LLMs' ability to resist fraud and phishing attacks across five key fraud categories: Fraudulent Services, Impersonation, Phishing Scams, Fake Job Postings, and Online Relationships, covering subclasses. Our dataset comprises manually curated fraud cases from social media, news, phishing scam records, and prior fraud datasets.

Shu Yang^★, Shenzhe Zhu, Zeyu Wu, Keyu Wang^★, Junchi Yao^★, Junchao Wu, Lijie Hu^★, Mengdi Li^★, Derek F. Wong, Di Wang
Annual Meeting of the Association for Computational Linguistics (ACL 2025 Findings)

[ACL] Understanding the Repeat Curse in Large Language Models from a Feature Perspective [Link] Abstract▼
Large language models (LLMs) have made remarkable progress in various domains, yet they often suffer from repetitive text generation, a phenomenon we refer to as the "Repeat Curse". While previous studies have proposed decoding strategies to mitigate repetition, the underlying mechanism behind this issue remains insufficiently explored. In this work, we investigate the root causes of repetition in LLMs through the lens of mechanistic interpretability. Inspired by recent advances in Sparse Autoencoders (SAEs), which enable monosemantic feature extraction, we propose a novel approach—"Duplicatus Charm"—to induce and analyze the Repeat Curse. Our method systematically identifies ``Repetition Features'' -the key model activations responsible for generating repetitive outputs. First, we locate the layers most involved in repetition through logit analysis. Next, we extract and stimulate relevant features using SAE-based activation manipulation. To validate our approach, we construct a repetition dataset covering token and paragraph level repetitions and introduce an evaluation pipeline to quantify the influence of identified repetition features.

Junchi Yao^★, Shu Yang^★, Lijie Hu^★, Mengdi Li^★, Di Wang
Annual Meeting of the Association for Computational Linguistics (ACL 2025 Findings)

[UAI] Nearly Optimal Differentially Private ReLU Regression [Link] Abstract▼
In this paper, we investigate one of the most fundamental non-convex learning problems—ReLU regression—in the Differential Privacy (DP) model. Previous studies on private ReLU regression heavily rely on stringent assumptions, such as constant-bounded norms for feature vectors and labels. We relax these assumptions to a more standard setting, where data can be i.i.d. sampled from $O(1)$-sub-Gaussian distributions. We first show that when $\varepsilon = \tilde{O}(\sqrt{\frac{1}{N}})$ and there is some public data, it is possible to achieve an upper bound of $\Tilde{O}(\frac{d^2}{N^2 \varepsilon^2})$ for the excess population risk in $(\epsilon, \delta)$-DP, where $d$ is the dimension and $N$ is the number of data samples. Moreover, we relax the requirement of $\epsilon$ and public data by proposing and analyzing a one-pass mini-batch Generalized Linear Model Perceptron algorithm (DP-MBGLMtron). Additionally, using the tracing attack argument technique, we demonstrate that the minimax rate of the estimation error for $(\varepsilon, \delta)$-DP algorithms is lower bounded by $\Omega(\frac{d^2}{N^2 \varepsilon^2})$. This shows that DP-MBGLMtron achieves the optimal utility bound up to logarithmic factors. Experiments further support our theoretical results.

Meng Ding^★, Mingxi Lei, Shaowei Wang, Tianhang Zheng, Di Wang, Jinhui Xu
Conference on Uncertainty in Artificial Intelligence (UAI 2025)

[ICML] Locate-then-edit for Multi-hop Factual Recall under Knowledge Editing [Link] Abstract▼
The locate-then-edit paradigm has shown significant promise for knowledge editing (KE) in Large Language Models (LLMs). While previous methods perform well on single-hop fact recall tasks, they consistently struggle with multi-hop factual recall tasks involving newly edited knowledge. In this paper, leveraging tools in mechanistic interpretability, we first identify that in multi-hop tasks, LLMs tend to retrieve knowledge with implicit subject information from deeper MLP layers, unlike single-hop tasks, which rely on shallow layers. This distinction explains the poor performance of current methods in multi-hop queries, as they primarily focus on editing shallow layers with single-hop edit prompts, leaving deeper layers unchanged. To address this, we propose IFMET, a novel locate-then-edit KE approach designed to edit both shallow and deep MLP layers. Beyond single-hop editing prompts, IFMET further incorporates multi-hop editing prompts to locate and modify knowledge across different stages of reasoning. Experimental results demonstrate that IFMET significantly improves performance on multi-hop factual recall tasks, overcoming the limitations of previous locate-then-edit methods.

Zhuoran Zhang^★, Yongxiang Li, Zijian Kan, Keyuan Cheng, Lijie Hu, Di Wang
International Conference on Machine Learning (ICML 2025)

[ICML] Editable Concept Bottleneck Models [Link] Abstract▼
Concept Bottleneck Models (CBMs) have garnered much attention for their ability to elucidate the prediction process through a human-understandable concept layer. However, most previous studies focused on cases where the data, including concepts, are clean. In many scenarios, we always need to remove/insert some training data or new concepts from trained CBMs due to different reasons, such as privacy concerns, data mislabelling, spurious concepts, and concept annotation errors. Thus, the challenge of deriving efficient editable CBMs without retraining from scratch persists, particularly in large-scale applications. To address these challenges, we propose Editable Concept Bottleneck Models (ECBMs). Specifically, ECBMs support three different levels of data removal: concept-label-level, concept-level, and data-level. ECBMs enjoy mathematically rigorous closed-form approximations derived from influence functions that obviate the need for re-training. Experimental results demonstrate the efficiency and effectiveness of our ECBMs, affirming their adaptability within the realm of CBMs.

Lijie Hu^★* , Chenyang Ren^★*, Zhengyu Hu^★*, Hongbin Lin, Cheng-Long Wang, Zhen Tan, Weimin Lyu, Jingfeng Zhang, Hui Xiong, Di Wang
International Conference on Machine Learning (ICML 2025)

[NAACL] Mechanistic Unveiling of Transformer Circuits: Self-Influence as a Key to Model Reasoning [Link] Abstract▼
Transformer-based language models have achieved significant success; however, their internal mechanisms remain largely opaque due to the complexity of non-linear interactions and high-dimensional operations. While previous studies have demonstrated that these models implicitly embed reasoning trees, humans typically employ various distinct logical reasoning mechanisms to complete the same task. It is still unclear which multi-step reasoning mechanisms are used by language models to solve such tasks. In this paper, we aim to address this question by investigating the mechanistic interpretability of language models, particularly in the context of multi-step reasoning tasks. Specifically, we employ circuit analysis and self-influence functions to evaluate the changing importance of each token throughout the reasoning process, allowing us to map the reasoning paths adopted by the model. We apply this methodology to the GPT-2 model on a prediction task (IOI) and demonstrate that the underlying circuits reveal a human-interpretable reasoning process used by the model.

Lin Zhang^★, Lijie Hu^★, Di Wang
Americas Chapter of the Association for Computational Linguistics (NAACL 2025 Findings)

[WWW] LUSTER: Link Prediction Utilizing Shared-Latent Space Representation in Multi-Layer Networks [Link] Abstract▼
In data-driven applications, preserving user privacy while enabling valuable computations remains a critical challenge. Technologies like differential privacy have been pivotal in addressing these concerns. The shuffle model of DP requires no trusted curators and can achieve high utility by leveraging the privacy amplification effect yielded from shuffling. These benefits have led to significant interest in the shuffle model. However, the computation tasks in the shuffle model are limited to statistical estimation, making it inapplicable to real-world scenarios in which each user requires a personalized output. This paper introduces a novel paradigm termed Private Individual Computation (PIC), expanding the shuffle model to support a broader range of permutation-equivariant computations. PIC enables personalized outputs while preserving privacy, and enjoys privacy amplification through shuffling. We propose a concrete protocol that realizes PIC. By using one-time public keys, our protocol enables users to receive their outputs without compromising anonymity, which is essential for privacy amplification. Additionally, we present an optimal randomizer, the Minkowski Response, designed for the PIC model to enhance utility. We formally prove the security and privacy properties of the PIC protocol. Theoretical analysis and empirical evaluations demonstrate PIC's capability in handling non-statistical computation tasks, and the efficacy of PIC and the Minkowski randomizer in achieving superior utility compared to existing solutions.

Ruohan Yang, Muhammad Asif Ali^★, Huan Wang^★, Junyang Chen, Di Wang
ACM TheWebConf 2025 Conference (WWW 2025)

[IJCAI] Mitigating Sample Imbalance in Anomaly Detection within Dynamic Graphs [Link] Abstract▼
In dynamic graphs, detecting anomalous nodes faces challenges due to sample imbalance, stemming from the scarcity of anomalous samples and feature representation bias. Existing methods often use unsupervised or semi-supervised learning to extract anomalous samples from unlabeled data, but struggle to obtain enough anomalous instances due to their low occurrence. Moreover, GNN-based approaches often prioritize normal samples, neglecting rare anomalies. To address these issues, we propose the Anomaly Balance Network (ABNet), designed to alleviate sample imbalance and enhance anomaly detection. ABNet includes three key components: a feature extractor that compares node features across time points to avoid bias, an anomaly augmenter that amplifies anomaly details and generates diverse anomalous samples, and an anomaly detector using meta-learning to adapt to graph evolution. Experimental results show that ABNet outperforms existing methods on three real-world datasets, effectively addressing sample imbalance.

Yifan Hong, Muhammad Asif Ali, Huan Wang^★, Junyang Chen, Di Wang
International Joint Conference on Artificial Intelligence (IJCAI 2025)

[USENIX] Towards Lifecycle Unlearning Commitment Management: Measuring Sample-level Unlearning Completeness [Link] Abstract▼
Growing concerns over data privacy and security emphasize the importance of machine unlearning to remove targeted data's influence from trained models. Approximate unlearning adopts relaxed criteria to enable more resource-efficient unlearning but makes the unlearning process less transparent. Techniques like Membership Inference Attacks (MIAs) have been employed to evaluate whether a sample has been successfully unlearned, providing an external assessment. However, fully maximizing MIAs' capabilities, such as online attacks, requires significant computing resources to train shadow models for each new query, typically exceeding the cost of retraining and making it impractical. To address this, we propose the Interpolated Approximate Measurement (IAM), which efficiently estimates membership scores by interpolating the model's generalization-fitting behavior gap on the queried sample. Even with a single pre-trained shadow model, these scores yield strong MIA performance on exactly unlearned samples and robustly measure unlearning completeness for approximately unlearned samples. We then apply IAM to recent approximate unlearning algorithms, revealing the risk of over-unlearning and under-unlearning. We underscore the need for more powerful safeguards against these risks in approximate unlearning.

Cheng-Long Wang^★, Qi Li^★, Zihang Xiang^★, Yinzhi Cao, Di Wang
USENIX Security Symposium (USENIX Security 2025)

[USENIX] Beyond Statistical Estimation: Differentially Private Individual Computation via Shuffling [Link] Abstract▼
In data-driven applications, preserving user privacy while enabling valuable computations remains a critical challenge. Technologies like differential privacy have been pivotal in addressing these concerns. The shuffle model of DP requires no trusted curators and can achieve high utility by leveraging the privacy amplification effect yielded from shuffling. These benefits have led to significant interest in the shuffle model. However, the computation tasks in the shuffle model are limited to statistical estimation, making it inapplicable to real-world scenarios in which each user requires a personalized output. This paper introduces a novel paradigm termed Private Individual Computation (PIC), expanding the shuffle model to support a broader range of permutation-equivariant computations. PIC enables personalized outputs while preserving privacy, and enjoys privacy amplification through shuffling. We propose a concrete protocol that realizes PIC. By using one-time public keys, our protocol enables users to receive their outputs without compromising anonymity, which is essential for privacy amplification. Additionally, we present an optimal randomizer, the Minkowski Response, designed for the PIC model to enhance utility. We formally prove the security and privacy properties of the PIC protocol. Theoretical analysis and empirical evaluations demonstrate PIC's capability in handling non-statistical computation tasks, and the efficacy of PIC and the Minkowski randomizer in achieving superior utility compared to existing solutions.

Shaowei Wang, Changyu Dong, Xiangfu Song, Jin Li, Zhili Zhou, Di Wang, Han Wu
USENIX Security Symposium (USENIX Security 2025)

[USENIX] Privacy Audit as Bits Transmission: (Im)possibilities for Audit by One Run [Link] Abstract▼
Auditing algorithms' privacy typically involves simulating a game-based protocol that determines which of two adjacent datasets was the original input. Traditional approaches require thousands of such simulations, leading to significant computational overhead. Recent methods propose single-run auditing of the target algorithm to address this, substantially reducing computational cost. However, these methods' general applicability and tightness in producing empirical privacy guarantees remain uncertain. This work studies such problems in detail. Our contributions are twofold: First, we introduce a unifying framework for privacy audits based on information-theoretic principles, modeling the audit as a bit transmission problem in a noisy channel. This formulation allows us to derive fundamental limits and develop an audit approach that yields tight privacy lower bounds for various DP protocols. Second, leveraging this framework, we demystify the method of privacy audit by one run, identifying the conditions under which single-run audits are feasible or infeasible. Our analysis provides general guidelines for conducting privacy audits and offers deeper insights into the privacy audit. Finally, through experiments, we demonstrate that our approach produces tighter privacy lower bounds on common differentially private mechanisms while requiring significantly fewer observations. We also provide a case study illustrating that our method successfully detects privacy violations in flawed implementations of private algorithms.

Zihang Xiang^★, Tianhao Wang, Di Wang
USENIX Security Symposium (USENIX Security 2025)

[AAAI] Privacy-Preserving Low-Rank Adaptation against Membership Inference Attacks for Latent Diffusion Models [Link] Abstract▼
Low-rank adaptation (LoRA) is an efficient strategy for adapting latent diffusion models (LDMs) on a private dataset to generate specific images by minimizing the adaptation loss. However, the LoRA-adapted LDMs are vulnerable to membership inference (MI) attacks that can judge whether a particular data point belongs to the private dataset, thus leading to the privacy leakage. To defend against MI attacks, we first propose a straightforward solution: Membership-Privacy-preserving LoRA (MP-LoRA). MP-LoRA is formulated as a min-max optimization problem where a proxy attack model is trained by maximizing its MI gain while the LDM is adapted by minimizing the sum of the adaptation loss and the MI gain of the proxy attack model. However, we empirically find that MP-LoRA has the issue of unstable optimization, and theoretically analyze that the potential reason is the unconstrained local smoothness, which impedes the privacy-preserving adaptation. To mitigate this issue, we further propose a Stable Membership-Privacy-preserving LoRA (SMP-LoRA) that adapts the LDM by minimizing the ratio of the adaptation loss to the MI gain. Besides, we theoretically prove that the local smoothness of SMP-LoRA can be constrained by the gradient norm, leading to improved convergence. Our experimental results corroborate that SMP-LoRA can indeed defend against MI attacks and generate high-quality images.

Zihao Luo, Xilie Xu, Feng Liu, Yun Sing Koh, Di Wang, Jingfeng Zhang
Annual AAAI Conference on Artificial Intelligence (AAAI 2025)

[AAAI] Fair Text-to-Image Diffusion via Fair Mapping [Link] Abstract▼
In this paper, we address the limitations of existing text-to-image diffusion models in generating demographically fair results when given human-related descriptions. These models often struggle to disentangle the target language context from sociocultural biases, resulting in biased image generation. To overcome this challenge, we propose Fair Mapping, a flexible, model-agnostic, and lightweight approach that modifies a pre-trained text-to-image diffusion model by controlling the prompt to achieve fair image generation. One key advantage of our approach is its high efficiency. It only requires updating an additional linear network with few parameters at a low computational cost. By developing a linear network that maps conditioning embeddings into a debiased space, we enable the generation of relatively balanced demographic results based on the specified text condition. With comprehensive experiments on face image generation, we show that our method significantly improves image generation fairness with almost the same image quality compared to conventional diffusion models when prompted with descriptions related to humans. By effectively addressing the issue of implicit language bias, our method produces more fair and diverse image outputs.

Jia Li^★*, Lijie Hu^*, Jingfeng Zhang, Tianhang Zheng, Hua Zhang, Di Wang
Annual AAAI Conference on Artificial Intelligence (AAAI 2025)
Selected as an oral paper

[AAAI] Improved Rates of Differentially Private Nonconvex-Strongly-Concave Minimax Optimization [Link] Abstract▼
In this paper, we study the problem of (finite sum) minimax optimization in the Differential Privacy (DP) model. Unlike most of the previous studies on the (strongly) convex-concave settings or loss functions satisfying the Polyak-\L{ojasiewicz} condition, here we mainly focus on the nonconvex-strongly-concave one, which encapsulates many models in deep learning such as deep AUC maximization. Specifically, we first analyze a DP version of Stochastic Gradient Descent Ascent (SGDA) and show that it is possible to get an $(\epsilon,\delta)$-DP estimator whose $l_2$-norm of the gradient for the empirical risk function is upper bounded by $\tilde{O}(\frac{d^{1/4}}{({n\epsilon})^{1/2}})$, where $d$ is the model dimension and $n$ is the sample size. We then propose a new method with less gradient noise variance and improve the upper bound to $\tilde{O}(\frac{d^{1/3}}{(n\epsilon)^{2/3}})$, which matches the best-known result for DP Empirical Risk Minimization with non-convex loss. We also discussed several lower bounds of private minimax optimization. Finally, experiments on AUC maximization, generative adversarial networks and temporal difference learning with real-world data support our theoretical analysis.

Ruijia Zhang^★*, Mingxi Lei^*, Meng Ding, Zihang Xiang, Jinhui Xu, Di Wang
Annual AAAI Conference on Artificial Intelligence (AAAI 2025)

[COLING] MQA-KEAL: Multi-hop Question Answering under Knowledge Editing for Arabic Language [Link] Abstract▼
Large Language Models (LLMs) have demonstrated significant capabilities across numerous application domains. A key challenge is to keep these models updated with latest available information, which limits the true potential of these models for the end-applications. Although, there have been numerous attempts for LLMs Knowledge Editing (KE), i.e., to edit the LLMs prior knowledge and in turn test it via Multi-hop Question Answering (MQA), yet so far these studies are primarily focused on English language. To bridge this gap, in this paper we propose: Multi-hop Questioning Answering under Knowledge Editing for Arabic Language (MQA-KEAL). MQA-KEAL stores knowledge edits as structured knowledge units in the external memory. In order to solve multi-hop question, it first uses task-decomposition to decompose the question into smaller sub-problems. Later for each sub-problem, it iteratively queries the external memory and/or target LLM in order to generate the final response. In addition, we also contribute MQUAKE-AR (Arabic translation of English benchmark MQUAKE), as well as a new benchmark MQA-AEVAL for rigorous performance evaluation of MQA under KE for Arabic language. Experimentation evaluation reveals MQA-KEAL outperforms the baseline models by a significant margin.

Muhammad Asif Ali^★, Nawal Daftardar^★, Mutayyaba Waheed, Jianbin Qin, Di Wang
International Conference on Computational Linguistics (COLING 2025)

Journal Papers

[TMLR] Beyond ordinary Lipschitz constraints: Differentially Private optimization with TNC [Link] Abstract▼
We study Stochastic Convex Optimization in Differential Privacy model (DP-SCO). Unlike previous studies, here we assume the population risk function satisfies the Tsybakov Noise Condition (TNC) with some parameter $\theta>1$, where the Lipschitz constant of the loss could be extremely large or even unbounded, but the $\ell_2$-norm gradient of the loss has bounded $k$-th moment with $k\geq 2$. For the Lipschitz case with $\theta\geq 2$, we first propose an $(\epsilon, \delta)$-DP algorithms whose utility bound is $\tilde{O}\left(\left(\tilde{r}_{2k}(\frac{1}{\sqrt{n}}+(\frac{\sqrt{d}}{n\epsilon}))^\frac{k-1}{k}\right)^\frac{\theta}{\theta-1}\right)$ in high probability, where $n$ is the sample size, $d$ is the model dimension, and $\tilde{r}_{2k}$ is a term that only depends on the $2k$-th moment of the gradient. It is notable that such an upper bound is independent of the Lipschitz constant. We then extend to the case where $\theta\geq \bar{\theta}> 1$ for some known constant $\bar{\theta}$. Moreover, when the privacy budget $\epsilon$ is small enough, we show an upper bound of $\tilde{O}\left(\left(\tilde{r}_{k}(\frac{1}{\sqrt{n}}+(\frac{\sqrt{d}}{n\epsilon}))^\frac{k-1}{k}\right)^\frac{\theta}{\theta-1}\right)$ even if the loss function is not Lipschitz. For the lower bound, we show that for any $\theta\geq 2$, the private minimax rate for $\rho$-zero Concentrated Differential Privacy is lower bounded by $\Omega\left(\left(\tilde{r}_{k}(\frac{1}{\sqrt{n}}+(\frac{\sqrt{d}}{n\sqrt{\rho}}))^\frac{k-1}{k}\right)^\frac{\theta}{\theta-1}\right)$.

Difei Xu^★, Meng Ding, Zihang Xiang, Jinhui Xu, Di Wang.
Revision, Transactions on Machine Learning Research

[ToN] Byzantine-Resilient Federated Learning under Heterogeneity and Heavy Tails [Link] Abstract▼
Byzantine resilience is essential in federated learning (FL) to safeguard model training from malicious or faulty participants. However, existing Byzantine-resilient methods struggle when faced with heavy-tailed gradient noise, a common challenge in heterogeneous environments. In this work, we propose a Byzantine-resilient FL framework specifically designed to handle both heterogeneity and heavy-tailed noise. Our approach builds on robust distributed stochastic heavy-ball optimization, incorporating update normalization and gradient/momentum clipping to mitigate the effects of heavy-tailed noise. We establish the first high-probability convergence guarantees for Byzantine-resilient FL under these conditions, showing that our algorithms achieve optimal Byzantine resilience and align with known lower bounds. Additionally, we introduce an efficient variant of the nearest neighbor mixing technique, leveraging random projections to significantly reduce computational costs in high-dimensional settings. Through rigorous theoretical analysis and extensive empirical evaluations, we demonstrate that our methods outperform existing approaches in robustness against both Byzantine failures and heavy-tailed noise.

Youming Tao, Zuyuan Zhang, Di Wang, Dongxiao Yu, Xiuzhen Cheng, Falko Dressler.
Revision, IEEE Transactions on Networking

[ToN] Wireless Aware Energy Efficient Federated Learning over Mobile Devices via Algorithm and Hardware Co-Design [Link] Abstract▼
Energy efficiency is essential for federated learning (FL) over mobile devices and its potential prosperous applica- tions. Different from existing communication efficient FL re- search efforts, which regard communication energy consumption as the bottleneck, we have observed that with ever increasing wireless transmission speed (e.g., Wi-Fi 5 or 5G), the energy consumption of wireless communications for model updates in FL is significantly reduced and sometimes is smaller than that of local on-device training. Motivated by such observations, in this paper, we propose a high-speed wireless communications inspired energy efficient federated learning over mobile devices (EEFL), whose goal is to reduce the overall energy consumption (computing + communication). In particular, we design a novel energy-aware adaptive local update policy for mobile devices by jointly consid- ering FL performance and energy saving of high-speed wireless transmissions. Furthermore, given the device’s local update policy in each FL global round, we advance the dynamic voltage and frequency scaling (DVFS) strategy to minimize local training’s energy consumption by keeping GPU and CPU working at appropriate frequencies without triggering thermal throttling. Extensive experimental results with various learning models, datasets, and wireless transmission environments demonstrate the proposed EEFL’s superiority over the peer designs in terms of energy efficiency.

Rui Chen, Qiyu Wan, Yixin Liu^*, Xinyue Zhang, Xiaoqi Qin, Yanzhao Hou, Di Wang, Xin Fu, Miao Pan
Revision, IEEE/ACM Transactions On Networking

[TACL] RepreGuard: Detecting LLM-Generated Text by Revealing Hidden Representation Patterns [Link] Abstract▼

Xin Chen, Junchao Wu, Shu Yang, Runzhe Zhan, Zeyu Wu, Ziyang Luo, Di Wang, Min Yang, Lidia S. Chao, Derek F. Wong
Transactions of the Association for Computational Linguistics

[JECE] Machine Learning and Soil Spectra Enable Rapid Identification of Multiple Heavy Metals at the Continental Scale [Link] Abstract▼
Heavy metals (HMs) detection in soil is a prerequisite for soil contamination control, but traditional chemical analysis methods are limited by efficiency and cost, thereby hindering its large-scale detection. To address this issue, this study presents the first attempt to build an accurate continental-scale classification model for HMs in soil via random forest (RF) algorithm. The Land Use and Cover Area frame Statistical Survey dataset that provides soil visible-to-near-infrared spectra and HM data was used for RF training/testing. Soil spectral data preprocessing and RF hyperparameter optimization were used to construct optimal RF model, and the Youden index was used to select the optimal cut-off point. The values of area under the curve obtained by five optimal RF classifiers were 0.90, 0.89, 0.88, 0.84, and 0.89 for As, Cr, Ni, Pb, and Cu respectively. The feature importance analysis revealed distinct bands that correspond to each HM. The application of the trained RF classifiers across U.S. suggested cities in the southeastern U.S. were more prone to HM contamination compared to the northwestern U.S. Overall, the developed RF classifier could effectively identify regions with potentially high and low contents of HM contamination at the continental-scale. This capability can greatly assist in risk assessment and the design of contamination mitigation strategies.

Chongchong Qi, Tao Hu1, Mengting Wu, Ping Zhang, Sybil Derrible, Di Wang, Yong Sik Ok, Zhang Lin.
Journal of Environmental Chemical Engineering

[TIT] Theoretical Analysis of Robust Overfitting for Wide DNNs: An NTK Approach [Link] Abstract▼
Adversarial training (AT) is a canonical method for enhancing the robustness of deep neural networks (DNNs). However, recent studies empirically demonstrated that it suffers from robust overfitting, i.e., a long time AT can be detrimental to the robustness of DNNs. This paper presents a theoretical explanation of robust overfitting for DNNs. Specifically, we non-trivially extend the neural tangent kernel (NTK) theory to AT and prove that an adversarially trained wide DNN can be well approximated by a linearized DNN. Moreover, for squared loss, closed-form AT dynamics for the linearized DNN can be derived, which reveals a new AT degeneration phenomenon: a long-term AT will result in a wide DNN degenerates to that obtained without AT and thus cause robust overfitting. Based on our theoretical results, we further design a method namely Adv-NTK, the first AT algorithm for infinite-width DNNs. Experiments on real-world datasets show that Adv-NTK can help infinite-width DNNs enhance comparable robustness to that of their finite-width counterparts, which in turn justifies our theoretical findings.

Shaopeng Fu^★, Di Wang
IEEE Transactions on Information Theory

[TIFS] Side-channel Attacks and New Principles in the Shuffle Model of Differential Privacy [Link] Abstract▼
The shuffle model employs a shuffler to anonymize and permute user messages, thereby enhancing privacy/utility trade-offs compared to the local model. Ideally, it assumes perfect message anonymity protection against adversaries, allowing each user to hide among a large population. However, in contexts like mobile/edge networks or in scenarios where the shuffler is curious, this assumption is frequently unrealistic. In this study, we demonstrate the vulnerability of the shuffle model to communication side-channel attacks, which substantially compromise privacy amplification via shuffling. We categorize side-channel information in the shuffle model into three types: (i) in-out information, revealing the victim user’s participation and timing, (ii) message-cardinality information, indicating the victim’s message count, and (iii) message-length information, disclosing the victim’s message length(s). Numerical results indicate these attacks increase privacy loss by 200% to 4100%, revealing secret value with probability more than 90%. After theoretically analyzing the remaining privacy amplification effects, we suggest several countermeasures and principles to alleviate degradation caused by these attacks: (a) appending padding bits to each message to counter message-length attacks, (b) maximizing query parallelization to elude in-out attacks and increase the population for privacy amplification, and (c) sending dummy messages to exchange communication costs for improved privacy amplification effects. The newly proposed paradigms and principles significantly save privacy budget in comparison to current models under attacks.

Shaowei Wang, Jin Li, Changyu Dong, Jin Li, Zhili Zhou, Di Wang, Zikai Wen
IEEE Transactions on Information Forensics & Security

[TMLR] Faithful Interpretation for Graph Neural Networks [Link] Abstract▼
Currently, attention mechanisms have garnered increasing attention in Graph Neural Networks (GNNs), such as Graph Attention Networks (GATs) and Graph Transformers (GTs). This is due to not only the commendable boost in performance they offer but also their capacity to provide a more lucid rationale for model behaviors, which are often viewed as inscrutable. However, Attention-based GNNs have demonstrated instability in interpretability when subjected to various sources of perturbations during both training and testing phases, including factors like additional edges or nodes. In this paper, we propose a solution to this problem by introducing a novel notion called Faithful Graph Attention-based Interpretation (FGAI). In particular, FGAI has four crucial properties in terms of stability and sensitivity to interpretation and the final output distribution. Built upon this notion, we propose an efficient methodology for obtaining FGAI, which can be viewed as an ad hoc modification to the canonical Attention-based GNNs. To validate our proposed solution, we introduce two novel metrics tailored for graph interpretation assessment. Experimental results demonstrate that FGAI exhibits superior stability and preserves the interpretability of attention under various forms of perturbations and randomness, which makes FGAI a more faithful and reliable explanation tool.

Lijie Hu^★*, Tianhao Huang^★*, Lu Yu, Wanyu Lin, Tianhang Zheng, Di Wang
Transactions on Machine Learning Research

[TKDE] EPM: Evolutionary Perception Method for Anomaly Detection in Noisy Dynamic Graphs [Link] Abstract▼
As interactions among entities expand across diverse domains such as social networks, transaction networks, and IPto-IP networks, the significance of anomaly detection in dynamic graphs has markedly increased, playing a crucial role in mitigating potential risks. Unfortunately, existing anomaly detection methods tend to assume noise-free graph structures and ignore the interference of structural noises, such as spurious and missing nodes and edges. To address this issue, we propose an Evolutionary Perception Method (EPM) for identifying anomalous nodes, which can resist the interference of structural noises and adapt to noisy dynamic graphs. It primarily consists of a dynamic fitter and a filtering reviser. The dynamic fitter characterizes the interaction dynamics of nodes that removes and generates links at each period as multiple superposition states, utilizing various link prediction algorithms to fit evolutionary mechanisms. Additionally, the filtering reviser designs evolutionary entropies to quantify the evolutionary uncertainty in multiple superposition states, further reconstructing the Kalman filter to optimize these entropies. Finally, our method analyzes the fluctuations in evolutionary entropies to discover anomalous nodes. Extensive experiments have demonstrated that EPM outperforms state-ofthe-art methods in detecting anomalous nodes in noisy dynamic graphs.

Huan Wang^★, Junyang Chen, Yirui Wu, Victor C. M. Leung, Di Wang
IEEE Transactions on Knowledge and Data Engineering

[TKDE] Towards Stable and Explainable Attention Mechanisms [Link] Abstract▼
Currently, attention mechanism has become a standard fixture in most state-of-the-art natural language processing (NLP) models, not only due to the outstanding performance it could gain but also due to plausible innate explanations for the behaviors of neural architectures it provides, which is notoriously difficult to analyze. However, recent studies show that attention is unstable against randomness and perturbations during training or testing, such as random seeds and slight perturbation of embedding vectors, which impedes it from becoming a faithful explanation tool. Thus, a natural question is whether we can find some substitute for the current attention that is more stable and could keep the most important characteristics of explanation and prediction of attention. In this paper, to resolve the problem, we provide a rigorous definition of such alternate namely SEAT. Specifically, a SEAT should have the following three properties: (1) Its prediction distribution is enforced to be close to the distribution based on the vanilla attention; (2) Its top-$k$ indices have large overlaps with those of the vanilla attention; (3) It is robust w.r.t perturbations, i.e., any slight perturbation on SEAT will not change the prediction distribution too much, which implicitly indicates that it is stable to randomness and perturbations. To further improve the interpretability stability against perturbations, based on SEAT we provide another definition called SEAT++. Then we propose a method to get a SEAT++, which could be considered an ad hoc modification for canonical attention. Finally, through intensive experiments on various datasets, we compare our SEAT and SEAT++ with other baseline methods using RNN, BiLSTM, and BERT architectures via six different evaluation metrics for model interpretation, stability, and accuracy. Results show that SEAT and SEAT++ are more stable against different perturbations and randomness while also keeping the explainability of attention, which indicates they provide more faithful explanations. Moreover, compared with vanilla attention, there is almost no utility (accuracy) degradation for SEAT and SEAT++.

Lijie Hu^★*, Xinhai Wang^★*, Yixin Liu^*, Ninghao Liu, Mengdi Huai, Lichao Sun, and Di Wang
IEEE Transactions on Knowledge and Data Engineering

[TIFS] FedMUA: Exploring the Vulnerabilities of Federated Learning to Malicious Unlearning Attacks [Link] Abstract▼
Recently, the practical needs of “the right to be forgotten” in federated learning gave birth to a paradigm known as federated unlearning, which enables the server to forget personal data upon the client’s removal request. Existing studies on federated unlearning have primarily focused on efficiently eliminating the influence of requested data from the client’s model without retraining from scratch, however, they have rarely doubted the reliability of the global model posed by the discrepancy between its prediction performance before and after unlearning. To bridge this gap, we take the first step by introducing a novel malicious unlearning attack dubbed FedMUA, aiming to unveil potential vulnerabilities emerging from federated learning during the unlearning process. Specifically, clients may act as attackers by crafting malicious unlearning requests to manipulate the prediction behavior of the global model. The crux of FedMUA is to mislead the global model into unlearning more information associated with the influential samples for the target sample than anticipated, thus inducing adverse effects on target samples from other clients. To achieve this, we design a novel two-step method, known as Influential Sample Identification and Malicious Unlearning Generation, to identify and subsequently generate malicious feature unlearning requests within the influential samples. By doing so, we can significantly alter the predictions pertaining to the target sample by initiating the malicious feature unlearning requests, leading to the deliberate manipulation for the user adversely. Additionally, we design a new defense mechanism that is highly resilient against malicious unlearning attacks. Extensive experiments on three realistic datasets reveal that FedMUA effectively induces misclassification on target samples and can achieve an 80% attack success rate by triggering only 0.3% malicious unlearning requests.

Jian Chen, Zehui Lin, Wanyu Lin, Wenlong Shi, Xiaoyan Yin, Di Wang
IEEE Transactions on Information Forensics and Security

[Neural Computation] Generalization Guarantees of Gradient Descent for Shallow Neural Networks [Link] Abstract▼
Recently, significant progress has been made in understanding the generalization of neural networks (NNs) trained by gradient descent (GD) using the algorithmic stability approach. However, most of the existing research has focused on one-hidden-layer NNs and has not addressed the impact of different network scaling parameters. In this paper, we greatly extend the previous work \cite{lei2022stability,richards2021stability} by conducting a comprehensive stability and generalization analysis of GD for multi-layer NNs. For two-layer NNs, our results are established under general network scaling parameters, relaxing previous conditions. In the case of three-layer NNs, our technical contribution lies in demonstrating its nearly co-coercive property by utilizing a novel induction strategy that thoroughly explores the effects of over-parameterization. As a direct application of our general findings, we derive the excess risk rate of $O(1/\sqrt{n})$ for GD algorithms in both two-layer and three-layer NNs. This sheds light on sufficient or necessary conditions for under-parameterized and over-parameterized NNs trained by GD to attain the desired risk rate $O(1/\sqrt{n})$. Moreover, we demonstrate that as the scaling parameter increases or the network complexity decreases, less over-parameterization is required for GD to achieve the desired error rates. Additionally, under a low-noise condition, we obtain a fast risk rate of O(1/n) for GD in both two-layer and three-layer NNs.

Puyu Wang, Yunwen Lei, Di Wang, Yiming Ying, Ding-Xuan Zhou.
Neural Computation

[TCS] Private Least Absolute Deviations with Heavy-tailed Data [Link] Abstract▼
We study the problem of Differentially Private Stochastic Convex Optimization (DPSCO) with heavy-tailed data. Specifically, we focus on the problem of Least Absolute Deviations, i.e., $\ell_1$-norm linear regression, in the $\epsilon$-DP model. While most previous work focuses on the case where the loss function is Lipschitz, in this paper we only need to assume the variates have bounded moments. Firstly, we study the case where the $\ell_2$ norm of data has a bounded second-order moment. We propose an algorithm that is based on the exponential mechanism and show that it is possible to achieve an upper bound of $\tilde{O}(\sqrt{\frac{d}{n\epsilon}})$ (with high probability). Next, we relax the assumption to bounded $\theta$-th order moment with some $\theta\in (1, 2)$ and show that it is possible to achieve an upper bound of $\tilde{O}(({\frac{d}{n\epsilon}})^\frac{\theta-1}{\theta})$. Our algorithms can also be extended to more relaxed cases where only each coordinate of the data has bounded moments, and we can get an upper bound of $\tilde{O}({\frac{d}{\sqrt{n\epsilon}}})$ and $\tilde{O}({\frac{d}{({n\epsilon})^\frac{\theta-1}{\theta}}})$ in the second and $\theta$-th moment case respectively.

Di Wang, and Jinhui Xu
Theoretical Computer Science

[IPL] TAAD: Time-varying adversarial anomaly detection in dynamic graphs [Link] Abstract▼
The timely detection of anomalous nodes that can cause significant harm is essential in real-world networks. One challenge for anomaly detection in dynamic graphs is the identification of abnormal nodes at newly emerged moments. Unfortunately, existing methods tend to learn nontransferable features from historical moments that do not generalize well to newly emerged moments. In response to this challenge, we propose Time-varying Adversarial Anomaly Detection (TAAD), a generalizable model to learn transferable features from historical moments, which can transfer prior anomaly knowledge to newly emerged moments. It comprises four components: the feature extractor, the anomaly detector, the time-varying discriminator and the score generator. The time-varying discriminator cooperates with the feature extractor to conduct adversarial training, which decreases the distributional differences in the feature representations of nodes between historical and newly emerged moments to learn transferable features. The score generator measures the distributional differences of feature representations between normal and abnormal nodes, and further learns discriminable features. Extensive experiments conducted with four different datasets present that the proposed TAAD outperforms state-of-the-art methods.

Guanghua Liu, Jia Zhang, Peng Lv, Chenlong Wang, Huan Wang, Di Wang
Information Processing & Management

[TBD] A Multi-classification Division-aggregation Framework for Fake News Detection [Link] Abstract▼
Nowadays, as human activities are shifting to social media, fake news detection has been a crucial problem. Existing methods ignore the classification difference in online news and cannot take full advantage of multi-classification knowledges. For example, when coping with a post “A mouse is frightened by a cat,” a model that learns “computer” knowledge tends to misunder- stand “mouse” and give a fake label, but a model that learns “ani- mal” knowledge tends to give a true label. Therefore, this research proposes a multi-classification division-aggregation framework to detect fake news, named CKA, which innovatively learns classifi- cation knowledges during training stages and aggregates them during prediction stages. It consists of three main components: a news characterizer, an ensemble coordinator, and a truth predic- tor. The news characterizer is responsible for extracting news fea- tures and obtaining news classifications. Cooperating with the news characterizer, the ensemble coordinator generates classifica- tion-specifical models for the maximum reservation of classifica- tion knowledges during the training stage, where each classifica- tion-specifical model maximizes the detection performance of fake news on corresponding news classifications. Further, to aggregate the classification knowledges during the prediction stage, the truth predictor uses the truth discovery technology to aggregate the pre- dictions from different classification-specifical models based on re- liability evaluation of classification-specifical models. Extensive experiments prove that our proposed CKA outperforms state-of- the-art baselines in fake news detection.

Wen Zhang, Haitao Fu, Lionel Z. Wang, Huan Wang^★, Zhiguo Gong, Pan Zhou, and Di Wang
IEEE Transactions on Big Data

2024

Conference Papers

[IEEE S&P] Preserving Node-level Privacy in Graph Neural Networks [Link] Abstract▼
Differential privacy (DP) has seen immense applications in learning on tabular, image, and sequential data where instance-level privacy is concerned. In learning on graphs, contrastingly, works on node-level privacy are highly sparse. Challenges arise as existing DP protocols hardly apply to the message-passing mechanism in Graph Neural Networks (GNNs). In this study, we propose a solution that specifically addresses the issue of node-level privacy. Our protocol consists of two main components: 1) a sampling routine called HeterPoisson, which employs a specialized node sampling strategy and a series of tailored operations to generate a batch of sub-graphs with desired properties, and 2) a randomization routine that utilizes symmetric multivariate Laplace (SML) noise instead of the commonly used Gaussian noise. Our privacy accounting shows this particular combination provides a non-trivial privacy guarantee. In addition, our protocol enables GNN learning with good performance, as demonstrated by experiments on five real-world datasets; compared with existing baselines, our method shows significant advantages, especially in the high privacy regime. Experimentally, we also 1) perform membership inference attacks against our protocol and 2) apply privacy audit techniques to confirm our protocol's privacy integrity. In the sequel, we present a study on a seemingly appealing approach (USENIX'23) that protects node-level privacy via differentially private node/instance embeddings. Unfortunately, such work has fundamental privacy flaws, which are identified through a thorough case study. More importantly, we prove an impossibility result of achieving both (strong) privacy and (acceptable) utility through private instance embedding. The implication is that such an approach has intrinsic utility barriers when enforcing differential privacy.

Zihang Xiang^★, Tianhao Wang, Di Wang
The 45th IEEE Symposium on Security and Privacy (IEEE S&P 2024).

[VLDB] Privacy Amplification via Shuffling: Unified, Simplified, and Tightened Abstract▼
The shuffle model of differential privacy provides promising privacy-utility balances in decentralized, privacy-preserving data analysis. However, the current analyses of privacy amplification via shuffling lack both tightness and generality. To address this issue, we propose the \emph{variation-ratio reduction} as a comprehensive framework for privacy amplification in both single-message and multi-message shuffle protocols. It leverages two new parameterizations: the total variation bounds of local messages and the probability ratio bounds of blanket messages, to determine indistinguishability levels. Our theoretical results demonstrate that our framework provides tighter bounds, especially for local randomizers with extremal probability design, where our bounds are exactly tight. Additionally, variation-ratio reduction complements parallel composition in the shuffle model, yielding enhanced privacy accounting for popular sampling-based randomizers employed in statistical queries (e.g., range queries, marginal queries, and frequent itemset mining). Empirical findings demonstrate that our numerical amplification bounds surpass existing ones, conserving up to $30\%$ of the budget for single-message protocols, $75\%$ for multi-message ones, and a striking $75\%$-$95\%$ for parallel composition. Our bounds also result in a remarkably efficient $\tilde{O}(n)$ algorithm that numerically amplifies privacy in less than $10$ seconds for $n=10^8$ users.

Shaowei Wang, Yun Peng, Jin Li, Zikai Wen, Zhipeng Li, Shiyu Yu, Di Wang, and Wei Yang
International Conference on Very Large Data Bases (VLDB 2024)

[VLDB] Communication Efficient and Provable Federated Unlearning Abstract▼
We investigate the problem of federated unlearning, which aims to erase the influence of requested clients or data samples on the global model trained via federated learning (FL). This problem arises from the increasing demand for the right to be forgotten as well as various privacy threats in FL. We propose a novel exact federated unlearning framework that satisfies two crucial desiderata: \textit{communication efficiency} and \textit{exact unlearning provability}. To the best of our knowledge, our work is the first to address both aspects in a unified manner. We first introduce a rigorous definition of \textit{exact} federated unlearning, which ensures that the unlearned model is statistically indistinguishable from the one trained without the removed data. We then identify the key property to enable rapid exact federated unlearning: total variation (TV) stability, which captures the sensitivity of the model parameters to small changes in the dataset. Based on this insight, we design a TV-stable FL algorithm called \texttt{FATS}, which adapts the classical \texttt{\underline{F}ed\underline{A}vg} algorithm for \underline{T}V \underline{S}tability and incorporates local SGD with periodic averaging to reduce the communication round. We also devise efficient unlearning algorithms for \texttt{FATS} under two scenarios: client-level and sample-level unlearning. We provide theoretical guarantees for our learning and unlearning algorithms, showing that they achieve exact federated unlearning with reasonable convergence rates for both the original and unlearned models. We empirically evaluate our framework on 6 benchmark datasets, and demonstrate its superiority over state-of-the-art baselines in terms of accuracy, communication cost, computation cost, and unlearning effectiveness.

Youming Tao^★*, Chenglong Wang^★*, Miao Pan, Dongxiao Yu, Xiuzhen Cheng, and Di Wang
International Conference on Very Large Data Bases (VLDB 2024)

[NeurIPS] Revisiting Differentially Private ReLU Regression [Link] Abstract▼
As one of the most fundamental non-convex learning problems, ReLU regression under differential privacy (DP) constraints, especially in high-dimensional settings, remains a challenging area in privacy-preserving machine learning. Existing results are limited to the assumptions of bounded norm $ \|\mathbf{x}\|_2 \leq 1$, which becomes stringent and strong with increasing data dimensionality. In this work, we revisit the problem of DP ReLU regression in overparameterized regimes. We propose two innovative algorithms, DP-GLMtron and DP-TAGLMtron, that outperform the conventional DPSGD. DP-GLMtron is based on a generalized linear model perceptron approach, integrating adaptive clipping and Gaussian mechanism for enhanced privacy. To overcome the constraints of small privacy budgets in DP-GLMtron, represented by $\widetilde{O}(\sqrt{1/N})$ where $N$ is the sample size, we introduce DP-TAGLMtron, which utilizes a tree aggregation protocol to balance privacy and utility effectively, showing that DP-TAGLMtron achieves comparable performance with only an additional factor of $O(\log N)$ in the utility upper bound. Moreover, our theoretical analysis extends beyond Gaussian-like data distributions to settings with eigenvalue decay, showing how data distribution impacts learning in high dimensions. Notably, our findings suggest that the utility bound could be independent of the dimension $d$, even when $d \gg N$. Experiments on synthetic and real-world datasets also validate our results.

Meng Ding, Mingxi Lei, Liyang Zhu, Shaowei Wang, Di Wang, Jinhui Xu.
The Conference on Neural Information Processing Systems (NeurIPS 2024)
[NeurIPS] Truthful High Dimensional Sparse Linear Regression [Link] Abstract▼
We study the problem of fitting the high dimensional sparse linear regression model with sub-Gaussian covariates and responses, where the data are provided by strategic or self-interested agents (individuals) who prioritize their privacy of data disclosure. In contrast to the classical setting, our focus is on designing mechanisms that can effectively incentivize most agents to truthfully report their data while preserving the privacy of individual reports. Simultaneously, we seek an estimator which should be close to the underlying parameter. We attempt to solve the problem by deriving a novel private estimator that has a closed-form expression. Based on the estimator, we propose a mechanism which has the following properties via some appropriate design of the computation and payment scheme: (1) the mechanism is $(o(1), O(n^{-\Omega({1})}))$-jointly differentially private, where $n$ is the number of agents; (2) it is an $o(\frac{1}{n})$-approximate Bayes Nash equilibrium for a $(1-o(1))$-fraction of agents to truthfully report their data; (3) the output could achieve an error of $o(1)$ to the underlying parameter; (4) it is individually rational for a $(1-o(1))$ fraction of agents in the mechanism; (5) the payment budget required from the analyst to run the mechanism is $o(1)$. To the best of our knowledge, this is the first study on designing truthful (and privacy-preserving) mechanisms for high dimensional sparse linear regression.

Liyang Zhu, Amina Manseur, Meng Ding, Jinyan Liu, Jinhui Xu, Di Wang.
The Conference on Neural Information Processing Systems (NeurIPS 2024)
[NeurIPS] Perplexity-aware Correction for Robust Alignment with Noisy Preferences [Link] Abstract▼
Alignment techniques are critical in ensuring that large language models (LLMs) output helpful and harmless content by enforcing the LLM-generated content to align with preferences. However, the existence of noisy preferences (NPs), where the responses are mistakenly labelled as chosen or rejected, could deteriorate the alignment, thus making the LLMs generate useless and even malicious content. Existing methods mitigate the issue of NPs from the loss perspective by adjusting the alignment loss based on a clean validation dataset. Orthogonal to these loss-oriented methods, we propose perplexity-aware correction (PerpCorrect) from the data perspective for robust alignment which detects and corrects NPs based on the differences between the perplexity of the chosen and rejected responses (dubbed as PPLDiff). Intuitively, a higher PPLDiff indicates a higher probability of the NP because a rejected/chosen response which is mistakenly labelled as chosen/rejected is less preferable to be generated by an aligned LLM, thus having a higher/lower perplexity. PerpCorrect works in three steps: (1) PerpCorrect aligns a surrogate LLM using the clean validation data to make the PPLDiff able to distinguish clean preferences (CPs) and NPs. (2) PerpCorrect further aligns the surrogate LLM by incorporating the reliably clean training data whose PPLDiff is extremely small and reliably noisy training data whose PPLDiff is extremely large after correction to boost the discriminatory power. (3) Detecting and correcting NPs according to the PPLDiff obtained by the aligned surrogate LLM to obtain a denoised training dataset for robust alignment. Comprehensive experiments validate that our proposed PerpCorrect can achieve state-of-the-art alignment performance under NPs. Notably, PerpCorrect demonstrates practical utility by requiring only a modest number of validation data and being compatible with various alignment techniques.

Keyi Kong, Xilie Xu, Di Wang, Jingfeng Zhang, Mohan Kankanhalli.
The Conference on Neural Information Processing Systems (NeurIPS 2024)
[NeurIPS] Towards Multi-dimensional Explanation Alignment for Medical Classification [Link] Abstract▼
The lack of interpretability in the field of medical image analysis has significant ethical and legal implications. Existing interpretable methods in this domain encounter several challenges, including dependency on specific models, difficulties in understanding and visualization, and issues related to efficiency. To address these limitations, we propose a novel framework called Med-MICN (Medical Multi-dimensional Interpretable Concept Network). Med-MICN provides interpretability alignment for various angles, including neural symbolic reasoning, concept semantics, and saliency maps, which are superior to current interpretable methods. Its advantages include high prediction accuracy, interpretability across multiple dimensions, and automation through an end-to-end concept labeling process that reduces the need for extensive human training effort when working with new datasets. To demonstrate the effectiveness and interpretability of Med-MICN, we apply it to four benchmark datasets and compare it with baselines. The results clearly demonstrate the superior performance and interpretability of our Med-MICN.

Lijie Hu, Songning Lai, Wenshuo Chen, Hongru Xiao, Hongbin Lin, Lu Yu, Jingfeng Zhang, Di Wang.
The Conference on Neural Information Processing Systems (NeurIPS 2024)
[ICML] Improving Interpretation Faithfulness for Vision Transformers [Link] Abstract▼
Vision Transformers (ViTs) have achieved state-of-the-art performance for various vision tasks. One reason behind the success lies in their ability to provide plausible innate explanations for the behavior of neural architectures. However, ViTs suffer from issues with explanation faithfulness, as their focal points are fragile to adversarial attacks and can be easily changed with even slight perturbations on the input image. In this paper, we propose a rigorous approach to mitigate these issues by introducing Faithful ViTs (FViTs). Briefly speaking, an FViT should have the following two properties: (1) The top-k indices of its self-attention vector should remain mostly unchanged under input perturbation, indicating stable explanations; (2) The prediction distribution should be robust to perturbations. To achieve this, we propose a new method called Denoised Diffusion Smoothing (DDS), which adopts randomized smoothing and diffusion-based denoising. We theoretically prove that processing ViTs directly with DDS can turn them into FViTs. We also show that Gaussian noise is nearly optimal for both ℓ2 and ℓ∞-norm cases. Finally, we demonstrate the effectiveness of our approach through comprehensive experiments and evaluations. Specifically, we compare our FViTs with other baselines through visual interpretation and robustness accuracy under adversarial attacks. Results show that FViTs are more robust against adversarial attacks while maintaining the explainability of attention, indicating higher faithfulness.

Lijie Hu^★*, Yixin Liu^*, Ninghao Liu, Mengdi Huai, Lichao Sun, and Di Wang
The 41st International Conference on Machine Learning (ICML 2024)
Selected as a spotlight paper
[ICML] Understanding Forgetting in Continual Learning with Linear Regression [Link] Abstract▼
Continual learning, focused on sequentially learning multiple tasks, has gained significant attention recently. Despite the tremendous progress made in the past, the theoretical understanding, especially factors contributing to catastrophic forgetting , remains relatively unexplored. In this paper, we provide a general theoretical analysis of forgetting in the linear regression model via Stochastic Gradient Descent (SGD) applicable to both under-parameterized and overparameterized regimes. Our theoretical framework reveals some interesting insights into the intricate relationship between task sequence and algorithmic parameters, an aspect not fully captured in previous studies due to their restrictive assumptions. Specifically, {we demonstrate that, given a sufficiently large data size, the arrangement of tasks in a sequence—where tasks with larger eigenvalues in their population data covariance matrices are trained later—tends to result in increased forgetting.} Additionally, our findings highlight that an appropriate choice of step size will help mitigate forgetting in both under-parameterized and overparameterized settings. To validate our theoretical analysis, we conducted simulation experiments on both linear regression models and Deep Neural Networks (DNNs). Results from these simulations substantiate our theoretical findings.

Meng Ding, Kaiyi Ji, Di Wang, and Jinhui Xu
The 41st International Conference on Machine Learning (ICML 2024)
[ICML] Closing the Gap: Achieving Global Convergence (Last Iterate) of Actor-Critic under Markovian Sampling with Neural Network Parametrization [Link] Abstract▼
The current state-of-the-art theoretical analysis of Actor-Critic (AC) algorithms significantly lags in addressing the practical aspects of AC implementations. This crucial gap needs bridging to bring the analysis in line with practical implementations of AC. To address this, we advocate for considering the MMCLG criteria: Multi-layer neural network parametrization for actor/critic, Markovian sampling, Continuous state-action spaces, the performance of the Last iterate, and Global optimality. These aspects are practically significant and have been largely overlooked in existing theoretical analyses of AC algorithms. In this work, we address these gaps by providing the first comprehensive theoretical analysis of AC algorithms that encompasses all five crucial practical aspects (covers MMCLG criteria). We establish global convergence sample complexity bounds of $O(\epsilon^{-3})$ . We achieve this result through our novel use of the weak gradient domination property of MDP's and our unique analysis of the error in critic estimation.

Mudit Gaur, Amrit Bedi, Di Wang, Vaneet Aggarwal
The 41st International Conference on Machine Learning (ICML 2024)
Selected as a spotlight paper
[ICLR] Faithful Vision-Language Interpretation via Concept Bottleneck Models [Link] Abstract▼
The demand for transparency in healthcare and finance has led to interpretable machine learning (IML) models, notably the concept bottleneck models (CBMs), valued for its potential in performance and insights into deep neural networks. However, CBM's reliance on manually annotated data poses challenges. Label-free CBM has emerged to address this, but they remain unstable, affecting their faithfulness as explanatory tools. To address this inherent instability issue, we introduce a formal definition for an alternative concept called the Faithful Vision-Language Concept (FVLC) models. We present a methodology for constructing an FVLC that satisfies four critical properties. Our extensive experimentation, conducted on four benchmark datasets using Label-free CBM model architectures, demonstrates that our FVLC outperforms other baselines in terms of stability against input and concept set perturbations. Our approach incurs minimal accuracy degradation compared to the vanilla CBM, making it a promising solution for reliable and faithful model interpretation.

Songning Lai^★*, Lijie Hu^★*, Junxiao Wang, Laure Berti-Equille, and Di Wang
The 12th International Conference on Learning Representations (ICLR 2024)

[ICLR] Improved Analysis of Sparse Linear Regression in Local Differential Privacy Model [Link] Abstract▼
In this paper, we revisit the problem of sparse linear regression in the local differential privacy (LDP) model. Existing research in the non-interactive and sequentially local models has focused on obtaining the lower bounds for the case where the underlying parameter is $1$-sparse, and extending such bounds to the more general $k$-sparse case has proven to be challenging. Moreover, it is unclear whether efficient non-interactive LDP (NLDP) algorithms exist. To address these issues, we first consider the problem in the $\epsilon$ non-interactive LDP model and provide a lower bound of $\Omega(\frac{\sqrt{dk\log d}}{\sqrt{n}\epsilon})$ on the $\ell_2$-norm estimation error for sub-Gaussian data, where $n$ is the sample size and $d$ is the dimension of the space. We propose an innovative NLDP algorithm, the very first of its kind for the problem. As a remarkable outcome, this algorithm also yields a novel and highly efficient estimator as a valuable by-product. Our algorithm achieves an upper bound of $\tilde{O}({\frac{d\sqrt{k}}{\sqrt{n}\epsilon}})$ for the estimation error when the data is sub-Gaussian, which can be further improved by a factor of $O(\sqrt{d})$ if the server has additional public but unlabeled data. For the sequentially interactive LDP model, we show a similar lower bound of $\Omega({\frac{\sqrt{dk}}{\sqrt{n}\epsilon}})$. As for the upper bound, we rectify a previous method and show that it is possible to achieve a bound of $\tilde{O}(\frac{k\sqrt{d}}{\sqrt{n}\epsilon})$. Our findings reveal fundamental differences between the non-private case, central DP model, and local DP model in the sparse linear regression problem.

Liyang Zhu^★*, Meng Ding^★*, Vaneet Aggarwal, Jinhui Xu, and Di Wang
The 12th International Conference on Learning Representations (ICLR 2024)

[ICLR] Theoretical Analysis of Robust Overfitting for Wide DNNs: An NTK Approach [Link] Abstract▼
Adversarial training (AT) is a canonical method for enhancing the robustness of deep neural networks (DNNs). However, recent studies empirically demonstrated that it suffers from robust overfitting, i.e., a long time AT can be detrimental to the robustness of DNNs. This paper presents a theoretical explanation of robust overfitting for DNNs. Specifically, we non-trivially extend the neural tangent kernel (NTK) theory to AT and prove that an adversarially trained wide DNN can be well approximated by a linearized DNN. Moreover, for squared loss, closed-form AT dynamics for the linearized DNN can be derived, which reveals a new AT degeneration phenomenon: a long-term AT will result in a wide DNN degenerates to that obtained without AT and thus cause robust overfitting. Based on our theoretical results, we further design a method namely Adv-NTK, the first AT algorithm for infinite-width DNNs. Experiments on real-world datasets show that Adv-NTK can help infinite-width DNNs enhance comparable robustness to that of their finite-width counterparts, which in turn justifies our theoretical findings.

Shaopeng Fu^★ and Di Wang
The 12th International Conference on Learning Representations (ICLR 2024)

[ICLR] An LLM can Fool Itself: A Prompt-Based Adversarial Attack [Link] Abstract▼
The wide-ranging applications of large language models (LLMs), especially in safety-critical domains, necessitate the proper evaluation of the LLM's adversarial robustness. This paper proposes an efficient tool to audit the LLM's adversarial robustness via a prompt-based adversarial attack (PromptAttack). PromptAttack converts adversarial textual attacks into an attack prompt that can cause the victim LLM to output the adversarial sample to fool itself. The attack prompt is composed of three important components: (1) original input (OI) including the original sample and its ground-truth label, (2) attack objective (AO) illustrating a task description of generating a new sample that can fool itself without changing the semantic meaning, and (3) attack guidance (AG) containing the perturbation instructions to guide the LLM on how to complete the task by perturbing the original sample at character, word, and sentence levels, respectively. Besides, we use a fidelity filter to ensure that PromptAttack maintains the original semantic meanings of the adversarial examples. Further, we enhance the attack power of PromptAttack by ensembling adversarial examples at different perturbation levels. Comprehensive empirical results using Llama2 and GPT-3.5 validate that PromptAttack consistently yields a much higher attack success rate compared to AdvGLUE and AdvGLUE++. Interesting findings include that a simple emoji can easily mislead GPT-3.5 to make wrong predictions.

Xilie Xu, Keyi Kong, Ning Liu, Lizhen Cui, Di Wang, Jingfeng Zhang, and Mohan Kankanhalli
The 12th International Conference on Learning Representations (ICLR 2024)

[CoLM] Multi-hop Question Answering under Temporal Knowledge Editing [Link] Abstract▼
Multi-hop question answering (MQA) under knowledge editing (KE) has garnered significant attention in the era of large language models. However, existing models for MQA under KE exhibit poor performance when dealing with questions containing explicit temporal contexts. To address this limitation, we propose a novel framework, namely TEMPoral knowLEdge augmented Multi-hop Question Answering (TEMPLE-MQA). Unlike previous methods, TEMPLE-MQA first constructs a time-aware graph (TAG) to store edit knowledge in a structured manner. Then, through our proposed inference path, structural retrieval, and joint reasoning stages, TEMPLE-MQA effectively discerns temporal contexts within the question query. Experiments on benchmark datasets demonstrate that TEMPLE-MQA significantly outperforms baseline models. Additionally, we contribute a new dataset, namely TKEMQA, which serves as the inaugural benchmark tailored specifically for MQA with temporal scopes.

Keyuan Cheng^★*, Gang Lin^★*, Haoyang Fei^★*, Yuxuan Zhai, Lu Yu, Muhammad Asif Ali^★, Lijie Hu^★, Di Wang.
The 1st Conference on Language Modeling (COLM 2024)

[CoLM] Model Autophagy Analysis to Explicate Self-consumption within Human-AI Interactions [Link] Abstract▼
The increasing significance of large models and their multi-modal variants in societal information processing has ignited debates on social safety and ethics. However, there exists a paucity of comprehensive analysis for (i) the interactions between human and artificial intelligence systems, and (ii) understanding and addressing the associated limitations. To bridge this gap, we propose Model AutOphagy ANALysis (MONAL) for self-consumption explanation. MONAL employs two distinct autophagous loops (referred to as “self-consumption loops”) to elucidate the suppression of human-generated information in the exchange between human and AI systems. Through comprehensive experiments on diverse datasets, we evaluate the capacities of generated models as both creators and disseminators of information. Our key findings reveal (i) A progressive prevalence of model-generated synthetic information over time within training datasets compared to human-generated information; (ii) The discernible tendency of large models, when acting as information transmitters across multiple iterations, to selectively modify or prioritize specific contents; and (iii) The potential for a reduction in the diversity of socially or human-generated information, leading to bottlenecks in the performance enhancement of large models and confining them to local optima.

Shu Yang^★*, Muhammad Asif Ali^★*, Lu Yu^★, Lijie Hu, and Di Wang
The 1st Conference on Language Modeling (COLM 2024)

[EMNLP] Dissecting Fine-Tuning Unlearning in Large Language Models [Link] Abstract▼
Fine-tuning-based unlearning methods prevail for erasing targeted harmful, sensitive, or copyrighted information within large language models while preserving overall capabilities. However, the true effectiveness of the methods is unclear. In this paper, we delve into the limitations of fine-tuning-based unlearning through activation patching and parameter restoration experiments. Our findings reveal that these methods alter the model's knowledge retrieval process, rather than genuinely erasing the problematic knowledge embedded in the model parameters. Furthermore, behavioral tests demonstrate that the unlearning mechanisms inevitably impact the global behavior of the models, affecting unrelated knowledge or capabilities. Our work advocates the development of more resilient unlearning techniques for truly erasing knowledge.

Yihuai Hong, Yuelin Zou, Lijie Hu^★, Ziqian Zeng, Di Wang, Haiqin Yang.
The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP 2024)
Selected as an oral paper

[EMNLP] Private Language Models via Truncated Laplacian Mechanism [Link] Abstract▼
Recently it has been shown that deep learning models for NLP tasks are prone to attacks that can even reconstruct the verbatim training texts. To prevent privacy leakage, researchers have investigated word-level perturbations, relying on the formal guarantees of differential privacy (DP) in the embedding space. However, many existing approaches either achieve unsatisfactory performance in the high privacy regime when using the Laplacian or Gaussian mechanism, or resort to weaker relaxations of DP that are inferior to the canonical DP in terms of privacy strength. This raises the question of whether a new method for private word embedding can be designed to overcome these limitations. In this paper, we propose a novel private embedding method called the high dimensional truncated Laplacian mechanism. Specifically, we introduce a non-trivial extension of the truncated Laplacian mechanism, which was previously only investigated in one-dimensional space cases. Theoretically, we show that our method has a lower variance compared to the previous private word embedding methods. To further validate its effectiveness, we conduct comprehensive experiments on private embedding and downstream tasks using three datasets. Remarkably, even in the high privacy regime, our approach only incurs a slight decrease in utility compared to the non-private scenario.

Tianhao Huang^★*, Tao Yang^★*, Ivan Habernal, Lijie Hu^★, Di Wang
The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP 2024)
Selected as an oral paper

[ACL] Autonomous Workflow for Multimodal Fine-Grained Training Assistants Towards Mixed Reality [Link] Abstract▼
Autonomous artificial intelligence (AI) agents have emerged as promising protocols for automatically understanding the language-based environment, particularly with the exponential development of large language models (LLMs). However, a fine-grained, comprehensive understanding of multimodal environments remains under-explored. This work designs an autonomous workflow tailored for integrating AI agents seamlessly into extended reality (XR) applications for fine-grained training. We present a demonstration of a multimodal fine-grained training assistant for LEGO brick assembly in a pilot XR environment. Specifically, we design a cerebral language agent that integrates LLM with memory, planning, and interaction with XR tools and a vision-language agent, enabling agents to decide their actions based on past experiences. Furthermore, we introduce LEGO-MRTA, a multimodal fine-grained assembly dialogue dataset synthesized automatically in the workflow served by a commercial LLM. This dataset comprises multimodal instruction manuals, conversations, XR responses, and vision question answering. Last, we present several prevailing open-resource LLMs as benchmarks, assessing their performance with and without fine-tuning on the proposed dataset. We anticipate that the broader impact of this workflow will advance the development of smarter assistants for seamless user interaction in XR environments, fostering research in both AI and HCI communities.

Jiahuan Pei, Haochen Huang, Junxiao Wang, Moonisa Ahsan, Fanghua Ye, Jiang Yiming, Yao Sai, Di Wang, Zhumin Chen, Pengjie Ren, Irene Viola, Pablo Cesar
The 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024 Findings).

[EACL] Antonym vs Synonym Distinction using InterlaCed Encoder NETworks (ICE-NET) [Link] Abstract▼
Antonyms vs synonyms distinction is a core challenge in lexico-semantic analysis and automated lexical resource construction. These pairs share a similar distributional context which makes it harder to distinguish them. Leading research in this regard attempts to capture the properties of the relation pairs, i.e., symmetry, transitivity, and trans-transitivity. However, the inability of existing research to appropriately model the relation-specific properties limits their end performance. In this paper, we propose InterlaCed Encoder NETworks (i.e., ICE-NET) for antonym vs synonym distinction, that aim to capture and model the relation-specific properties of the antonyms and synonyms pairs in order to perform the classification task in a performance-enhanced manner. Experimental evaluation using the benchmark datasets shows that ICE-NET outperforms the existing research by a relative score of upto 1.8% in F1-measure.

Muhammad Asif Ali^★, Yan Hu^★, Jianbin Qin, and Di Wang
The 18th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2024 Findings)

[EACL] Differentially Private Natural Language Models: Recent Advances and Future Directions [Link] Abstract▼
Recent developments in deep learning have led to great success in various natural language processing (NLP) tasks. However, these applications may involve data that contain sensitive information. Therefore, how to achieve good performance while also protect privacy of sensitive data is a crucial challenge in NLP. To preserve privacy, Differential Privacy (DP), which can prevent reconstruction attacks and protect against potential side knowledge, is becoming a de facto technique for private data analysis. In recent years, NLP in DP models (DP-NLP) has been studied from different perspectives, which deserves a comprehensive review. In this paper, we provide the first systematic review of recent advances on DP deep learning models in NLP. In particular, we first discuss some differences and additional challenges of DP-NLP compared with the standard DP deep learning. Then we investigate some existing work on DP-NLP and present its recent developments from two aspects: gradient perturbation based methods and embedding vector perturbation based methods. We also discuss some challenges and future directions of this topic.

Lijie Hu^★, Ivan Habernal, Lei Shen, and Di Wang
The 18th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2024 Findings)

Journal Papers

[IANDC] Truthful and Privacy-preserving Generalized Linear Models [Link] Abstract▼
In this paper we study estimating Generalized Linear Models (GLMs) in the case where the agents (individuals) are strategic or self-interested and they concern about their privacy when reporting data. Compared with the classical setting, here we aim to design mechanisms that can both incentivize most agents to truthfully report their data and preserve the privacy of individuals' reports, while their outputs should also close to the underlying parameter. In the first part of the paper, we consider the case where the covariates are sub-Gaussian and the responses are heavy-tailed where they only have the finite fourth moments. First, motivated by the stationary condition of the maximizer of the likelihood function, we derive a novel private and closed form estimator. Based on the estimator, we propose a mechanism which has the following properties via some appropriate design of the computation and payment scheme for several canonical models such as linear regression, logistic regression and Poisson regression: (1) the mechanism is $o(1)$-jointly differentially private (with probability at least $1-o(1)$); (2) it is an $o(\frac{1}{n})$-approximate Bayes Nash equilibrium for a $(1-o(1))$-fraction of agents to truthfully report their data, where $n$ is the number of agents; (3) the output could achieve an error of $o(1)$ to the underlying parameter; (4) it is individually rational for a $(1-o(1))$ fraction of agents in the mechanism ; (5) the payment budget required from the analyst to run the mechanism is $o(1)$. In the second part, we consider the linear regression model under more general setting where both covariates and responses are heavy-tailed and only have finite fourth moments. By using an $\ell_4$-norm shrinkage operator, we propose a private estimator and payment scheme which have similar properties as in the sub-Gaussian case.

Yuan Qiu^★, Jinyan Liu, and Di Wang
Information and Computation

[JMLR] Faster Rates of Private Stochastic Convex Optimization [Link] Abstract▼
In this paper, we revisit the problem of Differentially Private Stochastic Convex Optimization (DP-SCO) and provide excess population risks for some special classes of functions that are faster than the previous results of general convex and strongly convex functions. In the first part of the paper, we study the case where the population risk function satisfies the Tysbakov Noise Condition (TNC) with some parameter $\theta>1$. Specifically, we first show that under some mild assumptions on the loss functions, there is an algorithm whose output could achieve an upper bound of $\tilde{O}((\frac{1}{\sqrt{n}}+\frac{d}{n\epsilon})^\frac{\theta}{\theta-1}) $ and $\tilde{O}((\frac{1}{\sqrt{n}}+\frac{\sqrt{d\log(1/\delta)}}{n\epsilon})^\frac{\theta}{\theta-1})$ for $\epsilon$-DP and $(\epsilon, \delta)$-DP, respectively when $\theta\geq 2$, here $n$ is the sample size and $d$ is the dimension of the space. Then we address the inefficiency issue, improve the upper bounds by $\text{Poly}(\log n)$ factors and extend to the case where $\theta\geq \bar{\theta}>1$ for some known $\bar{\theta}$. Next we show that the excess population risk of population functions satisfying TNC with parameter $\theta\geq 2$ is always lower bounded by $\Omega((\frac{d}{n\epsilon})^\frac{\theta}{\theta-1}) $ and $\Omega((\frac{\sqrt{d\log(1/\delta)}}{n\epsilon})^\frac{\theta}{\theta-1})$ for $\epsilon$-DP and $(\epsilon, \delta)$-DP, respectively, which matches our upper bounds. In the second part, we focus on a special case where the population risk function is strongly convex. Unlike the previous studies, here we assume the loss function is {\em non-negative} and {\em the optimal value of population risk is sufficiently small}. With these additional assumptions, we propose a new method whose output could achieve an upper bound of $O(\frac{d\log(1/\delta)}{n^2\epsilon^2}+\frac{1}{n^{\tau}})$ and $O(\frac{d^2}{n^2\epsilon^2}+\frac{1}{n^{\tau}})$ for any $\tau> 1$ in $(\epsilon,\delta)$-DP and $\epsilon$-DP model respectively if the sample size $n$ is sufficiently large. These results circumvent their corresponding lower bounds in \cite{feldman2020private} for general strongly convex functions. Finally, we conduct experiments of our new methods on real world data. Experimental results also provide new insights into established theories.

Jinyan Su^★, Lijie Hu^★, and Di Wang
Journal of Machine Learning Research

[TMC] Private Over-the-Air Federated Learning at Band-Limited Edge [Link] Abstract▼
We investigate over-the-air federated learning (OTA-FL) that exploits over-the-air computing (AirComp) to integrate communication and computation seamlessly for FL. Privacy presents a serious obstacle for OTA-FL, as it can be compromised by maliciously manipulating channel state information (CSI). Moreover, the limited band at edge hinders OTA-FL from training large-scale models. It remains open how to enable a multitude of devices with constrained resources and sensitive data to collaboratively train a global model at band-limited edge. To tackle this, we design a novel algorithm PROBE building upon a lightweight over-the-air gradients aggregation rule PB-O-GAR. Specifically, PB-O-GAR combines a random sparsification-like dimension reduction with Gaussian perturbation to provide rigorous privacy and band-adapted communication. It elaborately calibrates the transmission signal according to devices’ perceived CSI for heterogeneous power constraints accommodation and CSI attack resilience. We show that by utilizing the common randomness, which deviates from the conventional FL, random sparsification-like dimension reduction can augment privacy in addition to the intrinsic privacy amplification effect of AirComp. We establish near-optimal convergence rates and explicit trade-offs among privacy, communication and utility for PROBE. Finally, extensive experiments on benchmark datasets are conducted to validate our theoretical findings and showcase the superiority of PROBE in realistic settings

Youming Tao, Shuzhen Chen, Congwei Zhang, Di Wang, Dongxiao Yu, Xiuzhen Cheng, and Falko Dressler.
IEEE Transactions on Mobile Computing.

[TKDD] Fair Single Index Model [Link] Abstract▼
Single-index models (SIMs) have been widely used in various applications due to their simplicity and interpretability. However, despite the potential for SIMs to result in discriminatory outcomes based on sensitive attributes like gender, race, or ethnicity, the issue of fairness has not been thoroughly examined in recent studies on the topic. This paper aims to address these fairness concerns by proposing methods for building fair SIMs. Specifically, based on the definition of equal opportunity, we first provide a fairness definition for SIM. Next, we develop a unified fair SIM model and propose an efficient method to solve the fair SIM. Theoretically, we also show that our output is consistent in fairness. Finally, we conduct comprehensive experimental studies over 7 benchmark datasets and demonstrate that our fair SIM outperforms the other 8 baseline methods.

Yidong Wang^★*, Meng Ding^★*, Jinhui Xu and Di Wang
ACM Transactions on Knowledge Discovery from Data

[TMLR] Persistent Local Homology in Graph Learning [Link] Abstract▼
In this study, we introduce Persistent Local Homology (PLH) for graphs, a novel method that synergizes persistent homology with local homology to analyze graph structures. We begin by mathematically formalizing PLH, defining it as the application of persistent homology to annular local subgraphs. This foundation paves the way for the development of a computational pipeline, specifically tailored for PLH, which we explore in various graph learning contexts. Despite its utility, a complexity analysis reveals potential computational bottlenecks in PLH application. To address this, we propose Reduced PLH (rPLH), an efficient variant designed to significantly lower computational complexity. Experimental evaluations with rPLH demonstrate its capability to retain the effectiveness of the original PLH while substantially reducing computational demands. The practical utility of PLH and rPLH is further corroborated through comprehensive experiments on both synthetic and real-world datasets, highlighting their broad applicability and potential in diverse analytical scenarios.

Minghua Wang, Yan Hu, Ziyun Huang, Di Wang, and Jinhui Xu
Transactions on Machine Learning Research

[TCSS] Multitask Asynchronous Meta-learning for Few-shot Anomalous Node Detection in Dynamic Networks. [Link] Abstract▼
Few-shot anomalous node detection in dynamic net- works has been extensively investigated in the field of research. In this few-shot scenario, the detection of these anomalous nodes is particularly challenging due to the continuously evolving network topology and data distribution over time, which is known as concept drift. Concept drift refers to the phenomenon where the underlying concepts or patterns in the data generation process change over time, leading to varying data distributions across different periods. Due to these changes in data distribution, the patterns learned during training may become invalid under the new data distribution. Existing models primarily aim to enhance the representation of evolving node attributes and relationships to mitigate the impact of concept drift in few-shot scenarios. However, the scarcity of anomalous samples further limits the model’s ability to learn new patterns, thereby reducing its effectiveness in addressing concept drift in few-shot scenarios. To address this challenge, we propose the Multitask Asynchronous Meta-learning Framework (M AM F ), which aims to mitigate bias induced by concept drift in few-shot anomalous node detection. Our framework consists of four main components: a feature extractor, an anomaly simulator, an asynchronous learner, and a type detector. The feature extractor captures the relative variations of each node in an evolving graph stream. The anomaly simulator uses generative adversarial models to learn anomaly distributions and generate samples at different time intervals. The asynchronous learner samples from various time distributions to create meta-tasks for anomalous node detection, allowing it to adapt to changes between these distributions. To aid in few-shot anomalous node detection, the type detector is used for anomaly type recognition. Our framework achieves AUC improvements of 5.12%, 6.87%, and 1.91% over the best existing methods on Wikipedia, Reddit, and Mooc datasets, respectively, demonstrating its effectiveness and robustness in adapting to concept drift and detecting anomalous nodes.

Yifan Hong, Lionel Z. WANG, Chuanqi Shi, Junyang Chen, Xiaomei Wei, Huan Wang, Di Wang
IEEE Transactions on Computational Social Systems

[NL] Near-perfect Coverage Manifold Estimation in Cellular Networks via conditional GAN [Link] Abstract▼
This paper presents a conditional generative adver- sarial network (cGAN) that translates base station location (BSL) information of any Region-of-Interest (RoI) to location-dependent coverage probability values within a subset of that region, called the region-of-evaluation (RoE). We train our network utilizing the BSL data of India, the USA, Germany, and Brazil. In comparison to the state-of-the-art convolutional neural networks (CNNs), our model improves the prediction error (L1 difference between the coverage manifold generated by the network under consideration and that generated via simulation) by two orders of magnitude. Moreover, the cGAN-generated coverage manifolds appear to be almost visually indistinguishable from the ground truth.

Washim Uddin Mondal, Veni Goyal, Goutam Das, Satish V. Ukkusuri, Di Wang, Mohamed-Slim Alouini, and Vaneet Aggarwal
IEEE Networking Letters

2023

Conference Papers

[USENIX] Inductive Graph Unlearning [Link] [Code] Abstract▼
As a way to implement the "right to be forgotten" in machine learning, \textit{machine unlearning} aims to completely remove the contributions and information of the samples to be deleted from a trained model without affecting the contributions of other samples. Recently, many frameworks for machine unlearning have been proposed, and most of them focus on image and text data. To extend machine unlearning to graph data, \textit{GraphEraser} has been proposed. However, a critical issue is that \textit{GraphEraser} is specifically designed for the transductive graph setting, where the graph is static and attributes and edges of test nodes are visible during training. {\color{blue}It is unsuitable for the inductive setting, where the graph could be dynamic and the test graph information is invisible in advance. Such inductive capability is essential for production machine learning systems with evolving graphs like social media and transaction networks.} To fill this gap, we propose the \underline{{\bf G}}\underline{{\bf U}}ided Guided Inductive Graph Unlearning framework (GUIDE). GUIDE consists of three components: guided graph partitioning with fairness and balance, efficient subgraph repair, and similarity-based aggregation. Empirically, we evaluate our method on several inductive benchmarks and evolving transaction graphs. Experimental results demonstrate the efficiency and practicality of GUIDE for inductive graph unlearning. In particular, GUIDE achieves faster implementation compared to \textit{GraphEraser} and outperforms the existing state-of-the-art methods on inductive graph learning tasks.

Cheng-Long Wang^★, Mengdi Huai, and Di Wang
The 32nd USENIX Security Symposium (USENIX 2023)

[IEEE S&P] A Theory to Instruct Differentially-Private Learning via Clipping Bias Reduction [Link] [Code] Abstract▼
We study the bias introduced in Differentially-Private Stochastic Gradient Descent (DP-SGD) with clipped or normalized per-sample gradient. As one of the most popular but artificial operations to ensure bounded sensitivity, gradient clipping enables composite privacy analysis of many iterative optimization methods without additional assumptions on either learning models or input data. Despite its wide applicability, gradient clipping presents theoretical challenges in systematically instructing improvement of privacy or utility. In general, without an assumption on globally-bounded gradient, classic convergence analyses do not apply to clipped gradient descent. Further, given limited understanding of the utility loss, many existing improvements upon DP-SGD are heuristic, especially in the applications of private deep learning. In this paper, we provide meaningful theoretical analysis validated by thorough empirical results of DP-SGD. We point out that the bias caused by gradient clipping is highly underestimated in previous works. For generic non-convex optimization via DP-SGD, we show one key factor contributing to the bias is the sampling noise of stochastic gradient to be clipped. Accordingly, we use the developed theory to build a series of improvements for sampling noise reduction from various perspectives. From an optimization angle, we study variance reduction techniques and propose inner-outer momentum. At the learning model (neural network) level, we propose several tricks to enhance network internal normalization and BatchClipping to carefully clip the gradient of a batch of samples. For the data preprocessing, we provide theoretical justification of recently proposed improvements via data normalization and (self-)augmentation. Putting these systematic improvements together, private deep learning via DP-SGD can be significantly strengthened in many tasks. For example, in computer vision applications, with (\epsilon=8, \delta=10^{-5}) DP guarantee, we successfully train Resnet20 on CIFAR10 with accuracy 74.5%; for natural language processing, with (\epsilon=4, \delta=10^{-5}), we successfully train recurrent neural network on IMDb data with accuracy 77.5%.

Hanshen Xiao^*, Zihang Xiang^*★, Di Wang, and Srini Devadas (* equal contribution)
The 44th IEEE Symposium on Security and Privacy (IEEE S&P 2023)

[SIGMOD] On Practical Differentially Private and Byzantine-resilient Federated Learning [Link] [Code] Abstract▼
Privacy and Byzantine resilience are two indispensable requirements for a federated learning (FL) system. Although there have been extensive studies on privacy and Byzantine security in their own track, solutions that consider both remain sparse. This is due to difficulties in reconciling privacy-preserving and Byzantine-resilient algorithms. In this work, we propose a solution to such a two-fold issue. We use our version of differentially private stochastic gradient descent (DP-SGD) algorithm to preserve privacy and then apply our Byzantine-resilient algorithms. We note that while existing works follow this general approach, an in-depth analysis on the interplay between DP and Byzantine resilience has been ignored, leading to unsatisfactory performance. Specifically, for the random noise introduced by DP, previous works strive to reduce its seemingly detrimental impact on the Byzantine aggregation. In contrast, we leverage the random noise to construct a first-stage aggregation that effectively rejects many existing Byzantine attacks. Moreover, based on another property of our DP variant, we form a second-stage aggregation which provides a final sound filtering. Our protocol follows the principle of co-designing both DP and Byzantine resilience. We provide both theoretical proof and empirical experiments to show our protocol is effective: retaining high accuracy while preserving the DP guarantee and Byzantine resilience. Compared with the previous work, our protocol 1) achieves significantly higher accuracy even in a high privacy regime; 2) works well even when up to 90% distributive workers are Byzantine.

Zihang Xiang^★, Tianhao Wang , Wanyu Lin, and Di Wang
International Conference on Management of Data (SIGMOD 2023)

[NeurIPS] On Private and Robust Bandits [Link] Abstract▼
We study private and robust multi-armed bandits (MABs), where the agent receives Huber's contaminated heavy-tailed rewards and meanwhile needs to ensure differential privacy. We consider both the finite $k$-th raw moment and the finite $k$-th central moment settings for heavy-tailed rewards distributions with $k\ge 2$. We first present its minimax lower bound, characterizing the information-theoretic limit of regret with respect to privacy budget, contamination level, and heavy-tailedness. Then, we propose a meta-algorithm that builds on a private and robust mean estimation sub-routine \texttt{PRM} that essentially relies on reward truncation and the Laplace mechanism. For the above two different heavy-tailed settings, we give corresponding schemes of \texttt{PRM}, which enable us to achieve nearly-optimal regrets. Moreover, our two proposed truncation-based or histogram-based \texttt{PRM} schemes achieve the optimal trade-off between estimation accuracy, privacy and robustness. Finally, we support our theoretical results and show the effectiveness of our algorithms with experimental studies.

Yulian Wu^★*, Xingyu Zhou^*, Youming Tao and Di Wang
2023 Conference on Neural Information Processing Systems (NeurIPS 2023)

[ICML] Differentially Private Episodic Reinforcement Learning with Heavy-tailed Rewards [Link] Abstract▼
In this paper we study the problem of (finite horizon tabular) Markov decision processes (MDPs) with heavy-tailed rewards under the constraint of differential privacy (DP). Compared with the previous studies for private reinforcement learning that typically assume rewards are sampled from some bounded or sub-Gaussian distributions to ensure DP, we consider the setting where reward distributions have only finite (1+v)-th moments with some $v\in (0, 1]$. By resorting to robust mean estimators for rewards, we first propose two frameworks for heavy-tailed MDPs, i.e., one is for value iteration and another is for policy optimization. Under each framework, we consider both joint differential privacy (JDP) and local differential privacy (LDP) models. Based on our frameworks, we provide regret upper bounds for both JDP and LDP cases, and show that the moment of distributions and privacy budget have significant impact on regrets. Finally, we establish a lower bound of regret minimization for heavy-tailed MDPs in JDP model by reducing it to the instance-independent lower bound of heavy-tailed multi-armed bandits in DP model. We also show the lower bound for the problem in LDP by adopting some private minimax methods. Our results reveal that there are fundamental differences between the problem of private RL with sub-Gaussian and that with heavy-tailed rewards.

Yulian Wu^★, Xingyu Zhou, Sayak Ray Chowdhury and Di Wang
The 40th International Conference on Machine Learning (ICML 2023)

[AISTATS] Privacy-preserving Sparse Generalized Eigenvalue Problem [Link] Abstract▼
In this paper we study the (sparse) Generalized Eigenvalue Problem (GEP), which arises in a number of modern statistical learning models, such as principal component analysis (PCA), canonical correlation analysis (CCA), Fisher's discriminant analysis (FDA) and sliced inverse regression (SIR). Existing techniques for GEP all fail to consider the protection of sensitive information in the training set. Models learned by such methods can implicitly memorize the details of sensitive information. To address this issue, we provide the first study on GEP in the differential privacy (DP) model under both deterministic and stochastic settings. In the low dimensional case, we provide a $\rho$-Concentrated DP (CDP) method namely DP-Rayleigh Flow and show if the initial vector is close enough to the optimal vector, its output has an $\ell_2$-norm estimation error of $\tilde{O}(\frac{d}{n}+\frac{d}{n^2\rho})$ (under some mild assumptions), where $d$ is the dimension and $n$ is the sample size. Next, we discuss how to find such a initial parameter privately. In the high dimensional sparse case where $d\gg n$, we propose the DP-Truncated Rayleigh Flow method whose output could achieve an error of $\tilde{O}(\frac{s\log d}{n}+\frac{s\log d}{n^2\rho})$ for various statistical models, where $s$ is the sparsity of the underlying parameter. Moreover, we show that these errors in the stochastic setting are optimal up to a factor of $\text{Poly}(\log n)$ by providing the lower bounds of PCA and SIR under statistical setting and in the CDP model. Finally, to give a separation between $\epsilon$-DP and $\rho$-CDP for GEP, we also provide the lower bound $\Omega(\frac{d}{n}+\frac{d^2}{n^2\epsilon^2})$ and $\Omega(\frac{s\log d}{n}+\frac{s^2\log^2 d}{n^2\epsilon^2})$ of private minimax risk for PCA, under the statistical setting and $\epsilon$-DP model, in low and high dimensional sparse case respectively. Experiments on both synthetic and real-world data also support our theoretical analysis.

Lijie Hu^*★, Zihang Xiang^*★, Jiabin Liu, and Di Wang (* equal contribution)
The 26th International Conference on Artificial Intelligence and Statistics (AISTATS 2023)

[AAAI] SEAT: Stable and Explainable Attention [Link] Abstract▼
Currently, attention mechanism becomes a standard fixture in most state-of-the-art natural language processing (NLP) models, not only due to outstanding performance it could gain, but also due to plausible innate explanation for the behaviors of neural architectures it provides, which is notoriously difficult to analyze. However, recent studies show that attention is unstable against randomness and perturbations during training or testing, such as random seeds and slight perturbation of embedding vectors, which impedes it from becoming a faithful explanation tool. Thus, a natural question is whether we can find some substitute of the current attention which is more stable and could keep the most important characteristics on explanation and prediction of attention. In this paper, to resolve the problem, we provide a first rigorous definition of such alternate namely SEAT (Stable and Explainable Attention). Specifically, a SEAT should has the following three properties: (1) Its prediction distribution is enforced to be close to the distribution based on the vanilla attention; (2) Its top-$k$ indices have large overlaps with those of the vanilla attention; (3) It is robust w.r.t perturbations, i.e., any slight perturbation on SEAT will not change the prediction distribution too much, which implicitly indicates that it is stable to randomness and perturbations. Moreover we propose a method to get a SEAT, which could be considered as an ad hoc modification for the canonical attention. Finally, through intensive experiments on various datasets, we compare our SEAT with other baseline methods using RNN, BiLSTM and BERT architectures via six different evaluation metrics for model interpretation, stability and accuracy. Results show that SEAT is more stable against different perturbations and randomness while also keeps the explainability of attention, which indicates it is a more faithful explanation. Moreover, compared with vanilla attention, there is almost no utility (accuracy) degradation for SEAT.

Lijie Hu^*★, Yixin Liu ^*, Ninghao Liu , Mengdi Huai, Lichao Sun, and Di Wang (* equal contribution)
The 37th AAAI Conference on Artificial Intelligence (AAAI 2023)
Selected as an Oral paper

[UAI] Differentially Private Stochastic Convex Optimization in (Non)-Euclidean Space Revisited [Link] Abstract▼
In this paper, we revisit the problem of Differentially Private Stochastic Convex Optimization (DP-SCO) in Euclidean and general $\ell_p^d$ spaces. Specifically, we focus on three settings that are still far from well understood: (1) DP-SCO over a constrained and bounded (convex) set in Euclidean space; (2) unconstrained DP-SCO in $\ell_p^d$ space; (3) DP-SCO with heavy-tailed data over a constrained and bounded set in $\ell_p^d$ space. For problem (1), for both convex and strongly convex loss functions, we propose methods whose outputs could achieve (expected) excess population risks that are only dependent on the Gaussian width of the constraint set rather than the dimension of the space. Moreover, we also show the bound for strongly convex functions is optimal up to a logarithmic factor. For problems (2) and (3), we propose several novel algorithms and provide the first theoretical results for both cases.

Jinyan Su^★, Changhong Zhao and Di Wang
The 39th Conference on Uncertainty in Artificial Intelligence (UAI 2023)
[MobiSys] High-Speed Wireless Communications Inspired Energy Efficient Federated Learning over Mobile Devices [Link] Abstract▼
Energy efficiency is essential for federated learning (FL) over mobile devices and its potential prosperous applications. Different from existing communication efficient FL research efforts, which regard communication energy consumption as the bottleneck, we have observed that with ever increasing wireless transmission speed (e.g., Wi-Fi 5 or 5G), the energy consumption of wireless communications for model updates in FL is significantly reduced and sometimes is smaller than that of local on-device training. Motivated by such observations, in this paper, we propose a high-speed wireless communications inspired energy efficient federated learning over mobile devices (EEFL), whose goal is to reduce the overall energy consumption (computing + communication). In particular, we design a novel energy-aware adaptive local update policy for mobile devices by jointly considering FL performance and energy saving of high-speed wireless transmissions. Furthermore, given the device's local update policy in each FL global round, we advance the dynamic voltage and frequency scaling (DVFS) strategy to minimize local training's energy consumption by keeping GPU and CPU working at appropriate frequencies without triggering thermal throttling. Extensive experimental results with various learning models, datasets, and wireless transmission environments demonstrate the proposed EEFL's superiority over the peer designs in terms of energy efficiency.

Rui Chen, Qiyu Wan, Xinyue Zhang, Xiaoqi Qin, Di Wang, Xin Fu, and Miao Pan
The 21st ACM International Conference on Mobile Systems, Applications, and Services (MobiSys 2023)
[EMNLP] GRI: Graph-based Relative Isomorphism of Word Embedding Spaces [Link] Abstract▼
Automated construction of bi-lingual dictionaries using monolingual embedding spaces is a core challenge in machine translation. The end performance of these dictionaries relies on the geometric similarity of individual spaces, i.e., their degree of isomorphism. Existing attempts aimed at controlling the relative isomorphism of different spaces fail to incorporate the impact of lexically different but semantically related words in the training objective. To address this, we propose GRI that combines the distributional training objectives with attentive graph convolutions to unanimously consider the impact of lexical variations of semantically similar words required to define/compute the relative isomorphism of multiple spaces. Exper imental evaluation shows that GRI outperforms the existing research by improving the average P@1 by a relative score of upto 63.6%.

Muhammad Asif Ali^★, Yan Hu^★, Jianbin Qin, and Di Wang
Findings of The 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP Findings)
[EMNLP] DetectLLM: Leveraging Log Rank Information for Zero-Shot Detection of Machine-Generated Text [Link] Abstract▼
With the rapid progress of Large language models (LLMs) and the huge amount of text they generated, it becomes impractical to manually distinguish whether a text is machine-generated. Given the growing use of LLMs in social media and education, it prompts us to develop methods to detect machine-generated text, preventing malicious use such as plagiarism, misinformation, and propaganda. In this paper, we introduce two novel zero-shot methods for detecting machine-generated text by leveraging the log rank information. One is called DetectLLM-LRR, which is fast and efficient, and the other is called DetectLLM-NPR, which is more accurate, but slower due to the need for perturbations. Our experiments on three datasets and seven language models show that our proposed methods improve over the state of the art by 3.9 and 1.75 AUROC points absolute. Moreover, DetectLLM-NPR needs fewer perturbations than previous work to achieve the same level of performance, which makes it more practical for real-world use. We also investigate the efficiency–performance trade-off based on users' preference on these two measures and provide intuition for using them in practice effectively.

Jinyan Su, Terry Yue Zhuo, Di Wang, and Preslav Nakov
Findings of The 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP Findings)
[ECAI] Finite Sample Guarantees of Differentially Private Expectation Maximization Algorithm [Link] Abstract▼
(Gradient) Expectation Maximization (EM) is a widely used algorithm for estimating the maximum likelihood of mixture models or incomplete data problems. A major challenge facing this popular technique is how to effectively preserve the privacy of sensitive data. Previous research on this problem has already lead to the discovery of some Differentially Private (DP) algorithms for (Gradient) EM. However, unlike in the non-private case, existing techniques are not yet able to provide finite sample statistical guarantees. To address this issue, we propose in this paper the first DP version of Gradient EM algorithm with statistical guarantees. Specifically, we first propose a new mechanism for privately estimating the mean of a heavy-tailed distribution, which significantly improves a previous result in \cite{wangicml2020}, and it could be extended to the local DP model, which has not been studied before. Next, we apply our general framework to three canonical models: Gaussian Mixture Model (GMM), Mixture of Regressions Model (MRM) and Linear Regression with Missing Covariates (RMC). Specifically, for GMM in the DP model, our estimation error is near optimal in some cases. For the other two models, we provide the first result on finite sample statistical guarantees. Our theory is supported by thorough numerical experiments on both real-world data and synthetic data.

Di Wang^*, Jiahao Ding^*, Lijie Hu, Zejun Xie, Miao Pan, and Jinhui Xu
The 26th European Conference on Artificial Intelligence (ECAI 2023)
[ArabicNLP] GARI: Graph Attention for Relative Isomorphism of Arabic Word Embeddings. [Link] Abstract▼
Bilingual Lexical Induction (BLI) is a core challenge in NLP, it relies on the relative isomorphism of individual embedding spaces. Existing attempts aimed at controlling the relative isomorphism of different embedding spaces fail to incorporate the impact of semantically related words in the model training objective. To address this, we propose GARI that combines the distributional training objectives with multiple isomorphism losses guided by the graph attention network. GARI considers the impact of semantical variations of words in order to define the relative isomorphism of the embedding spaces. Experimental evaluation using the Arabic language data set shows that GARI outperforms the existing research by improving the average P@1 by a relative score of up to 40.95% and 76.80% for in-domain and domain mismatch settings respectively.

Muhammad Asif Ali^★, Maha Alshmrani^★, Jianbin Qin, Yan Hu^★, and Di Wang
The First Arabic Natural Language Processing Conference (ArabicNLP 2023)

Journal Papers
[JMLR] Generalized Linear Models in Non-interactive Local Differential Privacy with Public Data [Link] Abstract▼
In this paper, we study the problem of estimating smooth Generalized Linear Models (GLMs) in the Non-interactive Local Differential Privacy (NLDP) model. Different from its classical setting, our model allows the server to access some additional public but unlabeled data. In the first part of the paper we focus on GLMs. Specifically, we first consider the case where each data record is i.i.d. sampled from a zero-mean multivariate Gaussian distribution. Motivated by the Stein's lemma, we present an $(\epsilon, \delta)$-NLDP algorithm for GLMs. Moreover, the sample complexity of the public and private data for the algorithm to achieve an $\ell_2$-norm estimation error of $\alpha$ (with high probability) is ${O}(p \alpha^{-2})$ and $\tilde{O}(p^3\alpha^{-2}\epsilon^{-2})$ respectively, where $p$ is the dimension of the feature vector. This is a significant improvement over the previously known exponential or quasi-polynomial in $\alpha^{-1}$, or exponential in $p$ sample complexities of GLMs with no public data. Then we consider a more general setting where each data record is i.i.d. sampled from some sub-Gaussian distribution with bounded $\ell_1$-norm. Based on a variant of Stein's lemma, we propose an $(\epsilon, \delta)$-NLDP algorithm for GLMs (under some mild assumptions). The sample complexity of the public and private data for the algorithm to achieve an $\ell_\infty$-norm estimation error of $\alpha$ is ${O}(p^2\alpha^{-2})$ and $\tilde{O}(p^2\alpha^{-2}\epsilon^{-2})$ respectively, if $\alpha$ is not too small ({\em i.e.,} $\alpha\geq \Omega(\frac{1}{\sqrt{p}})$). In the second part of the paper, we extend our idea to the non-linear regression problem and show similar results for it under the multivariate Gaussian and sub-Gaussian cases. Finally, we demonstrate the effectiveness of our algorithms through experiments on both synthetic and real world datasets. To our best knowledge, this is the first paper showing the existence of efficient and effective algorithms for GLMs and non-linear regression in the NLDP model with public unlabeled data.

Di Wang^*, Lijie Hu^*^★, Huanyu Zhang, Marco Gaboardi, and Jinhui Xu (* equal contribution)
Journal of Machine Learning Research, Volume 24, 132 (2023), Pages 1-57

[TIT] Quantizing Heavy-tailed Data in Statistical Estimation:(Near) Minimax Rates, Covariate Quantization, and Uniform Recovery [Link] Abstract▼
This paper studies the quantization of heavy-tailed data in some fundamental statistical estimation problems, where the underlying distributions have bounded moments of some order. We propose to truncate and properly dither the data prior to a uniform quantization. Our major standpoint is that (near) minimax rates of estimation error are achievable merely from the quantized data produced by the proposed scheme. In particular, concrete results are worked out for covariance estimation, compressed sensing, and matrix completion, all agreeing that the quantization only slightly worsens the multiplicative factor. Besides, we study compressed sensing where both covariate (i.e., sensing vector) and response are quantized. Under covariate quantization, although our recovery program is non-convex because the covariance matrix estimator lacks positive semi-definiteness, all local minimizers are proved to enjoy near optimal error bound. Moreover, by the concentration inequality of product process and covering argument, we establish near minimax uniform recovery guarantee for quantized compressed sensing with heavy-tailed noise.

Junren Chen^★, Michael Kwok Po NG, and Di Wang
IEEE Transactions on Information Theory

[TIT] High Dimensional Statistical Estimation under Uniformly Dithered One-bit Quantization [Link] Abstract▼
In this paper, we consider several fundamental machine learning and (sparse) statistical estimation problems from i.i.d samples of a sub-Gaussian or heavy-tailed distribution. Instead of having the full knowledge of the samples, we focus on the scenario where each entry of these the samples is quantized to one or two bits, which plays a critical role in signal processing or machine learning applications. Specifically, we give the first study on the problems of (sparse) covariance matrix estimation, sparse linear regression and low-rank matrix completion. For covariance matrix estimation, compared with the previous results, we extend to the heavy-tailed case where the underlying distribution has bounded fourth order moments. Moreover, we extend to the high dimensional sparse (and heavy-tailed) cases by providing several new estimators which consist of four steps: Truncating, Dithering, Quantizing and Thresholding. For sparse linear model and low-rank matrix completion, we propose new quadratic objective functions for one-bit quantized samples based on our previous estimators. For all of our estimators we also derive estimation errors. Especially, in the sub-Gaussian case, all of our estimators could achieve near optimal rates in the unquantized cases. Finally, extensive experiments on synthetic and real-world data have been carried out to support our theories.

Junren Chen^★, Cheng-Long Wang^★, Michael Kwok Po NG, and Di Wang
IEEE Transactions on Information Theory, Volume 69, 8 (2023), Pages 5151-5187

[Science Advances] PPML-Omics: a Privacy-Preserving federated Machine Learning System Protects Patients’ Privacy from Omic Data [Link] Abstract▼
Modern machine learning models towards various tasks with omic data analysis give rise to threats of privacy leakage of patients involved in those datasets. Despite the advances in different privacy technologies, existing methods tend to introduce too much noise, which hampers model accuracy and usefulness. Here, we built a secure and privacy-preserving machine learning (PPML) system by combining federated learning (FL), differential privacy (DP) and shuffling mechanism. We applied this system to analyze data from three sequencing technologies, and addressed the privacy concern in three major tasks of omic data, namely cancer classification with bulk RNA-seq, clustering with single-cell RNA-seq, and the integration of spatial gene expression and tumour morphology with spatial transcriptomics, under three representative deep learning models. We also examined privacy breaches in depth through privacy attack experiments and demonstrated that our PPML-Omics system could protect patients’ privacy. In each of these applications, PPML-Omics was able to outperform state-of-the-art systems under the same level of privacy guarantee, demonstrating the versatility of the system in simultaneously balancing the privacy-preserving capability and utility in omic data analysis. Furthermore, we gave the theoretical proof of the privacy-preserving capability of PPML-Omics, suggesting the first mathematically guaranteed model with robust and generalizable empirical performance.

Juexiao Zhou^*, Siyuan Chen^*, Yulian Wu^*★, Haoyang Li, Bin Zhang, Longxi Zhou, Yan Hu, Zihang Xiang, Zhongxiao Li, Ningning Chen, Wenkai Han, Di Wang, and Xin Gao (* equal contribution)
Science Advances

[TKDE] Nearly Optimal Rates of Privacy-preserving Sparse Generalized Eigenvalue Problem [Link] Abstract▼
In this paper we study the (sparse) Generalized Eigenvalue Problem (GEP), which arises in a number of modern statistical learning models, such as principal component analysis (PCA), canonical correlation analysis (CCA), Fisher's discriminant analysis (FDA) and sliced inverse regression (SIR). We provide the first study on GEP in the differential privacy (DP) model under both deterministic and stochastic settings. In the low dimensional case, we provide a $\rho$-Concentrated DP (CDP) method namely DP-Rayleigh Flow and show if the initial vector is close enough to the optimal vector, its output has an $\ell_2$-norm estimation error of $\tilde{O}(\frac{d}{n}+\frac{d}{n^2\rho})$ (under some mild assumptions), where $d$ is the dimension and $n$ is the sample size. Next, we discuss how to find such an initial parameter privately. In the high dimensional sparse case where $d\gg n$, we propose the DP-Truncated Rayleigh Flow method whose output could achieve an error of $\tilde{O}(\frac{s\log d}{n}+\frac{s\log d}{n^2\rho})$ for various statistical models, where $s$ is the sparsity of the underlying parameter. Moreover, we show that these errors in the stochastic setting are optimal up to a factor of $\text{Poly}(\log n)$ by providing the lower bounds of PCA and SIR under the statistical setting and in the CDP model. Finally, to give a separation between $\epsilon$-DP and $\rho$-CDP for GEP, we also provide the lower bound $\Omega(\frac{d}{n}+\frac{d^2}{n^2\epsilon^2})$ and $\Omega(\frac{s\log d}{n}+\frac{s^2\log^2 d}{n^2\epsilon^2})$ of private minimax risk for PCA, under the statistical setting and $\epsilon$-DP model, in low and high dimensional sparse case respectively. Finally, extensive experiments on both synthetic and real-world data support our previous theoretical analysis.

Lijie Hu^*★, Zihang Xiang^*★, Jiabin Liu, and Di Wang (* equal contribution)
IEEE Transactions on Knowledge and Data Engineering

[JCSS] PAC Learning Halfspaces in Non-interactive Local Differential Privacy Model with Public Unlabeled Data Abstract▼
In this paper, we study the problem of PAC learning halfspaces in the non-interactive local differential privacy model (NLDP). To breach the barrier of exponential sample complexity, previous results studied a relaxed setting where the server has access to some additional public but unlabeled data. We continue in this direction. Specifically, we consider the problem under the standard setting instead of the large margin setting studied before. Under different mild assumptions on the underlying data distribution, we propose two approaches that are based on the Massart noise model and self-supervised learning and show that it is possible to achieve sample complexities that are only linear in the dimension and polynomial in other terms for both private and public data, which significantly improve the previous results. Our methods could also be used for other private PAC learning problems.

Jinyan Su^★, Jinhui Xu, and Di Wang
Journal of Computer and System Sciences

[CBM] Personalized and Privacy-preserving Federated Heterogeneous Medical Image Analysis with PPPML-HMI [Link] Abstract▼
Heterogeneous data is endemic due to the use of diverse models and settings of devices by hospitals in the field of medical imaging. However, there are few open- source frameworks for federated heterogeneous medical image analysis with personalization and privacy protection without the demand to modify the existing model structures or to share any private data. Here, we proposed PPPML-HMI, a novel open- source learning paradigm for personalized and privacy-preserving federated heterogeneous medical image analysis. To our best knowledge, personalization and privacy protection were discussed simultaneously for the first time under the federated scenario by integrating the PerFedAvg algorithm and designing the novel cyclic secure aggregation with the homomorphic encryption algorithm. To show the utility of PPPML- HMI, we applied it to a simulated classification task namely the classification of healthy people and patients from the RAD-ChestCT Dataset, and one real-world segmentation task namely the segmentation of lung infections from COVID-19 CT scans. Meanwhile, we applied the improved deep leakage from gradients to simulate adversarial attacks and showed the strong privacy-preserving capability of PPPML-HMI. By applying PPPML-HMI to both tasks with different neural networks, a varied number of users, and sample sizes, we demonstrated the strong generalizability of PPPML-HMI in privacy-preserving federated learning on heterogeneous medical images.

Juexiao Zhou^*, Longxi Zhou^*, Di Wang, Xiaopeng Xu, Haoyang Li, Yuetan Chu, Wenkai Han, and Xin Gao
Computers in Biology and Medicine

[TCS] Gradient Complexity and Non-stationary Views of Differentially Private Empirical Risk Minimization Abstract▼
In this paper we consider the Differentially Private Empirical Risk Minimization (DP-ERM) problem with either convex or non-convex loss functions. For the cases that DP-ERM with smooth (strongly) convex loss functions with/without (non)-smooth regularization, we propose several methods that achieve (near) optimal expected excess risks ({\em i.e.,} utility bounds) with less gradient complexity compared to the previous methods. For DP-ERM with smooth convex loss functions in high dimensions, we give an algorithm which achieves an upper bound with less gradient complexity than previous ones. For DP-ERM with non-convex loss functions, we consider the problem in both low and high dimensional spaces. For the low dimensional case with non-smooth regularizer, we generalize an existing work by using $\ell_2$ norm of the projected gradient to measure the utility. We also extend the error bound measurement, for the first time, from empirical risk to population risk by using the expected $\ell_2$ norm of the gradient. For the high dimensional case, we first show that by measuring the utility with Frank-Wolfe gap, it is possible to bound the utility by the Gaussian Width of the constraint set, instead of the dimensionality $p$ of the underlying space. We then demonstrate that the advantages of this result can be achieved by measuring the $\ell_2$ norm of the projected gradient. We finally show that the utility of some special non-convex loss functions can be reduced to a level ({\em i.e.,} depending only on $\log p$) similar to that of convex loss functions.

Di Wang, and Jinhui Xu
Theoretical Computer Science

2022

Conference Papers

[PODS] High Dimensional Differentially Private Stochastic Optimization with Heavy-tailed Data [Link] Abstract▼
As one of the most fundamental problems in machine learning, statistics and differential privacy, Differentially Private Stochastic Convex Optimization (DP-SCO) has been extensively studied in recent years. However, most of the previous work can only handle either regular data distribution or irregular data in the low dimensional space case. To better understand the challenges arising from irregular data distribution, in this paper we provide the first study on the problem of DP-SCO with heavy-tailed data in the high dimensional space. In the first part we focus on the problem over some polytope constraint (such as the $\ell_1$-norm ball). We show that if the loss function is smooth and its gradient has bounded second order moment, it is possible to get an error bound of $\tilde{O}(\frac{\log d}{(n\epsilon)^\frac{1}{3}})$ in the $\epsilon$-DP model, where $n$ is the sample size and $d$ is the dimensionality of the underlying space. Next, for LASSO, if the data distribution that has bounded fourth-order moments, we improve the bound to $\tilde{O}(\frac{\log d}{(n\epsilon)^\frac{2}{5}})$ in the $(\epsilon, \delta)$-DP model. In the second part of the paper, we study DP-SCO for sparse learning with heavy-tailed data. We first revisit the sparse linear regression and propose a truncated DP-IHT method whose output could achieve an error of $\tilde{O}(\frac{s^{*2}\log d}{n\epsilon})$, where $s^*$ is the sparsity of the underlying parameter. Then we study a more general problem over the sparsity ({\em i.e.,} $\ell_0$-norm) constraint, and show that it is possible to achieve an error of $\tilde{O}(\frac{s^{*\frac{3}{2}}\log d}{n\epsilon})$, which is also near optimal up to a factor of $\tilde{O}{(\sqrt{s^*})}$, if the loss function is smooth and strongly convex.

Lijie Hu^★, Shuo Ni^★, Hanshen Xiao, and Di Wang
The 41st ACM Symposium on Principles of Database Systems (PODS 2022)
Invited to The ACM Transactions on Database Systems special issue on Best of PODS 2022
ACM CCS 2021 Workshop on Privacy Preserving Machine Learning

[WINE] Truthful Generalized Linear Models [Link] Abstract▼
In this paper we study estimating Generalized Linear Models (GLMs) in the case where the agents (individuals) are strategic or self-interested and they concern about their privacy when reporting data. Compared with the classical setting, here we aim to design mechanisms that can both incentivize most agents to truthfully report their data and preserve the privacy of individuals' reports, while their outputs should also close to the underlying parameter. In the first part of the paper, we consider the case where the covariates are sub-Gaussian and the responses are heavy-tailed where they only have the finite fourth moments. First, motivated by the stationary condition of the maximizer of the likelihood function, we derive a novel private and closed form estimator. Based on the estimator, we propose a mechanism which has the following properties via some appropriate design of the computation and payment scheme for several canonical models such as linear regression, logistic regression and Poisson regression: (1) the mechanism is $o(1)$-jointly differentially private (with probability at least $1-o(1)$); (2) it is an $o(\frac{1}{n})$-approximate Bayes Nash equilibrium for a $(1-o(1))$-fraction of agents to truthfully report their data, where $n$ is the number of agents; (3) the output could achieve an error of $o(1)$ to the underlying parameter; (4) it is individually rational for a $(1-o(1))$ fraction of agents in the mechanism ; (5) the payment budget required from the analyst to run the mechanism is $o(1)$. In the second part, we consider the linear regression model under more general setting where both covariates and responses are heavy-tailed and only have finite fourth moments. By using an $\ell_4$-norm shrinkage operator, we propose a private estimator and payment scheme which have similar properties as in the sub-Gaussian case.

Yuan Qiu^★, Jinyan Liu, and Di Wang
The 18th Conference on Web and Internet Economics (WINE 2022)

[ALT] Faster Rates of Private Stochastic Convex Optimization [Link] Abstract▼
In this paper, we revisit the problem of Differentially Private Stochastic Convex Optimization (DP-SCO) and provide excess population risks for some special classes of functions that are faster than the previous results of general convex and strongly convex functions. In the first part of the paper, we study the case where the population risk function satisfies the Tysbakov Noise Condition (TNC) with some parameter $\theta>1$. Specifically, we first show that under some mild assumptions on the loss functions, there is an algorithm whose output could achieve an upper bound of $\tilde{O}((\frac{1}{\sqrt{n}}+\frac{d}{n\epsilon})^\frac{\theta}{\theta-1}) $ and $\tilde{O}((\frac{1}{\sqrt{n}}+\frac{\sqrt{d\log(1/\delta)}}{n\epsilon})^\frac{\theta}{\theta-1})$ for $\epsilon$-DP and $(\epsilon, \delta)$-DP, respectively when $\theta\geq 2$, here $n$ is the sample size and $d$ is the dimension of the space. Then we address the inefficiency issue, improve the upper bounds by $\text{Poly}(\log n)$ factors and extend to the case where $\theta\geq \bar{\theta}>1$ for some known $\bar{\theta}$. Next we show that the excess population risk of population functions satisfying TNC with parameter $\theta\geq 2$ is always lower bounded by $\Omega((\frac{d}{n\epsilon})^\frac{\theta}{\theta-1}) $ and $\Omega((\frac{\sqrt{d\log(1/\delta)}}{n\epsilon})^\frac{\theta}{\theta-1})$ for $\epsilon$-DP and $(\epsilon, \delta)$-DP, respectively, which matches our upper bounds. In the second part, we focus on a special case where the population risk function is strongly convex. Unlike the previous studies, here we assume the loss function is {\em non-negative} and {\em the optimal value of population risk is sufficiently small}. With these additional assumptions, we propose a new method whose output could achieve an upper bound of $O(\frac{d\log(1/\delta)}{n^2\epsilon^2}+\frac{1}{n^{\tau}})$ and $O(\frac{d^2}{n^2\epsilon^2}+\frac{1}{n^{\tau}})$ for any $\tau> 1$ in $(\epsilon,\delta)$-DP and $\epsilon$-DP model respectively if the sample size $n$ is sufficiently large. These results circumvent their corresponding lower bounds in \cite{feldman2020private} for general strongly convex functions. Finally, we conduct experiments of our new methods on real world data. Experimental results also provide new insights into established theories.

Jinyan Su^★, Lijie Hu^★, and Di Wang
The 33rd International Conference on Algorithmic Learning Theory (ALT 2022)

[AISTATS] Optimal Rates of (Locally) Differentially Private Heavy-tailed Multi-Armed Bandits [Link] Abstract▼
In this paper we study the problem of stochastic multi-armed bandits (MAB) in the (local) differential privacy (DP/LDP) model. Unlike the previous results which need to assume bounded reward distributions, here we mainly focus on the case the reward distribution of each arm only has $(1+v)$-th moment with some $v\in (0, 1]$. In the first part, we study the problem in the central $\epsilon$-DP model. We first provide a near-optimal result by developing a private and robust Upper Confidence Bound (UCB) algorithm. Then, we improve the result via a private and robust version of the Successive Elimination (SE) algorithm. Finally, we show that the instance-dependent regret bound of our improved algorithm is optimal by showing its lower bound. In the second part of the paper, we study the problem in the $\epsilon$-LDP model. We propose an algorithm which could be seen as locally private and robust version of the SE algorithm, and show it could achieve (near) optimal rates for both instance-dependent and instance-independent regrets. All of the above results can also reveal the differences between the problem of private MAB with bounded rewards and heavy-tailed rewards. To achieve these (near) optimal rates, we develop several new hard instances and private robust estimators as byproducts, which might could be used to other related problems. Finally, experimental results also support our theoretical analysis and show the effectiveness of our algorithms.

Youming Tao^*★, Yulian Wu^*★, Peng Zhao, and Di Wang (* equal contribution)
The 25th International Conference on Artificial Intelligence and Statistics (AISTATS 2022)
Selected as an Oral paper (Acceptance Rate: 44/1685=2.6%)
ACM CCS 2021 Workshop on Privacy Preserving Machine Learning
ICML 2022 Workshop on Responsible Decision Making in Dynamic Environments (Selected as Contributed Talk)

[AISTATS] On Facility Location Problem in Local Differential Privacy Model [Link] Abstract▼
This paper studies the facility location problem, where the given input is combined of (1) a metric space (V, d), (2) facility cost f_i for each facility, We study the uncapacitated facility location (UFL) problem under the constraints imposed by the local differential privacy (LDP). Recently, Gupta et al. (2009) and Esencayi et al. (2019) proposed lower and upper bounds for the UFL problem on the central differential privacy (DP) model where a curator first collects all data before being processed. In this paper, we focus on the LDP model, where we protect a client's participation in the facility location instance. Under the HST metric, we show that there is a non-interactive $\epsilon$-LDP algorithm achieving $O(n^{1/4}/\epsilon^2)$-approximation ratio, where n is the size of the metric. On the negative side, we show a lower bound of $\Omega(n^{1/4}/\sqrt{\epsilon})$ on the approximation ratio for any non-interactive $\epsilon$-LDP algorithm. Thus, our results are tight up to a factor of $\epsilon$. Moreover, unlike previous results, our results generalize for non-uniform facility costs.

[alphabetic order] Vincent Cohen-Addad, Yunus Esencayi, Chenglin Fan, Marco Gaboradi, Shi Li, and Di Wang
The 25th International Conference on Artificial Intelligence and Statistics (AISTATS 2022)

[IJCAI] Private Stochastic Convex Optimization and Sparse Learning with Heavy-tailed Data Revisited [Link] Abstract▼
In this paper, we revisit the problem of Differentially Private Stochastic Convex Optimization (DP-SCO) with heavy-tailed data, where the gradient of the loss function has bounded moments. Instead of the case where the loss function is Lipschitz or each coordinate of the gradient has bounded second moment studied previously, we consider a relaxed scenario where each coordinate of the gradient only has bounded $(1+v)$-th moment with some $v\in (0, 1]$. Firstly, we start from the one dimensional private mean estimation for heavy-tailed distributions. We propose a novel robust and private mean estimator which is optimal. Based on its idea, we then extend to the general $d$-dimensional space and use it to DP-SCO with general convex and strongly convex loss functions. We also provide lower bounds for these two classes of loss under our setting and show that our upper bounds are optimal up to a factor of $O(\text{Poly}(d))$. To address the high dimensionality issue, we also study DP-SCO with heavy-tailed gradient under some sparsity constraint (DP sparse learning). We propose a new method and show it is also optimal up to a factor of $O(s^*)$, where $s^*$ is the underlying sparsity of the constraint.

Youming Tao^★, Yulian Wu^★, Xiuzhen Cheng, and Di Wang
The 31st International Joint Conference on Artificial Intelligence (IJCAI-ECAI 2022)

[ACML] On PAC Learning Halfspaces in Non-interactive Local Privacy Model with Public Unlabeled Data [Link] Abstract▼
In this paper, we study the problem of PAC learning halfspaces in the non-interactive local differential privacy model (NLDP). To breach the barrier of exponential sample complexity, previous results studied a relaxed setting where the server has access to some additional public but unlabeled data. We continue along this direction. Specifically, we consider the problem under the standard setting instead of the large margin setting studied before. Under different mild assumptions on the underlying data distribution, we propose two approaches that are based on the Massart noise model and self-supervised learning, and show that it is possible to achieve sample complexities that are only linear in the dimension and polynomial in other terms for both private and public data, which significantly improve the previous results. Our methods might could also be used to other private PAC learning problems.

Jinyan Su^★, Jinhui Xu, and Di Wang.
The 14th Asian Conference on Machine Learning (ACML 2022)
Best Paper Award

[ISIT] Differentially Private $\ell_1$-norm Linear Regression with Heavy-tailed Data [Link] Abstract▼
We consider Differentially Private Stochastic Convex Optimization (DP-SCO) with heavy-tailed data. Specifically, we focus on the $\ell_1$ regression in the $\epsilon$-DP model. While most of the previous work focuses on the case where the loss function is Lipschitz, here we only need to assume the variates has bounded moments. Firstly, we study the case where the $\ell_2$ norm of data has bounded second order moment. We propose an algorithm which is based on the exponential mechanism and show that it is possible to achieve an upper bound of $O(\sqrt{\frac{d}{n\epsilon}})$ (with high probability). Next, we relax the assumption to bounded $\theta$-th order moment with some $\theta\in (0, 1)$ and we show that it is possible to achieve an upper bound of $O(({\frac{d}{n\epsilon}})^\frac{\theta-1}{\theta})$. Our algorithms can also be extended to more relaxed cases where only each coordinate of the data has bounded moments.

Di Wang and Jinhui Xu
2022 IEEE International Symposium on Information Theory (ISIT 2022)

2021

Conference Papers

[ALT] Estimating Smooth GLM in Non-interactive Local Differential Privacy Model with Public Unlabeled Data [Link] Abstract▼
In this paper, we study the problem of estimating smooth Generalized Linear Models (GLM) in the Non-interactive Local Differential Privacy (NLDP) model. Different from its classical setting, our model allows the server to process some additional public but unlabeled data. We first show that there is an $(\epsilon, \delta)$-NLDP algorithm for GLM (under some mild assumptions), if each data record is i.i.d sampled from some sub-Gaussian distribution with bounded $\ell_1$-norm. The sample complexity of both public and private data, for the algorithm to achieve an $\alpha$ estimation error (in $\ell_\infty$-norm), is $\tilde{O}(p^2\alpha^{-2}\epsilon^{-2})$ if $\alpha$ is not too small (i.e., $\alpha\geq \Omega(\frac{1}{\sqrt{p}})$), where $p$ is the dimensionality of the data. This is a significant improvement over the previously known quasi-polynomial (in $\alpha$) or exponential (in $p$) complexity of convex GLM with no public data. We then extend our idea to the non-linear regression problem and show a similar phenomenon for it. Finally, we demonstrate the practicality of our algorithms through experiments on both synthethic and real world datasets. To our best knowledge, this is the first paper showing the existence of efficient and practical algorithms for GLM and non-linear regression in the NLDP model with public unlabeled data.

Di Wang^*, Huanyu Zhang^*, Marco Gaboardi and Jinhui Xu (* equal contribution)
The 32nd International Conference on Algorithmic Learning Theory (ALT 2021)
NeurIPS 2019 Workshop on Privacy in Machine Learning

[IJCAI] Differentially Private Pairwise Learning Revisited [Link] Abstract▼
Instead of learning with pointwise loss functions, learning with pairwise loss functions (pairwise learning) has received much attention recently as it is more capable of modeling the relative relationship between pairs of samples. However, most of the existing algorithms for pairwise learning fail to take into consideration the privacy issue in their design. To address this issue, previous work studied pairwise learning in the Differential Privacy (DP) model. However, their utilities (population errors) are far from optimal. To address the sub-optimal utility issue, in this paper, we proposed new $(\epsilon, \delta)$ or $\epsilon$-DP algorithms for pairwise learning. Specifically, when the loss functions are Lipschitz, smooth and strongly convex, we show that the output of our algorithm achieves an expected population error of $O(\frac{1}{n}+\frac{d\log\frac{1}{\delta}}{n^2\epsilon^2})$ and $O(\frac{1}{n}+\frac{d^2}{n^2\epsilon^2})$ for $(\epsilon, \delta)$-DP and $\epsilon$-DP, respectively, where $n$ is the sample size and $d$ is the dimension of the underlying space. Moreover, for general convex case, the output of our algorithm achieves an expected population error of $O(\frac{1}{\sqrt{n}}+\frac{\sqrt{d \log\frac{1}{\delta}}}{n\epsilon})$ and $O(\frac{1}{\sqrt{n}}+\frac{d}{n\epsilon})$ for $(\epsilon, \delta)$-DP and $\epsilon$-DP, respectively. It is also notable that these upper bounds are {\bf optimal} ({\em i.e.,} match the lower bounds). We also conduct extensive experiments on real-world datasets to evaluate the proposed algorithms, experimental results support our theoretical analysis and show the priority of our algorithms.

Zhiyu Xue^*^★, Shaoyang Yang^*^★, Mengdi Huai and Di Wang (* equal contribution)
The 30th International Joint Conference on Artificial Intelligence (IJCAI 2021)

Journal Papers

[TIT] On Sparse Linear Regression in the Local Differential Privacy Model [Link] Abstract▼
In this paper, we study the sparse linear regression problem in the Local Differential Privacy (LDP) model. We first show that polynomial dependency on the dimensionality $p$ of the space is unavoidable for the estimation error in both non-interactive and sequential interactive local models, if the privacy of the whole dataset needs to be preserved. Similar limitations also exist for other types of error measurements and in the relaxed local models. This indicates that differential privacy in high dimensional space is unlikely achievable for the problem. With the understanding of this limitation, we then present two algorithmic results. The first one is a sequential interactive LDP algorithm for the low dimensional sparse case, called Locally Differentially Private Iterative Hard Thresholding (LDP-IHT), which achieves a near optimal upper bound. This algorithm is actually rather general and can be used to solve quite a few other problems, such as (Local) DP-ERM with sparsity constraints and sparse regression with non-linear measurements. The second one is for the restricted (high dimensional) case where only the privacy of the responses (labels) needs to be preserved. For this case, we show that the optimal rate of the error estimation can be made logarithmically depending on $p$ (i.e., $\log p$) in the local model, where an upper bound is obtained by a label-privacy version of LDP-IHT. Experiments on real world and synthetic datasets confirm our theoretical analysis.

Di Wang and Jinhui Xu
IEEE Transactions on Information Theory, Volume 67, no. 2, Pages 1182-1200, Feb. 2021

[TCS] Inferring Ground Truth From Crowdsourced Data Under Local Attribute Differential Privacy [Link] Abstract▼
Recently, the problem of ground truth inference under local differential privacy (LDP) model has been recently studied. However, this problem is still not well understood and even some basic questions have not been solved yet. First, it is still unknown what is the average error of the private estimators to the underlying ground truth. Secondly, we do not known whether we can infer the ability of each user under LDP model and what is the estimation error w.r.t the underlying users ability. Finally, previous work only show that their methods have better performance than the private major voting algorithm through experiments. However, there is still no theoretically result which shows this priority formally or mathematically. In this paper, we partially solve these problems by studying the ground truth inference problem under local attribute differential privacy (LADP) model, and propose a new algorithm called private Dawid-Skene method, which is motivated by the classical Dawid-Skene method. Specifically, we first provide the estimation errors for both ability of users and the ground truth under some assumptions of the problem if the algorithm start with some appropriate initial vector. Moreover, we propose an explicit instance and show that the estimation error of the ground truth achieved by the private major voting algorithm is always greater than the error achieved by our method. To our best knowledge, this is the first result on showing the explicit estimation errors for both ability of users and ground truth for the problem. Also, this paper is the first result on theoretically comparing with the private major voting algorithm.

Di Wang and Jinhui Xu
Theoretical Computer Science Volume 865, 14 April 2021, Pages 85-98

[TCS] Differentially Private High Dimensional Sparse Covariance Matrix Estimation [Link] Abstract▼
In this paper, we study the problem of estimating the covariance matrix under differential privacy, where the underlying covariance matrix is assumed to be sparse and of high dimensions. We propose a new method, called DP-Thresholding, to achieve a non-trivial $\ell_2$-norm based error bound, which is significantly better than the existing ones from adding noise directly to the empirical covariance matrix. We also extend the $\ell_2$-norm based error bound to a general $\ell_w$-norm based one for any $1\leq w\leq \infty$, and show that they share the same upper bound asymptotically. Our approach can be easily extended to local differential privacy. Experiments on the synthetic datasets show consistent results with our theoretical claims.

Di Wang and Jinhui Xu
Theoretical Computer Science Volume 865, 14 April 2021, Pages 119-130

2020

Conference Papers

[ICML] On Differentially Private Stochastic Convex Optimization with Heavy-tailed Data [Link] Abstract▼
In this paper, we consider the problem of designing Differentially Private (DP) algorithms for Stochastic Convex Optimization (SCO) on heavy-tailed data. The irregularity of such data violates some key assumptions used in almost all existing DP-SCO and DP-ERM methods, resulting in failure to provide the DP guarantees. To better understand this type of challenges, we provide in this paper a comprehensive study of DP-SCO under various settings. First, we consider the case where the loss function is strongly convex and smooth. For this case, we propose a method based on the sample-and-aggregate framework, which has an excess population risk of $\tilde{O}(\frac{d^3}{n\epsilon^4})$ (after omitting other factors), where $n$ is the sample size and $d$ is the dimensionality of the data. Then, we show that with some additional assumptions on the loss functions, it is possible to reduce the \textit{expected} excess population risk to $\tilde{O}(\frac{ d^2}{ n\epsilon^2 })$. To lift these additional conditions, we also provide a gradient smoothing and trimming based scheme to achieve excess population risks of $\tilde{O}(\frac{ d^2}{n\epsilon^2})$ and $\tilde{O}(\frac{d^\frac{2}{3}}{(n\epsilon^2)^\frac{1}{3}})$ for strongly convex and general convex loss functions, respectively, \textit{with high probability}. Experiments on both synthetic and real-world datasets suggest that our algorithms can effectively deal with the challenges caused by data irregularity.

Di Wang^*, Hanshen Xiao^*, Srini Devadas and Jinhui Xu (* equal contribution)
The 37th International Conference on Machine Learning (ICML 2020)

[AAAI] Scalable Estimating Stochastic Linear Combination of Non-linear Regressions [Link] Abstract▼
In this paper we study the problem of estimating stochastic linear combination of non-linear regressions, which has a closed connection with many machine learning and statistical models such as non-linear regressions, Single Index Models, Multi-index Models, Varying Coefficient Index Models and Two-layer Neural Networks. Specifically, we first show that under some mild assumptions, if the variates are multivariate Gaussian, then there is an algorithm whose outputs have $\ell_2$-norm estimation error of $O(\sqrt{\frac{p}{n}})$ with high probability, where $p$ is the dimensionality and $n$ is the size of samples. Moreover, we extend to the bounded sub-Gaussian case by using the zero-bias transformation, which could be seen as a generalization of the classical Stein's lemma. We show that with some additional assumptions there is an algorithm whose outputs have $\ell_\infty$-norm estimation error of $O(\frac{1}{\sqrt{p}}+\sqrt{\frac{p}{n}})$ with high probability. Finally, for both Gaussian and sub-Gaussian cases we propose a scalable algorithm based on sub-sampling method and show that when the sub-sample size is large enough the estimation errors will be almost the same as previous ones. Experimental results for both Gaussian and sub-Gaussian cases support our theoretical results. To our best knowledge, this is the first paper study and provide theoretical guarantees of the stochastic linear combination of non-linear regressions model.

Di Wang^* , Xiangyu Guo^*, Chaowen Guan, Shi Li and Jinhui Xu (* equal contribution)
The Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI 2020)

[AAAI] Pairwise Learning with Differential Privacy Guarantees [Link] Abstract▼
Pairwise learning has received much attention recently as it is more capable of modeling the relative relationship between pairs of samples. Many machine learning tasks can be categorized as pairwise learning, such as AUC maximization and metric learning. Existing techniques for pairwise learning all fail to take into consideration a critical issue in their design, i.e., the protection of sensitive information in the training set. Models learned by such algorithms can implicitly memorize the details of sensitive information, which offers opportunity for malicious parties to infer it from the learned models. To address this challenging issue, in this paper, we propose several differentially private pairwise learning algorithms for both online and offline settings. Specifically, for the online setting, we first introduce a differentially private algorithm (called OnPairStrC) for strongly convex loss functions. Then, we extend this algorithm to general convex loss functions and give another differentially private algorithm (called OnPairC). For the offline setting, we also present two differentially private algorithms (called OffPairStrC and OffPairC) for strongly and general convex loss functions, respectively. These proposed algorithms can not only learn the model effectively from the data but also provide strong privacy protection guarantee for sensitive information in the training set. Extensive experiments on real-world datasets are conducted to evaluate the proposed algorithms and the experimental results support our theoretical analysis.

Mengdi Huai^*, Di Wang^*, Chenglin Miao, Jinhui Xu and Aidong Zhang (* equal contribution)
Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI 2020)

[AAAI] Towards Interpretation of Pairwise Learning [Link] Abstract▼
Recently, there are increasingly more attentions paid to an important family of learning problems called pairwise learning, in which the associated loss functions depend on pairs of instances. Despite the tremendous success of pairwise learning in many real-world applications, the lack of transparency behind the learned pairwise models makes it difficult for users to understand how particular decisions are made by these models, which further impedes users from trusting the predicted results. To tackle this problem, in this paper, we study feature importance scoring as a specific approach to the problem of interpreting the predictions of black-box pairwise models. Specifically, we first propose a novel adaptive Shapley-value-based interpretation method, based on which a vector of importance scores associated with the underlying features of a testing instance pair can be adaptively calculated with the consideration of feature correlations, and these scores can be used to indicate which features make key contributions to the final prediction. Considering that Shapley-value-based methods are usually computationally challenging, we further propose a novel robust approximation interpretation method for pairwise models. This method is not only much more efficient but also robust to data noise. To the best of our knowledge, we are the first to investigate how to enable interpretation in pairwise learning. Theoretical analysis and extensive experiments demonstrate the effectiveness of the proposed methods.

Mengdi Huai, Di Wang, Chenglin Miao and Aidong Zhang
The Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI 2020)

[ECML-PKDD] Escaping Saddle Points of Empirical Risk Privately and Scalably via DP-Trust Region Method [Link] Abstract▼
It has been shown recently that many non-convex objective/loss functions in machine learning and deep learning are known to be strict saddle. This means that finding a second-order stationary point (i.e., approximate local minimum) and thus escaping saddle points are sufficient for such functions to obtain a classifier with good generalization performance. Existing algorithms for escaping saddle points, however, all fail to take into consideration a critical issue in their designs, that is, the protection of sensitive information in the training set. Models learned by such algorithms can often implicitly memorize the details of sensitive information, and thus offer opportunities for malicious parties to infer it from the learned models. In this paper, we investigate the problem of privately escaping saddle points and finding a second-order stationary point of the empirical risk of non-convex loss function. Previous result on this problem is mainly of theoretical importance and has several issues ({e.g., high sample complexity and non-scalable) which hinder its applicability, especially, in big data. To deal with these issues, we propose in this paper a new method called Differentially Private Trust Region, and show that it outputs a second-order stationary point with high probability and less sample complexity, compared to the existing one. Moreover, we also provide a stochastic version of our method (along with some theoretical guarantees) to make it faster and more scalable. Experiments on benchmark datasets suggest that our methods are indeed more efficient and practical than the previous one.

Di Wang and Jinhui Xu
The 2020 European Conference on Machine Learning (ECML-PKDD 2020)

[BIBM] Global Interpretation for Patient Similarity Learning [Link] Abstract▼
As an important family of learning problems, pairwise learning has received much attention in recent years. Since pairwise learning involves pairs of instances in its loss function, it is more capable of modeling the relative relationships between instances compared with traditional pointwise learning (e.g., classification). In practice, many machine learning and data mining tasks can be categorized as pairwise learning. Although pairwise learning has achieved tremendous success in many real-world applications, the lack of transparency behind the behavior of the learned pairwise model impedes users from trusting the predicted results, which hampers its further applications in the real world. To tackle this problem, in this paper, we investigate how to enable interpretation in pairwise learning and propose a global interpretation method for pairwise learning. Based on the proposed global interpretation method, we can identify a minimal sufficient subset of data features that are sufficient in themselves to justify the global predictions made by the pairwise model. The identified minimal sufficient feature subset can help us better understand the overall behaviors of the learned pairwise model across different subpopulations of instance pairs. To the best of our knowledge, this is the first work that provides global interpretation for pairwise learning. We also conduct extensive experiments on real-world datasets to evaluate the performance of the proposed global interpretation method.

Mengdi Huai, Chenglin Miao, Jinduo Liu, Di Wang, Jingyuan Chou and Aidong Zhang.
The 2020 IEEE International Conference on Bioinformatics and Biomedicine (IEEE BIBM 2020)
Selected as Regular Paper (Acceptance Rate: 19.4%)

Journal Papers

[JMLR] Empirical Risk Minimization in the Non-interactive Local Model of Differential Privacy [Link] Abstract▼
In this paper, we study the Empirical Risk Minimization (ERM) problem in the non-interactive Local Differential Privacy (LDP) model. We first show that if the loss function is $(\infty, T)$-smooth, by using the Bernstein polynomial approximation we can avoid a dependency of the sample complexity, to achieve error $\alpha$, on the exponential of the dimensionality $p$ with base $1/\alpha$ (i.e., $\alpha^{-p}$). This answers a question from (Smith et.al., 2017). Then, we propose player-efficient algorithms with $1$-bit communication complexity and $O(1)$ computation cost for each player. The error bound of these algorithms is asymptotically the same as the original one. With some additional assumptions, we also give an algorithm which is more efficient for the server. Based on different types of polynomial approximations, we propose (efficient) non-interactive locally differential private algorithms for learning the set of k-way marginal queries and the set of smooth queries. Moreover, we study the case of $1$-Lipschitz generalized linear convex loss functions and show that there is an $(\epsilon, \delta)$-LDP algorithm whose sample complexity for achieving error $\alpha$ is only linear in the dimensionality $p$ and quasi-polynomial in other terms. To prove this, we first show that the conclusion holds for the hinge loss function. Then, we extend the result to any $1$-Lipschitz generalized linear convex loss functions by showing that every such a function can be approximated by a linear combination of hinge loss functions and some linear functions. Our results use a polynomial of inner product approximation technique. Then we apply our technique to the Euclidean median problem and show that its sample complexity needs only to be quasi-polynomial in $p$, which is the first result with a sub-exponential sample complexity in $p$ for non-generalized linear loss functions.

Di Wang, Marco Gaboardi, Adam Smith and Jinhui Xu
Journal of Machine Learning Research, Volume 21, 200 (2020), Pages 1-39

[MLJ] Robust High Dimensional Expectation Maximization Algorithm via Trimmed Hard Thresholding [Link] Abstract▼
In this paper, we study the problem of estimating latent variable models with arbitrarily corrupted samples in the high dimensional case, i.e., $d\gg n$, where the underlying parameter is sparse. Specifically, we propose a method called Trimmed (Gradient) Expectation Maximization which attaches a trimming gradients and hard thresholding step to the Expectation step (E-step) and Maximization step (M-step), respectively. Particularly, under some mild assumptions, with an appropriate initialization, we show that the algorithm is corruption-free and converges to the (near) optimal statistical rate geometrically when the fraction of corruption samples satisfies $\alpha\leq O(\frac{1}{\sqrt{n}})$. Moreover, we implement our general framework to three canonical examples: mixture of Gaussians, mixture of regressions and linear regression with missing covariates. Experiments also support our theoretical analysis.

Di Wang^*, Xiangyu Guo^*, Shi Li and Jinhui Xu (* equal contribution)
Machine Learning, 109, 2283-2311 (2020)

[TCS] Tight Lower Bound of Locally Differentially Private Sparse Covariance Matrix Estimation [Link] Abstract▼
In this paper, we study the sparse covariance matrix estimation problem in the local differential privacy model, and give a lower bound of $\Omega(\frac{s^2\log p}{n\epsilon^2})$ on the $\epsilon$ non-interactive private minimax risk in the metric of squared spectral norm, where $s$ is the row sparsity of the underlying covariance matrix, $n$ is the sample size, and $p$ is the dimensionality of the data. We show that the lower bound is actually tight, as it matches a previous upper bound.Our main technique for achieving this lower bound is a general framework, called General Private Assouad Lemma, which is a considerable generalization of the previous private Assouad lemma and can be used as a general method for bounding the private minimax risk of matrix-related estimation problems.

Di Wang and Jinhui Xu
Theoretical Computer Science, Volume 815, 2 May 2020, Pages 47-59

[TCS] Principal Component Analysis in the Local Differential Privacy Model [Link] Abstract▼
In this paper, we study the Principal Component Analysis (PCA) problem under the (distributed) non-interactive local differential privacy model. For the low dimensional case (i.e., $p \ll n$), we show the optimal rate of $\Theta(\frac{kp}{n\epsilon^2})$ (omitting the eigenvalue terms) for the private minimax risk of the $k$-dimensional PCA using the squared subspace distance as the measurement, where $n$ is the sample size and $\epsilon$ is the privacy parameter. For the high dimensional (i.e., $p\gg n$) row sparse case, we first give a lower bound of $\Omega(\frac{ks\log p }{n\epsilon^2})$ on the private minimax risk, where $s$ is the underlying sparsity parameter. Then we provide an efficient algorithm to achieve the upper bound of $O(\frac{s^2\log p}{n\epsilon^2})$. Experiments on both synthetic and real world datasets confirm our theoretical guarantees.

Di Wang and Jinhui Xu
Theoretical Computer Science, Volume 809, 24 February 2020, Pages 296-312

[Neurocomputing] Estimating Stochastic Linear Combination of Non-linear Regressions Efficiently and Scalably [Link] Abstract▼
Recently, many machine learning and statistical models such as non-linear regressions, the Single Index, Multi-index, Varying Coefficient Index Models and Two-layer Neural Networks can be reduced to or be seen as a special case of a new model which is called the \textit{Stochastic Linear Combination of Non-linear Regressions} model. However, due to the high non-convexity of the problem, there is no previous work study how to estimate the model. In this paper, we provide the first study on how to estimate the model efficiently and scalably. Specifically, we first show that with some mild assumptions, if the variate vector $x$ is multivariate Gaussian, then there is an algorithm whose output vectors have $\ell_2$-norm estimation errors of $O(\sqrt{\frac{p}{n}})$ with high probability, where $p$ is the dimension of $x$ and $n$ is the number of samples. The key idea of the proof is based on an observation motived by the Stein's lemma. Then we extend our result to the case where $x$ is bounded and sub-Gaussian using the zero-bias transformation, which could be seen as a generalization of the classic Stein's lemma. We also show that with some additional assumptions there is an algorithm whose output vectors have $\ell_\infty$-norm estimation errors of $O(\frac{1}{\sqrt{p}}+\sqrt{\frac{p}{n}})$ with high probability. We also provide a concrete example to show that there exists some link function which satisfies the previous assumptions. Finally, for both Gaussian and sub-Gaussian cases we propose a faster sub-sampling based algorithm and show that when the sub-sample sizes are large enough then the estimation errors will not be sacrificed by too much. Experiments for both cases support our theoretical results. To the best of our knowledge, this is the first work that studies and provides theoretical guarantees for the stochastic linear combination of non-linear regressions model.

Di Wang^* , Xiangyu Guo^* , Chaowen Guan, Shi Li and Jinhui Xu (* equal contribution)
Neurocomputing, Volume 399, 25 July 2020, Pages 129-140

2019

Conference Papers

[ICML] Differentially Private Empirical Risk Minimization with Non-convex Loss Functions [Link] Abstract▼
We study the problem of Empirical Risk Minimization (ERM) with (smooth) non-convex loss functions under the differential-privacy (DP) model. Existing approaches for this problem mainly adopt gradient norms to measure the error, which in general cannot guarantee the quality of the solution. To address this issue, we first study the expected excess empirical (or population) risk, which was primarily used as the utility to measure the quality for convex loss functions. Specifically, we show that the excess empirical (or population) risk can be upper bounded by $\tilde{O}(\frac{d\log (1/\delta)}{\log n\epsilon^2})$ in the $(\epsilon, \delta)$-DP settings, where $n$ is the data size and $d$ is the dimensionality of the space. The $\frac{1}{\log n}$ term in the empirical risk bound can be further improved to $\frac{1}{n^{\Omega(1)}}$ (when $d$ is a constant) by a highly non-trivial analysis on the time-average error. To obtain more efficient solutions, we also consider the connection between achieving differential privacy and finding approximate local minimum. Particularly, we show that when the size $n$ is large enough, there are $(\epsilon, \delta)$-DP algorithms which can find an approximate local minimum of the empirical risk with high probability in both the constrained and non-constrained settings. These results indicate that one can escape saddle points privately.

Di Wang, Changyou Chen and Jinhui Xu
The 36th International Conference on Machine Learning (ICML 2019)

[ICML] On Sparse Linear Regression in the Local Differential Privacy Model [Link] Abstract▼
In this paper, we study the sparse linear regression problem under the Local Differential Privacy (LDP) model. We first show that polynomial dependency on the dimensionality $p$ of the space is unavoidable for the estimation error in both non-interactive and sequential interactive local models, if the privacy of the whole dataset needs to be preserved. Similar limitations also exist for other types of error measurements and in the relaxed local models. This indicates that differential privacy in high dimensional space is unlikely achievable for the problem. With the understanding of this limitation, we then present two algorithmic results. The first one is a sequential interactive LDP algorithm for the low dimensional sparse case, called Locally Differentially Private Iterative Hard Thresholding (LDP-IHT), which achieves a near optimal upper bound. This algorithm is actually rather general and can be used to solve quite a few other problems, such as (Local) DP-ERM with sparsity constraints and sparse regression with non-linear measurements. The second one is for the restricted (high dimensional) case where only the privacy of the responses (labels) needs to be preserved. For this case, we show that the optimal rate of the error estimation can be made logarithmically depending on $p$ (i.e., $\log p$) in the local model, where an upper bound is obtained by a label-privacy version of LDP-IHT. Experiments on real world and synthetic datasets confirm our theoretical analysis.

Di Wang and Jinhui Xu
The 36th International Conference on Machine Learning (ICML 2019)
Selected as Long Talk (Acceptance Rate: 140/3424= 4.1%)
NeurIPS 2018 Workshop on Privacy Preserving Machine Learning

[NeurIPS] Facility Location Problem in Differential Privacy Model Revisited [Link] Abstract▼
In this paper we study the uncapacitated facility location problem in the model of differential privacy (DP) with uniform facility cost. Specifically, we first show that, under the \emph{hierarchically well-separated tree (HST) metrics} and the super-set output setting that was introduced in (Gupta et al 2010), there is an $\epsilon$-DP algorithm that achieves an $O(\frac{1}{\epsilon})$ (expected multiplicative) approximation ratio; this implies an $O(\frac{\log n}{\epsilon})$ approximation ratio for the general metric case, where $n$ is the size of the input metric. These bounds improve the best-known results given by (Gupta et al 2010). In particular, our approximation ratio for HST-metrics is independent of $n$, and the ratio for general metrics is independent of the aspect ratio of the input metric. On the negative side, we show that the approximation ratio of any $\epsilon$-DP algorithm is lower bounded by $\Omega(\frac{1}{\sqrt{\epsilon}})$, even for instances on HST metrics with uniform facility cost, under the super-set output setting. The lower bound shows that the dependence of the approximation ratio for HST metrics on $\epsilon$ can not be removed or greatly improved. Our novel methods and techniques for both the upper and lower bound may find additional applications.

[alphabetic order] Yunus Esencayi, Marco Gaboardi, Shi Li and Di Wang
Conference on Neural Information Processing Systems (NIPS/NeurIPS), 2019

[ALT] Noninteractive Locally Private Learning of Linear Models via Polynomial Approximations [Link] Abstract▼
In this paper, we study the Empirical Risk Minimization problem in the non-interactive Local Differential Privacy (LDP) model. First, we show that for the hinge loss function, there is an $(\epsilon, \delta)$-LDP algorithm whose sample complexity for achieving an error of $\alpha$ is only linear in the dimensionality $p$ and quasi-polynomial in other terms. Then, we extend the result to any $1$-Lipschitz generalized linear convex loss functions by showing that every such function can be approximated by a linear combination of hinge loss functions and some linear functions. Finally, we apply our technique to the Euclidean median problem and show that its sample complexity needs only to be quasi-polynomial in $p$, which is the first result with a sub-exponential sample complexity in $p$ for non-generalized linear loss functions. Our results are based on a technique, called polynomial of inner product approximation, which may be applicable to other problems.

Di Wang, Adam Smith and Jinhui Xu
The 30th International Conference on Algorithmic Learning Theory (ALT 2019)

[AAAI] Differentially Private Empirical Risk Minimization with Smooth Non-convex Loss Functions: A Non-stationary View [Link] Abstract▼
In this paper, we study the Differentially Private Empirical Risk Minimization (DP-ERM) problem with non-convex loss functions and give several upper bounds for the utility in different settings. We first consider the problem in low-dimensional space. For DP-ERM with non-smooth regularizer, we generalize an existing work by measuring the utility using $\ell_2$ norm of the projected gradient. Also, we extend the error bound measurement, for the first time, from empirical risk to population risk by using the expected $\ell_2$ norm of the gradient. We then investigate the problem in high dimensional space, and show that by measuring the utility with Frank-Wolfe gap, it is possible to bound the utility by the Gaussian Width of the constraint set, instead of the dimensionality $p$ of the underlying space. We further demonstrate that the advantages of this result can be achieved by the measure of $\ell_2$ norm of the projected gradient. A somewhat surprising discovery is that although the two kinds of measurements are quite different, their induced utility upper bounds are asymptotically the same under some assumptions. We also show that the utility of some special non-convex loss functions can be reduced to a level (i.e., depending only on $\log p$) similar to that of convex loss functions. Finally, we test our proposed algorithms on both synthetic and real world datasets and the experimental results confirm our theoretical analysis.

Di Wang and Jinhui Xu
The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI 2019)
Selected as Oral Presentation (Acceptance Rate: 460/7095=6.5%)

[IJCAI] Lower Bound of Locally Differentially Private Sparse Covariance Matrix Estimation [Link] Abstract▼
In this paper, we study the sparse covariance matrix estimation problem in the local differential privacy model, and give a non-trivial lower bound on the non-interactive private minimax risk in the metric of squared spectral norm. We show that the lower bound is actually tight, as it matches a previous upper bound. Our main technique for achieving this lower bound is a general framework, called General Private Assouad Lemma, which is a considerable generalization of the previous private Assouad lemma and can be used as a general method for bounding the private minimax risk of matrix-related estimation problems.

Di Wang and Jinhui Xu
The 28th International Joint Conference on Artificial Intelligence (IJCAI 2019)

[IJCAI] Principal Component Analysis in the Local Differential Privacy Model [Link] Abstract▼
In this paper, we study the Principal Component Analysis (PCA) problem under the (distributed) non-interactive local differential privacy model. For the low dimensional case, we show the optimal ratefor the private minimax risk of the kdimensional PCA using the squared subspace distance as the measurement. For the high dimensional row sparse case, we first give a lower bound on the private minimax risk, . Then we provide an efficient algorithm to achieve a near optimal upper bound. Experiments on both synthetic and real world datasets confirm the theoretical guarantees of our algorithms.

Di Wang and Jinhui Xu
The 28th International Joint Conference on Artificial Intelligence (IJCAI 2019)

[IJCAI] Privacy-aware Synthesizing for Crowdsourced Data [Link] Abstract▼
The prevalence of the World Wide Web enables the data collectors to easily share their data that are collected from a large crowd of users with the public. Although releasing crowdsourced data on the web brings many benefits to the data analyzers to conduct statistical analysis, it may violate crowd users' data privacy, which has been a major obstacle to release the crowdsourced data. A potential way to address this problem is to employ the traditional differential privacy-based mechanisms and perturb the data with some noise before publishing them. However, such noise perturbation mechanisms offer little data utility because crowdsourced datasets contain massive amounts of conflicting data records with large domains. In particular, this degraded utility results from two respects: firstly, the originally collected crowdsourced data usually contain conflicting data; secondly, the noise needed to guarantee differential privacy may be proportional to the number of data records or the domain of the input data, which renders the released crowdsourced data useless. To address the above challenges, we propose a novel privacy-aware synthesizing method for crowdsourced data. In this method, the data collectors first learn the underlying distributions of the crowdsourced data through taking each user's fine grained reliability degrees into account. Then, a set of candidate synthetics are sampled from the learned distributions. Finally, these sampled candidate synthetics are subjected to privacy tests, and only those candidate synthetics which pass privacy tests are allowed to be safely released. The proposed method not only provides strong privacy protection for individual users but also generates high utility synthetic data. The desirable performance of the proposed method is verified via theoretical analysis and extensive experiments conducted on both real-world and synthetic datasets.

Mengdi Huai, Di Wang, Chenglin Miao, Jinhui Xu, Aidong Zhang
The 28th International Joint Conference on Artificial Intelligence (IJCAI 2019)

[CISS] Estimating Sparse Covariance Matrix Under Differential Privacy via Thresholding [Link] Abstract▼
In this paper, we study the problem of estimating the covariance matrix under differential privacy, where the underlying covariance matrix is assumed to be sparse and of high dimensions. We propose a new method, called DP-Thresholding, to achieve a non-trivial $\ell_2$-norm based error bound, which is significantly better than the existing ones from adding noise directly to the empirical covariance matrix. Experiments on the synthetic datasets show consistent results with our theoretical claims.

Di Wang, Jinhui Xu and Yang He
The 53rd Annual Conference on Information Sciences and Systems (CISS 2019)

Journal Papers

[Neurocomputing] Faster Large Scale Constrained Linear Regression via Two-Step Preconditioning [Link] Abstract▼
In this paper, we study the large scale constrained linear regression problem and propose a two-step preconditioning method, which is based on some recent developments on random projection, sketching techniques and convex optimization methods. Combining the method with (accelerated) mini-batch SGD, we can achieve an approximate solution with a time complexity lower than that of the state-of-the-art techniques for the low precision case. Our idea can also be extended to the high precision case, which gives an alternative implementation to the Iterative Hessian Sketch (IHS) method with significantly improved time complexity. Experiments on benchmark and synthetic datasets suggest that our methods indeed outperform existing ones considerably in both the low and high precision cases.

Di Wang and Jinhui Xu
Neurocomputing, Volume 364, 28 October 2019, Pages 280-296

2018

Conference Papers

[NeurIPS] Empirical Risk Minimization in Non-interactive Local Differential Privacy Revisited [Link] Abstract▼
In this paper, we revisit the Empirical Risk Minimization problem in the non-interactive local model of differential privacy. In the case of constant or low dimensions ($p\ll n$), we first show that if the loss function is $(\infty, T)$-smooth, we can avoid a dependence of the sample complexity, to achieve error $\alpha$, on the exponential of the dimensionality $p$ with base $1/\alpha$ (i.e., $\alpha^{-p}$), which answers a question in (Smith et al 2017). Our approach is based on polynomial approximation. Then, we propose player-efficient algorithms with $1$-bit communication complexity and $O(1)$ computation cost for each player. The error bound is asymptotically the same as the original one. With some additional assumptions, we also give an efficient algorithm for the server. In the case of high dimensions ($n\ll p$), we show that if the loss function is a convex generalized linear function, the error can be bounded by using the Gaussian width of the constrained set, instead of $p$, which improves the one in Smith et al. Our techniques can be extended to some related problems, such as $k$-way marginal queries and smooth queries.

Di Wang, Marco Gaboardi and Jinhui Xu
Conference on Neural Information Processing Systems (NIPS/NeurIPS), 2018

[AAAI ] Large Scale Constrained Linear Regression Revisited: Faster Algorithms via Preconditioning [Link] Abstract▼
In this paper, we revisit the large-scale constrained linear regression problem and propose faster methods based on some recent developments in sketching and optimization. Our algorithm combines mini-batch SGD with a new method called two-step preconditioning to achieve an $\epsilon$-accuracy solution for the low precision case, and has a lower time complexity than the state-of-the-art techniques. Our idea can also be extended to the high precision case, which gives an alternative implementation to the Iterative Hessian Sketch (IHS) method with significantly improved time complexity. Experiments on benchmark and synthetic datasets suggest that our methods indeed outperform existing ones considerably in both the low and high precision cases.

Di Wang and Jinhui Xu
The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI 2018)
Selected as Oral Presentation (Acceptance Rate: 411/3800=10.8%)

[GlobalSip] Differentially Private Sparse Inverse Covariance Estimation [Link] Abstract▼
In this paper, we give the first study of sparse inverse covariance estimation problem under differential privacy. Firstly, we propose an $\epsilon$-differentially private algorithm via output perturbation, which is based on the sensitivity of the optimization problem and Wishart mechanism. Based on the idea of that, we propose a general covariance perturbation method, and then for $\epsilon$-differential privacy, we analyze Laplacian and Wishart mechanisms, for $(\epsilon,\delta)$-differential privacy we analyze Gaussian and Wishart mechanisms. Moreover, we extend the covariance perturbation algorithm to distributed setting and local differential privacy. Experiments on synthetic and benchmark datasets are also support our theoretical analysis.

Di Wang, Mengdi Huai and Jinhui Xu
2018 6th IEEE Global Conference on Signal and Information Processing (2018 GlobalSip)
Selected as Oral Presentation

2017

Conference Papers

[NeurIPS] Differentially Private Empirical Risk Minimization Revisited: Faster and More General [Link] Abstract▼
In this paper we study the differentially private Empirical Risk Minimization (ERM) problem in different settings. For smooth (strongly) convex loss function with or without (non)-smooth regularization, we give algorithms that achieve either optimal or near optimal utility bounds with less gradient complexity compared with previous work. For ERM with smooth convex loss function in high-dimensional ($p\gg n$) setting, we give an algorithm which achieves the upper bound with less gradient complexity than previous ones. At last, we generalize the expected excess empirical risk from convex loss functions to non-convex ones satisfying the Polyak-Lojasiewicz condition and give a tighter upper bound on the utility than the one in (Zhang et al 2017).

Di Wang, Minwei Ye and Jinhui Xu
Conference on Neural Information Processing Systems (NIPS/NeurIPS), 2017