Di Wang's Publications
(★ my Master/PhD/Intern/Visiting students)
2023
Conference Papers
- High-Speed Wireless Communications Inspired Energy Efficient Federated Learning over Mobile Devices. Abstract▼
Energy efficiency is essential for federated learning (FL) over mobile devices and its potential prosperous applications. Different from existing communication efficient FL research efforts, which regard communication energy consumption as the bottleneck, we have observed that with ever increasing wireless transmission speed (e.g., Wi-Fi 5 or 5G), the energy consumption of wireless communications for model updates in FL is significantly reduced and sometimes is smaller than that of local on-device training. Motivated by such observations, in this paper, we propose a high-speed wireless communications inspired energy efficient federated learning
over mobile devices (EEFL), whose goal is to reduce the overall energy consumption (computing + communication). In particular, we design a novel energy-aware adaptive local update policy for mobile devices by jointly considering FL performance and energy saving of high-speed wireless transmissions. Furthermore, given the device's local update policy in each FL global round, we advance the dynamic voltage and frequency scaling (DVFS) strategy to minimize local training's energy consumption by keeping GPU and CPU working at appropriate frequencies without triggering thermal throttling. Extensive experimental results with various learning models, datasets, and wireless transmission environments demonstrate the proposed EEFL's superiority over the peer designs in terms of energy efficiency.
Rui Chen, Qiyu Wan, Xinyue Zhang, Xiaoqi Qin, Di Wang, Xin Fu, and Miao Pan.
The 21st ACM International Conference on Mobile Systems, Applications, and Services (MobiSys 2023).
- On Practical Differentially Private and Byzantine-resilient Federated Learning. Abstract▼
Privacy and Byzantine resilience are two indispensable requirements for a federated learning (FL) system. Although there have been extensive studies on privacy and Byzantine security in their own track, solutions that consider both remain sparse. This is due to difficulties in reconciling privacy-preserving and Byzantine-resilient algorithms.
In this work, we propose a solution to such a two-fold issue. We use our version of differentially private stochastic gradient descent (DP-SGD) algorithm to preserve privacy and then apply our Byzantine-resilient algorithms. We note that while existing works follow this general approach, an in-depth analysis on the interplay between DP and Byzantine resilience has been ignored, leading to unsatisfactory performance. Specifically, for the random noise introduced by DP, previous works strive to reduce its seemingly detrimental impact on the Byzantine aggregation. In contrast, we leverage the random noise to construct a first-stage aggregation that effectively rejects many existing Byzantine attacks. Moreover, based on another property of our DP variant, we form a second-stage aggregation which provides a final sound filtering. Our protocol follows the principle of co-designing both DP and Byzantine resilience.
We provide both theoretical proof and empirical experiments to show our protocol is effective: retaining high accuracy while preserving the DP guarantee and Byzantine resilience. Compared with the previous work, our protocol 1) achieves significantly higher accuracy even in a high privacy regime; 2) works well even when up to 90% distributive workers are Byzantine.
Zihang Xiang★, Tianhao Wang, Wanyu Lin, and Di Wang.
International Conference on Management of Data (SIGMOD 2023).
-
Privacy-preserving Sparse Generalized Eigenvalue Problem. Abstract▼
In this paper we study the (sparse) Generalized Eigenvalue Problem (GEP), which arises in a
number of modern statistical learning models, such as principal component analysis (PCA), canonical correlation analysis (CCA), Fisher's discriminant analysis (FDA) and sliced inverse regression (SIR). Existing techniques for GEP all fail to consider the protection of sensitive information in the training set. Models learned by such methods can implicitly
memorize the details of sensitive information.
To address this issue, we provide the first study on GEP in the differential privacy (DP) model under both deterministic and stochastic settings. In the low dimensional case, we provide a $\rho$-Concentrated DP (CDP) method namely DP-Rayleigh Flow and show if the initial vector is close enough to the optimal vector, its output has an $\ell_2$-norm estimation error of $\tilde{O}(\frac{d}{n}+\frac{d}{n^2\rho})$ (under some mild assumptions), where $d$ is the dimension and $n$ is the sample size. Next, we discuss how to find such a initial parameter privately. In the high dimensional sparse case where $d\gg n$, we propose the DP-Truncated Rayleigh Flow method whose output could achieve an error of $\tilde{O}(\frac{s\log d}{n}+\frac{s\log d}{n^2\rho})$ for various statistical models, where $s$ is the sparsity of the underlying parameter. Moreover, we show that these errors in the stochastic setting are optimal up to a factor of $\text{Poly}(\log n)$ by providing the lower bounds of PCA and SIR under statistical setting and in the CDP model. Finally, to give a separation between $\epsilon$-DP and $\rho$-CDP for GEP, we also provide the lower bound $\Omega(\frac{d}{n}+\frac{d^2}{n^2\epsilon^2})$ and $\Omega(\frac{s\log d}{n}+\frac{s^2\log^2 d}{n^2\epsilon^2})$ of private minimax risk for PCA, under the statistical setting and $\epsilon$-DP model, in low and high dimensional sparse case respectively. Experiments on both synthetic and real-world data also support our theoretical analysis.
Lijie Hu*★, Zihang Xiang*★, Jiabin Liu, and Di Wang. (* equal contribution)
The 26th International Conference on Artificial Intelligence and Statistics (AISTATS 2023).
-
SEAT: Stable and Explainable Attention. Abstract▼
Currently, attention mechanism becomes a standard fixture in most state-of-the-art natural language processing (NLP) models, not only due to outstanding performance it could gain, but also due to plausible innate explanation for the behaviors of neural architectures it provides, which is notoriously difficult to analyze. However, recent studies show that attention is unstable against randomness and perturbations during training or testing, such as random seeds and slight perturbation of embedding vectors, which impedes it from becoming a faithful explanation tool. Thus, a natural question is whether we can find some substitute of the current attention which is more stable and could keep the most important characteristics on explanation and prediction of attention. In this paper, to resolve the problem, we provide a first rigorous definition of such alternate namely SEAT (Stable and Explainable Attention). Specifically, a SEAT should has the following three properties: (1) Its prediction distribution is enforced to be close to the distribution based on the vanilla attention; (2) Its top-$k$ indices have large overlaps with those of the vanilla attention; (3) It is robust w.r.t perturbations, i.e., any slight perturbation on SEAT will not change the prediction distribution too much, which implicitly indicates that it is stable to randomness and perturbations. Moreover we propose a method to get a SEAT, which could be considered as an ad hoc modification for the canonical attention. Finally, through intensive experiments on various datasets, we compare our SEAT with other baseline methods using RNN, BiLSTM and BERT architectures via six different evaluation metrics for model interpretation, stability and accuracy. Results show that SEAT is more stable against different perturbations and randomness while also keeps the explainability of attention, which indicates it is a more faithful explanation.
Moreover, compared with vanilla attention, there is almost no utility (accuracy) degradation for SEAT.
Lijie Hu*★, Yixin Liu *, Ninghao Liu , Mengdi Huai, Lichao Sun, and Di Wang. (* equal contribution)
The 37th AAAI Conference on Artificial Intelligence (AAAI 2023).
Selected as an Oral paper.
Journal Papers
-
High Dimensional Sparse Statistical Estimation Under One-bit Quantization. [Link] Abstract▼
In this paper, we consider several fundamental machine learning and (sparse) statistical estimation problems from i.i.d samples of a sub-Gaussian or heavy-tailed distribution. Instead of having the full knowledge of the samples, we focus on the scenario where each entry of these the samples is quantized to one or two bits, which plays a critical role in signal processing or machine learning applications. Specifically, we give the first study on the problems of (sparse) covariance matrix estimation, sparse linear regression and low-rank matrix completion. For covariance matrix estimation, compared with the previous results, we extend to the heavy-tailed case where the underlying distribution has bounded fourth order moments. Moreover, we extend to the high dimensional sparse (and heavy-tailed) cases by providing several new estimators which consist of four steps: Truncating, Dithering, Quantizing and Thresholding. For sparse linear model and low-rank matrix completion, we propose new quadratic objective functions for one-bit quantized samples based on our previous estimators. For all of our estimators we also derive estimation errors. Especially, in the sub-Gaussian case, all of our estimators could achieve near optimal rates in the unquantized cases.
Finally, extensive experiments on synthetic and real-world data have been carried out to support our theories.
Junren Chen★, Cheng-Long Wang★, Michael Kwok Po NG, and Di Wang.
Accepted at IEEE Transactions on Information Theory.
- Generalized Linear Models in Non-interactive Local
Differential Privacy with Public Data. [Link] Abstract▼
In this paper, we study the problem of estimating smooth Generalized Linear Models (GLMs) in the Non-interactive Local Differential Privacy (NLDP) model.
Different from its classical setting, our model allows the server to access
some additional public but unlabeled data. In the first part of the paper we focus on GLMs. Specifically,
we first consider the case where each data record is i.i.d. sampled from a zero-mean multivariate Gaussian distribution. Motivated by the Stein's lemma, we present an $(\epsilon, \delta)$-NLDP algorithm for
GLMs. Moreover, the sample complexity of the
public and private data for the algorithm to achieve an $\ell_2$-norm estimation error of $\alpha$ (with high probability) is ${O}(p \alpha^{-2})$ and $\tilde{O}(p^3\alpha^{-2}\epsilon^{-2})$ respectively, where $p$ is the dimension of the feature vector. This is a significant improvement over the previously known exponential or quasi-polynomial in $\alpha^{-1}$, or exponential in $p$ sample complexities of GLMs with no public data. Then we consider a more general setting where each data record is i.i.d. sampled from some sub-Gaussian distribution with bounded $\ell_1$-norm. Based on a variant of Stein's lemma, we propose an $(\epsilon, \delta)$-NLDP algorithm for
GLMs (under some mild assumptions). The sample complexity of the
public and private data for the algorithm to achieve an $\ell_\infty$-norm estimation error of $\alpha$ is ${O}(p^2\alpha^{-2})$ and $\tilde{O}(p^2\alpha^{-2}\epsilon^{-2})$ respectively, if $\alpha$ is not too small
({\em i.e.,} $\alpha\geq \Omega(\frac{1}{\sqrt{p}})$). In the second part of the paper, we extend our idea to the non-linear regression problem and show
similar results for it under the multivariate Gaussian and sub-Gaussian cases.
Finally, we demonstrate the effectiveness of our algorithms through experiments on both synthetic and real world datasets.
To our best knowledge, this is the first paper showing the existence of efficient and effective
algorithms for GLMs and non-linear regression
in the NLDP model with public unlabeled data.
Di Wang*,
Lijie Hu*★, Huanyu Zhang, Marco Gaboardi, and Jinhui Xu. (* equal contribution)
Minor Revision, Journal of Machine Learning Research.
2022
Conference Papers
-
Truthful Generalized Linear Models. [Link] Abstract▼
In this paper we study estimating Generalized Linear Models (GLMs) in the case where the agents (individuals) are strategic or self-interested and they concern about their privacy when reporting data. Compared with the classical setting, here we aim to design mechanisms that can both incentivize most agents to truthfully report their data and preserve the privacy of individuals' reports, while their outputs should also close to the underlying parameter. In the first part of the paper, we consider the case where the covariates are sub-Gaussian and the responses are heavy-tailed where they only have the finite fourth moments. First, motivated by the stationary condition of the
maximizer of the likelihood function, we derive a novel private and closed form estimator. Based on the estimator, we propose a mechanism which has the following properties via some appropriate design of the computation and payment scheme for several canonical models such as linear regression, logistic regression and Poisson regression: (1) the mechanism is $o(1)$-jointly differentially private (with probability at least $1-o(1)$); (2) it is an $o(\frac{1}{n})$-approximate Bayes Nash equilibrium for a $(1-o(1))$-fraction of agents to truthfully report their data, where $n$ is the number of agents; (3) the output could achieve an error of $o(1)$ to the underlying parameter; (4) it is individually rational for a $(1-o(1))$ fraction of agents in the mechanism ; (5) the payment budget required from the analyst to run the mechanism
is $o(1)$. In the second part, we consider the linear regression model under more general setting where both covariates and responses are heavy-tailed and only have finite fourth moments. By using an $\ell_4$-norm shrinkage operator, we propose a private estimator and payment scheme which have similar properties as in the sub-Gaussian case.
Yuan Qiu★, Jinyan Liu, and Di Wang.
The 18th Conference on Web and Internet Economics (WINE 2022)
-
On PAC Learning Halfspaces in Non-interactive Local Privacy Model with Public
Unlabeled Data. [Link] Abstract▼
In this paper, we study the problem of PAC learning halfspaces in the non-interactive local differential privacy model (NLDP). To breach the barrier of exponential sample complexity, previous results studied a relaxed setting where the server has access to some additional public but unlabeled data. We continue along this direction. Specifically, we consider the problem under the standard setting instead of the large margin setting studied before. Under different mild assumptions on the underlying data distribution, we propose two approaches that are based on the Massart noise model and self-supervised learning, and show that it is possible to achieve sample complexities that are only linear in the dimension and polynomial in other terms for both private and public data, which significantly improve the previous results. Our methods might could also be used to other private PAC learning problems.
Jinyan Su★, Jinhui Xu, and Di Wang.
The 14th Asian Conference on Machine Learning (ACML 2022)
Best Paper Award.
-
Differentially Private $\ell_1$-norm Linear Regression with Heavy-tailed Data. [Link] Abstract▼
We consider Differentially Private Stochastic Convex Optimization (DP-SCO) with heavy-tailed data. Specifically,
we focus on the $\ell_1$ regression in the $\epsilon$-DP model. While most of the previous work focuses on the case where the loss function is Lipschitz, here we only need to assume the variates has bounded moments. Firstly, we study the case where the $\ell_2$ norm of data has bounded second order moment. We propose an algorithm which is based on the exponential mechanism and show that it is possible to achieve an upper bound of $O(\sqrt{\frac{d}{n\epsilon}})$ (with high probability). Next, we relax the assumption to bounded $\theta$-th order moment with some $\theta\in (0, 1)$ and we show that it is possible to achieve an upper bound of $O(({\frac{d}{n\epsilon}})^\frac{\theta-1}{\theta})$. Our algorithms can also
be extended to more relaxed cases where only each coordinate
of the data has bounded moments.
Di Wang and Jinhui Xu.
2022 IEEE International Symposium on Information Theory (ISIT 2022)
-
Private Stochastic Convex Optimization and Sparse Learning with Heavy-tailed Data Revisited. [Link] Abstract▼
In this paper, we revisit the problem of Differentially Private Stochastic Convex Optimization (DP-SCO) with heavy-tailed data, where the gradient of the loss function has bounded moments. Instead of the case where the loss function is Lipschitz or each coordinate of the gradient has bounded second moment studied previously, we consider a relaxed scenario where each coordinate of the gradient only has bounded $(1+v)$-th moment with some $v\in (0, 1]$. Firstly, we start from the one dimensional private mean estimation for heavy-tailed distributions. We propose a novel robust and private mean estimator which is optimal. Based on its idea, we then extend to the general $d$-dimensional space and use it to DP-SCO with general convex and strongly convex loss functions. We also provide lower bounds for these two classes of loss under our setting and show that our upper bounds are optimal up to a factor of $O(\text{Poly}(d))$. To address the high dimensionality issue, we also study DP-SCO with heavy-tailed gradient under some sparsity constraint (DP sparse learning). We propose a new method and show it is also optimal up to a factor of $O(s^*)$, where $s^*$ is the underlying sparsity of the constraint.
Youming Tao★, Yulian Wu★, Xiuzhen Cheng, and Di Wang.
The 31st International Joint Conference on Artificial Intelligence (IJCAI-ECAI 2022).
-
Faster Rates of Private Stochastic Convex Optimization. [Link] Abstract▼
In this paper, we revisit the problem of Differentially Private Stochastic Convex Optimization (DP-SCO) and provide excess population risks for some special classes of functions that are faster than the previous results of general convex and strongly convex functions. In the first part of the paper, we study the case where the population risk function satisfies the Tysbakov Noise Condition (TNC) with some parameter $\theta>1$. Specifically, we first show that under some mild assumptions on the loss functions, there is an algorithm whose output could achieve an upper bound of $\tilde{O}((\frac{1}{\sqrt{n}}+\frac{d}{n\epsilon})^\frac{\theta}{\theta-1}) $ and $\tilde{O}((\frac{1}{\sqrt{n}}+\frac{\sqrt{d\log(1/\delta)}}{n\epsilon})^\frac{\theta}{\theta-1})$ for $\epsilon$-DP and $(\epsilon, \delta)$-DP, respectively when $\theta\geq 2$, here $n$ is the sample size and $d$ is the dimension of the space. Then we address the inefficiency issue, improve the upper bounds by $\text{Poly}(\log n)$ factors and extend to the case where $\theta\geq \bar{\theta}>1$ for some known $\bar{\theta}$. Next we show that the excess population risk of population functions satisfying TNC with parameter $\theta\geq 2$ is always lower bounded by $\Omega((\frac{d}{n\epsilon})^\frac{\theta}{\theta-1}) $ and $\Omega((\frac{\sqrt{d\log(1/\delta)}}{n\epsilon})^\frac{\theta}{\theta-1})$ for $\epsilon$-DP and $(\epsilon, \delta)$-DP, respectively, which matches our upper bounds. In the second part, we focus on a special case where the population risk function is strongly convex. Unlike the previous studies, here we assume the loss function is {\em non-negative} and {\em the optimal value of population risk is sufficiently small}. With these additional assumptions, we propose a new method whose output could achieve an upper bound of $O(\frac{d\log(1/\delta)}{n^2\epsilon^2}+\frac{1}{n^{\tau}})$ and $O(\frac{d^2}{n^2\epsilon^2}+\frac{1}{n^{\tau}})$ for any $\tau> 1$ in $(\epsilon,\delta)$-DP and $\epsilon$-DP model respectively if the sample size $n$ is sufficiently large. These results circumvent their corresponding lower bounds in \cite{feldman2020private} for general strongly convex functions. Finally, we conduct experiments of our new methods on real world data. Experimental results also provide new insights into established theories.
Jinyan Su★, Lijie Hu★, and Di Wang.
The 33rd International Conference on Algorithmic Learning Theory (ALT 2022)
-
Optimal Rates of (Locally) Differentially Private Heavy-tailed
Multi-Armed Bandits. [Link] Abstract▼
In this paper we study the problem of stochastic multi-armed bandits (MAB) in the (local) differential privacy (DP/LDP) model. Unlike the previous results which need to assume bounded reward distributions, here we mainly focus on the case the reward distribution of each arm only has $(1+v)$-th moment with some $v\in (0, 1]$. In the first part, we study the problem in the central $\epsilon$-DP model. We first provide a near-optimal result by developing a private and robust Upper Confidence Bound (UCB) algorithm. Then, we improve the result via a private and robust version of the Successive Elimination (SE) algorithm. Finally, we show that the instance-dependent regret bound of our improved algorithm is optimal by showing its lower bound. In the second part of the paper, we study the problem in the $\epsilon$-LDP model. We propose an algorithm which could be seen as locally private and robust version of the SE algorithm, and show it could achieve (near) optimal rates for both instance-dependent and instance-independent regrets. All of the above results can also reveal the differences between the problem of private MAB with bounded rewards and heavy-tailed rewards. To achieve these (near) optimal rates, we develop several new hard instances and private robust estimators as byproducts, which might could be used to other related problems. Finally, experimental results also support our theoretical analysis and show the effectiveness of our algorithms.
Youming Tao*★, Yulian Wu*★, Peng Zhao, and Di Wang. (* equal contribution) The 25th International Conference on Artificial Intelligence and Statistics (AISTATS 2022).
Selected as an Oral paper (Acceptance Rate: 44/1685=2.6%).
ACM CCS 2021 Workshop on Privacy Preserving Machine Learning.
ICML 2022 Workshop on Responsible Decision Making in Dynamic Environments (Selected as Contributed Talk).
-
On Facility Location Problem in Local Differential Privacy Model. Abstract▼
This paper studies the facility location problem, where the given input is combined of (1) a metric space (V, d), (2) facility cost f_i for each facility, We study the uncapacitated facility location (UFL) problem under the constraints imposed by the local differential privacy (LDP). Recently, Gupta et al. (2009) and Esencayi et al. (2019) proposed lower and upper bounds for the UFL problem on the central differential privacy (DP) model where a curator first collects all data before being processed. In this paper, we focus on the LDP model, where we protect a client's participation in the facility location instance. Under the HST metric, we show that there is a non-interactive $\epsilon$-LDP algorithm achieving $O(n^{1/4}/\epsilon^2)$-approximation ratio, where n is the size of the metric. On the negative side, we show a lower bound of $\Omega(n^{1/4}/\sqrt{\epsilon})$ on the approximation ratio for any non-interactive $\epsilon$-LDP algorithm. Thus, our results are tight up to a factor of $\epsilon$. Moreover, unlike previous results, our results generalize for non-uniform facility costs.
[alphabetic order] Vincent Cohen-Addad, Yunus Esencayi, Chenglin Fan, Marco Gaboradi, Shi Li,
and Di Wang.
The 25th International Conference on Artificial Intelligence and Statistics (AISTATS 2022).
-
High Dimensional Differentially Private Stochastic Optimization with Heavy-tailed Data. [Link] Abstract▼
As one of the most fundamental problems in machine learning, statistics and differential privacy, Differentially Private Stochastic Convex Optimization (DP-SCO) has been extensively studied in recent years. However, most of the previous work can only handle either regular data distribution or irregular data in the low dimensional space case. To better understand the challenges arising from irregular data distribution, in this paper we provide the first study on the problem of DP-SCO with heavy-tailed data in the high dimensional space. In the first part we focus on the problem over some polytope constraint (such as the $\ell_1$-norm ball). We show that if the loss function is smooth and its gradient has bounded second order moment, it is possible to get an error bound of $\tilde{O}(\frac{\log d}{(n\epsilon)^\frac{1}{3}})$ in the $\epsilon$-DP model, where $n$ is the sample size and $d$ is the dimensionality of the underlying space. Next, for LASSO, if the data distribution that has bounded fourth-order moments, we improve the bound to $\tilde{O}(\frac{\log d}{(n\epsilon)^\frac{2}{5}})$ in the $(\epsilon, \delta)$-DP model. In the second part of the paper, we study DP-SCO for sparse learning with heavy-tailed data. We first revisit the sparse linear regression and propose a truncated DP-IHT method whose output could achieve an error of $\tilde{O}(\frac{s^{*2}\log d}{n\epsilon})$, where $s^*$ is the sparsity of the underlying parameter. Then we study a more general problem over the sparsity ({\em i.e.,} $\ell_0$-norm) constraint, and show that it is possible to achieve an error of $\tilde{O}(\frac{s^{*\frac{3}{2}}\log d}{n\epsilon})$, which is also near optimal up to a factor of $\tilde{O}{(\sqrt{s^*})}$, if the loss function is smooth and strongly convex.
Lijie Hu★, Shuo Ni★, Hanshen Xiao, and Di Wang.
The 41st ACM Symposium on Principles of Database Systems (PODS 2022).
Invited to The ACM Transactions on Database Systems special issue on Best of PODS 2022.
ACM CCS 2021 Workshop on Privacy Preserving Machine Learning.
2021
Conference Papers
- Differentially Private Pairwise Learning Revisited. Abstract▼
Instead of learning with pointwise loss functions, learning with pairwise loss functions (pairwise learning) has received much attention recently as it is more capable of modeling the relative relationship between pairs of samples. However, most of the existing algorithms for pairwise learning fail to take into consideration the privacy issue in their design. To address this issue, previous work studied pairwise learning in the Differential Privacy (DP) model. However, their utilities (population errors) are far from optimal. To address the sub-optimal utility issue, in this paper, we proposed new $(\epsilon, \delta)$ or $\epsilon$-DP algorithms for pairwise learning. Specifically, when the loss functions are Lipschitz, smooth and strongly convex, we show that the output of our algorithm achieves an expected population error of $O(\frac{1}{n}+\frac{d\log\frac{1}{\delta}}{n^2\epsilon^2})$ and $O(\frac{1}{n}+\frac{d^2}{n^2\epsilon^2})$ for $(\epsilon, \delta)$-DP and $\epsilon$-DP, respectively, where $n$ is the sample size and $d$ is the dimension of the underlying space.
Moreover, for general convex case, the output of our algorithm achieves an expected population error of $O(\frac{1}{\sqrt{n}}+\frac{\sqrt{d \log\frac{1}{\delta}}}{n\epsilon})$ and $O(\frac{1}{\sqrt{n}}+\frac{d}{n\epsilon})$ for $(\epsilon, \delta)$-DP and $\epsilon$-DP, respectively. It is also notable that these upper bounds are {\bf optimal} ({\em i.e.,} match the lower bounds). We also conduct extensive experiments on real-world datasets to evaluate the proposed algorithms, experimental results support our theoretical analysis and show the priority of our algorithms.
Zhiyu Xue*★, Shaoyang Yang*★, Mengdi Huai and Di Wang. (* equal contribution)
The 30th International Joint Conference on Artificial Intelligence (IJCAI 2021).
- Estimating Smooth GLM in Non-interactive Local Differential Privacy Model with Public Unlabeled Data [Link] Abstract▼
In this paper, we study the problem of estimating smooth Generalized Linear Models (GLM) in the Non-interactive Local Differential Privacy (NLDP) model.
Different from its classical setting, our model allows the server to process
some additional public but unlabeled data.
We first show that there is an $(\epsilon, \delta)$-NLDP algorithm for
GLM (under some mild assumptions), if each data record is i.i.d sampled from some sub-Gaussian distribution with bounded $\ell_1$-norm.
The sample complexity of both
public and private data, for the algorithm to achieve an $\alpha$ estimation error (in $\ell_\infty$-norm), is $\tilde{O}(p^2\alpha^{-2}\epsilon^{-2})$ if $\alpha$ is not too small (i.e., $\alpha\geq \Omega(\frac{1}{\sqrt{p}})$), where $p$ is the dimensionality of the data. This is a significant improvement over the previously known quasi-polynomial (in $\alpha$) or exponential (in $p$) complexity of convex GLM with no public data.
We then extend our idea to the non-linear regression problem and show
a similar phenomenon for it.
Finally, we demonstrate the practicality of our algorithms through experiments on both synthethic and real world datasets.
To our best knowledge, this is the first paper showing the existence of efficient and practical
algorithms for GLM and non-linear regression
in the NLDP model with public unlabeled data.
Di Wang*, Huanyu Zhang*, Marco Gaboardi and Jinhui Xu. (* equal contribution)
The 32nd International Conference on Algorithmic Learning Theory (ALT 2021).
NeurIPS 2019 Workshop on Privacy in Machine Learning.
Journal Papers
- Inferring Ground Truth From Crowdsourced Data Under Local Attribute Differential Privacy
[Link] Abstract▼
Recently, the problem of ground truth inference under local differential privacy (LDP) model has been recently studied. However, this problem is still not well understood and even some basic questions have not been solved yet. First, it is still unknown what is the average error of the private estimators to the underlying ground truth. Secondly, we do not known whether we can infer the ability of each user under LDP model and what is the estimation error w.r.t the underlying users ability. Finally, previous work only show that their methods have better performance than the private major voting algorithm through experiments. However, there is still no theoretically result which shows this priority formally or mathematically. In this paper, we partially solve these problems by studying the ground truth inference problem under local attribute differential privacy (LADP) model, and propose a new algorithm called private Dawid-Skene method, which is motivated by the classical Dawid-Skene method. Specifically, we first provide the estimation errors for both ability of users and the ground truth under some assumptions of the problem if the algorithm start with some appropriate initial vector. Moreover, we propose an explicit instance and show that the estimation error of the ground truth achieved by the private major voting algorithm is always greater than the error achieved by our method. To our best knowledge, this is the first result on showing the explicit estimation errors for both ability of users and ground truth for the problem. Also, this paper is the first result on theoretically comparing with the private major voting algorithm.
Di Wang and Jinhui Xu.
Theoretical Computer Science
Volume 865, 14 April 2021, Pages 85-98.
- Differentially Private High Dimensional Sparse Covariance Matrix Estimation [Link] Abstract▼
In this paper, we study the problem of estimating the covariance matrix under differential privacy, where the underlying covariance matrix is assumed to be sparse and of high dimensions. We propose a new method, called DP-Thresholding, to achieve a non-trivial $\ell_2$-norm based error bound,
which is significantly better than the existing ones from adding noise directly to the empirical covariance matrix. We also extend the $\ell_2$-norm based error bound to a general $\ell_w$-norm based one for any $1\leq w\leq \infty$, and show that they share the same upper bound asymptotically. Our approach can be easily extended to local differential privacy. Experiments on the synthetic datasets show consistent results with our theoretical claims.
Di Wang and Jinhui Xu.
Theoretical Computer Science
Volume 865, 14 April 2021, Pages 119-130.
- On Sparse Linear Regression in the Local Differential Privacy Model [Link] Abstract▼
In this paper, we study the sparse linear regression problem in the Local Differential Privacy (LDP) model. We first show that polynomial dependency on the dimensionality $p$ of the space is unavoidable for the estimation error in both non-interactive and sequential interactive local models, if the privacy of the whole dataset needs to be preserved. Similar limitations also exist for other types of error measurements and in the relaxed local models. This indicates that differential privacy in high dimensional space is unlikely achievable for the problem. With the understanding of this limitation, we then present two algorithmic results. The first one is
a sequential interactive LDP algorithm for the low dimensional sparse case, called Locally Differentially Private Iterative Hard Thresholding (LDP-IHT), which achieves a near optimal upper bound. This algorithm is actually rather general and can be used to solve quite a few other problems, such as (Local) DP-ERM with sparsity constraints and sparse regression with non-linear measurements. The second one is for the restricted (high dimensional) case where only the privacy of the responses (labels) needs to be preserved. For this case,
we show that the optimal rate of the error estimation can be made logarithmically depending on $p$ (i.e., $\log p$) in the local model, where an upper bound is obtained by a label-privacy version of LDP-IHT. Experiments on real world and synthetic datasets confirm our theoretical analysis.
Di Wang and Jinhui Xu.
IEEE Transactions on Information Theory, Volume 67, no. 2, Pages 1182-1200, Feb. 2021.
2020
Conference Papers
-
Global Interpretation for Patient Similarity Learning [Link] Abstract▼
As an important family of learning problems, pairwise learning has received much attention in recent years. Since pairwise learning involves pairs of instances in its loss function, it is more capable of modeling the relative relationships between instances compared with traditional pointwise learning (e.g., classification). In practice, many machine learning and data mining tasks can be categorized as pairwise learning. Although pairwise learning has achieved tremendous success in many real-world applications, the lack of transparency behind the behavior of the learned pairwise model impedes users from trusting the predicted results, which hampers its further applications in the real world. To tackle this problem, in this paper, we investigate how to enable interpretation in pairwise learning and propose a global interpretation method for pairwise learning. Based on the proposed global interpretation method, we can identify a minimal sufficient subset of data features that are sufficient in themselves to justify the global predictions made by the pairwise model. The identified minimal sufficient feature subset can help us better understand the overall behaviors of the learned pairwise model across different subpopulations of instance pairs. To the best of our knowledge, this is the first work that provides global interpretation for pairwise learning. We also conduct extensive experiments on real-world datasets to evaluate the performance of the proposed global interpretation method.
Mengdi Huai, Chenglin Miao, Jinduo Liu, Di Wang, Jingyuan Chou and Aidong Zhang.
The 2020 IEEE International Conference on Bioinformatics and Biomedicine (IEEE BIBM 2020).
Selected as Regular Paper (Acceptance Rate: 19.4%).
- Escaping Saddle Points of Empirical Risk Privately and Scalably via DP-Trust Region Method [Link] Abstract▼
It has been shown recently that many non-convex objective/loss functions in machine learning and deep learning are known to be strict saddle. This means that finding a second-order stationary point (i.e., approximate local minimum) and thus escaping saddle points are sufficient for such functions to obtain a classifier with good generalization performance. Existing algorithms for escaping saddle points, however, all fail to take into consideration a critical issue in their designs, that is, the protection of sensitive information in the training set. Models learned by such algorithms can often implicitly memorize the details of sensitive information, and thus offer opportunities for malicious parties to infer it from the learned models. In this paper, we investigate the problem of privately escaping saddle points and finding a second-order stationary point of the empirical risk of non-convex loss function.
Previous result on this problem is mainly of theoretical importance and has several issues ({e.g., high sample complexity and non-scalable) which hinder its applicability, especially, in big data.
To deal with these issues,
we propose in this paper a new method called Differentially Private Trust Region, and show that it outputs a second-order stationary point with high probability and less sample complexity, compared to the existing one. Moreover, we also provide a stochastic version of our method (along with some theoretical guarantees) to make it faster and more scalable.
Experiments on benchmark datasets suggest that our methods are indeed more efficient and practical than the previous one.
Di Wang and Jinhui Xu.
The 2020 European Conference on Machine Learning (ECML-PKDD 2020).
- On Differentially Private Stochastic Convex Optimization with Heavy-tailed Data [Link] Abstract▼
In this paper, we consider the problem of designing Differentially Private (DP) algorithms for Stochastic Convex Optimization (SCO) on heavy-tailed data. The irregularity of such data violates some key assumptions used in almost all existing DP-SCO and DP-ERM methods, resulting in failure to provide the DP guarantees. To better understand this type of challenges, we provide in this paper a comprehensive study of DP-SCO under various settings. First, we consider the case where the loss function is strongly convex and smooth. For this case, we propose a method based on the sample-and-aggregate framework, which has an excess population risk of $\tilde{O}(\frac{d^3}{n\epsilon^4})$ (after omitting other factors), where $n$ is the sample size and $d$ is the dimensionality of the data. Then, we show that with some additional assumptions on the loss functions, it is possible to reduce the \textit{expected} excess population risk to $\tilde{O}(\frac{ d^2}{ n\epsilon^2 })$. To lift these additional conditions, we also provide a gradient smoothing and trimming based scheme to achieve excess population risks of $\tilde{O}(\frac{ d^2}{n\epsilon^2})$ and $\tilde{O}(\frac{d^\frac{2}{3}}{(n\epsilon^2)^\frac{1}{3}})$ for strongly convex and general convex loss functions, respectively, \textit{with high probability}. Experiments on both synthetic and real-world datasets suggest that our algorithms can effectively deal with the challenges caused by data irregularity.
Di Wang*, Hanshen Xiao*, Srini Devadas and Jinhui Xu (* equal contribution).
The 37th International Conference on Machine Learning (ICML 2020).
- Scalable Estimating Stochastic Linear Combination of Non-linear Regressions [Link] Abstract▼
In this paper we study the problem of estimating stochastic linear combination of non-linear regressions, which has a closed connection with many machine learning and statistical models such as non-linear regressions, Single Index Models, Multi-index Models, Varying Coefficient Index Models and Two-layer Neural Networks. Specifically, we first show that under some mild assumptions, if the variates are multivariate Gaussian, then there is an algorithm whose outputs have $\ell_2$-norm estimation error of $O(\sqrt{\frac{p}{n}})$ with high probability, where $p$ is the dimensionality and $n$ is the size of samples. Moreover, we extend to the bounded sub-Gaussian case by using the zero-bias transformation, which could be seen as a generalization of the classical Stein's lemma. We show that with some additional assumptions there is an algorithm whose outputs have $\ell_\infty$-norm estimation error of $O(\frac{1}{\sqrt{p}}+\sqrt{\frac{p}{n}})$ with high probability. Finally, for both Gaussian and sub-Gaussian cases we propose a scalable algorithm based on sub-sampling method and show that when the sub-sample size is large enough the estimation errors will be almost the same as previous ones. Experimental results for both Gaussian and sub-Gaussian cases support our theoretical results. To our best knowledge, this is the first paper study and provide theoretical guarantees of the stochastic linear combination of non-linear regressions model.
Di Wang* , Xiangyu Guo*, Chaowen Guan, Shi Li and Jinhui Xu (* equal contribution).
The Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI 2020).
- Pairwise Learning with Differential Privacy Guarantees [Link] Abstract▼
Pairwise learning has received much attention recently as it is more capable of modeling the relative relationship between pairs of samples. Many machine learning tasks can be categorized as pairwise learning, such as AUC maximization and metric learning. Existing techniques for pairwise learning all fail to take into consideration a critical issue in their design, i.e., the protection of sensitive information in the training set. Models learned by such algorithms can implicitly memorize the details of sensitive information, which offers opportunity for malicious parties to infer it from the learned models. To address this challenging issue, in this paper, we propose several differentially private pairwise learning algorithms for both online and offline settings. Specifically, for the online setting, we first introduce a differentially private algorithm (called OnPairStrC) for strongly convex loss functions. Then, we extend this algorithm to general convex loss functions and give another differentially private algorithm (called OnPairC). For the offline setting, we also present two differentially private algorithms (called OffPairStrC and OffPairC) for strongly and general convex loss functions, respectively. These proposed algorithms can not only learn the model effectively from the data but also provide strong privacy protection guarantee for sensitive information in the training set. Extensive experiments on real-world datasets are conducted to evaluate the proposed algorithms and the experimental results support our theoretical analysis.
Mengdi Huai*, Di Wang*, Chenglin Miao, Jinhui Xu and Aidong Zhang (* equal contribution).
Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI 2020).
- Towards Interpretation of Pairwise Learning [Link] Abstract▼
Recently, there are increasingly more attentions paid to an important family of learning problems called pairwise learning, in which the associated loss functions depend on pairs of instances. Despite the tremendous success of pairwise learning in many real-world applications, the lack of transparency behind the learned pairwise models makes it difficult for users to understand how particular decisions are made by these models, which further impedes users from trusting the predicted results. To tackle this problem, in this paper, we study feature importance scoring as a specific approach to the problem of interpreting the predictions of black-box pairwise models. Specifically, we first propose a novel adaptive Shapley-value-based interpretation method, based on which a vector of importance scores associated with the underlying features of a testing instance pair can be adaptively calculated with the consideration of feature correlations, and these scores can be used to indicate which features make key contributions to the final prediction. Considering that Shapley-value-based methods are usually computationally challenging, we further propose a novel robust approximation interpretation method for pairwise models. This method is not only much more efficient but also robust to data noise. To the best of our knowledge, we are the first to investigate how to enable interpretation in pairwise learning. Theoretical analysis and extensive experiments demonstrate the effectiveness of the proposed methods.
Mengdi Huai, Di Wang, Chenglin Miao and Aidong Zhang.
The Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI 2020).
Journal Papers
- Empirical Risk Minimization in the Non-interactive Local
Model of Differential Privacy. [Link] Abstract▼
In this paper, we study the Empirical Risk Minimization (ERM) problem in the non-interactive Local Differential Privacy (LDP) model. We first show that if the loss function is $(\infty, T)$-smooth, by using the Bernstein polynomial approximation we can avoid a dependency of the sample complexity, to achieve error $\alpha$, on the exponential of the dimensionality $p$ with base $1/\alpha$ (i.e., $\alpha^{-p}$).
This answers a question from (Smith et.al., 2017). Then, we propose player-efficient algorithms with $1$-bit communication complexity and $O(1)$ computation cost for each player. The error bound of these algorithms is asymptotically the same as the original one. With some additional assumptions, we also give an algorithm which is more efficient for the server. Based on different types of polynomial approximations, we propose (efficient) non-interactive locally differential private algorithms for learning the set of k-way marginal queries and the set of smooth queries. Moreover, we study the case of $1$-Lipschitz generalized linear convex loss functions and show that there is an $(\epsilon, \delta)$-LDP algorithm whose sample complexity for achieving error $\alpha$ is only linear in the dimensionality $p$ and quasi-polynomial in other terms. To prove this, we first show that the conclusion holds for the hinge loss function. Then, we extend the result to any $1$-Lipschitz generalized linear convex loss functions by showing that every such a function can be approximated by a linear combination of hinge loss functions and some linear functions. Our results use a polynomial of inner product approximation technique. Then we apply our technique to the Euclidean median problem and show that its sample complexity needs only to be quasi-polynomial in $p$, which is the first result with a sub-exponential sample complexity in $p$ for non-generalized linear loss functions.
Di Wang, Marco Gaboardi, Adam Smith and Jinhui Xu.
Journal of Machine Learning Research, Volume 21, 200 (2020), Pages 1-39.
- Robust High Dimensional Expectation Maximization Algorithm via Trimmed Hard Thresholding. [Link] Abstract▼
In this paper, we study the problem of estimating latent variable models with arbitrarily corrupted samples in the high dimensional case, i.e., $d\gg n$, where the underlying parameter is sparse. Specifically, we propose a method called Trimmed (Gradient) Expectation Maximization which attaches a trimming gradients and hard thresholding step to the Expectation step (E-step) and Maximization step (M-step), respectively. Particularly, under some mild assumptions, with an appropriate initialization, we show that the algorithm is corruption-free and converges to the (near) optimal statistical rate geometrically when the fraction of corruption samples satisfies $\alpha\leq O(\frac{1}{\sqrt{n}})$. Moreover, we implement our general framework to three canonical examples: mixture of Gaussians, mixture of regressions and linear regression with missing covariates. Experiments also support our theoretical analysis.
Di Wang*, Xiangyu Guo*, Shi Li and Jinhui Xu (* equal contribution).
Machine Learning, 109, 2283-2311 (2020).
- Tight Lower Bound of Locally Differentially Private Sparse Covariance Matrix Estimation. [Link] Abstract▼
In this paper, we study the sparse covariance matrix estimation problem in the local differential privacy model, and give a lower bound of $\Omega(\frac{s^2\log p}{n\epsilon^2})$ on the $\epsilon$ non-interactive private minimax risk in the metric of squared spectral norm, where $s$ is the row sparsity of the underlying covariance matrix, $n$ is the sample size, and $p$ is the dimensionality of the data.
We show that the lower bound is actually tight, as it matches a previous upper bound.Our main technique for achieving this lower bound is a general framework, called General Private Assouad Lemma, which is a considerable generalization of the previous private Assouad lemma and can be used as a general method for bounding the private minimax risk of matrix-related estimation problems.
Di Wang and Jinhui Xu.
Theoretical Computer Science,
Volume 815, 2 May 2020, Pages 47-59.
- Estimating Stochastic Linear Combination of Non-linear Regressions Efficiently and Scalably. [Link] Abstract▼
Recently, many machine learning and statistical models such as non-linear regressions, the Single Index, Multi-index, Varying Coefficient Index Models and Two-layer Neural Networks can be reduced to or be seen as a special case of a new model which is called the \textit{Stochastic Linear Combination of Non-linear Regressions} model. However, due to the high non-convexity of the problem, there is no previous work study how to estimate the model. In this paper, we provide the first study on how to estimate the model efficiently and scalably. Specifically, we first show that with some mild assumptions, if the variate vector $x$ is multivariate Gaussian, then there is an algorithm whose output vectors have $\ell_2$-norm estimation errors of $O(\sqrt{\frac{p}{n}})$ with high probability, where $p$ is the dimension of $x$ and $n$ is the number of samples. The key idea of the proof is based on an observation motived by the Stein's lemma. Then we extend our result to the case where $x$ is bounded and sub-Gaussian using the zero-bias transformation, which could be seen as a generalization of the classic Stein's lemma. We also show that with some additional assumptions there is an algorithm whose output vectors have $\ell_\infty$-norm estimation errors of $O(\frac{1}{\sqrt{p}}+\sqrt{\frac{p}{n}})$ with high probability. We also provide a concrete example to show that there exists some link function which satisfies the previous assumptions. Finally, for both Gaussian and sub-Gaussian cases we propose a faster sub-sampling based algorithm and show that when the sub-sample sizes are large enough then the estimation errors will not be sacrificed by too much. Experiments for both cases support our theoretical results.
To the best of our knowledge, this is the first work that studies and provides theoretical guarantees for the stochastic linear combination of non-linear regressions model.
Di Wang* , Xiangyu Guo* , Chaowen Guan, Shi Li and Jinhui Xu (* equal contribution).
Neurocomputing, Volume 399, 25 July 2020, Pages 129-140.
- Principal Component Analysis in the Local Differential Privacy Model. [Link] Abstract▼
In this paper, we study the Principal Component Analysis (PCA) problem under the (distributed) non-interactive local differential privacy model. For the low dimensional case (i.e., $p \ll n$), we show the optimal rate of $\Theta(\frac{kp}{n\epsilon^2})$ (omitting the eigenvalue terms) for the private minimax risk of the $k$-dimensional PCA using the squared subspace distance as the measurement, where $n$ is the sample size and $\epsilon$ is the privacy parameter. For the high dimensional (i.e., $p\gg n$) row sparse case, we first give a lower bound of $\Omega(\frac{ks\log p }{n\epsilon^2})$ on the private minimax risk, where $s$ is the underlying sparsity parameter. Then we provide an efficient algorithm to achieve the upper bound of $O(\frac{s^2\log p}{n\epsilon^2})$. Experiments on both synthetic and real world datasets confirm our theoretical guarantees.
Di Wang and Jinhui Xu.
Theoretical Computer Science, Volume 809, 24 February 2020, Pages 296-312.
2019
Conference Papers
- Facility Location Problem in Differential Privacy Model Revisited. [Link] Abstract▼
In this paper we study the uncapacitated facility location problem in the
model of differential privacy (DP) with uniform facility
cost. Specifically, we first show that, under the \emph{hierarchically well-separated tree (HST) metrics} and the super-set output setting that was introduced in (Gupta et al 2010), there is an $\epsilon$-DP
algorithm that achieves an $O(\frac{1}{\epsilon})$
(expected multiplicative) approximation ratio; this implies an
$O(\frac{\log n}{\epsilon})$
approximation ratio for the general metric case, where $n$ is the size of the input metric. These bounds improve
the best-known results given by (Gupta et al 2010). In particular, our approximation ratio for HST-metrics is independent of $n$, and the ratio for general metrics is independent of the aspect ratio of the input metric. On the negative side, we show that the approximation ratio of any $\epsilon$-DP algorithm is lower bounded by $\Omega(\frac{1}{\sqrt{\epsilon}})$, even for instances on HST metrics with uniform facility cost, under the super-set output setting. The lower bound shows that the dependence of the approximation ratio for HST metrics on $\epsilon$ can not be removed or greatly improved. Our novel methods and techniques for both the upper and lower bound may find additional applications.
[alphabetic order] Yunus Esencayi, Marco Gaboardi, Shi Li and Di Wang
Conference on Neural Information Processing Systems (NIPS/NeurIPS), 2019.
- Lower Bound of Locally Differentially Private Sparse Covariance Matrix Estimation. [Link] Abstract▼
In this paper, we study the sparse covariance matrix
estimation problem in the local differential privacy
model, and give a non-trivial lower bound on the
non-interactive private minimax risk in the metric
of squared spectral norm. We show that the lower
bound is actually tight, as it matches a previous upper
bound. Our main technique for achieving this
lower bound is a general framework, called General
Private Assouad Lemma, which is a considerable
generalization of the previous private Assouad
lemma and can be used as a general method for
bounding the private minimax risk of matrix-related
estimation problems.
Di Wang and Jinhui Xu.
The 28th International Joint Conference on Artificial Intelligence (IJCAI 2019).
- Principal Component Analysis in the Local Differential Privacy Model. [Link] Abstract▼
In this paper, we study the Principal Component
Analysis (PCA) problem under the (distributed)
non-interactive local differential privacy model.
For the low dimensional case, we show the optimal
ratefor the private minimax risk of the kdimensional
PCA using the squared subspace distance
as the measurement. For the high dimensional
row sparse case, we first give a lower bound
on the private minimax risk, . Then we provide
an efficient algorithm to achieve a near optimal upper
bound. Experiments on both synthetic and real
world datasets confirm the theoretical guarantees of
our algorithms.
Di Wang and Jinhui Xu .
The 28th International Joint Conference on Artificial Intelligence (IJCAI 2019).
- Privacy-aware Synthesizing for Crowdsourced Data. [Link] Abstract▼
The prevalence of the World Wide Web enables the data collectors to easily share their data that are collected from a large crowd of users with the public. Although releasing crowdsourced data on the web brings many benefits to the data analyzers to conduct statistical analysis, it may violate crowd users' data privacy, which has been a major obstacle to release the crowdsourced data. A potential way to address this problem is to employ the traditional differential privacy-based mechanisms and perturb the data with some noise before publishing them. However, such noise perturbation mechanisms offer little data utility because crowdsourced datasets contain massive amounts of conflicting data records with large domains. In particular, this degraded utility results from two respects: firstly, the originally collected crowdsourced data usually contain conflicting data; secondly, the noise needed to guarantee differential privacy may be proportional to the number of data records or the domain of the input data, which renders the released crowdsourced data useless. To address the above challenges, we propose a novel privacy-aware synthesizing method for crowdsourced data. In this method, the data collectors first learn the underlying distributions of the crowdsourced data through taking each user's fine grained reliability degrees into account. Then, a set of candidate synthetics are sampled from the learned distributions. Finally, these sampled candidate synthetics are subjected to privacy tests, and only those candidate synthetics which pass privacy tests are allowed to be safely released. The proposed method not only provides strong privacy protection for individual users but also generates high utility synthetic data. The desirable performance of the proposed method is verified via theoretical analysis and extensive experiments conducted on both real-world and synthetic datasets.
Mengdi Huai, Di Wang, Chenglin Miao, Jinhui Xu, Aidong Zhang.
The 28th International Joint Conference on Artificial Intelligence (IJCAI 2019).
- Differentially Private Empirical Risk Minimization with Non-convex Loss Functions. [Link] Abstract▼
We study the problem of Empirical Risk Minimization (ERM) with (smooth) non-convex loss functions under the differential-privacy (DP) model. Existing approaches for this problem mainly adopt gradient norms to measure the error, which in general cannot guarantee the quality of the solution. To address this issue,
we first study the expected excess empirical (or population) risk, which was primarily used as the utility to measure the quality for convex loss functions. Specifically, we show that
the excess empirical (or population) risk can be upper bounded by $\tilde{O}(\frac{d\log (1/\delta)}{\log n\epsilon^2})$ in the $(\epsilon, \delta)$-DP settings, where $n$ is the data size and $d$ is the dimensionality of the space.
The $\frac{1}{\log n}$ term in the empirical risk bound can be further improved to $\frac{1}{n^{\Omega(1)}}$ (when $d$ is a constant) by a highly non-trivial analysis on the time-average error.
To obtain more efficient solutions, we also consider the connection between achieving differential privacy and finding approximate local minimum.
Particularly, we show that when the size $n$ is large enough, there are $(\epsilon, \delta)$-DP algorithms which can find an approximate local minimum of the empirical risk with high probability in both the constrained and non-constrained settings.
These results indicate that one can escape saddle points privately.
Di Wang, Changyou Chen and Jinhui Xu.
The 36th International Conference on Machine Learning (ICML 2019).
- On Sparse Linear Regression in the Local Differential Privacy Model. [Link] Abstract▼
In this paper, we study the sparse linear regression problem under the Local Differential Privacy (LDP) model. We first show that polynomial dependency on the dimensionality $p$ of the space is unavoidable for the estimation error in both non-interactive and sequential interactive local models, if the privacy of the whole dataset needs to be preserved. Similar limitations also exist for other types of error measurements and in the relaxed local models. This indicates that differential privacy in high dimensional space is unlikely achievable for the problem. With the understanding of this limitation, we then present two algorithmic results. The first one is
a sequential interactive LDP algorithm for the low dimensional sparse case, called Locally Differentially Private Iterative Hard Thresholding (LDP-IHT), which achieves a near optimal upper bound. This algorithm is actually rather general and can be used to solve quite a few other problems, such as (Local) DP-ERM with sparsity constraints and sparse regression with non-linear measurements. The second one is for the restricted (high dimensional) case where only the privacy of the responses (labels) needs to be preserved. For this case,
we show that the optimal rate of the error estimation can be made logarithmically depending on $p$ (i.e., $\log p$) in the local model,
where an upper bound is obtained by a label-privacy version of LDP-IHT. Experiments on real world and synthetic datasets confirm our theoretical analysis.
Di Wang and Jinhui Xu.
The 36th International Conference on Machine Learning (ICML 2019).
Selected as Long Talk (Acceptance Rate: 140/3424= 4.1%) .
NeurIPS 2018 Workshop on Privacy Preserving Machine Learning.
- Estimating Sparse Covariance Matrix Under Differential Privacy via Thresholding. [Link] Abstract▼
In this paper, we study the problem of estimating the covariance matrix under differential privacy, where the underlying covariance matrix is assumed to be sparse and of high dimensions. We propose a new method, called DP-Thresholding, to achieve a non-trivial $\ell_2$-norm based error bound,
which is significantly better than the existing ones from adding noise directly to the empirical covariance matrix. Experiments on the synthetic datasets show consistent results with our theoretical claims.
Di Wang, Jinhui Xu and Yang He.
The 53rd Annual Conference on Information Sciences and Systems (CISS 2019).
- Noninteractive Locally Private Learning of Linear Models via Polynomial Approximations. [Link] Abstract▼
In this paper, we study the Empirical Risk Minimization problem in the non-interactive Local Differential Privacy (LDP) model. First, we show that for the hinge loss function, there is an $(\epsilon, \delta)$-LDP algorithm whose sample complexity for achieving an error of $\alpha$ is only linear in the dimensionality $p$ and quasi-polynomial in other terms. Then, we extend the result to any $1$-Lipschitz generalized linear convex loss functions by showing that every such function can be approximated by a linear combination of hinge loss functions and some linear functions. Finally, we apply our technique to the Euclidean median problem and show that its sample complexity needs only to be quasi-polynomial in $p$, which is the first result with a sub-exponential sample complexity in $p$ for non-generalized linear loss functions. Our results are based on a technique, called polynomial of inner product approximation, which may be applicable to other problems.
Di Wang, Adam Smith and Jinhui Xu.
The 30th International Conference on Algorithmic Learning Theory (ALT 2019).
- Differentially Private Empirical Risk Minimization with Smooth Non-convex Loss Functions: A Non-stationary View. [Link] Abstract▼
In this paper, we study the Differentially Private Empirical Risk Minimization (DP-ERM) problem with non-convex loss functions and give several upper bounds for the utility in different settings. We first consider the problem in low-dimensional space. For DP-ERM with non-smooth regularizer, we generalize an existing work by measuring the utility using $\ell_2$ norm of the projected gradient. Also, we extend the error bound measurement, for the first time, from empirical risk to population risk by using the expected $\ell_2$ norm of the gradient. We then investigate the problem in high dimensional space, and show that by measuring the utility with Frank-Wolfe gap, it is possible to bound the utility by the Gaussian Width of the constraint set, instead of the dimensionality $p$ of the underlying space. We further demonstrate that the advantages of this result can be achieved by the measure of $\ell_2$ norm of the projected gradient. A somewhat surprising discovery is that although the two kinds of measurements are quite different, their induced utility upper bounds are asymptotically the same under some assumptions. We also show that the utility of some special non-convex loss functions can be reduced to a level (i.e., depending only on $\log p$) similar to that of convex loss functions. Finally, we test our proposed algorithms on both synthetic and real world datasets and the experimental results confirm our theoretical analysis.
Di Wang and Jinhui Xu.
The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI 2019).
Selected as Oral Presentation (Acceptance Rate: 460/7095=6.5%).
Journal Papers
- Faster Large Scale Constrained Linear Regression via Two-Step Preconditioning. [Link] Abstract▼
In this paper, we study the large scale constrained linear regression problem and propose a two-step preconditioning method, which is based on some recent developments on random projection, sketching techniques and convex optimization methods. Combining the method with (accelerated) mini-batch SGD, we can achieve an approximate solution with a time complexity lower than that of
the state-of-the-art techniques for the low precision case. Our idea can also be extended to the high precision case, which gives an alternative implementation to the Iterative Hessian Sketch (IHS) method with significantly improved time complexity.
Experiments on benchmark and synthetic datasets suggest that our methods indeed outperform existing ones considerably in both the low and high precision cases.
Di Wang and Jinhui Xu.
Neurocomputing, Volume 364, 28 October 2019, Pages 280-296.
2018
Conference Papers
- Empirical Risk Minimization in Non-interactive Local Differential Privacy Revisited. [Link]
Abstract▼
In this paper, we revisit the Empirical Risk Minimization problem in the non-interactive local model of differential privacy. In the case of constant or low dimensions ($p\ll n$), we first show that if the loss function is $(\infty, T)$-smooth, we can avoid a dependence of the sample complexity, to achieve error $\alpha$, on the exponential of the dimensionality $p$ with base $1/\alpha$ (i.e., $\alpha^{-p}$),
which answers a question in (Smith et al 2017). Our approach is based on polynomial approximation. Then, we propose player-efficient algorithms with $1$-bit communication complexity and $O(1)$ computation cost for each player. The error bound is asymptotically the same as the original one. With some additional assumptions, we also give an efficient algorithm for the server.
In the case of high dimensions ($n\ll p$),
we show that if the loss function is a convex generalized linear function, the error can be bounded by using the Gaussian width of the constrained set, instead of $p$, which improves the one in
Smith et al.
Our techniques can be extended to some related problems, such as $k$-way marginal queries and smooth queries.
Di Wang, Marco Gaboardi and Jinhui Xu.
Conference on Neural Information Processing Systems (NIPS/NeurIPS), 2018.
- Differentially Private Sparse Inverse Covariance Estimation. [Link] Abstract▼
In this paper, we give the first study of sparse inverse covariance estimation problem under differential privacy. Firstly, we propose an $\epsilon$-differentially private algorithm via output perturbation, which is based on the sensitivity of the optimization problem and Wishart mechanism. Based on the idea of that, we propose a general covariance perturbation method, and then for $\epsilon$-differential privacy, we analyze Laplacian and Wishart mechanisms, for $(\epsilon,\delta)$-differential privacy we analyze Gaussian and Wishart mechanisms. Moreover, we extend the covariance perturbation algorithm to distributed setting and local differential privacy. Experiments on synthetic and benchmark datasets are also support our theoretical analysis.
Di Wang, Mengdi Huai and Jinhui Xu.
2018 6th IEEE Global Conference on Signal and Information Processing (2018 GlobalSip).
Selected as Oral Presentation.
- Large Scale Constrained Linear Regression Revisited: Faster Algorithms via Preconditioning. [Link] Abstract▼
In this paper, we revisit the large-scale constrained linear regression problem and propose faster methods based on some recent developments in sketching and optimization.
Our algorithm combines mini-batch SGD with a new method called two-step preconditioning to achieve an $\epsilon$-accuracy solution for the low precision case, and has a lower time complexity than the state-of-the-art techniques. Our idea can also be extended to the high precision case, which gives an alternative implementation to the Iterative Hessian Sketch (IHS) method with significantly improved time complexity.
Experiments on benchmark and synthetic datasets suggest that our methods indeed outperform existing ones considerably in both the low and high precision cases.
Di Wang and Jinhui Xu.
The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI 2018).
Selected as Oral Presentation (Acceptance Rate: 411/3800=10.8%).
2017
Conference Papers
- Differentially Private Empirical Risk Minimization Revisited: Faster and More General. [Link] Abstract▼
In this paper we study the differentially private Empirical Risk Minimization (ERM) problem in different settings. For smooth (strongly) convex loss function with or without (non)-smooth regularization, we give algorithms that achieve either optimal or near optimal utility bounds with less gradient complexity compared with previous work. For ERM with smooth convex loss function in high-dimensional ($p\gg n$) setting, we give an algorithm which achieves the upper bound with less gradient complexity than previous ones. At last, we generalize the expected excess empirical risk from convex loss functions to non-convex ones satisfying the Polyak-Lojasiewicz condition and give a tighter upper bound on
the utility than the one in (Zhang et al 2017).
Di Wang, Minwei Ye and Jinhui Xu.
Conference on Neural
Information Processing Systems (NIPS/NeurIPS), 2017.