EmpiricalRiskMinimizationDP.CMS {DPpack} | R Documentation |
Privacy-preserving Empirical Risk Minimization for Binary Classification
Description
This class implements differentially private empirical risk
minimization (Chaudhuri et al. 2011). Either the output or the
objective perturbation method can be used. It is intended to be a framework
for building more specific models via inheritance. See
LogisticRegressionDP
for an example of this type of
structure.
Details
To use this class for empirical risk minimization, first use the
new
method to construct an object of this class with the desired
function values and hyperparameters. After constructing the object, the
fit
method can be applied with a provided dataset and data bounds to
fit the model. In fitting, the model stores a vector of coefficients
coeff
which satisfy differential privacy. These can be released
directly, or used in conjunction with the predict
method to
privately predict the outcomes of new datapoints.
Note that in order to guarantee differential privacy for empirical risk
minimization, certain constraints must be satisfied for the values used to
construct the object, as well as for the data used to fit. These conditions
depend on the chosen perturbation method. Specifically, the provided loss
function must be convex and differentiable with respect to y.hat
,
and the absolute value of the first derivative of the loss function must be
at most 1. If objective perturbation is chosen, the loss function must also
be doubly differentiable and the absolute value of the second derivative of
the loss function must be bounded above by a constant c for all possible
values of y.hat
and y
, where y.hat
is the predicted
label and y
is the true label. The regularizer must be 1-strongly
convex and differentiable. It also must be doubly differentiable if
objective perturbation is chosen. Finally, it is assumed that if x
represents a single row of the dataset X, then the l2-norm of x is at most
1 for all x. Note that because of this, a bias term cannot be included
without appropriate scaling/preprocessing of the dataset. To ensure
privacy, the add.bias argument in the fit
and predict
methods
should only be utilized in subclasses within this package where appropriate
preprocessing is implemented, not in this class.
Public fields
mapXy
Map function of the form
mapXy(X, coeff)
mapping input data matrixX
and coefficient vector or matrixcoeff
to output labelsy
.mapXy.gr
Function representing the gradient of the map function with respect to the values in
coeff
and of the formmapXy.gr(X, coeff)
, whereX
is a matrix andcoeff
is a matrix or numeric vector.loss
Loss function of the form
loss(y.hat, y)
, wherey.hat
andy
are matrices.loss.gr
Function representing the gradient of the loss function with respect to
y.hat
and of the formloss.gr(y.hat, y)
, wherey.hat
andy
are matrices.regularizer
Regularization function of the form
regularizer(coeff)
, wherecoeff
is a vector or matrix.regularizer.gr
Function representing the gradient of the regularization function with respect to
coeff
and of the formregularizer.gr(coeff)
.gamma
Nonnegative real number representing the regularization constant.
eps
Positive real number defining the epsilon privacy budget. If set to Inf, runs algorithm without differential privacy.
perturbation.method
String indicating whether to use the 'output' or the 'objective' perturbation methods (Chaudhuri et al. 2011).
c
Positive real number denoting the upper bound on the absolute value of the second derivative of the loss function, as required to ensure differential privacy for the objective perturbation method.
coeff
Numeric vector of coefficients for the model.
kernel
Value only used in child class
svmDP
. String indicating which kernel to use for SVM. Must be one of {'linear', 'Gaussian'}. If 'linear' (default), linear SVM is used. If 'Gaussian', uses the sampling function corresponding to the Gaussian (radial) kernel approximation.D
Value only used in child class
svmDP
. Nonnegative integer indicating the dimensionality of the transform space approximating the kernel. Higher values ofD
provide better kernel approximations at a cost of computational efficiency.sampling
Value only used in child class
svmDP
. Sampling function of the formsampling(d)
, whered
is the input dimension, returning a (d
+1)-dimensional vector of samples corresponding to the Fourier transform of the kernel to be approximated.phi
Value only used in child class
svmDP
. Function of the formphi(x, theta)
, wherex
is an individual row of the original dataset, and theta is a (d
+1)-dimensional vector sampled from the Fourier transform of the kernel to be approximated, whered
is the dimension ofx
. The function returns a numeric scalar corresponding to the pre-filtered value at the given row with the given sampled vector.kernel.param
Value only used in child class
svmDP
. Positive real number corresponding to the Gaussian kernel parameter.prefilter
Value only used in child class
svmDP
. Matrix of pre-filter values used in converting data into transform space.
Methods
Public methods
Method new()
Create a new EmpiricalRiskMinimizationDP.CMS
object.
Usage
EmpiricalRiskMinimizationDP.CMS$new( mapXy, loss, regularizer, eps, gamma, perturbation.method = "objective", c = NULL, mapXy.gr = NULL, loss.gr = NULL, regularizer.gr = NULL )
Arguments
mapXy
Map function of the form
mapXy(X, coeff)
mapping input data matrixX
and coefficient vector or matrixcoeff
to output labelsy
. Should return a column matrix of predicted labels for each row ofX
. SeemapXy.sigmoid
for an example.loss
Loss function of the form
loss(y.hat, y)
, wherey.hat
andy
are matrices. Should be defined such that it returns a matrix of loss values for each element ofy.hat
andy
. Seeloss.cross.entropy
for an example. It must be convex and differentiable, and the absolute value of the first derivative of the loss function must be at most 1. Additionally, if the objective perturbation method is chosen, it must be doubly differentiable and the absolute value of the second derivative of the loss function must be bounded above by a constant c for all possible values ofy.hat
andy
.regularizer
String or regularization function. If a string, must be 'l2', indicating to use l2 regularization. If a function, must have form
regularizer(coeff)
, wherecoeff
is a vector or matrix, and return the value of the regularizer atcoeff
. Seeregularizer.l2
for an example. Additionally, in order to ensure differential privacy, the function must be 1-strongly convex and differentiable. If the objective perturbation method is chosen, it must also be doubly differentiable.eps
Positive real number defining the epsilon privacy budget. If set to Inf, runs algorithm without differential privacy.
gamma
Nonnegative real number representing the regularization constant.
perturbation.method
String indicating whether to use the 'output' or the 'objective' perturbation methods (Chaudhuri et al. 2011). Defaults to 'objective'.
c
Positive real number denoting the upper bound on the absolute value of the second derivative of the loss function, as required to ensure differential privacy for the objective perturbation method. This input is unnecessary if perturbation.method is 'output', but is required if perturbation.method is 'objective'. Defaults to NULL.
mapXy.gr
Optional function representing the gradient of the map function with respect to the values in
coeff
. If given, must be of the formmapXy.gr(X, coeff)
, whereX
is a matrix andcoeff
is a matrix or numeric vector. Should be defined such that the ith row of the output represents the gradient with respect to the ith coefficient. SeemapXy.gr.sigmoid
for an example. If not given, non-gradient based optimization methods are used to compute the coefficient values in fitting the model.loss.gr
Optional function representing the gradient of the loss function with respect to
y.hat
and of the formloss.gr(y.hat, y)
, wherey.hat
andy
are matrices. Should be defined such that the ith row of the output represents the gradient of the loss function at the ith set of input values. Seeloss.gr.cross.entropy
for an example. If not given, non-gradient based optimization methods are used to compute the coefficient values in fitting the model.regularizer.gr
Optional function representing the gradient of the regularization function with respect to
coeff
and of the formregularizer.gr(coeff)
. Should return a vector. Seeregularizer.gr.l2
for an example. Ifregularizer
is given as a string, this value is ignored. If not given andregularizer
is a function, non-gradient based optimization methods are used to compute the coefficient values in fitting the model.
Returns
A new EmpiricalRiskMinimizationDP.CMS
object.
Method fit()
Fit the differentially private empirical risk minimization
model. This method runs either the output perturbation or the objective
perturbation algorithm (Chaudhuri et al. 2011), depending on
the value of perturbation.method used to construct the object, to
generate an objective function. A numerical optimization method is then
run to find optimal coefficients for fitting the model given the training
data and hyperparameters. The built-in optim
function using
the "BFGS" optimization method is used. If mapXy.gr
,
loss.gr
, and regularizer.gr
are all given in the
construction of the object, the gradient of the objective function is
utilized by optim
as well. Otherwise, non-gradient based
optimization methods are used. The resulting privacy-preserving
coefficients are stored in coeff
.
Usage
EmpiricalRiskMinimizationDP.CMS$fit( X, y, upper.bounds, lower.bounds, add.bias = FALSE )
Arguments
X
Dataframe of data to be fit.
y
Vector or matrix of true labels for each row of
X
.upper.bounds
Numeric vector of length
ncol(X)
giving upper bounds on the values in each column of X. Thencol(X)
values are assumed to be in the same order as the corresponding columns ofX
. Any value in the columns ofX
larger than the corresponding upper bound is clipped at the bound.lower.bounds
Numeric vector of length
ncol(X)
giving lower bounds on the values in each column ofX
. Thencol(X)
values are assumed to be in the same order as the corresponding columns ofX
. Any value in the columns ofX
larger than the corresponding upper bound is clipped at the bound.add.bias
Boolean indicating whether to add a bias term to
X
. Defaults to FALSE.
Method predict()
Predict label(s) for given X
using the fitted
coefficients.
Usage
EmpiricalRiskMinimizationDP.CMS$predict(X, add.bias = FALSE)
Arguments
X
Dataframe of data on which to make predictions. Must be of same form as
X
used to fit coefficients.add.bias
Boolean indicating whether to add a bias term to
X
. Defaults to FALSE. If add.bias was set to TRUE when fitting the coefficients, add.bias should be set to TRUE for predictions.
Returns
Matrix of predicted labels corresponding to each row of X
.
Method clone()
The objects of this class are cloneable with this method.
Usage
EmpiricalRiskMinimizationDP.CMS$clone(deep = FALSE)
Arguments
deep
Whether to make a deep clone.
References
Chaudhuri K, Monteleoni C, Sarwate AD (2011). “Differentially Private Empirical Risk Minimization.” Journal of Machine Learning Research, 12(29), 1069-1109. https://jmlr.org/papers/v12/chaudhuri11a.html.
Examples
# Build train dataset X and y, and test dataset Xtest and ytest
N <- 200
K <- 2
X <- data.frame()
y <- data.frame()
for (j in (1:K)){
t <- seq(-.25, .25, length.out = N)
if (j==1) m <- stats::rnorm(N,-.2, .1)
if (j==2) m <- stats::rnorm(N, .2, .1)
Xtemp <- data.frame(x1 = 3*t , x2 = m - t)
ytemp <- data.frame(matrix(j-1, N, 1))
X <- rbind(X, Xtemp)
y <- rbind(y, ytemp)
}
Xtest <- X[seq(1,(N*K),10),]
ytest <- y[seq(1,(N*K),10),,drop=FALSE]
X <- X[-seq(1,(N*K),10),]
y <- y[-seq(1,(N*K),10),,drop=FALSE]
# Construct object for logistic regression
mapXy <- function(X, coeff) e1071::sigmoid(X%*%coeff)
# Cross entropy loss
loss <- function(y.hat,y) -(y*log(y.hat) + (1-y)*log(1-y.hat))
regularizer <- 'l2' # Alternatively, function(coeff) coeff%*%coeff/2
eps <- 1
gamma <- 1
perturbation.method <- 'objective'
c <- 1/4 # Required value for logistic regression
mapXy.gr <- function(X, coeff) as.numeric(e1071::dsigmoid(X%*%coeff))*t(X)
loss.gr <- function(y.hat, y) -y/y.hat + (1-y)/(1-y.hat)
regularizer.gr <- function(coeff) coeff
ermdp <- EmpiricalRiskMinimizationDP.CMS$new(mapXy, loss, regularizer, eps,
gamma, perturbation.method, c,
mapXy.gr, loss.gr,
regularizer.gr)
# Fit with data
# Bounds for X based on construction
upper.bounds <- c( 1, 1)
lower.bounds <- c(-1,-1)
ermdp$fit(X, y, upper.bounds, lower.bounds) # No bias term
ermdp$coeff # Gets private coefficients
# Predict new data points
predicted.y <- ermdp$predict(Xtest)
n.errors <- sum(round(predicted.y)!=ytest)