NPCoImp {CoImp}R Documentation

Nonparametric Copula-Based Imputation Method

Description

Imputation method based on empirical conditional copula functions.

Usage

NPCoImp(X, Psi=seq(0.05,0.45,by=0.05), smoothing="beta", K=7, method="gower")

Arguments

X

a data matrix with missing values. Missing values should be denoted with NA.

Psi

vector of probabilities to evaluate the radial symmetry/asymmetry of the conditional empirical copula function and find the best lower-orthant quantile for the imputation (see below for details).

smoothing

the character string specifying the type of smoothing of the empirical copula. Default is "beta" (empirical beta copula) but also "none" (the original empirical copula) can be used.

K

the number of data matrix rows more similar to the missing one that are used for the imputation.

method

the distance measure used for the imputation, among Euclidean, Manhattan, Canberra, Gower, and two based on the Kendall-correlation coefficient (see below for details).

Details

NPCoImp is a nonparametric imputation method based on the conditional empirical copula function. To choose the best lower-orthant quantile for the imputation it evaluates the radial (a)symmetry of the conditional empirical copula and it uses the K pseudo-observations more similar to the missing one. The NPCoImp allows the imputation of missing observations according to the multivariate dependence structure of the data generating process without any assumptions on the margins. This method can be used independently from the dimension and the kind (monotone or non monotone) of the missing patterns. Brief description of the approach:

  1. estimate the conditional empirical (beta) copula of the missing observation(s) given the available ones;

  2. evaluate the radial (a)symmetry of the conditional empirical copula around 0.5 (see the paper in the references for details);

  3. select the lower-orthant quantile of the conditional empirical copula on the basis of its radial (a)symmetry (see the paper in the references for details);

  4. select the K pseudo-observations closest to the imputed one and the corresponding original observations;

  5. impute missing values by replacing them from the average of the original observations derived at the previous step;

  6. calculate the conditional probability of the lower-orthant quantile used for imputing.

Value

An object of S4 class "NPCoImp", which is a list with the following elements:

Imputed.matrix

the imputed data matrix.

Selected.alpha

the (conditional) probability of the lower-orthant quantile selected for the imputation.

numFlat

the number of possible flat conditional empirical copulas, i.e. when the copula is always zero.

Author(s)

F. Marta L. Di Lascio <marta.dilascio@unibz.it>, Aurora Gatto <aurora.gatto@unibz.it>

References

Di Lascio, F.M.L, Gatto A. (202x) "A nonparametric conditional copula-based imputation method". Under review.

See Also

CoImp, MCAR, MAR.

Examples

## generate data from a 4-variate Frank copula with different margins

set.seed(21)
n.marg <- 4
theta  <- 5
copula <- frankCopula(theta, dim = n.marg)
mymvdc <- mvdc(copula, c("norm", "gamma", "beta","gamma"), list(list(mean=7, sd=2),
list(shape=3, rate=2), list(shape1=4, shape2=1), list(shape=4, rate=3)))
n      <- 20
x.samp <- copula::rMvdc(n, mymvdc)

# randomly introduce univariate and multivariate missing

perc.mis    <- 0.25
set.seed(14)
miss.row    <- sample(1:n, perc.mis*n, replace=TRUE)
miss.col    <- sample(1:n.marg, perc.mis*n, replace=TRUE)
miss        <- cbind(miss.row,miss.col)
x.samp.miss <- replace(x.samp,miss,NA)
x.samp.miss
probs <- seq(0.05,0.45,by=0.1)
ndist <- 7
dist.meth <- "gower"  

# impute missing values
NPimp <- NPCoImp(X=x.samp.miss, Psi=probs, smoothing="beta", K=ndist, 
                    method=dist.meth)

# methods show

show(NPimp)

## Not run: 
## generate data from a 3-variate Clayton copula and introduce missing by
## using the MCAR function and try to impute through a rotated copula

set.seed(11)
n.marg <- 3
theta  <- 5
copula <- claytonCopula(theta, dim = n.marg)
mymvdc <- mvdc(copula, c("beta", "beta", "beta"), list(list(shape1=4, shape2=1),
                list(shape1=.5, shape2=.5), list(shape1=2, shape2=3)))
n      <- 50
x.samp <- copula::rMvdc(n, mymvdc)

# randomly introduce MCAR univariate and multivariate missing

perc.miss <- 0.15
setseed   <- set.seed(13)
x.samp.miss <- MCAR(x.samp, perc.miss, setseed)
x.samp.miss <- x.samp.miss@"db.missing"
probs <- seq(0.05,0.45,by=0.05)
ndist <- 7
dist.meth <- "gower" 

# impute missing values

NPimp2 <- NPCoImp(X=x.samp.miss, Psi=probs, smoothing="beta", K=ndist, 
                    method=dist.meth)

# methods show and plot

show(NPimp2)

## End(Not run)

[Package CoImp version 2.1.0 Index]