npuniden.sc {np} | R Documentation |
Kernel Shape Constrained Bounded Univariate Density Estimation
Description
npuniden.sc
computes shape constrained kernel univariate
unconditional density estimates given a vector of continuously
distributed training data and a bandwidth. Lower and upper bounds
[a
,b
] can be supplied (default is [0,1]) and if a
is set to -Inf
there is only one bound on the right, while if
b
is set to Inf
there is only one bound on the left.
Usage
npuniden.sc(X = NULL,
Y = NULL,
h = NULL,
a = 0,
b = 1,
lb = NULL,
ub = NULL,
extend.range = 0,
num.grid = 0,
function.distance = TRUE,
integral.equal = FALSE,
constraint = c("density",
"mono.incr",
"mono.decr",
"concave",
"convex",
"log-concave",
"log-convex"))
Arguments
X |
a required numeric vector of training data lying in |
Y |
an optional numeric vector of evaluation data lying in |
h |
a bandwidth ( |
a |
an optional lower bound on the support of |
b |
an optional upper bound on the support of |
lb |
a scalar lower bound ( |
ub |
a scalar upper bound ( |
extend.range |
number specifying the fraction by which the range of the training data
should be extended for the additional grid points (passed to the
function |
num.grid |
number of additional grid points (in addition to |
function.distance |
a logical value that, if |
integral.equal |
a logical value, that, if |
constraint |
a character string indicating whether the estimate is to be
constrained to be monotonically increasing
( |
Details
Typical usages are (see below for a complete list of options and also the examples at the end of this help file)
model <- npuniden.sc(X,a=-2,b=3)
npuniden.sc
implements a methods for estimating a univariate
density function defined over a continuous random variable in the
presence of bounds subject to a variety of shape constraints. The
bounded estimates use the truncated Gaussian kernel function.
Note that for the log-constrained estimates, the derivative estimate returned is that for the log-constrained estimate not the non-log value of the estimate returned by the function. See Example 5 below hat manually plots the log-density and returned derivative (no transformation is needed when plotting the density estimate itself).
If the quadratic program solver fails to find a solution, the
unconstrained estimate is returned with an immediate warning. Possible
causes to be investigated are undersmoothing, sparsity, and the
presence of non-sample grid points. To investigate the possibility of
undersmoothing try using a larger bandwidth, to investigate sparsity
try decreasing extend.range
, and to investigate non-sample grid
points try setting num.grid
to 0
.
Mean square error performance seems to improve generally when using
additional grid points in the empirical support of X
and
Y
(i.e., in the observed range of the data sample) but appears
to deteriorate when imposing constraints beyond the empirical support
(i.e., when extend.range
is positive). Increasing the number of
additional points beyond a hundred or so appears to have a limited
impact.
The option function.distance=TRUE
appears to perform better for
imposing convexity, concavity, log-convexity and log-concavity, while
function.distance=FALSE
appears to perform better for imposing
monotonicity, whether increasing or decreasing (based on simulations
for the Beta(s1,s2) distribution with sample size n=100
).
Value
A list with the following elements:
f |
unconstrained density estimate |
f.sc |
shape constrained density estimate |
se.f |
asymptotic standard error of the unconstrained density estimate |
se.f.sc |
asymptotic standard error of the shape constrained density estimate |
f.deriv |
unconstrained derivative estimate (of order 1 or 2 or log thereof) |
f.sc.deriv |
shape constrained derivative estimate (of order 1 or 2 or log thereof) |
F |
unconstrained distribution estimate |
F.sc |
shape constrained distribution estimate |
integral.f |
the integral of the unconstrained estimate over |
integral.f.sc |
the integral of the constrained estimate over |
solve.QP |
logical, if |
attempts |
number of attempts when |
Author(s)
Jeffrey S. Racine racinej@mcmaster.ca
References
Du, P. and C. Parmeter and J. Racine (forthcoming), “Shape Constrained Kernel PDF and PMF Estimation”, Statistica Sinica.
See Also
The logcondens, LogConDEAD, and scdensity packages,
and the function npuniden.boundary
.
Examples
## Not run:
n <- 100
set.seed(42)
## Example 1: N(0,1), constrain the density to lie within lb=.1 and ub=.2
X <- sort(rnorm(n))
h <- npuniden.boundary(X,a=-Inf,b=Inf)$h
foo <- npuniden.sc(X,h=h,constraint="density",a=-Inf,b=Inf,lb=.1,ub=.2)
ylim <- range(c(foo$f.sc,foo$f))
plot(X,foo$f.sc,type="l",ylim=ylim,xlab="X",ylab="Density")
lines(X,foo$f,col=2,lty=2)
rug(X)
legend("topleft",c("Constrained","Unconstrained"),lty=1:2,col=1:2,bty="n")
## Example 2: Beta(5,1), DGP is monotone increasing, impose valid
## restriction
X <- sort(rbeta(n,5,1))
h <- npuniden.boundary(X)$h
foo <- npuniden.sc(X=X,h=h,constraint=c("mono.incr"))
par(mfrow=c(1,2))
ylim <- range(c(foo$f.sc,foo$f))
plot(X,foo$f.sc,type="l",ylim=ylim,xlab="X",ylab="Density")
lines(X,foo$f,col=2,lty=2)
rug(X)
legend("topleft",c("Constrained","Unconstrained"),lty=1:2,col=1:2,bty="n")
ylim <- range(c(foo$f.sc.deriv,foo$f.deriv))
plot(X,foo$f.sc.deriv,type="l",ylim=ylim,xlab="X",ylab="First Derivative")
lines(X,foo$f.deriv,col=2,lty=2)
abline(h=0,lty=2)
rug(X)
legend("topleft",c("Constrained","Unconstrained"),lty=1:2,col=1:2,bty="n")
## Example 3: Beta(1,5), DGP is monotone decreasing, impose valid
## restriction
X <- sort(rbeta(n,1,5))
h <- npuniden.boundary(X)$h
foo <- npuniden.sc(X=X,h=h,constraint=c("mono.decr"))
par(mfrow=c(1,2))
ylim <- range(c(foo$f.sc,foo$f))
plot(X,foo$f.sc,type="l",ylim=ylim,xlab="X",ylab="Density")
lines(X,foo$f,col=2,lty=2)
rug(X)
legend("topleft",c("Constrained","Unconstrained"),lty=1:2,col=1:2,bty="n")
ylim <- range(c(foo$f.sc.deriv,foo$f.deriv))
plot(X,foo$f.sc.deriv,type="l",ylim=ylim,xlab="X",ylab="First Derivative")
lines(X,foo$f.deriv,col=2,lty=2)
abline(h=0,lty=2)
rug(X)
legend("topleft",c("Constrained","Unconstrained"),lty=1:2,col=1:2,bty="n")
## Example 4: N(0,1), DGP is log-concave, impose invalid concavity
## restriction
X <- sort(rnorm(n))
h <- npuniden.boundary(X,a=-Inf,b=Inf)$h
foo <- npuniden.sc(X=X,h=h,a=-Inf,b=Inf,constraint=c("concave"))
par(mfrow=c(1,2))
ylim <- range(c(foo$f.sc,foo$f))
plot(X,foo$f.sc,type="l",ylim=ylim,xlab="X",ylab="Density")
lines(X,foo$f,col=2,lty=2)
rug(X)
legend("topleft",c("Constrained","Unconstrained"),lty=1:2,col=1:2,bty="n")
ylim <- range(c(foo$f.sc.deriv,foo$f.deriv))
plot(X,foo$f.sc.deriv,type="l",ylim=ylim,xlab="X",ylab="Second Derivative")
lines(X,foo$f.deriv,col=2,lty=2)
abline(h=0,lty=2)
rug(X)
legend("topleft",c("Constrained","Unconstrained"),lty=1:2,col=1:2,bty="n")
## Example 45: Beta(3/4,3/4), DGP is convex, impose valid restriction
X <- sort(rbeta(n,3/4,3/4))
h <- npuniden.boundary(X)$h
foo <- npuniden.sc(X=X,h=h,constraint=c("convex"))
par(mfrow=c(1,2))
ylim <- range(c(foo$f.sc,foo$f))
plot(X,foo$f.sc,type="l",ylim=ylim,xlab="X",ylab="Density")
lines(X,foo$f,col=2,lty=2)
rug(X)
legend("topleft",c("Constrained","Unconstrained"),lty=1:2,col=1:2,bty="n")
ylim <- range(c(foo$f.sc.deriv,foo$f.deriv))
plot(X,foo$f.sc.deriv,type="l",ylim=ylim,xlab="X",ylab="Second Derivative")
lines(X,foo$f.deriv,col=2,lty=2)
abline(h=0,lty=2)
rug(X)
legend("topleft",c("Constrained","Unconstrained"),lty=1:2,col=1:2,bty="n")
## Example 6: N(0,1), DGP is log-concave, impose log-concavity
## restriction
X <- sort(rnorm(n))
h <- npuniden.boundary(X,a=-Inf,b=Inf)$h
foo <- npuniden.sc(X=X,h=h,a=-Inf,b=Inf,constraint=c("log-concave"))
par(mfrow=c(1,2))
ylim <- range(c(log(foo$f.sc),log(foo$f)))
plot(X,log(foo$f.sc),type="l",ylim=ylim,xlab="X",ylab="Log-Density")
lines(X,log(foo$f),col=2,lty=2)
rug(X)
legend("topleft",c("Constrained-log","Unconstrained-log"),lty=1:2,col=1:2,bty="n")
ylim <- range(c(foo$f.sc.deriv,foo$f.deriv))
plot(X,foo$f.sc.deriv,type="l",ylim=ylim,xlab="X",ylab="Second Derivative of Log-Density")
lines(X,foo$f.deriv,col=2,lty=2)
abline(h=0,lty=2)
rug(X)
legend("topleft",c("Constrained-log","Unconstrained-log"),lty=1:2,col=1:2,bty="n")
## End(Not run)