prepareKernel {smoothemplik} | R Documentation |
Check the data for kernel estimation
Description
Checks if the order is 2, 4, or 6, transforms the objects into matrices, checks the dimensions, provides the bandwidth, creates default arguments to pass to the C++ functions, carries out de-duplication for speed-up etc.
Usage
prepareKernel(
x,
y = NULL,
xout = NULL,
weights = NULL,
bw = NULL,
kernel = c("gaussian", "uniform", "triangular", "epanechnikov", "quartic"),
order = 2,
convolution = FALSE,
sparse = FALSE,
deduplicate.x = TRUE,
deduplicate.xout = TRUE,
no.dedup = FALSE,
PIT = FALSE
)
Arguments
x |
A numeric vector, matrix, or data frame containing observations. For density, the points used to compute the density. For kernel regression, the points corresponding to explanatory variables. |
y |
Optional: a vector of dependent variable values. |
xout |
A vector or a matrix of data points with |
weights |
A numeric vector of observation weights (typically counts) to
perform weighted operations. If null, |
bw |
Bandwidth for the kernel: a scalar or a vector of the same length as |
kernel |
Character describing the desired kernel type. NB: due to limited machine precision, even Gaussian has finite support. |
order |
An integer: 2, 4, or 6. Order-2 kernels are the standard kernels that are positive everywhere. Orders 4 and 6 produce some negative values, which reduces bias but may hamper density estimation. |
convolution |
Logical: if FALSE, returns the usual kernel. If TRUE, returns the convolution kernel that is used in density cross-validation. |
sparse |
Logical: TODO (ignored) |
deduplicate.x |
Logical: if TRUE, full duplicates in the input |
deduplicate.xout |
Logical: if TRUE, full duplicates in the input |
no.dedup |
Logical: if TRUE, sets |
PIT |
If TRUE, the Probability Integral Transform (PIT) is applied to all columns
of |
Value
A list of arguments that are taken by [kernelDensity()] and [kernelSmooth()].
Examples
# De-duplication facilities
set.seed(1) # Creating a data set with many duplicates
n.uniq <- 10000
n <- 60000
inds <- ceiling(runif(n, 0, n.uniq))
x.uniq <- matrix(rnorm(n.uniq*10), ncol = 10)
x <- x.uniq[inds, ]
y <- runif(n.uniq)[inds]
xout <- x.uniq[ceiling(runif(n.uniq*3, 0, n.uniq)), ]
w <- runif(n)
print(system.time(a1 <- prepareKernel(x, y, xout, w, bw = 0.5)))
print(system.time(a2 <- prepareKernel(x, y, xout, w, bw = 0.5,
deduplicate.x = FALSE, deduplicate.xout = FALSE)))
print(c(object.size(a1), object.size(a2)) / 1024) # Kilobytes used
# Speed-memory trade-off: 4 times smaller, takes 0.2 s, but reduces the
# number of matrix operations by a factor of
1 - prod(1 - a1$duplicate.stats[1:2]) # 95% fewer operations
sum(a1$weights) - sum(a2$weights) # Should be 0 or near machine epsilon