GEcalib {GECal} | R Documentation |
Generalized Entropy Calibration
Description
GEcalib
computes the calibration weights.
Generalized entropy calibration weights maximize the generalized entropy:
H(\bm{\omega}) = -\sum_{i \in A} G(\omega_i),
subject to the calibration constraints \sum_{i \in A} \omega_i \bm{z}_i = \sum_{i \in U} \bm{z}_i
,
where A
denotes the sample index, and U
represents the population index.
The auxiliary variables, whose population totals are known, are defined as \bm{z}_i^T = (\bm{x}_i^T, g(d_i))
,
where g
is the first-order derivative of the gerenalized entropy G
,
and d_i
is the design weight for each sampled unit i \in A
.
Usage
GEcalib(
formula,
dweight,
data = NULL,
const,
method = c("GEC", "GEC0", "DS"),
entropy = c("SL", "EL", "ET", "CE", "HD", "PH"),
weight.scale = 1,
G.scale = 1,
K_alpha = NULL,
is.total = TRUE,
del = NULL
)
Arguments
formula |
An object of class "formula" specifying the calibration model. |
dweight |
A vector of sampling weights. |
data |
An optional data frame containing the variables in the model (specified by |
const |
A vector used in the calibration constraint for population totals( or means). |
method |
The method to be used in calibration. See "Details" for more information. |
entropy |
The generalized entropy used in calibration, which can be either a numeric value or a string.
If numeric, |
weight.scale |
Positive scaling factor for the calibration weights |
G.scale |
Positive scaling factor for the generalized entropy function |
K_alpha |
The |
is.total |
Logical, |
del |
The optional threshold ( |
Details
The GEcal
object returns the calibration weights and necessary information for estimating population totals(or mean).
The terms to the right of the ~ symbol in the formula
argument define the calibration constraints.
When method == "GEC"
, the debiasing covariate g(dweight)
must be included in the formula
.
If the population total(mean) of g(dweight)
is unavailable, const
that corresponds to g(dweight)
can be set to NA
.
In this case, GECalib
performs joint optimization over
both the calibration weights \omega_i
and the missing value of const
.
The length of the const
vector should match the number of columns in the model.matrix
generated by formula
.
Additionally, the condition number of the model.matrix
must exceed .Machine$double.eps
to ensure its invertibility.
Both weight.scale
and G.scale
are positive scaling factors used for calibration.
Note that weight.scale
is not supported when method == "DS"
.
Let q_i
be the scaling factor for the generalized entropy function G
,
and \phi_i
be the scaling factor for the calibration weights \omega_i
.
If method == "GEC"
, GEcalib
minimizes the negative entropy:
\sum_{i \in A} q_iG(\phi_i\omega_i),
with respect to \bm \omega
subject to the calibration constraints \sum_{i \in A} \omega_i \bm{z}_i = \sum_{i \in U} \bm{z}_i
,
where \bm{z}_i^T = (\bm{x}_i^T, q_i \phi_i g(\phi_i d_i))
, A
denotes the sample index, and U
represents the population index.
If method == "GEC"
, but an element of const
corresponding to the debiasing covariate
g(d_i)
is NA
, GEcalib
minimizes the negative adjusted entropy:
\sum_{i \in A} q_iG(\phi_i\omega_i) - K(\alpha),
with respect to \bm \omega
and \alpha
subject to the calibration constraints
\sum_{i \in A} \omega_i (\bm{x}_i^T, q_i \phi_i g(\phi_i d_i)) = \left(\sum_{i \in U} \bm x_i, \alpha \right)
,
where the solution \hat \alpha
is an estimate of population total for g(d_i)
.
Examples of K(\alpha)
includes K(\alpha) = \alpha
when K_alpha == NULL
, and
K(\alpha) = \left(\sum_{i \in A} d_i g(d_i) + N \right)
\log \left| \frac{1}{N}\sum_{i \in A}q_i \phi_i \omega_i g(\phi_i \omega_i) + 1 \right|
when K_alpha == "log"
.
If method == "GEC0"
, GEcalib
minimizes the negative adjusted entropy:
\sum_{i \in A} q_iG(\phi_i\omega_i) - q_i\phi_i\omega_i g(\phi_i \omega_i)
with respect to \bm \omega
subject to the calibration constraints \sum_{i \in A} \omega_i \bm{x}_i = \sum_{i \in U} \bm{x}_i
.
If method == "DS"
, GEcalib
minimizes the divergence between \bm \omega
and \bm d
:
\sum_{i \in A} q_id_i \tilde G(\omega_i / d_i)
with respect to \bm \omega
subject to the calibration constraints \sum_{i \in A} \omega_i \bm{x}_i = \sum_{i \in U} \bm{x}_i
.
When method == "DS"
, weight.scale
, the scaling factor for the calibration weights \phi_i
, is not applicable.
Examples of G
and \tilde G
are given in "Summary".
Value
A list of class calibration
including the calibration weights
and data needed for estimation.
Summary
The table below provides a comparison between the GEC and DS methods.
GEC | DS |
\min_{\bm \omega} \left(-H(\bm \omega)\right) = \sum_{i \in A}G(\omega_i) \quad |
\quad \min_{\bm \omega} D(\bm \omega, \bm d) = \sum_{i \in A}d_i \tilde G(\omega_i / d_i) |
s.t. \sum_{i \in A} \omega_i (\bm{x}_i^T, g(d_i)) = \sum_{i \in U} (\bm{x}_i^T, g(d_i)) |
s.t. \sum_{i \in A} \omega_i \bm{x}_i^T = \sum_{i \in U} \bm{x}_i^T |
G(\omega) = \begin{cases} \frac{1}{r(r+1)} \omega^{r+1} & r \neq 0, -1\\
\omega \log \omega - \omega & r = 0\text{(ET)} \\
-\log \omega & r = -1\text{(EL)} \end{cases}
| \tilde G(\omega) = \begin{cases} \frac{1}{r(r+1)} \left(\omega^{r+1} - (r+1)\omega + r\right) & r \neq 0, -1 \\
\omega \log \omega - \omega + 1 & r = 0\text{(ET)} \\
-\log \omega + \omega - 1 & r = -1\text{(EL)} \end{cases} |
If method == "GEC"
, further examples include
G(\omega) = (\omega - 1) \log (\omega-1) - \omega \log \omega
when entropy == "CE"
, and
G(\omega) = \delta^2 \left(1 + (\omega / \delta)^2 \right)^{1/2}
for a threshold \delta
when entropy == "PH"
.
Author(s)
Yonghyun Kwon
References
Kwon, Y., Kim, J., & Qiu, Y. (2024). Debiased calibration estimation using generalized entropy in survey sampling. Arxiv preprint <https://arxiv.org/abs/2404.01076>
Deville, J. C., and Särndal, C. E. (1992). Calibration estimators in survey sampling. Journal of the American statistical Association, 87(418), 376-382.
Examples
set.seed(11)
N = 10000
x = data.frame(x1 = rnorm(N, 2, 1), x2= runif(N, 0, 4))
pi = pt((-x[,1] / 2 - x[,2] / 2), 3);
pi = ifelse(pi >.7, .7, pi)
delta = rbinom(N, 1, pi)
Index_S = (delta == 1)
pi_S = pi[Index_S]; d_S = 1 / pi_S
x_S = x[Index_S,]
# Deville & Sarndal(1992)'s calibration using divergence
w1 <- GECal::GEcalib(~ ., dweight = d_S, data = x_S,
const = colSums(cbind(1, x)),
entropy = "ET", method = "DS")$w
# Generalized entropy calibration without debiasing covariate
w2 <- GECal::GEcalib(~ ., dweight = d_S, data = x_S,
const = colSums(cbind(1, x)),
entropy = "ET", method = "GEC0")$w
all.equal(w1, w2)
# Generalized entropy calibration with debiasing covariate
w3 <- GECal::GEcalib(~ . + g(d_S), dweight = d_S, data = x_S,
const = colSums(cbind(1, x, log(1 / pi))),
entropy = "ET", method = "GEC")$w
# Generalized entropy calibration with debiasing covariate
# when its population total is unknown
w4 <- GECal::GEcalib(~ . + g(d_S), dweight = d_S, data = x_S,
const = colSums(cbind(1, x, NA)),
entropy = "ET", method = "GEC")$w
all.equal(w1, w4)
w5 <- GECal::GEcalib(~ . + g(d_S), dweight = d_S, data = x_S,
const = colSums(cbind(1, x, NA)),
entropy = "ET", method = "GEC", K_alpha = "log")$w