get_agg_proxy {fastei} | R Documentation |
Runs the EM algorithm aggregating adjacent groups, maximizing the variability of macro-group allocation in ballot boxes.
Description
This function estimates the voting probabilities (computed using run_em) aggregating adjacent groups so that the estimated probabilities' standard deviation (computed using bootstrap) is below a given threshold. See Details for more information.
Usage
get_agg_proxy(
object = NULL,
X = NULL,
W = NULL,
json_path = NULL,
sd_statistic = "maximum",
sd_threshold = 0.05,
method = "mult",
feasible = TRUE,
nboot = 100,
allow_mismatch = TRUE,
seed = NULL,
...
)
Arguments
object |
An object of class |
X |
A |
W |
A |
json_path |
A path to a JSON file containing |
sd_statistic |
String indicates the statistic for the standard deviation |
sd_threshold |
Numeric with the value to use as a threshold for the statistic ( |
method |
An optional string specifying the method used for estimating the E-step. Valid options are:
|
feasible |
Logical indicating whether the returned matrix must strictly satisfy the |
nboot |
Integer specifying how many times to run the EM algorithm. |
allow_mismatch |
Boolean, if |
seed |
An optional integer indicating the random seed for the randomized algorithms. This argument is only applicable if |
... |
Additional arguments passed to the run_em function that will execute the EM algorithm. |
Details
Groups need to have an order relation so that adjacent groups can be merged. Groups of consecutive column indices in the matrix W are considered adjacent. For example, consider the following seven groups defined by voters' age ranges: 20-29, 30-39, 40-49, 50-59, 60-69, 70-79, and 80+. A possible group aggregation can be a macro-group composed of the three following age ranges: 20-39, 40-59, and 60+. Since there are multiple group aggregations, even for a fixed number of macro-groups, a Dynamic Program (DP) mechanism is used to find the group aggregation that maximizes the sum of the standard deviation of the macro-groups proportions among ballot boxes for a specific number of macro-groups. If no group aggregation standard deviation statistic meets the threshold condition, NULL
is returned.
To find the best group aggregation, the function runs the DP iteratively, starting with all groups (this case is trivial since the group aggregation is such that all macro-groups match exactly the original groups). If the standard deviation statistic (sd_statistic
) is below the threshold (sd_threshold
), it stops. Otherwise, it runs the DP such that the number of macro-groups is one unit less than the original number of macro-groups. If the standard deviation statistic is below the threshold, it stops. This continues until either the algorithm stops, or until no group aggregation obtained by the DP satisfies the threshold condition. If the former holds, then the last group aggregation obtained (before stopping) is returned; while if the latter holds, then no output is returned unless the user sets the input parameter feasible=FALSE
, in which case it returns the group aggregation that has the least standard deviation statistic, among the group-aggregations obtained from the DP.
Value
It returns an eim object with the same attributes as the output of run_em, plus the attributes:
-
sd: A
(a x c)
matrix with the standard deviation of the estimated probabilities computed with bootstrapping. Note thata
denotes the number of macro-groups of the resulting group aggregation, it should be between1
andg
. -
nboot: Number of samples used for the bootstrap method.
-
seed: Random seed used (if specified).
-
sd_statistic: The statistic used as input.
-
sd_threshold: The threshold used as input.
-
is_feasible: Boolean indicating whether the statistic of the standard deviation matrix is below the threshold.
-
group_agg: Vector with the resulting group aggregation. See Examples for more details.
Additionally, it will create the W_agg
attribute with the aggregated groups, along with the attributes corresponding to running run_em with the aggregated groups.
See Also
The eim object and run_em implementation.
Examples
# Example 1: Using a simulated instance
simulations <- simulate_election(
num_ballots = 400,
num_candidates = 3,
num_groups = 6,
group_proportions = c(0.4, 0.1, 0.1, 0.1, 0.2, 0.1),
lambda = 0.7,
seed = 42
)
result <- get_agg_proxy(
X = simulations$X,
W = simulations$W,
sd_threshold = 0.015,
seed = 42
)
result$group_agg # c(2 6)
# This means that the resulting group aggregation is conformed by
# two macro-groups: one that has the original groups 1 and 2; and
# a second that has the original groups 3, 4, 5, and 6.
# Example 2: Using the chilean election results
data(chile_election_2021)
niebla_df <- chile_election_2021[chile_election_2021$ELECTORAL.DISTRICT == "NIEBLA", ]
# Create the X matrix with selected columns
X <- as.matrix(niebla_df[, c("C1", "C2", "C3", "C4", "C5", "C6", "C7")])
# Create the W matrix with selected columns
W <- as.matrix(niebla_df[, c(
"X18.19", "X20.29",
"X30.39", "X40.49",
"X50.59", "X60.69",
"X70.79", "X80."
)])
solution <- get_agg_proxy(
X = X, W = W,
allow_mismatch = TRUE, sd_threshold = 0.03,
sd_statistic = "average", seed = 42
)
solution$group_agg # c(3, 4, 5, 6, 8)
# This means that the resulting group aggregation consists of
# five macro-groups: one that includes the original groups 1, 2, and 3;
# three singleton groups (4, 5, and 6); and one macro-group that includes groups 7 and 8.