get_agg_opt {fastei} | R Documentation |
Runs the EM algorithm over all possible group aggregating, returning the one with higher likelihood while constraining the standard deviation of the probabilities.
Description
This function estimates the voting probabilities (computed using run_em) by trying all group aggregations (of adjacent groups), choosing the one that achieves the higher likelihood as long as the standard deviation (computed using bootstrap) of the estimated probabilities is below a given threshold. See Details for more informacion on adjacent groups.
Usage
get_agg_opt(
object = NULL,
X = NULL,
W = NULL,
json_path = NULL,
sd_statistic = "maximum",
sd_threshold = 0.05,
method = "mult",
nboot = 100,
allow_mismatch = TRUE,
seed = NULL,
...
)
Arguments
object |
An object of class |
X |
A |
W |
A |
json_path |
A path to a JSON file containing |
sd_statistic |
String indicates the statistic for the standard deviation |
sd_threshold |
Numeric with the value to use as a threshold for the statistic ( |
method |
An optional string specifying the method used for estimating the E-step. Valid options are:
|
nboot |
Integer specifying how many times to run the EM algorithm. |
allow_mismatch |
Boolean, if |
seed |
An optional integer indicating the random seed for the randomized algorithms. This argument is only applicable if |
... |
Additional arguments passed to the run_em function that will execute the EM algorithm. |
Details
Groups of consecutive column indices in the matrix W
are considered adjacent. For example, consider the following seven groups defined by voters' age
ranges: 20-29, 30-39, 40-49, 50-59, 60-69, 70-79, and 80+. A possible group aggregation can be a macro-group composed of the three following age
ranges: 20-39, 40-59, and 60+. Since there are multiple group aggregations, the method evaluates all possible group aggregations (merging only adjacent groups).
Value
It returns an eim object with the same attributes as the output of run_em, plus the attributes:
-
sd: A
(a x c)
matrix with the standard deviation of the estimated probabilities computed with bootstrapping. Note thata
denotes the number of macro-groups of the resulting group aggregation, it should be between1
andg
. -
nboot: Number of samples used for the bootstrap method.
-
seed: Random seed used (if specified).
-
sd_statistic: The statistic used as input.
-
sd_threshold: The threshold used as input.
-
group_agg: Vector with the resulting group aggregation. See Examples for more details.
Additionally, it will create the W_agg
attribute with the aggregated groups, along with the attributes corresponding to running run_em with the aggregated groups.
Examples
# Example 1: Using a simulated instance
simulations <- simulate_election(
num_ballots = 20,
num_candidates = 3,
num_groups = 8,
seed = 42
)
result <- get_agg_opt(
X = simulations$X,
W = simulations$W,
sd_threshold = 0.05,
seed = 42
)
result$group_agg # c(3,8)
# This means that the resulting group aggregation consists of
# two macro-groups: one that includes the original groups 1, 2, and 3;
# the remaining one with groups 4, 5, 6, 7 and 8.
# Example 2: Getting an unfeasible result
result2 <- get_agg_opt(
X = simulations$X,
W = simulations$W,
sd_threshold = 0.001
)
result2$group_agg # Error
result2$X # Input candidates' vote matrix
result2$W # Input group-level voter matrix