run_em {fastei} | R Documentation |
Compute the Expected-Maximization Algorithm
Description
Executes the Expectation-Maximization (EM) algorithm indicating the approximation method to use in the E-step.
Certain methods may require additional arguments, which can be passed through ...
(see fastei-package for more details).
Usage
run_em(
object = NULL,
X = NULL,
W = NULL,
json_path = NULL,
method = "mult",
initial_prob = "group_proportional",
allow_mismatch = TRUE,
maxiter = 1000,
maxtime = 3600,
param_threshold = 0.001,
ll_threshold = as.double(-Inf),
seed = NULL,
verbose = FALSE,
group_agg = NULL,
mcmc_samples = 1000,
mcmc_stepsize = 3000,
mvncdf_method = "genz",
mvncdf_error = 0.00001,
mvncdf_samples = 5000,
...
)
Arguments
object |
An object of class |
X |
A |
W |
A |
json_path |
A path to a JSON file containing |
method |
An optional string specifying the method used for estimating the E-step. Valid options are:
For a detailed description of each method, see fastei-package and References. |
initial_prob |
An optional string specifying the method used to obtain the initial probability. Accepted values are:
|
allow_mismatch |
Boolean, if |
maxiter |
An optional integer indicating the maximum number of EM iterations.
The default value is |
maxtime |
An optional numeric specifying the maximum running time (in seconds) for the
algorithm. This is checked at every iteration of the EM algorithm. The default value is |
param_threshold |
An optional numeric value indicating the minimum difference between
consecutive probability values required to stop iterating. The default value is |
ll_threshold |
An optional numeric value indicating the minimum difference between consecutive log-likelihood values to stop iterating. The default value is |
seed |
An optional integer indicating the random seed for the randomized algorithms. This argument is only applicable if |
verbose |
An optional boolean indicating whether to print informational messages during the EM
iterations. The default value is |
group_agg |
An optional vector of increasing integers from 1 to the number of columns in |
mcmc_samples |
An optional integer indicating the number of samples to generate for the
MCMC method. This parameter is only relevant when |
mcmc_stepsize |
An optional integer specifying the step size for the |
mvncdf_method |
An optional string specifying the method used to estimate the |
mvncdf_error |
An optional numeric value defining the error threshold for the Monte Carlo
simulation when estimating the |
mvncdf_samples |
An optional integer specifying the number of Monte Carlo
samples for the |
... |
Added for compability |
Value
The function returns an eim
object with the function arguments and the following attributes:
- prob
The estimated probability matrix
(g x c)
.- cond_prob
A
(b x g x c)
3d-array with the probability that a at each ballot-box a voter of each group voted for each candidate, given the observed outcome at the particular ballot-box.- logLik
The log-likelihood value from the last iteration.
- iterations
The total number of iterations performed by the EM algorithm.
- time
The total execution time of the algorithm in seconds.
- status
-
The final status ID of the algorithm upon completion:
-
0
: Converged -
1
: Maximum time reached. -
2
: Maximum iterations reached.
-
- message
The finishing status displayed as a message, matching the status ID value.
- method
The method for estimating the conditional probability in the E-step.
Aditionally, it will create mcmc_samples
and mcmc_stepsize
parameters if the specified method = "mcmc"
, or mvncdf_method
, mvncdf_error
and mvncdf_samples
if method = "mvn_cdf"
.
Also, if the eim object supplied is created with the function simulate_election, it also returns the real probability with the name real_prob
. See simulate_election.
If group_agg
is different than NULL
, two values are returned: W_agg
a (b x a)
matrix with the number of voters of each aggregated group o each ballot-box, and group_agg
the same input vector.
Note
This function can be executed using one of three mutually exclusive approaches:
By providing an existing
eim
object.By supplying both input matrices (
X
andW
) directly.By specifying a JSON file (
json_path
) containing the matrices.
These input methods are mutually exclusive, meaning that you must provide exactly one of these options. Attempting to provide more than one or none of these inputs will result in an error.
When called with an eim
object, the function updates the object with the computed results.
If an eim
object is not provided, the function will create one internally using either the
supplied matrices or the data from the JSON file before executing the algorithm.
References
Thraves, C., Ubilla, P. and Hermosilla, D.: "Fast Ecological Inference Algorithm for the RxC Case". Aditionally, the MVN CDF is computed by the methods introduced in Genz, A. (2000). Numerical computation of multivariate normal probabilities. Journal of Computational and Graphical Statistics
See Also
The eim object implementation.
Examples
# Example 1: Compute the Expected-Maximization with default settings
simulations <- simulate_election(
num_ballots = 300,
num_candidates = 5,
num_groups = 3,
)
model <- eim(simulations$X, simulations$W)
model <- run_em(model) # Returns the object with updated attributes
# Example 2: Compute the Expected-Maximization using the mvn_pdf method
model <- run_em(
object = model,
method = "mvn_pdf",
)
# Example 3: Run the mvn_cdf method with default settings
model <- run_em(object = model, method = "mvn_cdf")
## Not run:
# Example 4: Perform an Exact estimation using user-defined parameters
run_em(
json_path = "a/json/file.json",
method = "exact",
initial_prob = "uniform",
maxiter = 10,
maxtime = 600,
param_threshold = 1e-3,
ll_threshold = 1e-5,
verbose = TRUE
)
## End(Not run)