ei_gme {EIEntropy}R Documentation

Ecologic Inference applying entropy

Description

The function ei_gme defines the Shannon entropy function which takes a vector of probabilities as input and returns the negative sum of p times the natural logarithm of p.The function will set the optimization parameters and using the "nlminb" function an optimal solution is obtained. The function defines the independent variables in the two databases needed, which we call dataA with "n_A" observations and dataB with "n_B" observations; and the function of the binary variable of interest y. Then the weights of each observation for the two databases used are defined, if there are no weights available it will be 1. The errors are calculated pondering the support vector of dimension var, 0, -var. This support vector can be specified by the user. The default support vector is based on variance.We recommend a wider interval with v(1,0,-1) as the maximum. The restrictions are defined to guarantee consistency. The optimization of the Shannon entropy function is solved with "nlminb" function with maximum number of iterations 1000 and with tolerance defined by the user.

Usage

ei_gme(fn, dataA, dataB, weights = NULL, tol, v, iter)

Arguments

fn

Is the formula that represents the dependent variable in the optimization. In the context of this function, 'fn' is used to define the dependent variable to be optimized by the entropy function. Note: If the dependent variable is categorical the sorting criterion for the columns, and therefore for J, is alphabetical order.

dataA

The data where the variable of interest y is available and also the independent variables. Note: The variables and weights used as independent variables must have the same name in 'dataA' and in 'dataB' The variables in both databases need to match up in content.

dataB

The data which contains information on the independent variables at a disaggregated level. Note: The variables and weights used as independent variables must be the same and must have the same name in 'dataA' and in 'dataB'

weights

A character string specifying the column name to be used as weights in both 'dataA' and 'dataB' datasets. If the argument weights is provided and present in both datasets, the weights in each dataset will be normalized by the sum of the weights within that dataset. If weights is NULL or the specified column does not exist in both datasets, equal weights are applied across all observations.

tol

The tolerance to be applied in the optimization function. If the tolerance is not specified, the default tolerance has been set in 1e-10

v

The support vector

iter

The maximum number of iterations allowed for the optimization algorithm to run Increasing the number of iterations may improve the likelihood of finding an optimal solution, but can also increases computation time.If the maximum number of iterations is not specified, it will default to 1000

Details

To solve the optimization upper and lower bounds for p and w are settled, specifically, p and w must be above 0 and lower than 1. In addition, the initial values of p are settled as a uniform distribution and the errors (w) as 1/L.

Value

The function will provide you a dataframe called table with the next information:

The restriction g3 can be checked thoroughly with the objects by separate.

References

Fernandez-Vazquez, E., Díaz-Dapena, A., Rubiera-Morollon, F., Viñuela, A., (2020) Spatial Disaggregation of Social Indicators: An Info-Metrics Approach. Social Indicators Research, 152(2), 809–821. https://doi.org/10.1007/s11205-020-02455-z.

Examples

#In this example we use the data of this package
dataA <- financial()
dataB <- social()
# Setting up our function for the dependent variable.
fn               <- dataA$poor_liq ~ Dcollege+Totalincome+Dunemp
#Applying the function ei_gme to our databases. In this case dataA
#is the data where we have our variable of interest dataB is the data
# where we have the information for the disaggregation.
#w can be included if we have weights in both surveys
#Tolerance in this example is fixed in 1e-10 and v will be (1,0,-1)
v=matrix(c(1, 0, -1), nrow = 1)
result  <- ei_gme(fn=fn,dataA=dataA,dataB=dataB,weights="w",v=v)

[Package EIEntropy version 0.0.1.4 Index]