projection_rf {sae.projection} | R Documentation |
Projection RF Function
Description
This function trains a random forest model and performs domain-level estimation **without bias correction**.
Usage
projection_rf(
data_model,
target_column,
predictor_cols,
data_proj,
domain1,
domain2,
psu,
ssu,
strata,
weights,
split_ratio = 0.8,
metric = "Accuracy"
)
Arguments
data_model |
The training dataset, consisting of auxiliary variables and the target variable. |
target_column |
The name of the target column in the |
predictor_cols |
A vector of predictor column names. |
data_proj |
The data for projection (prediction), which needs to be projected using the trained model. It must contain the same auxiliary variables as the |
domain1 |
Domain variables for survey estimation (e.g., "province") |
domain2 |
Domain variables for survey estimation (e.g., "regency") |
psu |
Primary sampling units, representing the structure of the sampling frame. |
ssu |
Secondary sampling units, representing the structure of the sampling frame. |
strata |
Stratification variable, ensuring that specific subgroups are represented. |
weights |
Weights used for the direct estimation from |
split_ratio |
Proportion of data used for training (default is 0.8, meaning 80 percent for training and 20 percent for validation). |
metric |
The metric used for model evaluation (default is Accuracy, other options include "AUC", "F1", etc.). |
Value
A list containing the following elements:
-
model
The trained Random Forest model. -
importance
Feature importance showing which features contributed most to the model's predictions. -
train_accuracy
Accuracy of the model on the training set. -
validation_accuracy
Accuracy of the model on the validation set. -
validation_performance
Confusion matrix for the validation set, showing performance metrics like accuracy, precision, recall, etc. -
data_proj
The projection data with predicted values. -
Domain1
Estimations for Domain 1, including estimated values, variance, and relative standard error. -
Domain2
Estimations for Domain 2, including estimated values, variance, and relative standard error.