hmda.grid.analysis {HMDA} | R Documentation |
Analyze Hyperparameter Grid Performance
Description
Reorders an HMDA grid based on a specified performance metric and supplements the grid's summary table with additional performance metrics extracted via cross-validation. The function returns a data frame of performance metrics for each model in the grid. This enables a detailed analysis of model performance across various metrics such as logloss, AUC, RMSE, etc.
Usage
hmda.grid.analysis(
grid,
performance_metrics = c("logloss", "mse", "rmse", "rmsle", "auc", "aucpr",
"mean_per_class_error", "r2"),
sort_by = "logloss"
)
Arguments
grid |
A HMDA grid object from which the performance summary will be extracted. |
performance_metrics |
A character vector of additional performance metric
names to be included in the analysis. Default is
|
sort_by |
A character string indicating the performance metric to sort the grid
by. Default is |
Details
The function performs the following steps:
-
Grid Reordering: It calls
h2o.getGrid()
to reorder the grid based on thesort_by
metric. For metrics like "logloss", "mse", "rmse", and "rmsle", sorting is in ascending order; for others, it is in descending order. -
Performance Table Extraction: The grid's summary table is converted into a data frame.
-
Additional Metric Calculation: For each metric specified in
performance_metrics
(other than the one used for sorting), the function initializes a column with NA values and iterates over each model in the grid (via itsmodel_ids
) to extract the corresponding cross-validated performance metric using functions such ash2o.auc()
,h2o.rmse()
, etc. For threshold-based metrics (e.g.,f1
,f2
,mcc
,kappa
), it retrieves performance viah2o.performance()
. -
Return: The function returns the merged data frame of performance metrics.
Value
A data frame of class "hmda.grid.analysis"
that contains the merged
performance summary table. This table includes the default metrics from the grid
summary along with the additional metrics specified by performance_metrics
(if available). The data frame is sorted according to the sort_by
metric.
Author(s)
E. F. Haghish
Examples
## Not run:
# NOTE: This example may take a long time to run on your machine
# Initialize H2O (if not already running)
library(HMDA)
library(h2o)
hmda.init()
# Import a sample binary outcome train/test set into H2O
train <- h2o.importFile(
"https://s3.amazonaws.com/h2o-public-test-data/smalldata/higgs/higgs_train_10k.csv")
test <- h2o.importFile(
"https://s3.amazonaws.com/h2o-public-test-data/smalldata/higgs/higgs_test_5k.csv")
# Identify predictors and response
y <- "response"
x <- setdiff(names(train), y)
# For binary classification, response should be a factor
train[, y] <- as.factor(train[, y])
test[, y] <- as.factor(test[, y])
# Run the hyperparameter search using DRF and GBM algorithms.
result <- hmda.search.param(algorithm = c("gbm"),
x = x,
y = y,
training_frame = train,
max_models = 100,
nfolds = 10,
stopping_metric = "AUC",
stopping_rounds = 3)
# Assess the performances of the models
grid_performance <- hmda.grid.analysis(gbm_grid1)
# Return the best 2 models according to each metric
hmda.best.models(grid_performance, n_models = 2)
## End(Not run)