viralpreds {viralmodels}R Documentation

Predict Viral Load or CD4 Count using Many Models

Description

This function predicts viral load or CD4 count values based on multiple machine learning models using cross-validation. It allows users to specify two types of predictions: normal predictions on the full dataset or observation-by-observation (obs-by-obs) predictions.

Usage

viralpreds(output, semilla, data, prediction_type = "full")

Arguments

output

A non-ranked viraltab output

semilla

An integer specifying the seed for random number generation to ensure reproducibility.

data

A data frame containing the predictors and the target variable.

prediction_type

A character string specifying the type of predictions to perform. Use "full" (default) to perform predictions on the full dataset at once, or "batch" to perform predictions in a smaller size batches of data.

Value

A list containing two elements: predictions (a vector of predicted values for the target variable) and RMSE (the root mean square error of the best model).

Examples


library(dplyr)
library(magrittr)
library(baguette)
library(kernlab)
library(kknn)
library(ranger)
library(rules)
library(glmnet)
# Define the function to impute values in the undetectable range
set.seed(123)
impute_undetectable <- function(column) {
ifelse(column <= 40,
      rexp(sum(column <= 40), rate = 1/13) + 1,
            column)
            }
# Apply the function to all vl columns using purrr's map_dfc
library(viraldomain)
data("viral", package = "viraldomain")
viral_imputed <- viral %>%
mutate(across(starts_with("vl"), ~impute_undetectable(.x)))
traindata <- viral_imputed
target <- "cd_2022"
viralvars <- c("vl_2019", "vl_2021", "vl_2022")
logbase <- 10
pliegues <- 5
repeticiones <- 2
rejilla <- 2
semilla <- 123
viraltab(traindata, semilla, target, viralvars, logbase, pliegues, 
repeticiones, rejilla, rank_output = FALSE) %>% 
viralpreds(semilla, traindata, prediction_type = "full")


[Package viralmodels version 1.3.4 Index]