data_check {pprof} | R Documentation |
Data quality check function
Description
Conduct data quality check including checking missingness, variation, correlation and VIF of variables.
Usage
data_check(Y, Z, ProvID)
Arguments
Y |
a numeric vector indicating the outcome variable. |
Z |
a matrix or data frame representing covariates. |
ProvID |
a numeric vector representing the provider identifier. |
Details
The function performs the following checks:
-
Missingness: Checks for any missing values in the dataset and provides a summary of missing data.
-
Variation: Identifies covariates with zero or near-zero variance which might affect model stability.
-
Correlation: Analyzes pairwise correlation among covariates and highlights highly correlated pairs.
-
VIF: Computes the Variable Inflation Factors to identify covariates with potential multicollinearity issues.
If issues arise when using the model functions logis_fe
, linear_fe
and linear_re
,
this function can be called for data quality checking purposes.
Value
No return value, called for side effects.
Examples
data(ExampleDataBinary)
outcome = ExampleDataBinary$Y
covar = ExampleDataBinary$Z
ProvID = ExampleDataBinary$ProvID
data_check(outcome, covar, ProvID)