neighborhood_net {mantar} | R Documentation |
Estimate Network using Neighborhood Selection based on Information Criteria
Description
Estimate Network using Neighborhood Selection based on Information Criteria
Usage
neighborhood_net(
data = NULL,
ns = NULL,
mat = NULL,
n_calc = "individual",
missing_handling = "two-step-em",
k = "log(n)",
nimp = 20,
pcor_merge_rule = "and"
)
Arguments
data |
Raw data containing only the variables to be included in the network. May include missing values. |
ns |
Numeric vector specifying the sample size for each variable in the data.
If not provided, it will be computed based on the data.
Must be provided if a correlation matrix ( |
mat |
Optional covariance or correlation matrix for the variables to be included in the network.
Used only if |
n_calc |
Method for calculating the sample size for node-wise regression models. Can be one of:
|
missing_handling |
Method for estimating the correlation matrix in the presence of missing data.
|
k |
Penalty per parameter (number of predictor + 1) to be used in node-wise regressions; the default '"log(n)"' (number of observations for the dependent variable) is the classical BIC. Alternatively, classical AIC would be |
nimp |
Number of multiple imputations to perform when using multiple imputation for missing data (default: 20). |
pcor_merge_rule |
Rule for merging regression weights into partial correlations.
|
Details
This function estimates a network structure using neighborhood selection guided by information criteria.
Simulations by Williams et al. (2019) indicated that using the "and"
rule for merging regression weights tends to yield more accurate partial correlation estimates than the "or"
rule.
Both the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC) are supported and have been shown to produce valid network structures.
To handle missing data, the function offers two approaches: a two-step expectation-maximization (EM) algorithm and stacked multiple imputation. According to simulations by Nehler and Schultze (2024), stacked multiple imputation performs reliably across a range of sample sizes. In contrast, the two-step EM algorithm provides accurate results primarily when the sample size is large relative to the amount of missingness and network complexity—but may still be preferred in such cases due to its much faster runtime.
Currently, the function only supports variables that are directly included in the network analysis; auxiliary variables for missing handling are not yet supported. During imputation, all variables are imputed using predictive mean matching (see e.g., van Buuren, 2018), with all other variables in the data set used as predictors.
Value
A list with the following elements:
- pcor
Partial correlation matrix estimated from the node-wise regressions.
- betas
Matrix of regression coefficients from the final regression models.
- ns
Sample sizes used for each variable in the node-wise regressions.
- args
List of arguments used in the function call, including
pcor_merge_rule
,k
,missing_handling
, andnimp
.
References
Nehler, K. J., & Schultze, M. (2024). Handling missing values when using neighborhood selection for network analysis. https://doi.org/10.31234/osf.io/qpj35
van Buuren, S. (2018). Flexible Imputation of Missing Data (2nd ed.). CRC Press.
Williams, D. R., Rhemtulla, M., Wysocki, A. C., & Rast, P. (2019). On nonregularized estimation of psychological networks. Multivariate Behavioral Research, 54(5), 719–750. https://doi.org/10.1080/00273171.2019.1575716
Examples
# Estimate network from full data set
# Using Akaike information criterion
result <- neighborhood_net(data = mantar_dummy_full,
k = "2")
# View estimated partial correlations
result$pcor
# Estimate network for data set with missings
# Using Bayesian Information Criterion, individual sample sizes, and two-step EM
result_mis <- neighborhood_net(data = mantar_dummy_mis,
n_calc = "individual",
missing_handling = "two-step-em")
# View estimated partial correlations
result_mis$pcor