dfr_adap_sgl {dfr} | R Documentation |
Fit a DFR-aSGL model.
Description
Adaptive Sparse-group lasso (aSGL) with DFR main fitting function. Supports both linear and logistic regression, both with dense and sparse matrix implementations.
Usage
dfr_adap_sgl(
X,
y,
groups,
type = "linear",
lambda = "path",
alpha = 0.95,
gamma_1 = 0.1,
gamma_2 = 0.1,
max_iter = 5000,
backtracking = 0.7,
max_iter_backtracking = 100,
tol = 1e-05,
standardise = "l2",
intercept = TRUE,
path_length = 20,
min_frac = 0.05,
screen = TRUE,
verbose = FALSE,
v_weights = NULL,
w_weights = NULL
)
Arguments
X |
Input matrix of dimensions |
y |
Output vector of dimension |
groups |
A grouping structure for the input data. Should take the form of a vector of group indices. |
type |
The type of regression to perform. Supported values are: |
lambda |
The regularisation parameter. Defines the level of sparsity in the model. A higher value leads to sparser models:
|
alpha |
The value of |
gamma_1 |
Hyperparameter which determines the shape of the variable penalties. |
gamma_2 |
Hyperparameter which determines the shape of the group penalties. |
max_iter |
Maximum number of ATOS iterations to perform. |
backtracking |
The backtracking parameter, |
max_iter_backtracking |
Maximum number of backtracking line search iterations to perform per global iteration. |
tol |
Convergence tolerance for the stopping criteria. |
standardise |
Type of standardisation to perform on
|
intercept |
Logical flag for whether to fit an intercept. |
path_length |
The number of |
min_frac |
Smallest value of |
screen |
Logical flag for whether to apply the DFR screening rules (see Feser and Evangelou (2024)). |
verbose |
Logical flag for whether to print fitting information. |
v_weights |
Optional vector for the variable penalty weights. Overrides the adaptive SGL penalties if specified. When entering custom weights, these are multiplied internally by |
w_weights |
Optional vector for the group penalty weights. Overrides the adaptive SGL penalties if specified. When entering custom weights, these are multiplied internally by |
Details
dfr_adap_sgl()
fits a DFR-aSGL model (Feser and Evangelou (2024)) using Adaptive Three Operator Splitting (ATOS) (Pedregosa and Gidel (2018)).
It solves the convex optimisation problem given by (Poignard (2020) and Mendez-Civieta et al. (2020))
\frac{1}{2n} f(b ; y, \mathbf{X}) + \lambda \alpha \sum_{i=1}^{p}v_i |b_i| + \lambda (1-\alpha)\sum_{g=1}^{m} w_g \sqrt{p_g} \|b^{(g)}\|_2,
where f(\cdot)
is the loss function, p_g
are the group sizes, and (v,w)
are adaptive weights. In the case of the linear model, the loss function is given by the mean-squared error loss:
f(b; y, \mathbf{X}) = \left\|y-\mathbf{X}b \right\|_2^2.
In the logistic model, the loss function is given by
f(b;y,\mathbf{X})=-1/n \log(\mathcal{L}(b; y, \mathbf{X})).
where the log-likelihood is given by
\mathcal{L}(b; y, \mathbf{X}) = \sum_{i=1}^{n}\left\{y_i b^\intercal x_i - \log(1+\exp(b^\intercal x_i)) \right\}.
The adaptive weights are chosen as, for a group g
and variable i
(Mendez-Civieta et al. (2020))
v_i = \frac{1}{|q_{1i}|^{\gamma_1}}, \; w_g = \frac{1}{\|q_1^{(g)}\|_2^{\gamma_2}},
DFR uses the dual norm (the \epsilon
-norm) and the KKT conditions to discard features at \lambda_k
that would have been inactive at \lambda_{k+1}
.
It applies two layers of screening, so that it first screens out any groups that satisfy
\|\nabla_g f(\hat{\beta}(\lambda_{k}))\|_{\epsilon_g'} \leq \gamma_g(2\lambda_{k+1} - \lambda_k)
and then screens out any variables that satisfy
|\nabla_i f(\hat{\beta}(\lambda_{k}))| \leq \alpha v_i (2\lambda_{k+1} - \lambda_k)
leading to effective input dimensionality reduction. See Feser and Evangelou (2024) for full details.
Value
A list containing:
beta |
The fitted values from the regression. Taken to be the more stable fit between |
group_effects |
The group values from the regression. Taken by applying the |
selected_var |
A list containing the indicies of the active/selected variables for each |
selected_grp |
A list containing the indicies of the active/selected groups for each |
num_it |
Number of iterations performed. If convergence is not reached, this will be |
success |
Logical flag indicating whether ATOS converged, according to |
certificate |
Final value of convergence criteria. |
x |
The solution to the original problem (see Pedregosa and Gidel (2018)). |
u |
The solution to the dual problem (see Pedregosa and Gidel (2018)). |
z |
The updated values from applying the first proximal operator (see Pedregosa and Gidel (2018)). |
screen_set_var |
List of variables that were kept after screening step for each |
screen_set_grp |
List of groups that were kept after screening step for each |
epsilon_set_var |
List of variables that were used for fitting after screening for each |
epsilon_set_grp |
List of groups that were used for fitting after screening for each |
kkt_violations_var |
List of variables that violated the KKT conditions each |
kkt_violations_grp |
List of groups that violated the KKT conditions each |
v_weights |
Vector of the variable penalty sequence. |
w_weights |
Vector of the group penalty sequence. |
screen |
Logical flag indicating whether screening was performed. |
type |
Indicates which type of regression was performed. |
intercept |
Logical flag indicating whether an intercept was fit. |
lambda |
Value(s) of |
References
Feser, F., Evangelou, M. (2024). Dual feature reduction for the sparse-group lasso and its adaptive variant, https://arxiv.org/abs/2405.17094
Mendez-Civieta, A., Carmen Aguilera-Morillo, M., Lillo, R. (2020). Adaptive sparse group LASSO in quantile regression, doi:10.1007/s11634-020-00413-8
Pedregosa, F., Gidel, G. (2018). Adaptive Three Operator Splitting, https://proceedings.mlr.press/v80/pedregosa18a.html
Poignard, B. (2020). Asymptotic theory of the adaptive Sparse Group Lasso, doi:10.1007/s10463-018-0692-7
See Also
Other SGL-methods:
dfr_adap_sgl.cv()
,
dfr_sgl()
,
dfr_sgl.cv()
,
plot.sgl()
,
predict.sgl()
,
print.sgl()
Examples
# specify a grouping structure
groups = c(1,1,1,2,2,3,3,3,4,4)
# generate data
data = sgs::gen_toy_data(p=10, n=5, groups = groups, seed_id=3,group_sparsity=1)
# run DFR-aSGL
model = dfr_adap_sgl(X = data$X, y = data$y, groups = groups, type="linear", path_length = 5,
alpha=0.95, standardise = "l2", intercept = TRUE, verbose=FALSE)