glmTLP {glmtlp} | R Documentation |
Fit a GLM with truncated lasso regularization
Description
Fit a generalized linear model via penalized maximum likelihood. The regularization path is computed for the truncated lasso at a grid of values for the regularization parameter lambda. Can deal with all shapes of data, including very large sparse data matrices. Fit linear, logistic and multinomial, poisson, and Cox regression models.
Usage
glmTLP(x, y, family=c("gaussian","binomial","poisson","multinomial","cox","mgaussian"),
weights, offset=NULL, lambda, tau = 0.3, nlambda=100,
penalty.factor = rep(1, nvars), lambda.min.ratio=ifelse(nobs<nvars,1e-3,1e-4),
standardize=TRUE,intercept=TRUE,dfmax=nvars+1,pmax=min(dfmax*2+20,nvars),
lower.limits=-Inf,upper.limits=Inf,
standardize.response=FALSE, maxIter=100, Tol=1e-4)
Arguments
x |
input matrix, of dimension nobs x nvars; each row is an
observation vector. Can be in sparse matrix format (inherit from class "sparseMatrix" as in package Matrix ; not yet available for family="cox" )
|
y |
response variable. Quantitative for family="gaussian" ,
or family="poisson" (non-negative counts). For
family="binomial" should be either a factor with two levels, or
a two-column matrix of counts or proportions (the second column is
treated as the target class; for a factor, the last level in
alphabetical order is the target class). For
family="multinomial" , can be a nc>=2 level factor, or a
matrix with nc columns of counts or proportions.
For either "binomial" or "multinomial" , if y is
presented as a vector, it will be coerced into a factor. For
family="cox" , y should be a two-column matrix with
columns named 'time' and 'status'. The latter is a binary variable,
with '1' indicating death, and '0' indicating right censored. The
function Surv() in package survival produces such a
matrix. For family="mgaussian" , y is a matrix of quantitative responses.
|
family |
Response type (see above)
|
weights |
observation weights. Can be total counts if responses are proportion matrices. Default is 1 for each observation
|
offset |
A vector of length nobs that is included in the linear predictor (a nobs x nc matrix for the "multinomial" family). Useful for the "poisson" family (e.g. log of exposure time), or for refining a model by starting at a current fit. Default is NULL . If supplied, then values must also be supplied to the predict function.
|
tau |
Write something about tau
|
nlambda |
The number of lambda values - default is 100.
|
penalty.factor |
Separate penalty factors can be applied to each
coefficient. This is a number that multiplies lambda to allow
differential shrinkage. Can be 0 for some variables, which implies
no shrinkage, and that variable is always included in the
model. Default is 1 for all variables (and implicitly infinity for
variables listed in exclude ). Note: the penalty factors are
internally rescaled to sum to nvars, and the lambda sequence will
reflect this change.
|
lambda.min.ratio |
Smallest value for lambda , as a fraction of
lambda.max , the (data derived) entry value (i.e. the smallest
value for which all coefficients are zero). The default depends on the
sample size nobs relative to the number of variables
nvars . If nobs > nvars , the default is 0.0001 ,
close to zero. If nobs < nvars , the default is 0.01 .
A very small value of
lambda.min.ratio will lead to a saturated fit in the nobs <
nvars case. This is undefined for
"binomial" and "multinomial" models, and glmnet
will exit gracefully when the percentage deviance explained is almost
1.
|
lambda |
A user supplied lambda sequence. Typical usage
is to have the
program compute its own lambda sequence based on
nlambda and lambda.min.ratio . Supplying a value of
lambda overrides this. WARNING: use with care. Do not supply
a single value for lambda (for predictions after CV use predict()
instead). Supply instead
a decreasing sequence of lambda values. glmnet relies
on its warms starts for speed, and its often faster to fit a whole
path than compute a single fit.
|
standardize |
Logical flag for x variable standardization, prior to
fitting the model sequence. The coefficients are always returned on
the original scale. Default is standardize=TRUE .
If variables are in the same units already, you might not wish to
standardize. See details below for y standardization with family="gaussian" .
|
intercept |
Should intercept(s) be fitted (default=TRUE) or set to
zero (FALSE)
|
dfmax |
Limit the maximum number of variables in the
model. Useful for very large nvars , if a partial path is desired.
|
pmax |
Limit the maximum number of variables ever to be nonzero
|
lower.limits |
Vector of lower limits for each coefficient;
default -Inf . Each
of these must be non-positive. Can be presented as a single value
(which will then be replicated), else a vector of length nvars
|
upper.limits |
Vector of upper limits for each coefficient;
default Inf . See lower.limits
|
standardize.response |
This is for the family="mgaussian"
family, and allows the user to standardize the response variables
|
maxIter |
Maximum iteration for TLP.
|
Tol |
Tolerance.
|
Details
Write something about the details.
Value
An object that inherits from glmnet
.
call |
the call that produced this object
|
a0 |
Intercept sequence of length length(lambda)
|
beta |
For "elnet" , "lognet" , "fishnet" and "coxnet" models, a nvars x
length(lambda) matrix of coefficients, stored in sparse column
format ("CsparseMatrix" ). For "multnet" and "mgaussian" , a list of nc such
matrices, one for each class.
|
lambda |
The actual sequence of lambda values used.
|
dev.ratio |
The fraction of (null) deviance explained (for "elnet" , this
is the R-square). The deviance calculations incorporate weights if
present in the model. The deviance is defined to be 2*(loglike_sat -
loglike), where loglike_sat is the log-likelihood for the saturated
model (a model with a free parameter per observation). Hence dev.ratio=1-dev/nulldev.
|
nulldev |
Null deviance (per observation). This is defined to
be 2*(loglike_sat -loglike(Null)); The NULL model refers to the
intercept model, except for the Cox, where it is the 0 model.
|
df |
The number of nonzero coefficients for each value of
lambda .
|
dim |
dimension of coefficient matrix (ices)
|
nobs |
number of observations
|
npasses |
total passes over the data summed over all lambda
values
|
offset |
a logical variable indicating whether an offset was included in the model
|
jerr |
error flag, for warnings and errors (largely for internal debugging).
|
Author(s)
Chong Wu, Wei Pan
Maintainer: Chong Wu <wuxx0845@umn.edu>
References
Xiaotong Shen , Wei Pan and Yunzhang Zhu (2012)
Likelihood-Based Selection and Sharp Parameter Estimation,
Journal of the American Statistical Association, 107:497, 223-232
Examples
data("QuickStartExample")
fit = glmTLP(x,y, nlambda = 3)
#We set nlambda just to speed it up
# and pass the CRAN check. You should either use
# the default setting or search a larger space.
[Package
glmtlp version 1.1
Index]