cv.lmgce {GCEstim} | R Documentation |
Cross-validation for lmgce
Description
Performs k-fold cross-validation for some of the lmgce
parameters.
Usage
cv.lmgce(
formula,
data,
subset,
na.action,
offset,
contrasts = NULL,
model = TRUE,
x = FALSE,
y = FALSE,
cv = TRUE,
cv.nfolds = 5,
errormeasure = c("RMSE", "MSE", "MAE", "MAPE", "sMAPE", "MASE"),
errormeasure.which = {
if (isTRUE(cv))
c("1se", "min", "elbow")
else c("min", "elbow")
},
support.method = c("standardized", "ridge"),
support.method.penalize.intercept = TRUE,
support.signal = NULL,
support.signal.vector = NULL,
support.signal.vector.min = 0.3,
support.signal.vector.max = 20,
support.signal.vector.n = 20,
support.signal.points = c(3, 5, 7, 9),
support.noise = NULL,
support.noise.points = c(3, 5, 7, 9),
weight = c(0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9),
twosteps.n = 1,
method = c("dual.lbfgsb3c", "dual.BFGS", "dual", "primal.solnl", "primal.solnp",
"dual.CG", "dual.L-BFGS-B", "dual.Rcgmin", "dual.bobyqa", "dual.newuoa",
"dual.nlminb", "dual.nlm", "dual.lbfgs", "dual.optimParallel"),
caseGLM = c("D", "M", "NM"),
boot.B = 0,
boot.method = c("residuals", "cases", "wild"),
seed = 230676,
OLS = TRUE,
verbose = 0,
coef = NULL
)
Arguments
formula |
An object of class formula (or one that
can be coerced to that class): a symbolic description of the model to be
fitted.
|
data |
A data frame (or object coercible by
as.data.frame to a data frame) containing the variables
in the model.
|
subset |
an optional vector specifying a subset of observations to be
used in the fitting process.
|
na.action |
a function which indicates what should happen when the data
contain NA s. The default is set by the na.action setting of
options , and is na.fail if that is
unset. The ‘factory-fresh’ default is na.omit . Another
possible value is NULL , no action. Value
na.exclude can be useful.
|
offset |
this can be used to specify an a priori known component to be
included in the linear predictor during fitting. This should be NULL
or a numeric vector or matrix of extents matching those of the response. One
or more offset terms can be included in the formula
instead or as well, and if more than one are specified their sum is used.
See model.offset .
|
contrasts |
An optional list. See the contrasts.arg of
model.matrix.default .
|
model |
Boolean value. if TRUE , the model frame used is returned.
The default is model = TRUE .
|
x |
Boolean value. if TRUE , the model matrix used is returned.
The default is x = FALSE .
|
y |
Boolean value. if TRUE , the response used is returned.
The default is y = FALSE .
|
cv |
Boolean value. If TRUE the error, errormeasure ,
will be computed using cross-validation. If FALSE the error will be
computed in sample. The default is cv = TRUE .
|
cv.nfolds |
number of folds used for cross-validation when
cv = TRUE . The default is cv.nfolds = 5 and the smallest value
allowable is cv.nfolds = 3 .
|
errormeasure |
Loss function (error) to be used for the selection
of the support spaces. One of c("RMSE","MSE", "MAE", "MAPE", "sMAPE", "MASE").
The default is errormeasure = "RMSE" .
|
errormeasure.which |
Which value of errormeasure
to be used for selecting a support space upper limit from support.signal.vector .
One of c("min", "1se", "elbow") where "min" corresponds to the
support spaces that produced the lowest error, "1se" corresponds to
the support spaces such that error is within 1 standard error of the CV error
for "min" and "elbow" corresponds to the elbow point of the error
curve (the point that maximizes the distance between each observation, i.e,
the pair composed by the upper limit of the support space and the error, and
the line between the first and last observations, i.e., the lowest and the
highest upper limits of the support space respectively. See
find_curve_elbow ). The default is
errormeasure.which = "1se" .
|
support.method |
One of c("standardized", "ridge"). If
support.method = "standardized}, the default, standardized coefficients
are used to define the signal support spaces. If
\code{support.method = "ridge the signal support spaces are define by the
ridge trace.
|
support.method.penalize.intercept |
Boolean value. if TRUE ,
the default, the intercept will be penalized. To be used when
support.method = "ridge" .
|
support.signal |
NULL or fixed positive upper limit (L) for the
support spaces (-L,L) on standardized data (when
support.method = "standardized" ); NULL or fixed positive factor
to be multiplied by the maximum absolute value of the ridge trace for each
coefficient (when support.method = "ridge" ); a pair (LL,UL) or a
matrix ((k+1) x 2) for the support spaces on original data. The default is
support.signal = NULL .
|
support.signal.vector |
NULL or a vector of positive values when
support.signal = NULL . If support.signal.vector = NULL ,
the default, a vector
c(support.signal.vector.min,...,support.signal.vector.max) of dimension
support.signal.vector.n and logarithmically equally spaced will be
generated. Each value represents the upper limits for the standardized support
spaces, when support.method = "standardized" or the factor to be
multiplied by the maximum absolute value of the ridge trace for each
coefficient, when support.method = "ridge" .
|
support.signal.vector.min |
A positive value for the lowest limit of the
support.signal.vector when support.signal = NULL and
support.signal.vector = NULL . The default is
support.signal.vector.min = 0.3 .
|
support.signal.vector.max |
A positive value for the highest limit of the
support.signal.vector when support.signal = NULL and
support.signal.vector = NULL . The default is
support.signal.vector.max = 20 .
|
support.signal.vector.n |
A positive integer for the number of support
spaces to be used when support.signal = NULL and
support.signal.vector = NULL . The default is
support.signal.vector.n = 20 .
|
support.signal.points |
A vector of positive integers defining the number
of points for the signal support to be tested .The default is
support.signal.points = c(3, 5, 7, 9) .
|
support.noise |
An interval, preferably centered around zero, given in the form
c(LL,UL) . If support.noise = NULL , the default, then a vector
c(-L,L) is computed using the empirical three-sigma rule
Pukelsheim (1994).
|
support.noise.points |
A vector of positive integers defining the number
of points for the noise support to be tested .The default is
support.noise.points = c(3, 5, 7, 9) .
|
weight |
a vector of values between zero and one representing the
prediction-precision loss trade-off. The default is
weight = c(0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9) .
|
twosteps.n |
Number of GCE reestimations using a previously estimated
vector of signal probabilities.
|
method |
Use "primal.solnl" (GCE using Sequential Quadratic
Programming (SQP) method; see solnl ) or
"primal.solnp" (GCE using the augmented Lagrange multiplier method
with an SQP interior algorithm; see solnp ) for primal
form of the optimization problem and "dual" (GME), "dual.CG"
(GCE using a conjugate gradients method; see optim ),
"dual.BFGS" (GCE using Broyden-Fletcher-Goldfarb-Shanno quasi-Newton
method; see optim ), "dual.L-BFGS-B" (GCE using a
box-constrained optimization with limited-memory modification of the BFGS
quasi-Newton method; see optim ), dual.Rcgmin
(GCE using an update of the conjugate gradient algorithm; see
optimx ),
dual.bobyqa (GCE using a derivative-free optimization by quadratic
approximation; see optimx and
bobyqa ), dual.newuoa (GCE using a
derivative-free optimization by quadratic approximation; see
optimx and newuoa ),
dual.nlminb (GCE; see optimx and
nlminb ), dual.nlm (GCE; see
optimx and nlm ),
dual.lbfgs (GCE using the Limited-memory
Broyden-Fletcher-Goldfarb-Shanno; see lbfgs ),
dual.lbfgsb3c (GCE using L-BFSC-B implemented in Fortran code and with
an Rcpp interface; see lbfgsb3c ) or
dual.optimParallel (GCE using parallel version of the L-BFGS-B; see
optimParallel ) for dual form. The
default is method = "dual.BFGS" .
|
caseGLM |
special cases of the generic general linear model. One of
c("D", "M", "NM") , where "D" stands for data, "M" for moment and
"NM" for normed-moment The default is
caseGLM = "D" .
|
boot.B |
A single positive integer greater or equal to 10 for the number
of bootstrap replicates to be used for the computation of the bootstrap
confidence interval(s). Zero value will generate no replicate. The default
is boot.B = 0 .
|
boot.method |
Method to be use for bootstrapping. One of
c("residuals", "cases", "wild") which corresponds to resampling on
residuals, on individual cases or on residuals multiplied by a N(0,1) variable,
respectively. The default is boot.method = "residuals" .
|
seed |
A single value, interpreted as an integer, for reproducibility
or NULL for randomness. The default is seed = 230676 .
|
OLS |
Boolean value. if TRUE , the default, OLS estimation is
performed.
|
verbose |
An integer to control how verbose the output is. For a value
of 0 no messages or output are shown and for a value of 3 all messages
are shown. The default is verbose = 0 .
|
coef |
A vector of the true coefficients, when available.
|
Details
The cv.lmgce
function fits several linear regression models via
generalized cross according to the defined arguments. In particular,
support.signal.points
, support.noise.points
and
weight
can be defined as vectors.
Value
cv.lmgce
returns an object of class
cv.lmgce
.
An object of class
cv.lmgce
is a list containing at
least the following components:
results |
a C \times 8 data.frame , where C is the number of
combinations of the arguments support.signal.points ,
support.noise.points and weight . Contains information about the
arguments, error, convergence of the optimization method and time of
computation.
|
best |
a lmgce object obtained with the combination of
arguments that produced the lowest cross-validation error.
|
support.signal.points |
a vector of the support.signal.points
tested.
|
support.signal.points.best |
the value of support.signal.points
that produced the lowest cross-validation error.
|
support.noise.points |
a vector of the support.noise.points
tested.
|
support.noise.points.best |
the value of support.noise.points
that produced the lowest cross-validation error.
|
weight |
a vector of the weight tested.
|
weight.best |
the value of weight that produced the lowest
cross-validation error.
|
Author(s)
Jorge Cabral, jorgecabral@ua.pt
References
Golan, A., Judge, G. G. and Miller, D. (1996)
Maximum entropy econometrics : robust estimation with limited data.
Wiley.
Golan, A. (2008).
Information and Entropy Econometrics — A Review and Synthesis.
Foundations and Trends® in Econometrics, 2(1–2), 1–145.
doi:10.1561/0800000004
Golan, A. (2017)
Foundations of Info-Metrics: Modeling, Inference, and Imperfect Information (Vol. 1).
Oxford University Press.
doi:10.1093/oso/9780199349524.001.0001
Pukelsheim, F. (1994)
The Three Sigma Rule.
The American Statistician, 48(2), 88–91.
doi:10.2307/2684253
See Also
See the generic functions plot.cv.lmgce
,
print.cv.lmgce
and coef.cv.lmgce
.
Examples
res.cv.lmgce <-
cv.lmgce(y ~ .,
data = dataGCE)
res.cv.lmgce
[Package
GCEstim version 0.1.0
Index]