tsbootgce {GCEstim}R Documentation

Time series bootstrap Cross entropy estimation

Description

This generic function fits a linear regression model using bootstrapped time series via generalized cross entropy.

Usage

tsbootgce(
  formula,
  data,
  subset,
  na.action,
  offset,
  contrasts = NULL,
  trim = 0.05,
  reps = 1000,
  start = NULL,
  end = NULL,
  coef.method = c("mode", "median"),
  cv = TRUE,
  cv.nfolds = 5,
  errormeasure = c("RMSE", "MSE", "MAE", "MAPE", "sMAPE", "MASE"),
  errormeasure.which = {
     if (isTRUE(cv)) 
         c("1se", "min", "elbow")
    
    else c("min", "elbow")
 },
  support.method = c("standardized", "ridge"),
  support.method.penalize.intercept = TRUE,
  support.signal = NULL,
  support.signal.vector = NULL,
  support.signal.vector.min = 0.3,
  support.signal.vector.max = 20,
  support.signal.vector.n = 20,
  support.signal.points = c(1/5, 1/5, 1/5, 1/5, 1/5),
  support.noise = NULL,
  support.noise.points = c(1/3, 1/3, 1/3),
  weight = 0.5,
  twosteps.n = 1,
  method = c("dual.BFGS", "dual.lbfgsb3c", "dual", "primal.solnl", "primal.solnp",
    "dual.CG", "dual.L-BFGS-B", "dual.Rcgmin", "dual.bobyqa", "dual.newuoa",
    "dual.nlminb", "dual.nlm", "dual.lbfgs", "dual.optimParallel"),
  caseGLM = c("D", "M", "NM"),
  boot.B = 0,
  boot.method = c("residuals", "cases", "wild"),
  seed = 230676,
  OLS = TRUE,
  verbose = 0
)

Arguments

formula

a "formula" describing the linear model to be fit. For details see lm and dynlm.

data

A data.frame (or object coercible by as.data.frame to a data frame) or time series object (e.g., ts or zoo), containing the variables in the model.

subset

an optional vector specifying a subset of observations to be used in the fitting process.

na.action

a function which indicates what should happen when the data contain NAs. The default is set by the na.action setting of options, and is na.fail if that is unset. The ‘factory-fresh’ default is na.omit. Another possible value is NULL, no action. Value na.exclude can be useful.

offset

this can be used to specify an a priori known component to be included in the linear predictor during fitting. This should be NULL or a numeric vector or matrix of extents matching those of the response. One or more offset terms can be included in the formula instead or as well, and if more than one are specified their sum is used. See model.offset.

contrasts

An optional list. See the contrasts.arg of model.matrix.default.

trim

The trimming proportion (see meboot). The default is trim = 0.05.

reps

The number of replicates to generate (see meboot). The default is reps = 1000.

start

The time of the first observation. Either a single number or a vector of two numbers (the second of which is an integer), which specify a natural time unit and a (1-based) number of samples into the time unit (see ts).

end

The time of the last observation, specified in the same way as start (see ts).

coef.method

Method used to estimate the coefficients. One of c("mode", "median"). for "mode" see hdr

cv

Boolean value. If TRUE the error, errormeasure, will be computed using cross-validation. If FALSE the error will be computed in sample. The default is cv = TRUE.

cv.nfolds

number of folds used for cross-validation when cv = TRUE. The default is cv.nfolds = 5 and the smallest value allowable is cv.nfolds = 3.

errormeasure

Loss function (error) to be used for the selection of the support spaces. One of c("RMSE","MSE", "MAE", "MAPE", "sMAPE", "MASE"). The default is errormeasure = "RMSE".

errormeasure.which

Which value of errormeasure to be used for selecting a support space upper limit from support.signal.vector. One of c("min", "1se", "elbow") where "min" corresponds to the support spaces that produced the lowest error, "1se" corresponds to the support spaces such that error is within 1 standard error of the CV error for "min" and "elbow" corresponds to the elbow point of the error curve (the point that maximizes the distance between each observation, i.e, the pair composed by the upper limit of the support space and the error, and the line between the first and last observations, i.e., the lowest and the highest upper limits of the support space respectively. See find_curve_elbow). The default is errormeasure.which = "1se".

support.method

One of c("standardized", "ridge"). If support.method = "standardized}, the default, standardized coefficients are used to define the signal support spaces. If \code{support.method = "ridge the signal support spaces are define by the ridge trace.

support.method.penalize.intercept

Boolean value. if TRUE, the default, the intercept will be penalized. To be used when support.method = "ridge".

support.signal

NULL or fixed positive upper limit (L) for the support spaces (-L,L) on standardized data (when support.method = "standardized"); NULL or fixed positive factor to be multiplied by the maximum absolute value of the ridge trace for each coefficient (when support.method = "ridge"); a pair (LL,UL) or a matrix ((k+1) x 2) for the support spaces on original data. The default is support.signal = NULL.

support.signal.vector

NULL or a vector of positive values when support.signal = NULL. If support.signal.vector = NULL, the default, a vector c(support.signal.vector.min,...,support.signal.vector.max) of dimension support.signal.vector.n and logarithmically equally spaced will be generated. Each value represents the upper limits for the standardized support spaces, when support.method = "standardized" or the factor to be multiplied by the maximum absolute value of the ridge trace for each coefficient, when support.method = "ridge".

support.signal.vector.min

A positive value for the lowest limit of the support.signal.vector when support.signal = NULL and support.signal.vector = NULL. The default is support.signal.vector.min = 0.3.

support.signal.vector.max

A positive value for the highest limit of the support.signal.vector when support.signal = NULL and support.signal.vector = NULL. The default is support.signal.vector.max = 20.

support.signal.vector.n

A positive integer for the number of support spaces to be used when support.signal = NULL and support.signal.vector = NULL. The default is support.signal.vector.n = 20.

support.signal.points

A positive integer, a vector or a matrix. Prior weights for the signal. If not a positive integer then the sum of weights by row must be equal to 1. The default is support.signal.points = c(1 / 5, 1 / 5, 1 / 5, 1 / 5, 1 / 5).

support.noise

An interval, preferably centered around zero, given in the form c(LL,UL). If support.noise = NULL, the default, then a vector c(-L,L) is computed using the empirical three-sigma rule Pukelsheim (1994).

support.noise.points

A positive integer, a vector or a matrix. Prior weights for the noise. If not a positive integer then the sum of weights by row must be equal to 1. The default is support.noise.points = c(1 / 3, 1 / 3, 1 / 3).

weight

a value between zero and one representing the prediction-precision loss trade-off. If weight = 0.5, the default, equal weight is placed on the signal and noise entropies. A higher than 0.5 value places more weight on the noise entropy whereas a lower than 0.5 value places more weight on the signal entropy.

twosteps.n

Number of GCE reestimations using a previously estimated vector of signal probabilities.

method

Use "primal.solnl" (GCE using Sequential Quadratic Programming (SQP) method; see solnl) or "primal.solnp" (GCE using the augmented Lagrange multiplier method with an SQP interior algorithm; see solnp) for primal form of the optimization problem and "dual" (GME), "dual.CG" (GCE using a conjugate gradients method; see optim), "dual.BFGS" (GCE using Broyden-Fletcher-Goldfarb-Shanno quasi-Newton method; see optim), "dual.L-BFGS-B" (GCE using a box-constrained optimization with limited-memory modification of the BFGS quasi-Newton method; see optim), dual.Rcgmin (GCE using an update of the conjugate gradient algorithm; see optimx), dual.bobyqa (GCE using a derivative-free optimization by quadratic approximation; see optimx and bobyqa), dual.newuoa (GCE using a derivative-free optimization by quadratic approximation; see optimx and newuoa), dual.nlminb (GCE; see optimx and nlminb), dual.nlm (GCE; see optimx and nlm), dual.lbfgs (GCE using the Limited-memory Broyden-Fletcher-Goldfarb-Shanno; see lbfgs), dual.lbfgsb3c (GCE using L-BFSC-B implemented in Fortran code and with an Rcpp interface; see lbfgsb3c) or dual.optimParallel (GCE using parallel version of the L-BFGS-B; see optimParallel) for dual form. The default is method = "dual.BFGS".

caseGLM

special cases of the generic general linear model. One of c("D", "M", "NM"), where "D" stands for data, "M" for moment and "NM" for normed-moment The default is caseGLM = "D".

boot.B

A single positive integer greater or equal to 10 for the number of bootstrap replicates to be used for the computation of the bootstrap confidence interval(s). Zero value will generate no replicate. The default is boot.B = 0.

boot.method

Method to be use for bootstrapping. One of c("residuals", "cases", "wild") which corresponds to resampling on residuals, on individual cases or on residuals multiplied by a N(0,1) variable, respectively. The default is boot.method = "residuals".

seed

A single value, interpreted as an integer, for reproducibility or NULL for randomness. The default is seed = 230676.

OLS

Boolean value. if TRUE, the default, OLS estimation is performed.

verbose

An integer to control how verbose the output is. For a value of 0 no messages or output are shown and for a value of 3 all messages are shown. The default is verbose = 0.

Details

The tsbootgce function fits several linear regression models via generalized cross entropy in replicas of time series obtained using meboot. Models for tsbootgce are specified symbolically (see lm and dynlm).

Value

tsbootgce returns an object of class tsbootgce. The generic accessory functions coef.tsbootgce, confint.tsbootgce and plot.tsbootgce extract various useful features of the value returned by object of class tsbootgce.

An object of class tsbootgce is a list containing at least the following components:

call

the matched call.

coefficients

a named data frame of coefficients determined by coef.method.

data.ts

ts object.

error

loss function (error) used for the selection of the support spaces.

error.measure

in sample error for the selected support space.

fitted.values

the fitted mean values.

frequency

see link[zoo]{zoo}.

index

see link[zoo]{zoo}.

lmgce

lmgce object.

meboot

meboot replicates.

model

the model frame used.

nep

normalized entropy of the signal of the model.

nepk

normalized entropy of the signal of each coefficient.

residuals

the residuals, that is response minus fitted values.

results

a list containing the bootstrap results: "coef.matrix", a named data frame of all the coefficients; "nepk.matrix", a named data frame of all the normalized entropy values of each parameter; "nep.vector", a vector of all the normalized entropy values of the model.

seed

the seed used.

terms

the terms object used.

x

if requested (the default), the model matrix used.

xlevels

(only where relevant) a record of the levels of the factors used in fitting.

y

if requested (the default), the response used.

Author(s)

Jorge Cabral, jorgecabral@ua.pt

References

Golan, A., Judge, G. G. and Miller, D. (1996) Maximum entropy econometrics : robust estimation with limited data. Wiley.

Golan, A. (2008) Information and Entropy Econometrics — A Review and Synthesis. Foundations and Trends® in Econometrics, 2(1–2), 1–145. doi:10.1561/0800000004

Golan, A. (2017) Foundations of Info-Metrics: Modeling, Inference, and Imperfect Information (Vol. 1). Oxford University Press. doi:10.1093/oso/9780199349524.001.0001

Hyndman, R.J. (1996) Computing and graphing highest density regions. American Statistician, 50, 120-126. doi:10.2307/2684423

Pukelsheim, F. (1994) The Three Sigma Rule. The American Statistician, 48(2), 88–91. doi:10.2307/2684253

Vinod, H. D., & Lopez-de-Lacalle, J. (2009). Maximum Entropy Bootstrap for Time Series: The meboot R Package. Journal of Statistical Software, 29(5), 1–19. doi:10.18637/jss.v029.i05

See Also

The generic functions plot.tsbootgce, print.tsbootgce, and coef.tsbootgce.

Examples


res.tsbootgce <-
  tsbootgce(
    formula = CO2 ~ 1 + L(GDP, 1) + L(EPC, 1) + L(EU, 1),
    data = moz_ts)

res.tsbootgce



[Package GCEstim version 0.1.0 Index]