lrv {robcp} | R Documentation |
Long Run Variance
Description
Estimates the long run variance respectively covariance matrix of the supplied time series.
Usage
lrv(x, method = c("kernel", "subsampling", "bootstrap", "none"), control = list())
Arguments
x |
vector or matrix with each column representing a time series (numeric). |
method |
method of estimation. Options are |
control |
a list of control parameters. See 'Details'. |
Details
The long run variance equals the limit of n
times the variance of the arithmetic mean of a short range dependent time series, where n
is the length of the time series. It is used to standardize tests concering the mean on dependent data.
If method = "none"
, no long run variance estimation is performed and the value 1 is returned (i.e. it does not alterate the test statistic).
The control
argument is a list that can supply any of the following components:
kFun
Kernel function (character string). More in 'Notes'.
b_n
Bandwidth (numeric > 0 and smaller than sample size).
gamma0
Only use estimated variance if estimated long run variance is < 0? Boolean.
l
Block length (numeric > 0 and smaller than sample size).
overlapping
Overlapping subsampling estimation? Boolean.
distr
Tranform observations by their empirical distribution function? Boolean. Default is
FALSE
.B
Bootstrap repetitions (integer).
seed
RNG seed (numeric).
version
What property does the CUSUM test test for? Character string, details below.
loc
Estimated location corresponding to
version
. Numeric value, details below.scale
Estimated scale corresponding to
version
. Numeric value, details below.
Kernel-based estimation
The kernel-based long run variance estimation is available for various testing scenarios (set by control$version
) and both for one- and multi-dimensional data. It uses the bandwidth b_n =
control$b_n
and kernel function k(x) =
control$kFun
. For tests on certain properties also a corresponding location control$loc
(m_n
) and scale control$scale
(v_n
) estimation needs to be supplied. Supported testing scenarios are:
-
"mean"
1-dim. data:
\hat{\sigma}^2 = \frac{1}{n} \sum_{i = 1}^n (x_i - \bar{x})^2 + \frac{2}{n} \sum_{h = 1}^{b_n} \sum_{i = 1}^{n - h} (x_i - \bar{x}) (x_{i + h} - \bar{x}) k(h / b_n).
If
control$distr = TRUE
, then the long run variance is estimated on the empirical distribution ofx
. The resulting value is then multiplied with\sqrt{\pi} / 2
.Default values:
b_n
=0.9 n^{1/3}
,kFun = "bartlett"
.multivariate time series: The
k,l
-element of\Sigma
is estimated by\hat{\Sigma}^{(k,l)} = \frac{1}{n} \sum_{i,j = 1}^{n}(x_i^{(k)} - \bar{x}^{(k)}) (x_j^{(l)} - \bar{x}^{(l)}) k((i-j) / b_n),
k, l = 1, ..., m
.Default values:
b_n
=\log_{1.8 + m / 40}(n / 50)
,kFun = "bartlett"
.
-
"empVar"
for tests on changes in the empirical variance.\hat{\sigma}^2 = \sum_{h = -(n-1)}^{n-1} W \left( \frac{|h|}{b_n} \right) \frac{1}{n} \sum_{i = 1}^{n - |h|} ((x_i - m_n)^2 - v_n)((x_{i+|h|} - m_n)^2 - v_n).
Default values:
m_n =
mean(x)
,v_n =
var(x)
. -
"MD"
for tests on a change in the median deviation.\hat{\sigma}^2 = \sum_{h = -(n-1)}^{n-1} W \left( \frac{|h|}{b_n} \right) \frac{1}{n} \sum_{i = 1}^{n - |h|} (|x_i - m_n| - v_n)(|x_{i+|h|} - m_n| - v_n).
Default values:
m_n =
median(x)
,v_n = \frac{1}{n-1} \sum_{i = 1}^n |x_i - m_n|
. -
"GMD"
for tests on changes in Gini's mean difference.\hat{\sigma}^2 = 4 \sum_{h = -(n-1)}^{n-1} W \left( \frac{|h|}{b_n} \right) \frac{1}{n} \sum_{i = 1}^{n - |h|} \hat{\phi}_n(x_i)\hat{\phi}_n(x_{i+|h|})
with
\hat{\phi}_n(x) = n^{-1} \sum_{i = 1}^n |x - x_i| - v_n
.Default value:
v_n =
\frac{2}{n(n-1)} \sum_{1 \leq i < j \leq n} |x_i - x_j|.
-
"Qalpha"
for tests on changes inQalpha
.\hat{\sigma}^2 = \frac{4}{\hat{u}(v_n)} \sum_{h = -(n-1)}^{n-1} W \left( \frac{|h|}{b_n} \right) \frac{1}{n} \sum_{i = 1}^{n - |h|} \hat{\phi}_n(x_i)\hat{\phi}_n(x_{i+|h|}),
where
\hat{\phi}_n(x) = n^{-1} \sum_{i = 1}^n 1_{\{|x - x_i| \leq v_n\}} - m_n
and\hat{u}(t) = \frac{2}{n(n-1)h_n} \sum_{1 \leq i < j \leq n} K\left(\frac{|x_i - x_j| - t}{h_n}\right)
the kernel density estimation of the densitiy
u
corresponding to the distribution functionU(t) = P(|X-Y| \leq t)
,h_n =
IQR(x)
n^{-\frac{1}{3}}
andK
is the quatratic kernel function.Default values:
m_n = \alpha = 0.5
,v_n =
Qalpha(x, m_n)[n-1]
. -
"tau"
for tests in changes in Kendall's tau.Only available for bivariate data: assume that the given data
x
has the format(x_i, y_i)_{i = 1, ..., n}
.\hat{\sigma}^2 = \sum_{h = -(n-1)}^{n-1} W \left( \frac{|h|}{b_n} \right) \frac{1}{n} \sum_{i = 1}^{n - |h|} \hat{\phi}_n((x_i, y_i))\hat{\phi}_n((x_{i+|h|}, y_{i+|h|}),
where
\hat{\phi}_n(x) = 4 F_n(x, y) - 2F_{X,n}(x) 2 - F_{Y,n}(y) + 1 - v_n
andF_n
,F_{X,n}
andF_{Y,n}
are the empirical distribution functions of((X_i, Y_i))_{i = 1, ..., n}
,(X_i)_{i = 1, ..., n}
and(Y_i)_{i = 1, ..., n}
.Default value:
v_n = \hat{\tau}_n = \frac{2}{n(n-1)} \sum_{1 \leq i < j \leq n} sign\left((x_j - x_i)(y_j - y_i)\right)
. -
"rho"
for tests on changes in Spearman's rho.Only availabe for
d
-variate data withd > 1
: assume that the given datax
has the format(x_{i,j} | i = 1, ..., n; j = 1, ..., d)
.\hat{\sigma}^2 = a(d)^2 2^{2d} \left\{ \sum_{h = -(n-1)}^{n-1} K\left( \frac{|h|}{b_n} \right) \left( \sum_{i = 1}^{n-|h|} n^{-1} \prod_{j = 1}^d \hat{\phi}_n(x_i, x_j) \hat{\phi}_n(x_{i+|h|}, x_j) - M^2 \right) \right\} ,
where
a(d) = (d+1) / (2^d - d - 1)
,M = n^{-1} \sum_{i = 1}^n \prod_{j = 1}^d \hat{\phi}_n(x_i, x_j)
and\hat{\phi}_n(x, y) = 1 - \hat{U}_n(x, y)
,\hat{U}_n(x, y) = n^{-1}
(rank ofx_{i,j}
inx_{i,1}, ..., x_{i,n})
.
When control$gamma0 = TRUE
(default) then negative estimates of the long run variance are replaced by the autocovariance at lag 0 (= ordinary variance of the data). The function will then throw a warning.
Subsampling estimation
For method = "subsampling"
there are an overlapping and a non-overlapping version (parameter control$overlapping
). Also it can be specified if the observations x were transformed by their empirical distribution function \tilde{F}_n
(parameter control$distr
). Via control$l
the block length l
can be controlled.
If control$overlapping = TRUE
and control$distr = TRUE
:
\hat{\sigma}_n = \frac{\sqrt{\pi}}{\sqrt{2l}(n - l + 1)} \sum_{i = 0}^{n-l} \left| \sum_{j = i+1}^{i+l} (F_n(x_j) - 0.5) \right|.
Otherwise, if control$distr = FALSE
, the estimator is
\hat{\sigma}^2 = \frac{1}{l (n - l + 1)} \sum_{i = 0}^{n-l} \left( \sum_{j = i + 1}^{i+l} x_j - \frac{l}{n} \sum_{j = 1}^n x_j \right)^2.
If control$overlapping = FALSE
and control$distr = TRUE
:
\hat{\sigma} = \frac{1}{n/l} \sqrt{\pi/2} \sum_{i = 1}{n/l} \frac{1}{\sqrt{l}} \left| \sum_{j = (i-1)l + 1}^{il} F_n(x_j) - \frac{l}{n} \sum_{j = 1}^n F_n(x_j) \right|.
Otherwise, if control$distr = FALSE
, the estimator is
\hat{\sigma}^2 = \frac{1}{n/l} \sum_{i = 1}^{n/l} \frac{1}{l} \left(\sum_{j = (i-1)l + 1}^{il} x_j - \frac{l}{n} \sum_{j = 1}^n x_j\right)^2.
Default values: overlapping = TRUE, the block length is chosen adaptively:
l_n = \max{\left\{ \left\lceil n^{1/3} \left( \frac{2 \rho}{1 - \rho^2} \right)^{(2/3)} \right\rceil, 1 \right\}}
where \rho
is the Spearman autocorrelation at lag 1.
Bootstrap estimation
If method = "bootstrap"
a dependent wild bootstrap with the parameters B =
control$B
, l =
control$l
and k(x) =
control$kFun
is performed:
\hat{\sigma}^2 = \sqrt{n} Var(\bar{x^*_k} - \bar{x}), k = 1, ..., B
A single x_{ik}^*
is generated by x_i^* = \bar{x} + (x_i - \bar{x}) a_i
where a_i
are independent from the data x
and are generated from a multivariate normal distribution with E(A_i) = 0
, Var(A_i) = 1
and Cov(A_i, A_j) = k\left(\frac{i - j}{l}\right), i = 1, ..., n; j \neq i
. Via control$seed
a seed can optionally be specified (cf. set.seed
). Only "bartlett"
, "parzen"
and "QS"
are supported as kernel functions. Uses the function sqrtm
from package pracma
.
Default values: B
= 1000, kFun = "bartlett"
, l
is the same as for subsampling.
Value
long run variance \sigma^2
(numeric) resp. \Sigma
(numeric matrix)
Note
Kernel functions
bartlett
:
k(x) = (1 - |x|) * 1_{\{|x| < 1\}}
FT
:
k(x) = 1 * 1_{\{|x| \leq 0.5\}} + (2 - 2 * |x|) * 1_{\{0.5 < |x| < 1\}}
parzen
:
k(x) = (1 - 6x^2 + 6|x|^3) * 1_{\{0 \leq |x| \leq 0.5\}} + 2(1 - |x|)^3 * 1_{\{0.5 < |x| \leq 1\}}
QS
:
k(x) = \frac{25}{12 \pi ^2 x^2} \left(\frac{\sin(6\pi x / 5)}{6\pi x / 5} - \cos(6 \pi x / 5)\right)
TH
:
k(x) = (1 + \cos(\pi x)) / 2 * 1_{\{|x| < 1\}}
truncated
:
k(x) = 1_{\{|x| < 1\}}
SFT
:
k(x) = (1 - 4(|x| - 0.5)^2)^2 * 1_{\{|x| < 1\}}
Epanechnikov
:
k(x) = 3 \frac{1 - x^2}{4} * 1_{\{|x| < 1\}}
quatratic
:
k(x) = (1 - x^2)^2 * 1_{\{|x| < 1\}}
Author(s)
Sheila Görz
References
Andrews, D.W. "Heteroskedasticity and autocorrelation consistent covariance matrix estimation." Econometrica: Journal of the Econometric Society (1991): 817-858.
Dehling, H., et al. "Change-point detection under dependence based on two-sample U-statistics." Asymptotic laws and methods in stochastics. Springer, New York, NY, (2015). 195-220.
Dehling, H., Fried, R., and Wendler, M. "A robust method for shift detection in time series." Biometrika 107.3 (2020): 647-660.
Parzen, E. "On consistent estimates of the spectrum of a stationary time series." The Annals of Mathematical Statistics (1957): 329-348.
Shao, X. "The dependent wild bootstrap." Journal of the American Statistical Association 105.489 (2010): 218-235.
See Also
CUSUM
, HodgesLehmann
, wilcox_stat
Examples
Z <- c(rnorm(20), rnorm(20, 2))
## kernel density estimation
lrv(Z)
## overlapping subsampling
lrv(Z, method = "subsampling", control = list(overlapping = FALSE, distr = TRUE, l_n = 5))
## dependent wild bootstrap estimation
lrv(Z, method = "bootstrap", control = list(l_n = 5, kFun = "parzen"))