sp.runs.test {spqdep} | R Documentation |
Compute the global spatial runs test.
Description
This function compute the global spatial runs test for spatial independence of a categorical spatial data set.
Usage
sp.runs.test(formula = NULL, data = NULL, fx = NULL,
listw = listw, alternative = "two.sided" ,
distr = "asymptotic", nsim = NULL,control = list())
Arguments
formula |
a symbolic description of the factor (optional). |
data |
an (optional) data frame or a sf object containing the variable to testing for. |
fx |
a factor (optional). |
listw |
A neighbourhood list (type knn or nb) or a W matrix that indicates the order of the elements in each |
alternative |
a character string specifying the alternative hypothesis, must be one of "two.sided" (default), "greater" or "less". |
distr |
A string. Distribution of the test "asymptotic" (default) or "bootstrap". |
nsim |
Number of permutations to obtain pseudo-value and confidence intervals (CI). Default value is NULL to don't get CI of number of runs. |
control |
List of additional control arguments. |
Details
The order of the neighbourhoods (m_i-environments
) is critical to obtain the test.
To obtain the number of runs observed in each m_i-environment
, each element must be associated
with a set of neighbours ordered by proximity.
Three kinds of lists can be included to identify m_i-environments
:
-
knn
: Objects of the class knn that consider the neighbours in order of proximity. -
nb
: If the neighbours are obtained from an sf object, the code internally will call the functionnb2nb_order
it will order them in order of proximity of the centroids. -
matrix
: If a object of matrix class based in the inverse of the distance in introduced as argument, the functionnb2nb_order
will also be called internally to transform the object the class matrix to a matrix of the class nb with ordered neighbours.
Two alternative sets of arguments can be included in this function to compute the spatial runs test:
Option 1 | A factor (fx) and a list of neighborhood (listw ) of the class knn. |
Option 2 | A sf object (data) and formula to specify the factor. A list of neighbourhood (listw) |
Value
A object of the htest and sprunstest class
data.name | a character string giving the names of the data. |
method | the type of test applied (). |
SR | total number of runs |
dnr | empirical distribution of the number of runs |
statistic | Value of the homogeneity runs statistic. Negative sign indicates global homogeneity |
alternative | a character string describing the alternative hypothesis. |
p.value | p-value of the SRQ |
pseudo.value | the pseudo p-value of the SRQ test if nsim is not NULL |
MeanNeig | Mean of the Maximum number of neighborhood |
MaxNeig | Maximum number of neighborhood |
listw | The list of neighborhood |
nsim | number of boots (only for boots version) |
SRGP | nsim simulated values of statistic. |
SRLP | matrix with the number of runs for eacl localization. |
Definition of spatial run
In this section define the concepts of spatial encoding and runs, and construct the main statistics necessary
for testing spatial homogeneity of categorical variables. In order to develop a general theoretical setting,
let us consider \{X_s\}_{s \in S}
to be the categorical spatial process of interest with Q different
categories, where S is a set of coordinates.
Spatial encoding:
For a location s \in S
denote by N_s = \{s_1,s_2 ...,s_{n_s}\}
the set of neighbours according
to the interaction scheme W, which are ordered from lesser to higher Euclidean distance with respect to location s.
The sequence as X_{s_i} , X_{s_i+1},...,, X_{s_i+r}
its elements have the same value (or are identified by the same class)
is called a spatial run at location s of length r.
Spatial run statistic
The total number of runs is defined as:
SR^Q=n+\sum_{s \in S}\sum_{j=1}^{n_s}I_j^s
where I_j^s = 1 \ if \ X_{s_j-1} \neq X_{s_j} \ and 0 \ otherwise
for j=1,2,...,n_s
Following result by the Central Limit Theorem, the asymtotical distribution of SR^Q
is:
SR^Q = N(\mu_{SR^Q},\sigma_{SR^Q})
In the one-tailed case, we must distinguish the lower-tailed test and the upper-tailed, which are associated
with homogeneity and heterogeneity respectively. In the case of the lower-tailed test,
the following hypotheses are used:
H_0:\{X_s\}_{s \in S}
is i.i.d.
H_1
: The spatial distribution of the values of the categorical variable is more homogeneous than under the null hypothesis (according to the fixed association scheme).
In the upper-tailed test, the following hypotheses are used:
H_0:\{X_s\}_{s \in S}
is i.i.d.
H_1
: The spatial distribution of the values of the categorical variable is more
heterogeneous than under the null hypothesis (according to the fixed association scheme).
These hypotheses provide a decision rule regarding the degree of homogeneity in the spatial distribution
of the values of the spatial categorical random variable.
Control arguments
seedinit | Numerical value for the seed (only for boot version). Default value seedinit=123 |
Author(s)
Fernando López | fernando.lopez@upct.es |
Román Mínguez | roman.minguez@uclm.es |
Antonio Páez | paezha@gmail.com |
Manuel Ruiz | manuel.ruiz@upct.es |
References
Ruiz, M., López, F., and Páez, A. (2021). A test for global and local homogeneity of categorical data based on spatial runs. Working paper.
See Also
local.sp.runs.test
, dgp.spq
, Q.test
,
Examples
# Case 1: SRQ test based on factor and knn
n <- 100
cx <- runif(n)
cy <- runif(n)
x <- cbind(cx,cy)
listw <- spdep::knearneigh(cbind(cx,cy), k=3)
p <- c(1/6,3/6,2/6)
rho <- 0.5
fx <- dgp.spq(listw = listw, p = p, rho = rho)
srq <- sp.runs.test(fx = fx, listw = listw)
print(srq)
plot(srq)
# Boots Version
control <- list(seedinit = 1255)
srq <- sp.runs.test(fx = fx, listw = listw, distr = "bootstrap" , nsim = 299, control = control)
print(srq)
plot(srq)
# Case 2: SRQ test with formula, a sf object (points) and knn
data("FastFood.sf")
x <- sf::st_coordinates(sf::st_centroid(FastFood.sf))
listw <- spdep::knearneigh(x, k=4)
formula <- ~ Type
srq <- sp.runs.test(formula = formula, data = FastFood.sf, listw = listw)
print(srq)
plot(srq)
# Version boots
srq <- sp.runs.test(formula = formula, data = FastFood.sf, listw = listw,
distr = "bootstrap", nsim = 199)
print(srq)
plot(srq)
# Case 3: SRQ test (permutation) using formula with a sf object (polygons) and nb
library(sf)
fname <- system.file("shape/nc.shp", package="sf")
nc <- sf::st_read(fname)
listw <- spdep::poly2nb(as(nc,"Spatial"), queen = FALSE)
p <- c(1/6,3/6,2/6)
rho = 0.5
co <- sf::st_coordinates(sf::st_centroid(nc))
nc$fx <- dgp.spq(listw = listw, p = p, rho = rho)
plot(nc["fx"])
formula <- ~ fx
srq <- sp.runs.test(formula = formula, data = nc, listw = listw,
distr = "bootstrap", nsim = 399)
print(srq)
plot(srq)
# Case 4: SRQ test (Asymptotic) using formula with a sf object (polygons) and nb
data(provinces_spain)
# sf::sf_use_s2(FALSE)
listw <- spdep::poly2nb(provinces_spain, queen = FALSE)
provinces_spain$Coast <- factor(provinces_spain$Coast)
levels(provinces_spain$Coast) = c("no","yes")
plot(provinces_spain["Coast"])
formula <- ~ Coast
srq <- sp.runs.test(formula = formula, data = provinces_spain, listw = listw)
print(srq)
plot(srq)
# Boots version
srq <- sp.runs.test(formula = formula, data = provinces_spain, listw = listw,
distr = "bootstrap", nsim = 299)
print(srq)
plot(srq)
# Case 5: SRQ test based on a distance matrix (inverse distance)
N <- 100
cx <- runif(N)
cy <- runif(N)
data <- as.data.frame(cbind(cx,cy))
data <- sf::st_as_sf(data,coords = c("cx","cy"))
n = dim(data)[1]
dis <- 1/matrix(as.numeric(sf::st_distance(data,data)),ncol=n,nrow=n)
diag(dis) <- 0
dis <- (dis < quantile(dis,.10))*dis
p <- c(1/6,3/6,2/6)
rho <- 0.5
fx <- dgp.spq(listw = dis , p = p, rho = rho)
srq <- sp.runs.test(fx = fx, listw = dis)
print(srq)
plot(srq)
srq <- sp.runs.test(fx = fx, listw = dis, data = data)
print(srq)
plot(srq)
# Boots version
srq <- sp.runs.test(fx = fx, listw = dis, data = data, distr = "bootstrap", nsim = 299)
print(srq)
plot(srq)
# Case 6: SRQ test based on a distance matrix (inverse distance)
data("FastFood.sf")
# sf::sf_use_s2(FALSE)
n = dim(FastFood.sf)[1]
dis <- 1000000/matrix(as.numeric(sf::st_distance(FastFood.sf,FastFood.sf)), ncol = n, nrow = n)
diag(dis) <- 0
dis <- (dis < quantile(dis,.005))*dis
p <- c(1/6,3/6,2/6)
rho = 0.5
co <- sf::st_coordinates(sf::st_centroid(FastFood.sf))
FastFood.sf$fx <- dgp.spq(p = p, listw = dis, rho = rho)
plot(FastFood.sf["fx"])
formula <- ~ fx
# Boots version
srq <- sp.runs.test(formula = formula, data = FastFood.sf, listw = dis,
distr = "bootstrap", nsim = 299)
print(srq)
plot(srq)