compare {distfreereg} | R Documentation |
Compare the simulated statistic distribution with the observed statistic distribution used in distribution-free parametric regression testing
Description
Simulate response data repeatedly with true_mean
as the mean and true_covariance
as the covariance structure, each time running distfreereg
on the simulated data. The observed statistics and p-values are saved, as are the simulated statistics from the first replication.
See the Comparing Distributions with the distfreereg
Package vignette for an introduction.
Usage
compare(true_mean, true_method = NULL, true_method_args = NULL, true_covariance,
true_X = NULL, true_data = NULL, theta = NULL, n = NULL, reps = 1e3,
prog = reps/10, simulate_args = NULL, err_dist_fun = NULL,
err_dist_args = NULL, keep = NULL, manual = NULL, update_args = NULL,
global_override = NULL, ...)
Arguments
true_mean |
Object specifying the mean structure of the true model. It is used to generate the true values of |
true_method |
Character vector of length one; specifies the function (e.g., |
true_method_args |
Optional list; values are passed to the function specified by |
true_covariance |
Named list; specifies the covariance structures of the true error distribution in the format described in the documentation for the |
true_X , true_data |
Optional numeric matrix or data frame, respectively; specifies the covariate values for the true model. |
theta |
Numeric vector; used as the (true) parameter values for the model when |
n |
Optional integer; indicates how long each simulated data vector should be. Required only when no covariate values are specified for either the true or test mean. Silently converted to integer if numeric. |
reps |
Integer; specifies number of replications. Silently converted to integer if numeric. |
prog |
Integer or |
simulate_args |
Optional list; specifies additional named arguments to pass to |
err_dist_fun |
Character string; specifies the name of the function to be used to simulate errors when |
err_dist_args |
Optional list; specifies additional named arguments to pass to |
keep |
A vector of integers, or the character string " |
manual |
Optional function; applied to the |
update_args |
Optional named list; specifies arguments to pass to |
global_override |
Optional named list; specifies arguments to pass to the |
... |
Additional arguments passed to |
Details
This function allows the user to explore the asymptotic behavior of the distributions involved in the test conducted by distfreereg
. If the sample size is large enough and the true covariance matrix of the errors is known or is estimated well enough, then the observed and simulated statistics have nearly the same distribution. How large the sample size must be depends on the details of the situation. This function can be used to determine how large the sample size must be to obtain approximately equal distributions, and to estimate the power of the test against a specific alternative.
The user specifies a particular true model which is used to generate outcome values. There are three cases:
When
true_mean
is a function, this function determines the mean of the outcome values anderr_dist_fun
is used to generate errors. The error-generating function will usually include an element oftrue_covariance
as an argument, and in that case must accept the appropriate class of object. For example, if the true covariance is a list of matrices corresponding to a block-diagonal covariance matrix, thenerr_dist_fun
must accept such a list as an argument.When
true_mean
is annls
object, or when it is aformula
andtrue_method
is "nls
", the function determined by the formula (in the model call or user-specified, respectively) is used to determine the mean function, anderr_dist_fun
generates the errors.When
true_mean
is a model object that is not annls
object, or aformula
andmethod
is not "nls
", thensimulate
is used to generate outcome values.
If none of these cases apply to true_mean
, then compare()
cannot be used. (E.g., true_mean
cannot be a glm
object fitted using a "quasi" family
, because simulate
does not work for that family.)
The user also specifies arguments to pass to distfreereg
, most notably a model to test comprising a mean function test_mean
and a covariance structure specified by covariance
. For each repetition, compare
sends the simulated data, as Y
or as part of data
, to distfreereg
.
The true_covariance
argument specifies the covariance structure that is available to err_dist_fun
for generating errors. The needs of err_dist_fun
can vary (for example, the default function uses SqrtSigma
to generate multivariate normal errors), so any one of the elements Sigma
, SqrtSigma
, P
, and Q
(defined in the documentation of distfreereg
) can be specified. Any element needed by err_dist_fun
is calculated automatically if not supplied.
The value of err_dist_fun
must be a function whose output is a numeric matrix with n
rows and reps
columns. Each column is used as the vector of errors in one repetition. The error function's arguments can include the special values n
, reps
, Sigma
, SqrtSigma
, P
, and Q
. These arguments are automatically assigned their corresponding values from the values passed to compare
. For example, the default value rmvnorm
uses SqrtSigma
to generate multivariate normal values with mean 0 and covariance Sigma
.
The argument keep
is useful for diagnosing problems, but caution should be used lest a very large object be created. It is often sufficient to save the distfreereg
objects from only the first few replications.
For more specialized needs, the manual
argument allows the calculation and saving of objects during each repetition. For example, using manual = function(x) residuals(x)
will save the (raw) residuals from each repetition.
The first repetition creates a distfreereg
object. During each subsequent repetition, this object is passed to update.distfreereg
to create a new object. The update_args
argument can be used to modify this call.
If necessary, global_override
can be used to pass an override
argument to distfreereg
in each repetition. For example, using gobal_override = list(theta_hat = theta)
forces the estimated parameter vector used in the test in each call to be the true parameter vector theta
.
Value
An object of class compare
with the following components:
call |
The matched call. |
Y |
The matrix whose columns contain the model outcome values used for the corresponding repetitions. |
theta |
Supplied vector of parameter values. |
true_mean |
Supplied object specifying the true mean function. |
true_covariance |
List containing element(s) that specify the true covariance structure. |
true_X |
Supplied matrix of true covariate values. |
true_data |
Supplied data frame of true covariate values. |
test_mean |
Supplied object specifying the mean function being tested. |
covariance |
List containing element(s) that specify the test covariance structure. |
X |
Supplied matrix of test covariate values. |
data |
Supplied data frame of test covariate values. |
observed_stats |
The observed statistics collected in each repetition. |
mcsim_stats |
The simulated statistics from the first repetition. (They are the same for each repetition, because |
p |
The p-values for the observed statistics. |
dfrs |
A list containing the outputs of |
manual |
A list containing the results of the function specified by the argument |
Warnings
The generation of new outcome values requires specifying an error distribution. The default behavior when true_mean
is a function
, an nls
object, or a formula
with method
equal to "nls
" is to use a multivariate normal error distribution, but different error-generating functions can be defined by the user. When true_mean
is a model object that is not an nls
object, or a formula
and method
is not "nls
", then the errors are generated using simulate
and are therefore distributed according to that function's specifications.
In short, the asymptotic behavior is determined for a specific (true) error distribution, even though the test itself is distribution-free.
Note
Some of the processing of the elements of true_covariance
is analogous to the processing of covariance
by distfreereg
. Any values of solve_tol
and symmetric
specified in distfreereg
's control
argument are used by compare
to similar effect in processing true_covariance
.
Support for glm
objects is limited to those created using a family
that has a simulate
element.
The presence of call
in the value allows a compare
object to be passed to update
.
Author(s)
Jesse Miller
See Also
asymptotics
, distfreereg
, rejection
, plot.compare
, ks.test.compare
Examples
set.seed(20240201)
n <- 100
func <- function(X, theta) theta[1] + theta[2]*X[,1]
Sig <- rWishart(1, df = n, Sigma = diag(n))[,,1]
theta <- c(2,5)
X <- matrix(rexp(n, rate = 1))
# In practice, 'reps' should be much larger
cdfr <- compare(true_mean = func, true_X = X, true_covariance = list(Sigma = Sig),
test_mean = func, X = X, covariance = list(Sigma = Sig),
reps = 10, prog = Inf, theta = theta, theta_init = rep(1, length(theta)))
cdfr$p