unif_cp {fitdistcp}R Documentation

Uniform Distribution Predictions Based on a Calibrating Prior

Description

The fitdistcp package contains functions that generate predictive distributions for various statistical models, with and without parameter uncertainty. Parameter uncertainty is included by using Bayesian prediction with a type of objective prior known as a calibrating prior. Calibrating priors are chosen to give predictions that give good reliability (i.e., are well calibrated), for any underlying true parameter values.

There are five functions for each model, each of which uses training data x. For model **** the five functions are as follows:

The q, r, d, p routines return two sets of results, one based on maximum likelihood, and the other based on a calibrating prior. The prior used depends on the model, and is given under Details below.

Where possible, the Bayesian prediction integral is solved analytically. Otherwise, DMGS asymptotic expansions are used. Optionally, a third set of results is returned that integrates the prediction integral by sampling the parameter posterior distribution using the RUST rejection sampling algorithm.

Usage

qunif_cp(
  x,
  p = seq(0.1, 0.9, 0.1),
  means = FALSE,
  debug = FALSE,
  aderivs = TRUE
)

runif_cp(n, x, mlcp = TRUE, debug = FALSE, aderivs = TRUE)

dunif_cp(x, y = x, debug = FALSE, aderivs = TRUE)

punif_cp(x, y = x, debug = FALSE, aderivs = TRUE)

Arguments

x

a vector of training data values

p

a vector of probabilities at which to generate predictive quantiles

means

logical that indicates whether to run additional calculations and return analytical estimates for the distribution means (longer runtime)

debug

logical for turning on debug messages

aderivs

(for code testing only) logical for whether to use analytic derivatives (instead of numerical). By default almost all models now use analytical derivatives.

n

the number of random samples required

mlcp

logical that indicates whether maxlik and parameter uncertainty calculations should be performed (turn off to speed up RUST)

y

a vector of values at which to calculate the density and distribution functions

Value

q**** returns a list containing at least the following:

For models with predictors, q**** additionally returns:

r**** returns a list containing the following:

d**** returns a list containing the following:

p*** returns a list containing the following:

t*** returns a list containing the following:

Details of the Model

The uniform distribution has probability density function

f(x;min,max)=\frac{1}{max-min}

and zero otherwise, where min \le x \le max is the random variable and min, max are the parameters.

The calibrating prior is given by the right Haar prior, which is

\pi(\lambda) \propto \frac{1}{max-min}

as given in Jewson et al. (2025).

Optional Return Values

q**** optionally returns the following:

If rust=TRUE:

If waicscores=TRUE:

If logscores=TRUE:

If means=TRUE:

r**** optionally returns the following:

If rust=TRUE:

d**** optionally returns the following:

If rust=TRUE:

p**** optionally returns the following:

If rust=TRUE:

Selecting these additional outputs increases runtime. They are optional so that runtime for the basic outputs is minimised. This facilitates repeated experiments that evaluate reliability over many thousands of repeats.

Details (homogeneous models)

This model is a homogeneous model, and the cp results are based on the right Haar prior. For homogeneous models (models with sharply transitive transformation groups), a Bayesian prediction based on the right Haar prior gives exact reliability, as shown by Severini et al. (2002), even when the true parameters are unknown. This means that probabilities in the prediction will correspond to frequencies of future outcomes in repeated trials (if the model is correct).

Maximum likelihood prediction does not give reliable predictions, even when the model is correct, because it does not account for parameter uncertainty. In particular, maximum likelihood predictions typically underestimate the tail in repeated trials.

The reliability of the maximum likelihood and the calibrating prior predictive quantiles produced by the q****_cp routines in fitdistcp can be quantified using repeated simulations with the routine reltest.

Details (analytic integration)

For this model, the Bayesian prediction equation is integrated analytically.

Details (RUST)

The Bayesian prediction equation can also be integrated using ratio-of-uniforms-sampling-with-transformation (RUST), using the option rust=TRUE. fitdistcp then calls Paul Northrop's rust package (Northrop, 2023). The RUST calculations are slower than the DMGS calculations.

For small sample sizes (e.g., n<20), and the very extreme tail, the DMGS approximation is somewhat poor (although always better than maximum likelihood) and it may be better to use RUST. For medium sample sizes (30+), DMGS is reasonably accurate, except for the very far tail.

It is advisable to check the RUST results for convergence versus the number of RUST samples.

It may be interesting to compare the DMGS and RUST results.

Author(s)

Stephen Jewson stephen.jewson@gmail.com

References

If you use this package, we would be grateful if you would cite the following reference, which gives the various calibrating priors, and tests them for reliability:

See Also

An introduction to fitdistcp, with more examples, is given on this webpage.

The fitdistcp package currently includes the following models (in alphabetical order):

The level of predictive probability matching achieved by the maximum likelihood and calibrating prior quantiles, for any model, sample size and true parameter values, can be demonstrated using the routine reltest.

Model selection among models can be demonstrated using the routines ms_flat_1tail, ms_flat_2tail, ms_predictors_1tail, and ms_predictors_2tail,

Examples

#
# example 1
x=fitdistcp::d25unif_example_data_v1
cat("length(x)=",length(x),"\n")
p=c(1:9)/10
q=qunif_cp(x,p)
xmin=min(q$ml_quantiles,q$cp_quantiles);
xmax=max(q$ml_quantiles,q$cp_quantiles);
plot(q$ml_quantiles,p,xlab="quantile estimates",xlim=c(xmin,xmax),
	sub="(from qunif_cp)",
	main="unif: quantile estimates");
points(q$cp_quantiles,p,col="red",lwd=2)

[Package fitdistcp version 0.1.1 Index]