rdata {pricelevels} | R Documentation |
Simulate random price and quantity data
Description
Simulate random price and quantity data for a specified number of regions (r=1,\ldots,R)
, product groups (b=1,\ldots,B)
, and individual products (n=1,\ldots,N_{b})
using the function rdata()
.
The generation of prices follows the NLCPD model (see nlcpd()
), while expenditure share weights for product groups can be sampled using the function rweights()
. Purchased quantities are assigned to individual products. Moreover, random sales and gaps (using the function rgaps()
) can be introduced in the simulated data.
Usage
rgaps(r, n, amount=0, prob=NULL, pairs=FALSE, exclude=NULL)
rweights(r, b, type=~1)
rdata(R, B, N, gaps=0, weights=~b+r, sales=0, settings=list())
Arguments
r , n , b |
A character vector or factor of regional entities |
R , B , N |
A single integer specifying the number of regions |
weights , type |
A formula specifying the sampling of expenditure share weights for product groups. If |
gaps , sales , amount |
Percentage amount of gaps and sales (between 0 and 1), respectively, to be introduced in the data. |
prob |
A vector of probability weights, see also |
pairs |
A logical indicating if gaps should be introduced such that there are always at least two observations per product available ( |
exclude |
Data.frame of two (character) variables |
settings |
A list of control settings to be used. The following settings are supported:
|
Details
The function rgaps()
ensures that gaps do not lead to non-connected price data (see is.connected()
). Therefore, it could happen that the amount of gaps specified in rgaps()
is only approximate, in particular, in cases where certain regions and/or products should additionally be excluded from exhibiting gaps by exclude
.
If rgaps(pairs=FALSE)
, the minimum number of observations for a connected data set is R+N-1
. Otherwise, for rgaps(pairs=TRUE)
, this number is defined by 2N+\text{max}(0, R-N-1)
.
Note that setting sales>0
in function rdata()
distorts the initial price generating process. Consequently, parameter estimates may deviate stronger from their true values. Note also that the expenditure share weights weight
represent the relevance of product groups as (often) derived from national accounts and other data sources. Therefore, they cannot be derived from the simulated prices and quantities in the data, which would represent the expenditure shares of the individual products.
Value
Function rgaps()
returns a logical vector of the same length as r
where TRUE
s indicate gaps and FALSE
s no gaps.
Function rweights()
returns a numeric vector of (non-negative) expenditure share weights of the same length as r
.
Function rdata()
returns a data.table with the following variables:
group | product group identifier (factor) | |
weight | expenditure share weight of product groups (numeric) | |
region | region identifier (factor) | |
product | product identifier (factor) | |
sale | are prices and quantities affected by sales? (logical) | |
price | price (numeric) | |
quantity | consumed quantity (numeric) | |
or a list with the simulated data and its underlying parameter values, if settings=list(par.add=TRUE)
.
Author(s)
Sebastian Weinand
Examples
# simulate price data for ten regions and five product groups
# containing three individual products each:
set.seed(1)
dt <- rdata(R=10, B=5, N=3)
boxplot(price~paste(group, product, sep=":"), data=dt)
# simulate price data for ten regions and five product groups
# containing one to five individual products:
set.seed(1)
dt <- rdata(R=10, B=5, N=c(1,2,3,4,5))
boxplot(price~paste(group, product, sep=":"), data=dt)
# simulate price data for three product groups (with one
# product each) in four regions:
dt <- rdata(R=4, B=3, N=1)
# add expenditure share weights:
dt[, "w1" := rweights(r=region, b=group, type=~1)] # constant
dt[, "w2" := rweights(r=region, b=group, type=~b)] # product-specific
dt[, "w3" := rweights(r=region, b=group, type=~b+r)] # product-region-specific
# weights add up to 1:
dt[, list("w1"=sum(w1),"w2"=sum(w2),"w3"=sum(w3)), by="region"]
# introduce 25% random gaps:
dt.gaps <- dt[!rgaps(r=region, n=product, amount=0.25), ]
# weights no longer add up to 1 in each region:
dt.gaps[, list("w1"=sum(w1),"w2"=sum(w2),"w3"=sum(w3)), by="region"]
# approx. 25% random gaps, but keep observation for product "n2"
# in region "r1" and all observations in region "r2":
no_gaps <- data.frame(r=c("r1","r2"), n=c("n2",NA))
# apply to data:
dt[!rgaps(r=region, n=product, amount=0.25, exclude=no_gaps), ]
# or, directly, in one step:
dt <- rdata(R=4, B=3, N=1, gaps=0.25, settings=list("gaps.exclude"=no_gaps))
# introduce systematic gaps:
dt <- rdata(R=15, B=1, N=10)
dt[, "prob" := data.table::rleidv(product)] # probability for gaps increases per product
dt.gaps <- dt[!rgaps(r=region, n=product, amount=0.25, prob=prob), ]
plot(table(dt.gaps$product), type="l")