pairnormNum {PopulateR} | R Documentation |
Pair two people, using either a normal or skew-normal distribution, households already exist
Description
Creates a data frame of pairs, based on a distribution of age differences. The function will use either a skew normal or normal distribution, depending on whether a skew ("locationP") parameter is provided. The default value for the skew is 0, and using the default will cause a normal distribution to be used. Two data frames are required. One person from each data frame will be matched, based on the age difference distribution specified. If the data frames are different sizes, the smalldf data frame must be the smaller of the two. In this situation, a random subsample of the largedf data frame will be used. The household identifier variable can exist in either data frame. The function will apply the relevant household identifier once each pair is constructed. Both data frames must be restricted to only those people that are successfully paired. At least 30 matched pairs are required for the function to run. This is to reduce the proportion of empty cells.
Usage
pairnormNum(
smalldf,
smlid,
smlage,
largedf,
lrgid,
lrgage,
directxi = NULL,
directomega = NULL,
alphaused = 0,
HHNumVar,
userseed = NULL,
attempts = 10,
numiters = 1e+06,
verbose = FALSE
)
Arguments
smalldf |
The data frame containing one set of people to be paired. If the two data frames contain different numbers of people, this must be the data frame containing the smallest number. |
smlid |
The variable containing the unique ID for each person, in the smalldf data frame. |
smlage |
The age variable, in the smalldf data frame. |
largedf |
A data frame containing the second set of people to be paired. If the two data frames contain different numbers of people, this must be the data frame containing the largest number. |
lrgid |
The variable containing the unique ID for each person, in the largedf data frame. |
lrgage |
The age variable, in the largedf data frame. |
directxi |
If a skew-normal distribution is used, this is the location value. If the default alphaused value of 0 is used, this defaults to the mean value for the normal distribution. Use a positive value if the older ages are in smldf. |
directomega |
If a skew-normal distribution is used, this is the scale value. If the default alphaused value of 0 is used, this defaults to the standard deviation value for the normal distribution. |
alphaused |
The skew. If a normal distribution is to be used, this can be omitted as the default value is 0 (no skew). |
HHNumVar |
The household identifier variable. This must exist in only one data frame. |
userseed |
If specified, this will set the seed to the number provided. If not, the normal set.seed() function will be used. |
attempts |
The maximum number of times largedf will be sampled to draw an age match from the correct distribution, for each observation in the smalldf. The default number of attempts is 10. |
numiters |
The maximum number of iterations used to construct the output data frame ($Matched) containing the pairs. The default value is 1000000, and is the stopping rule if the algorithm does not converge. |
verbose |
Whether the distribution used, number of iterations used, the critical chi-squared value, and the final chi-squared value are printed to the console. The default value is FALSE. |
Value
A list of three data frames $Matched contains the data frame of pairs. $Smaller contains the unmatched observations from smalldf. $Larger contains the unmatched observations from largedf.
Examples
library(dplyr)
# parents are older than the children using a normal distribution of mean = 30,
# standard deviation of 5
set.seed(1)
Parents <- Township %>%
filter(between(Age, 24, 60)) %>%
slice_sample(n=120, replace = FALSE) %>%
mutate(HouseholdID = row_number())
Children <- Township %>%
filter(Age < 20) %>%
slice_sample(n = nrow(Parents), replace = FALSE)
PrntChld <- pairnormNum(Parents, smlid = "ID", smlage = "Age", Children, lrgid = "ID",
lrgage = "Age", directxi = 30, directomega = 5, HHNumVar = "HouseholdID",
userseed = 4, attempts=10, numiters = 80)
Matched <- PrntChld$Matched # all matched but not the specified distribution
UnmatchedAdults <- PrntChld$Smaller
UnmatchedChildren <- PrntChld$Larger