pairnorm {PopulateR} | R Documentation |
Pair two people, using either a normal or skew-normal distribution, into households
Description
Creates a data frame of couples, based on a distribution of age differences. The function will use either a skew normal or normal distribution, depending on whether a skew ("alphaused") parameter is provided. The default value for the skew is 0, and using the default will cause a normal distribution to be used. Two data frames are required. One person from each data frame will be matched, based on the age difference distribution specified. If the data frames are different sizes, the smalldf data frame must be the smaller of the two. In this situation, a random subsample of the largedf data frame will be used. Both data frames must be restricted to only those people that will have a couples match performed.
Usage
pairnorm(
smalldf,
smlid,
smlage,
largedf,
lrgid,
lrgage,
directxi = NULL,
directomega = NULL,
alphaused = 0,
HHStartNum,
HHNumVar,
userseed = NULL,
ptostop = NULL,
numiters = 1e+06,
verbose = FALSE
)
Arguments
smalldf |
A data frame containing one set of people to be paired. If the two data frames contain different numbers of people, this must be the data frame containing the smallest number. |
smlid |
The variable containing the unique ID for each person, in the smalldf data frame. |
smlage |
The age variable, in the smalldf data frame. |
largedf |
A data frame containing the second set of people to be paired. If the two data frames contain different numbers of people, this must be the data frame containing the largest number. |
lrgid |
The variable containing the unique ID for each person, in the largedf data frame. |
lrgage |
The age variable, in the largedf data frame. |
directxi |
If a skew-normal distribution is used, this is the location value. If the default alphaused value of 0 is used, this defaults to the mean value for the normal distribution. |
directomega |
If a skew-normal distribution is used, this is the scale value. If the default alphaused value of 0 is used, this defaults to the standard deviation value for the normal distribution. |
alphaused |
The skew. If a normal distribution is to be used, this can be omitted as the default value is 0 (no skew). |
HHStartNum |
The starting value for HHNumVar Must be numeric. |
HHNumVar |
The name for the household variable. |
userseed |
If specified, this will set the seed to the number provided. If not, the normal set.seed() function will be used. |
ptostop |
The critical p-value stopping rule for the function. If this value is not set, the critical p-value of .01 is used. |
numiters |
The maximum number of iterations used to construct the output data frame ($Matched) containing the couples. The default value is 1000000, and is the stopping rule if the algorithm does not converge. |
verbose |
Whether the distribution used, number of iterations used, the critical chi-squared value, and the final chi-squared value are printed to the console. The default value is FALSE. |
Value
A list of two data frames. $Matched contains the data frame of pairs. $Unmatched contains the unmatched observations from largedf. If there are no unmatched people, $Unmatched will be an empty data frame.
Examples
library(dplyr)
# matched dataframe sizes first, using a normal distribution
# females younger by a mean of -2 and a standard deviation of 3
set.seed(1)
PartneredFemales1 <- Township %>%
filter(Sex == "Female", Relationship == "Partnered") %>%
slice_sample(n=120, replace = FALSE)
PartneredMales1 <- Township %>%
filter(Sex == "Male", Relationship == "Partnered") %>%
slice_sample(n = nrow(PartneredFemales1), replace = FALSE)
# partners females and males, using a normal distribution, with the females
# being younger by a mean of -2 and a standard deviation of 3
OppSexCouples1 <- pairnorm(PartneredFemales1, smlid = "ID", smlage = "Age", PartneredMales1,
lrgid = "ID", lrgage = "Age", directxi = -2, directomega = 3,
HHStartNum = 1, HHNumVar = "HouseholdID", userseed = 4, ptostop=.3)
Couples1 <- OppSexCouples1$Matched
# different size dataframes
# there are more partnered males than partnered females
# so all partnered males will have a matched female partner
# but not all females will be matched
# being the smallest data frame, the female one must be the first
#
# PartneredFemales2 <- Township %>%
# filter(Sex == "Female", Relationship == "Partnered") %>%
# slice_sample(n=120, replace = FALSE)
# PartneredMales2 <- Township %>%
# filter(Sex == "Male", Relationship == "Partnered") %>%
# slice_sample(n=140, replace = FALSE)
#
# OppSexCouples2 <- pairnorm(PartneredFemales2, smlid = "ID", smlage = "Age", PartneredMales2,
# lrgid = "ID", lrgage = "Age", directxi = -2, directomega = 3,
# HHStartNum = 1, HHNumVar="HouseholdID", userseed = 4, ptostop=.3)
# Couples2 <- OppSexCouples2$Matched