pairnorm {PopulateR}R Documentation

Pair two people, using either a normal or skew-normal distribution, into households

Description

Creates a data frame of couples, based on a distribution of age differences. The function will use either a skew normal or normal distribution, depending on whether a skew ("alphaused") parameter is provided. The default value for the skew is 0, and using the default will cause a normal distribution to be used. Two data frames are required. One person from each data frame will be matched, based on the age difference distribution specified. If the data frames are different sizes, the smalldf data frame must be the smaller of the two. In this situation, a random subsample of the largedf data frame will be used. Both data frames must be restricted to only those people that will have a couples match performed.

Usage

pairnorm(
  smalldf,
  smlid,
  smlage,
  largedf,
  lrgid,
  lrgage,
  directxi = NULL,
  directomega = NULL,
  alphaused = 0,
  HHStartNum,
  HHNumVar,
  userseed = NULL,
  ptostop = NULL,
  numiters = 1e+06,
  verbose = FALSE
)

Arguments

smalldf

A data frame containing one set of people to be paired. If the two data frames contain different numbers of people, this must be the data frame containing the smallest number.

smlid

The variable containing the unique ID for each person, in the smalldf data frame.

smlage

The age variable, in the smalldf data frame.

largedf

A data frame containing the second set of people to be paired. If the two data frames contain different numbers of people, this must be the data frame containing the largest number.

lrgid

The variable containing the unique ID for each person, in the largedf data frame.

lrgage

The age variable, in the largedf data frame.

directxi

If a skew-normal distribution is used, this is the location value. If the default alphaused value of 0 is used, this defaults to the mean value for the normal distribution.

directomega

If a skew-normal distribution is used, this is the scale value. If the default alphaused value of 0 is used, this defaults to the standard deviation value for the normal distribution.

alphaused

The skew. If a normal distribution is to be used, this can be omitted as the default value is 0 (no skew).

HHStartNum

The starting value for HHNumVar Must be numeric.

HHNumVar

The name for the household variable.

userseed

If specified, this will set the seed to the number provided. If not, the normal set.seed() function will be used.

ptostop

The critical p-value stopping rule for the function. If this value is not set, the critical p-value of .01 is used.

numiters

The maximum number of iterations used to construct the output data frame ($Matched) containing the couples. The default value is 1000000, and is the stopping rule if the algorithm does not converge.

verbose

Whether the distribution used, number of iterations used, the critical chi-squared value, and the final chi-squared value are printed to the console. The default value is FALSE.

Value

A list of two data frames. $Matched contains the data frame of pairs. $Unmatched contains the unmatched observations from largedf. If there are no unmatched people, $Unmatched will be an empty data frame.

Examples


library(dplyr)

# matched dataframe sizes first, using a normal distribution
# females younger by a mean of -2 and a standard deviation of 3
set.seed(1)
PartneredFemales1 <- Township %>%
  filter(Sex == "Female", Relationship == "Partnered") %>%
  slice_sample(n=120, replace = FALSE)
PartneredMales1 <- Township %>%
 filter(Sex == "Male", Relationship == "Partnered") %>%
 slice_sample(n = nrow(PartneredFemales1), replace = FALSE)

# partners females and males, using a normal distribution, with the females
# being younger by a mean of -2 and a standard deviation of 3
OppSexCouples1 <- pairnorm(PartneredFemales1, smlid = "ID", smlage = "Age", PartneredMales1,
                           lrgid = "ID", lrgage = "Age", directxi = -2, directomega = 3,
                           HHStartNum = 1, HHNumVar = "HouseholdID", userseed = 4, ptostop=.3)
Couples1 <- OppSexCouples1$Matched

# different size dataframes
# there are more partnered males than partnered females
# so all partnered males will have a matched female partner
# but not all females will be matched
# being the smallest data frame, the female one must be the first
#
# PartneredFemales2 <- Township %>%
#   filter(Sex == "Female", Relationship == "Partnered") %>%
#   slice_sample(n=120, replace = FALSE)
# PartneredMales2 <- Township %>%
#   filter(Sex == "Male", Relationship == "Partnered") %>%
#   slice_sample(n=140, replace = FALSE)
#
# OppSexCouples2 <- pairnorm(PartneredFemales2, smlid = "ID", smlage = "Age", PartneredMales2,
#                            lrgid = "ID", lrgage = "Age", directxi = -2, directomega = 3,
#                            HHStartNum = 1, HHNumVar="HouseholdID", userseed = 4, ptostop=.3)
# Couples2 <- OppSexCouples2$Matched

[Package PopulateR version 1.13 Index]