polymatch {polymatching} | R Documentation |
Polymatching
Description
polymatch
generates matched samples in designs with up to 10 groups.
Usage
polymatch(
formulaMatch,
start = "small.to.large",
data,
distance = "euclidean",
exactMatch = NULL,
vectorK = NULL,
iterate = TRUE,
niter_max = 50,
withinGroupDist = TRUE,
verbose = TRUE
)
Arguments
formulaMatch |
Formula with form |
start |
An object specifying the starting point of the iterative algorithm. Three types of input are accepted:
|
data |
The |
distance |
String specifying whether the distance between pairs of observations should be computed with the
Euclidean ( |
exactMatch |
Formula with form |
vectorK |
A named vector with the number of subjects from each group in each matched set. The names of the vector must be
the labels of the groups, i.e., the levels of the variable identifying the treatment groups/exposures.
For example, in case of four groups with labels "A","B","C" and "D" and assuming that the desired design is 1:2:3:3
(1 subject from A, 2 from B, 3 from C and 3 from D in each matched set), the parameter should be set to
|
iterate |
Boolean specifying whether iterations should be done ( |
niter_max |
Maximum number of iterations. Default is 50. |
withinGroupDist |
Boolean specifying whether the distances within the same treatment/exposure group should be considered in the
total distance. For example, in a 1:2:3 matched design among the groups A, B and C, the parameters controls whether the distance
between the two subjects in B and the three pairwise distances among the subjects in C should be counted in the total distance.
The default value is |
verbose |
Boolean: should text be printed in the console? Default is |
Details
The function implements the conditionally optimal matching algorithm, which iteratively uses
two-group optimal matching steps to generate matched samples with small total distance. In the current implementation,
it is possible to generate matched samples with multiple subjects per group, with the matching ratio being
specified by the vectorK
parameter.
The steps of the algorithm are described with the following example. Consider a 4-group design with
groups labels "A", "B", "C" and "D" and a 1:1:1:1 matching ratio. The algorithm requires a set of quadruplets as starting point.
The argument start
defines the approach to be used to
generate such a starting point. polymatch
generates the starting point by sequentially using optimal two-group matching.
In the default setting (start="small.to.large"
), the steps are:
optimally match the two smallest groups;
optimally match the third smallest group to the pairs generated in the first step;
optimally match the last group to the triplets generated in the second step.
Notably, we can use the optimal two-group algorithm in steps 2) and 3) because they are
two-dimensional problems: the elements of one group on one hand, fixed matched sets on the other hand. The order of the
groups to be considered when generating the starting point can be user-specified (e.g., start="D-B-A-C"
).
In alternative, the user can provide a matched set that will be used as starting point.
Given the starting matched set, the algorithm iteratively explores possible reductions in the total distance (if iterate="TRUE"
),
by sequentially relaxing the connection to each group and rematching units of that group. In our example:
rematch "B-C-D" triplets within the starting quadruplets to units in group "A";
rematch "A-C-D" triplets within the starting quadruplets to units in group "B";
rematch "A-B-D" triplets within the starting quadruplets to units in group "C";
rematch "A-B-C" triplets within the starting quadruplets to units in group "D".
If none of the sets of quadruplets generated in 1)-4) has smaller total distance than the starting point, the algorihm stops.
Otherwise, the set of quadruplets with smallest distance is seleceted and the process iterated, until no reduction in the total
distance is found or the number of maximum iterations is reached (niter_max=50
by default).
The total distance is defined as the sum of all the within-matched-set distances. The within-matched-set distance is defined as the
sum of the pairwise distances between pairs of units in the matched set. The type of distance is specified with the distance
argument. The current implementation supports Euclidean (distance="euclidean"
) and Mahalanobis (distance="mahalanobis"
)
distances. In particular, for the Mahalanobis distance, the covariance matrix is defined only once on the full dataset.
Value
A list containing the following components:
- match_id
A numeric vector identifying the matched sets—matched units have the same identifier.
- total_distance
Total distance of the returned matched sample.
- total_distance_start
Total distance at the starting point.
See Also
balance
and plotBalance
to summarize the
balance in the covariates.
Examples
#Generate a datasets with group indicator and four variables:
#- var1, continuous, sampled from normal distributions;
#- var2, continuous, sampled from beta distributions;
#- var3, categorical with 4 levels;
#- var4, binary.
set.seed(1234567)
dat <- data.frame(group = c(rep("A",10),rep("B",20),rep("C",30)),
var1 = c(rnorm(10,mean=0,sd=1),
rnorm(20,mean=1,sd=2),
rnorm(30,mean=-1,sd=2)),
var2 = c(rbeta(10,shape1=1,shape2=1),
rbeta(20,shape1=2,shape2=1),
rbeta(30,shape1=1,shape2=2)),
var3 = factor(c(rbinom(10,size=3,prob=.4),
rbinom(20,size=3,prob=.5),
rbinom(30,size=3,prob=.3))),
var4 = factor(c(rbinom(10,size=1,prob=.5),
rbinom(20,size=1,prob=.3),
rbinom(30,size=1,prob=.7))))
#Match on propensity score
#-------------------------
#With multiple groups, need a multinomial model for the PS
library(VGAM)
psModel <- vglm(group ~ var1 + var2 + var3 + var4,
family=multinomial, data=dat)
#Estimated logits - 2 for each unit: log(P(group=A)/P(group=C)), log(P(group=B)/P(group=C))
logitPS <- predict(psModel, type = "link")
dat$logit_AvsC <- logitPS[,1]
dat$logit_BvsC <- logitPS[,2]
#Match on logits of PS
resultPs <- polymatch(group ~ logit_AvsC + logit_BvsC, data = dat,
distance = "euclidean")
dat$match_id_ps <- resultPs$match_id
#Match on covariates
#--------------------
#Match on continuous covariates with exact match on categorical/binary variables
resultCov <- polymatch(group ~ var1 + var2, data = dat,
distance = "mahalanobis",
exactMatch = ~var3+var4)
dat$match_id_cov <- resultCov$match_id