data_partition {rchallenge} | R Documentation |
Data partitioning function adapted from the caret package.
Description
data_partition
creates a test/training partition.
Usage
data_partition(y, p = 0.5, groups = min(5, length(y)))
Arguments
y |
a vector of outcomes. |
p |
the percentage of data that goes to training |
groups |
for numeric |
Details
The random sampling is done within the levels of y
when y
is a
factor in an attempt to balance the class distributions within the splits.
For numeric y
, the sample is split into groups sections based on
percentiles and sampling is done within these subgroups. The number of
percentiles is set via the groups
argument.
Also, very small class sizes (<= 3) the classes may not show up in both the training and test data
Value
A vector of row position integers corresponding to the training data
Author(s)
adapted from createDataPartition
function by Max Kuhn
References
http://caret.r-forge.r-project.org/splitting.html