internalSubAsRest {nbc4va}R Documentation

Substitute values in a dataframe proportionally to all other values

Description

Substitute a target value proportionally to the distribution of the rest of the values in a column, given the following conditions:

Usage

internalSubAsRest(
  dataset,
  x,
  cols = 1:ncol(dataset),
  ignore = c(NA, NaN),
  removal = FALSE
)

Arguments

dataset

A dataframe with value(s) of x in it.

x

A target value in dataframe to replace with the rest of values per column.

cols

A numeric vector of columns to consider for substitution.

ignore

A vector of the rest of the values to ignore for substitution.

removal

Set to TRUE to remove column(s) that consist only of x values.

Details

Pseudocode of algorithm:

  SET dataset = table of values with columns and rows
  SET x = target value for substitution

  IF x in dataset:
    FOR EACH column y in a dataset:
      SET xv = all x values in y
      SET rest = all values not equal to x in y
      IF xv == values in y:
        REMOVE y in dataset
      IF number of unique values of rest == 1:
        MODIFY xv = rest
      IF number of xv values < number of unique values of rest:
        SET xn = number of xv values
        MODIFY xv = random sample of rest with size xn
      ELSE:
        SET xn = number of xv values
        SET p = proportions of rest
        SET xnp = xn * p
        IF xnp has decimals:
          MODIFY xnp = round xnp such that sum(xnp) == xn via largest remainder method
        MODIFY xv = rest values with distribution of xnp
  RETURN dataset

Value

out A dataframe or list depending on removal:

See Also

Other data functions: internalRoundFixedSum()

Examples

library(nbc4va)
data(nbc4vaDataRaw)
unclean <- nbc4vaDataRaw
clean <- nbc4va::internalSubAsRest(unclean, 99)


[Package nbc4va version 1.2 Index]