PipeOpEncodePL {mlr3pipelines} | R Documentation |
Piecewise Linear Encoding Base Class
Description
Abstract base class for piecewise linear encoding.
Piecewise linear encoding works by splitting values of features into distinct bins, through an algorithm implemented
in private$.get_bins()
, and then creating new feature columns through a continuous alternative to one-hot encoding.
Here, one new feature per bin is constructed, with values being either
-
0
, if the original value was below the lower bin boundary, -
1
, if the original value was above or equal to the upper bin boundary, or a scaled value between
0
and1
, if the original value was inside the bin boundaries. Scaling is done by offsetting the original value by the lower bin boundary and dividing by the bin width.
PipeOp
s inheriting from this encode columns of type numeric
and integer
. Use the PipeOpTaskPreproc
$affect_columns
functionality to only encode a subset of columns, or only encode columns of a certain type, etc.
Format
Abstract R6Class
object inheriting from PipeOpTaskPreprocSimple
/PipeOpTaskPreproc
/PipeOp
.
Construction
PipeOpEncodePL$new(id = "encodepl", param_set = ps(), param_vals = list(), packages = character(0), task_type = "Task")
-
id
::character(1)
Identifier of resulting object. See$id
slot ofPipeOp
. -
param_set
::ParamSet
Parameter space description. This should be created by the subclass and given tosuper$initialize()
. -
param_vals
:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings given inparam_set
. The subclass should have its ownparam_vals
parameter and pass it on tosuper$initialize()
. Defaultlist()
. -
packages
::character
Set of all required packages for thePipeOp
'sprivate$.train()
andprivate$.predict()
methods. See$packages
slot. Default ischaracter(0)
. -
task_type
::character(1)
The class ofTask
that should be accepted as input and will be returned as output. This should generally be acharacter(1)
identifying a type ofTask
, e.g."Task"
,"TaskClassif"
or"TaskRegr"
(or another subclass introduced by other packages). Default is"Task"
.
Input and Output Channels
Input and output channels are inherited from PipeOpTaskPreproc
.
The output is the input Task
with all affected numeric
and integer
columns encoded using piecewise linear encoding.
State
The $state
is a named list
with the $state
elements inherited from PipeOpTaskPreproc
, as well as:
-
bins
:: namedlist
Named list of numeric vectors. Each element corresponds to and is named after one of the affected feature columns and contains the bin boundaries derived throughprivate$.get_bins()
.
Parameters
The parameters are the parameters inherited from PipeOpTaskPreproc
.
Internals
PipeOpEncodePL
is an abstract class inheriting from PipeOpTaskPreprocSimple
that allows easier implementation
of different binning algorithms for piecewise linear encoding. The respective binning algorithm should be implemented
as private$.get_bins()
.
Fields
Only fields inherited from PipeOp
.
Methods
Methods inherited from PipeOpTaskPreprocSimple
/PipeOpTaskPreproc
/PipeOp
as well as
-
.get_bins(task, cols)
(Task
,character
) -> namedlist
Abstract method for splitting the value range of a feature column into distinct bins. The argumentcols
should give the names of the feature columns of thetask
for which bins should be derived. Returns a named list of numeric vectors containing the bin boundaries for each affected feature column, named by that corresponding feature column.
References
Gorishniy Y, Rubachev I, Babenko A (2022). “On Embeddings for Numerical Features in Tabular Deep Learning.” In Advances in Neural Information Processing Systems, volume 35, 24991–25004. https://proceedings.neurips.cc/paper_files/paper/2022/hash/9e9f0ffc3d836836ca96cbf8fe14b105-Abstract-Conference.html.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp
,
PipeOpEnsemble
,
PipeOpImpute
,
PipeOpTargetTrafo
,
PipeOpTaskPreproc
,
PipeOpTaskPreprocSimple
,
mlr_pipeops
,
mlr_pipeops_adas
,
mlr_pipeops_blsmote
,
mlr_pipeops_boxcox
,
mlr_pipeops_branch
,
mlr_pipeops_chunk
,
mlr_pipeops_classbalancing
,
mlr_pipeops_classifavg
,
mlr_pipeops_classweights
,
mlr_pipeops_colapply
,
mlr_pipeops_collapsefactors
,
mlr_pipeops_colroles
,
mlr_pipeops_copy
,
mlr_pipeops_datefeatures
,
mlr_pipeops_decode
,
mlr_pipeops_encode
,
mlr_pipeops_encodeimpact
,
mlr_pipeops_encodelmer
,
mlr_pipeops_encodeplquantiles
,
mlr_pipeops_encodepltree
,
mlr_pipeops_featureunion
,
mlr_pipeops_filter
,
mlr_pipeops_fixfactors
,
mlr_pipeops_histbin
,
mlr_pipeops_ica
,
mlr_pipeops_imputeconstant
,
mlr_pipeops_imputehist
,
mlr_pipeops_imputelearner
,
mlr_pipeops_imputemean
,
mlr_pipeops_imputemedian
,
mlr_pipeops_imputemode
,
mlr_pipeops_imputeoor
,
mlr_pipeops_imputesample
,
mlr_pipeops_kernelpca
,
mlr_pipeops_learner
,
mlr_pipeops_learner_pi_cvplus
,
mlr_pipeops_learner_quantiles
,
mlr_pipeops_missind
,
mlr_pipeops_modelmatrix
,
mlr_pipeops_multiplicityexply
,
mlr_pipeops_multiplicityimply
,
mlr_pipeops_mutate
,
mlr_pipeops_nearmiss
,
mlr_pipeops_nmf
,
mlr_pipeops_nop
,
mlr_pipeops_ovrsplit
,
mlr_pipeops_ovrunite
,
mlr_pipeops_pca
,
mlr_pipeops_proxy
,
mlr_pipeops_quantilebin
,
mlr_pipeops_randomprojection
,
mlr_pipeops_randomresponse
,
mlr_pipeops_regravg
,
mlr_pipeops_removeconstants
,
mlr_pipeops_renamecolumns
,
mlr_pipeops_replicate
,
mlr_pipeops_rowapply
,
mlr_pipeops_scale
,
mlr_pipeops_scalemaxabs
,
mlr_pipeops_scalerange
,
mlr_pipeops_select
,
mlr_pipeops_smote
,
mlr_pipeops_smotenc
,
mlr_pipeops_spatialsign
,
mlr_pipeops_subsample
,
mlr_pipeops_targetinvert
,
mlr_pipeops_targetmutate
,
mlr_pipeops_targettrafoscalerange
,
mlr_pipeops_textvectorizer
,
mlr_pipeops_threshold
,
mlr_pipeops_tomek
,
mlr_pipeops_tunethreshold
,
mlr_pipeops_unbranch
,
mlr_pipeops_updatetarget
,
mlr_pipeops_vtreat
,
mlr_pipeops_yeojohnson
Other Piecewise Linear Encoding PipeOps:
mlr_pipeops_encodeplquantiles
,
mlr_pipeops_encodepltree