BatchLinUCBDisjointPolicyEpsilon {cramR} | R Documentation |
Batch Disjoint LinUCB Policy with Epsilon-Greedy
Description
Batch Disjoint LinUCB Policy with Epsilon-Greedy
Batch Disjoint LinUCB Policy with Epsilon-Greedy
Details
Implements the disjoint LinUCB algorithm with upper confidence bounds and epsilon-greedy exploration, using batched updates.
Methods
- 'initialize(alpha = 1.0, epsilon = 0.1, batch_size = 1)': Constructor. - 'set_parameters(context_params)': Initializes sufficient statistics for each arm. - 'get_action(t, context)': Selects an arm using UCB scores and epsilon-greedy rule. - 'set_reward(t, context, action, reward)': Updates statistics and refreshes model at batch intervals.
Super class
cramR::NA
Public fields
alpha
Numeric, UCB exploration strength parameter.
epsilon
Numeric, probability of taking a random exploratory action.
batch_size
Integer, number of rounds per batch update.
A_cc
List of Gram matrices per arm, accumulated across batch.
b_cc
List of reward-weighted context vectors per arm.
class_name
Internal class name identifier.
Methods
Public methods
Inherited methods
Method new()
Constructor for batched LinUCB with epsilon-greedy exploration.
Usage
BatchLinUCBDisjointPolicyEpsilon$new(alpha = 1, epsilon = 0.1, batch_size = 1)
Arguments
alpha
Numeric. UCB width parameter (exploration strength).
epsilon
Numeric. Probability of selecting a random arm.
batch_size
Integer. Number of rounds before updating parameters.
Method set_parameters()
Initialize arm-specific parameter containers.
Usage
BatchLinUCBDisjointPolicyEpsilon$set_parameters(context_params)
Arguments
context_params
List containing at least 'unique' (feature size) and 'k' (number of arms).
Method get_action()
Chooses an arm based on UCB and epsilon-greedy sampling.
Usage
BatchLinUCBDisjointPolicyEpsilon$get_action(t, context)
Arguments
t
Integer timestep.
context
List containing the context for the decision.
Returns
A list with the selected action.
Method set_reward()
Updates arm-specific sufficient statistics based on observed reward. Parameter updates occur only at the end of a batch.
Usage
BatchLinUCBDisjointPolicyEpsilon$set_reward(t, context, action, reward)
Arguments
t
Integer timestep.
context
Context object used for decision-making.
action
List containing the chosen action.
reward
List containing the observed reward.
Returns
Updated internal model parameters.
Method clone()
The objects of this class are cloneable with this method.
Usage
BatchLinUCBDisjointPolicyEpsilon$clone(deep = FALSE)
Arguments
deep
Whether to make a deep clone.