rl_config_set {RLoptimal}R Documentation

Configuration of Reinforcement Learning

Description

Mainly settings for the arguments of the training() function. Not compatible with the new API stack introduced in Ray 2.10.0.

Usage

rl_config_set(
  iter = 1000L,
  save_start_iter = NULL,
  save_every_iter = NULL,
  cores = 4L,
  gamma = 1,
  lr = 5e-05,
  train_batch_size = 10000L,
  model = rl_dnn_config(),
  sgd_minibatch_size = 200L,
  num_sgd_iter = 20L,
  ...
)

Arguments

iter

A positive integer value. Number of iterations.

save_start_iter, save_every_iter

An integer value. Save checkpoints every 'save_every_iter' iterations starting from 'save_start_iter' or later.

cores

A positive integer value. Number of CPU cores used for learning.

gamma

A positive numeric value. Discount factor of the Markov decision process. Default is 1.0 (not discount).

lr

A positive numeric value. Learning rate (default 5e-5). You can set a learning schedule instead of a learning rate.

train_batch_size

A positive integer value. Training batch size. Deprecated on the new API stack.

model

A list. Arguments passed into the policy model. See rl_dnn_config for details.

sgd_minibatch_size

A positive integer value. Total SGD batch size across all devices for SGD. Deprecated on the new API stack.

num_sgd_iter

A positive integer value. Number of SGD iterations in each outer loop.

...

Other settings for training(). See the arguments of the training() function in the source code of RLlib. https://github.com/ray-project/ray/blob/master/rllib/algorithms/algorithm_config.py https://github.com/ray-project/ray/blob/master/rllib/algorithms/ppo/ppo.py

Value

A list of reinforcement learning configuration parameters

Examples

## Not run: 
allocation_rule <- learn_allocation_rule(
  models, 
  N_total = 150, N_ini = rep(10, 5), N_block = 10, Delta = 1.3,
  outcome_type = "continuous", sd_normal = sqrt(4.5), 
  seed = 123, 
  # We change `iter` to 200 and `cores` for reinforcement learning to 2
  rl_config = rl_config_set(iter = 200, cores = 2), 
  alpha = 0.025
)
## End(Not run) 


[Package RLoptimal version 1.2.1 Index]