Skip to content

Customize Training

To go beyond the supported training classes presented in the quick start training guide, please see the following resources:

1) System — How models run/interact

Handles loading models/tokenizers, performing one rollout step (tester/target generation), computing log-probs (for DPO/PPO, etc.), advancing the conversation state, and defining a reward.

2) Sampler — How data is collected

Defines how tester–target interactions are generated and structured for training/eval (e.g., single-path vs. tree rollouts), what per-step data is stored, and what the solver receives.

3) Scorers — How we define/measure harm

Scores target generations (scalar harm). Instantiate in your System for seamless use.

4) Solvers (Algorithms) — How the tester learns

Consume rollout graphs, flatten them to per-sample steps, collate batches, and compute the training loss (plus logs).

5) Trainer — How the training loop runs

Orchestrates the main loop, hyperparameters, optimizer, eval cadence, and checkpointing.