Customize Training¶
To go beyond the supported training classes presented in the quick start training guide, please see the following resources:
1) System — How models run/interact¶
Handles loading models/tokenizers, performing one rollout step (tester/target generation), computing log-probs (for DPO/PPO, etc.), advancing the conversation state, and defining a reward.
- Guide: System Customization
2) Sampler — How data is collected¶
Defines how tester–target interactions are generated and structured for training/eval (e.g., single-path vs. tree rollouts), what per-step data is stored, and what the solver receives.
- Guide: Sampler Customization
3) Scorers — How we define/measure harm¶
Scores target generations (scalar harm). Instantiate in your System for seamless use.
- Guide: Scorer Customization
4) Solvers (Algorithms) — How the tester learns¶
Consume rollout graphs, flatten them to per-sample steps, collate batches, and compute the training loss (plus logs).
- Guide: Solver Customization
5) Trainer — How the training loop runs¶
Orchestrates the main loop, hyperparameters, optimizer, eval cadence, and checkpointing.
- Guide: Trainer Customization