Adaptive Stress Testing for Robust AI & Reinforcement Learning (ASTRA-RL)¶
Welcome to the ASTRA-RL toolbox documentation! This documentation provides an overview of the ASTRA-RL toolbox, its features, and how to use it effectively.
What is ASTRA-RL?¶
ASTRA-RL is a Python toolbox for training and evaluating language models and generative AI systems that use textual inputs. It provides a set of tools for training, evaluating, and analyzing language models, with a focus on applying reinforcement learning based refinement techniques to improve evaluator model performance.
The toolbox is particularly designed for LM red-teaming - a process that helps identify and benchmark prompts that elicit harmful or otherwise undesirable behavior from target language models. This helps surface vulnerabilities and guides fine-tuning to reduce harmful outputs.
Getting Started¶
Quick Installation¶
To get started quickly with ASTRA-RL:
pip install astra-rl
Then import it in your Python code:
import astra_rl
For detailed installation instructions, including development setup, see the Installation Guide.
Quick Links¶
- Installation Guide - Detailed installation and setup instructions
- Tutorials - Step-by-step guides for common tasks
- API Reference - Detailed documentation of all classes and functions
Key Features¶
- Modular Architecture: Easily swap components for your specific use case
- Pre-built Algorithms: Support for PPO, DPO, IPO out of the box
- Multiple Moderators: Integration with Llama-Guard 3, Detoxify, and custom moderators
- HuggingFace Compatible: Seamless integration with HuggingFace models
- Extensible Framework: Build custom problems, environments, and solvers
Documentation Structure¶
- Installation - Setup instructions for users and developers
- Tutorials - Learn how to use ASTRA-RL with hands-on examples
- Quick Start Training - Train your first red-teaming model
- Quick Start Evaluation - Evaluate models with pre-trained attackers. (Work in progress!)
- Customization Guides - Adapt ASTRA-RL to your needs
- API Reference - Complete API documentation
Support¶
If you encounter any issues or have questions:
- Check the Tutorials for common use cases
- Review the API documentation for detailed information
- Open an issue on GitHub