Adaptive Stress Testing for Robust AI & Reinforcement Learning (ASTRA-RL)¶
Welcome to the ASTRA-RL toolbox documentation! This documentation provides an overview of the ASTRA-RL toolbox, its features, and how to use it effectively.
What is ASTRA-RL?¶
ASTRA-RL is a Python toolbox for testing and evaluating language models and generative AI systems. It provides a modular framework for using reinforcement learning to systematically discover how language models respond to adversarial or challenging inputs.
What is Adversarial Testing for Language Models?¶
Adversarial testing is the process of stress-testing AI systems by trying to find inputs that cause them to behave in harmful or undesirable ways. Think of it like security testing for language models - you're trying to discover vulnerabilities before they're exploited in the real world.
ASTRA-RL automates this process using reinforcement learning. Instead of manually crafting adversarial prompts, you train a "tester" model to automatically generate prompts that elicit unsafe responses from your target model. This is:
- Faster than manual testing once the tester is trained
- More systematic at finding failure modes
- Scalable to large-scale evaluation
Who Should Use ASTRA-RL?¶
This toolbox is designed for:
- Researchers studying AI safety and robustness
- ML engineers evaluating production language models
- Safety teams stress-testing conversational AI systems
- Developers building safer AI applications
Getting Started¶
Quick Installation¶
To get started quickly with ASTRA-RL:
Then import it in your Python code:
For detailed installation instructions, including development setup, see the Installation Guide.
Key Features¶
- Modular Architecture: Easily swap components for your specific use case
- Pre-built Algorithms: Support for PPO, DPO, IPO out of the box
- Multiple Scorers: Integration with Llama-Guard 3, Detoxify, and custom scorers
- HuggingFace Compatible: Seamless integration with HuggingFace models
- Extensible Framework: Build custom systems, samplers, and solvers
Support¶
If you encounter any issues or have questions:
- Check the Tutorials for common use cases
- Review the API documentation for detailed information
- Open an issue on GitHub