Detoxify
astra_rl.scorers.detoxify
¶
detoxify.py Scorer to call into the Detoxify engine.
DetoxifyScorer
¶
Bases: Scorer[str, str]
Scorer that wraps the Detoxify library for toxicity detection.
https://github.com/unitaryai/detoxify
Attributes:
| Name | Type | Description |
|---|---|---|
harm_category |
str
|
The category of harm to detect (default is "toxicity"); see below. |
variant |
str
|
The variant of the Detoxify model to use (default is "original"). |
Notes
Possible harm categories include "toxicity", "severe_toxicity", "obscene", "identity_attack", "insult", "threat", "sexual_explicit".
Possible variants Include "original", "multilingual", "unbiased".