Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Redlite tasks for safety benchmarking #2020

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

inno-simon
Copy link

@inno-simon inno-simon commented Jun 25, 2024

Hello,

I've added a red teaming oriented group of tasks that have been featured in this paper

I've omitted the tasks based on realtoxicity as the original dataset is already featured in lm_eval.

The PR includes 7 tasks designed to evaluate various safety aspects of an LLM, as well as a custom metric, best-of-pem-rouge which takes computes both a rouge score, and a "prefix exact match" and picks the highest value. Details and justifications can be found in the paper.

To test the integration, i ran all the tasks on two models, one local (gemma 2B) and one online (gpt4). I compared these results with the ones from the paper (detailed run results can be found here) and it produced the following :

image

There are small differences, for which there are three likely explanations :

  1. redlite is leveraging the system prompt part of the datasets when the model supports it; lm_eval does not
  2. the general non-deterministic nature of LLMs
  3. possible discrepancy in model parameters used (temperature, etc.)

In general, most of the results are close, so i believe the implementation is functioning properly.

@CLAassistant
Copy link

CLAassistant commented Jun 25, 2024

CLA assistant check
All committers have signed the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants