Add Redlite tasks for safety benchmarking #2020

inno-simon · 2024-06-25T20:06:34Z

Hello,

I've added a red teaming oriented group of tasks that have been featured in this paper

I've omitted the tasks based on realtoxicity as the original dataset is already featured in lm_eval.

The PR includes 7 tasks designed to evaluate various safety aspects of an LLM, as well as a custom metric, best-of-pem-rouge which takes computes both a rouge score, and a "prefix exact match" and picks the highest value. Details and justifications can be found in the paper.

To test the integration, i ran all the tasks on two models, one local (gemma 2B) and one online (gpt4). I compared these results with the ones from the paper (detailed run results can be found here) and it produced the following :

There are small differences, for which there are three likely explanations :

redlite is leveraging the system prompt part of the datasets when the model supports it; lm_eval does not
the general non-deterministic nature of LLMs
possible discrepancy in model parameters used (temperature, etc.)

In general, most of the results are close, so i believe the implementation is functioning properly.

CLAassistant · 2024-06-25T20:06:39Z

All committers have signed the CLA.

inno-simon requested review from haileyschoelkopf and lintangsutawika as code owners June 25, 2024 20:06

Add redlite tasks

e5db0bb

inno-simon force-pushed the rt branch from 3c8f98c to e5db0bb Compare June 25, 2024 20:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Redlite tasks for safety benchmarking #2020

Add Redlite tasks for safety benchmarking #2020

inno-simon commented Jun 25, 2024 •

edited

Loading

CLAassistant commented Jun 25, 2024 •

edited

Loading

Add Redlite tasks for safety benchmarking #2020

Are you sure you want to change the base?

Add Redlite tasks for safety benchmarking #2020

Conversation

inno-simon commented Jun 25, 2024 • edited Loading

CLAassistant commented Jun 25, 2024 • edited Loading

inno-simon commented Jun 25, 2024 •

edited

Loading

CLAassistant commented Jun 25, 2024 •

edited

Loading