Dummy token for prompts in HH datasets #308

louieworth · 2024-05-28T13:54:47Z

I am issue regarding the Anthropic/hh-rlhf dataset, in reward_dataset.py:

# Anthropic/hh-rlhf
# tasksource/oasst1_pairwise_rlhf_reward
if exist_and_not_none(data, "chosen") and exist_and_not_none(data, "rejected"):
    prompt = data["prompt"] if exist_and_not_none(data, "prompt") else ""
    if prompt.startswith("prompter:"):
        prompt = (
            prompt.replace("prompter:", "\nHuman:\n").replace("assistant:", "\nAssistant:\n")
            + "\nAssistant:\n"
        )
    chosen = data["chosen"]
    reject = data["rejected"]
    input_template = None  # do not modified with input template again

This will result in prompt="" for every sample as the Anthropic/hh-rlhf = {'chosen', rejected}`. Will this cause an issue in accounting the logits of prompts in the following code:

OpenRLHF/openrlhf/trainer/dpo_trainer.py

Line 301 in fbc1eab

mask[:source_len] = False

The text was updated successfully, but these errors were encountered:

hijkzzz · 2024-05-29T00:07:55Z

Theoretically, whether mask prompt or not has no effect on DPO loss.

louieworth · 2024-05-29T02:05:30Z

It seems true as the prompt logits will be canceled by (chosen_logits - rejected_logits). Will this affect other algorithms? such as KTO?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dummy token for prompts in HH datasets #308

Dummy token for prompts in HH datasets #308

louieworth commented May 28, 2024

hijkzzz commented May 29, 2024

louieworth commented May 29, 2024

Dummy token for prompts in HH datasets #308

Dummy token for prompts in HH datasets #308

Comments

louieworth commented May 28, 2024

hijkzzz commented May 29, 2024

louieworth commented May 29, 2024