How are prompts being picked up? #24

mohdsanadzakirizvi · 2024-05-29T14:46:08Z

I am trying to experiment with prompts and I'm unable to check whether the system is picking up my changed prompts?

I have overwritten the "prompt_file" for my experiment (found by checking out base_config for my experiment) in prompts/ folder.
I am running IRCOT with flan-t5-base and have added "print" statements in llm_server/serve.py and commaqa/inference/ircot.py but I still don't see the prompts in the output.
Are prompts dynamically read along with the dataset during run time?
What should I change if I want to change prompt for a particular experiment?

The text was updated successfully, but these errors were encountered:

mohdsanadzakirizvi · 2024-05-29T14:48:47Z

The following is my experiment script, I am using only prompt_set 1 and avoiding test evals for quicker prototyping:

#!/usr/bin/env bash

# Expected command line argument values.
valid_systems=("ircot" "ircot_qa" "oner" "oner_qa" "nor_qa")
valid_models=("codex" "flan-t5-xxl" "flan-t5-xl" "flan-t5-large" "flan-t5-base" "none")
valid_datasets=("hotpotqa" "2wikimultihopqa" "musique" "iirc")

# Function to check if an argument is valid
check_argument() {
    local arg="$1"
    local position="$2"
    local valid_values=("${!3}")
    if ! [[ " ${valid_values[*]} " =~ " $arg " ]]; then
        echo "argument number $position is not a valid. Please provide one of: ${valid_values[*]}"
        exit 1
    fi

    if [[ $position -eq 2 && $arg == "none" && $1 != "oner" ]]; then
        echo "The model argument can only be 'none' only if the system argument is 'oner'."
        exit 1
    fi
}

# Check the number of arguments
if [[ $# -ne 3 ]]; then
    echo "Error: Invalid number of arguments. Expected format: ./reproduce.sh SYSTEM MODEL DATASET"
    exit 1
fi

# Check the validity of arguments
check_argument "$1" 1 valid_systems[*]
check_argument "$2" 2 valid_models[*]
check_argument "$3" 3 valid_datasets[*]

echo ">>>> Instantiate experiment configs with different HPs and write them in files. <<<<"
python runner.py $1 $2 $3 write --prompt_set 1

echo ">>>> Run experiments for different HPs on the dev set. <<<<"
python runner.py $1 $2 $3 predict --prompt_set 1

echo ">>>> Show results for experiments with different HPs <<<<"
python runner.py $1 $2 $3 summarize --prompt_set 1

echo ">>>> Pick the best HP and save the config with that HP. <<<<"
python runner.py $1 $2 $3 write --prompt_set 1 --best

mohdsanadzakirizvi · 2024-05-30T05:51:54Z

@HarshTrivedi did you get a chance to look at this issue? Any help is greatly appreciated!

HarshTrivedi · 2024-06-12T02:49:33Z

@mohdsanadzakirizvi Sorry for the late response.

What you are doing seems correct. But to see the prompt is affected or not, you should be putting a breakpoint/print statement elsewhere. Put a breakpoint here and ensure that the self.prompt is what you expect it to be.

I left some notes about how to navigate the code/flow better here some time ago. Should be helpful to figure out where the relevant code is based on the config.

Lastly, note that the command python runner.py $1 $2 $3 write --prompt_set 1 is responsible for instantiating the base config with HPs and dumping them in instantiated_configs/. So if you make any change in the base config you want to make sure the write command is called. If the above doesn't work as expected, it'll be a good sanity check to see if the instantiated configs are as you expect them to be as per the change in the base config.

mohdsanadzakirizvi · 2024-06-12T07:16:36Z

Thanks for your response! Another question that I had was, how do you pick up the dev and eval sets? I have seen the subsampled files in the folders but where do you read them? It seems to me that inference/ircot.py's "StepByStepCOT..." only get executed when I call the predict on eval:

python runner.py $1 $2 $3 predict --prompt_set 1 --best --eval_test --official

Which doesn't make sense. Shouldn't it also run during the predict on dev too? If so, why am I not getting any print in the output? Are we not making a LLM prediction on dev set unless eval is involved?

HarshTrivedi · 2024-06-12T15:09:28Z

Just to clarify: If you leave out the --best --eval_test in that command, it'll run prediction on all instantiated configs on the dev set. The ----best refers to best HP on the dev set, and --eval_test refers to the test set (as opposed to the default dev set). The --official flag is to use officially released eval scripts of respective datasets (instead of internally developed ones).

It seems you already know this. So when you drop these additional flags and still don't get any log/breakpoint in StepByStepCOTGen, it perhaps might mean that you there are no HPs setup to run.

Note that runner.py is a wrapper around run.py for running a batch of experiments. Whenever you run runner.py, it emits out what run.py commands it runs. The run.py is also a wrapper around a collection of scripts like predict.py, evaluate.py, etc. It'll also emit out what exact script it calls. It will be something like python predict.py {file_path} {evaluation_path} for the predict command all the way down. For debugging purposes, perhaps, going down at the root level may be easier.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How are prompts being picked up? #24

How are prompts being picked up? #24

mohdsanadzakirizvi commented May 29, 2024

mohdsanadzakirizvi commented May 29, 2024

mohdsanadzakirizvi commented May 30, 2024

HarshTrivedi commented Jun 12, 2024

mohdsanadzakirizvi commented Jun 12, 2024

HarshTrivedi commented Jun 12, 2024

How are prompts being picked up? #24

How are prompts being picked up? #24

Comments

mohdsanadzakirizvi commented May 29, 2024

mohdsanadzakirizvi commented May 29, 2024

mohdsanadzakirizvi commented May 30, 2024

HarshTrivedi commented Jun 12, 2024

mohdsanadzakirizvi commented Jun 12, 2024

HarshTrivedi commented Jun 12, 2024