Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How are prompts being picked up? #24

Open
mohdsanadzakirizvi opened this issue May 29, 2024 · 5 comments
Open

How are prompts being picked up? #24

mohdsanadzakirizvi opened this issue May 29, 2024 · 5 comments

Comments

@mohdsanadzakirizvi
Copy link

I am trying to experiment with prompts and I'm unable to check whether the system is picking up my changed prompts?

  1. I have overwritten the "prompt_file" for my experiment (found by checking out base_config for my experiment) in prompts/ folder.
  2. I am running IRCOT with flan-t5-base and have added "print" statements in llm_server/serve.py and commaqa/inference/ircot.py but I still don't see the prompts in the output.
  3. Are prompts dynamically read along with the dataset during run time?
  4. What should I change if I want to change prompt for a particular experiment?
@mohdsanadzakirizvi
Copy link
Author

The following is my experiment script, I am using only prompt_set 1 and avoiding test evals for quicker prototyping:

#!/usr/bin/env bash

# Expected command line argument values.
valid_systems=("ircot" "ircot_qa" "oner" "oner_qa" "nor_qa")
valid_models=("codex" "flan-t5-xxl" "flan-t5-xl" "flan-t5-large" "flan-t5-base" "none")
valid_datasets=("hotpotqa" "2wikimultihopqa" "musique" "iirc")

# Function to check if an argument is valid
check_argument() {
    local arg="$1"
    local position="$2"
    local valid_values=("${!3}")
    if ! [[ " ${valid_values[*]} " =~ " $arg " ]]; then
        echo "argument number $position is not a valid. Please provide one of: ${valid_values[*]}"
        exit 1
    fi

    if [[ $position -eq 2 && $arg == "none" && $1 != "oner" ]]; then
        echo "The model argument can only be 'none' only if the system argument is 'oner'."
        exit 1
    fi
}

# Check the number of arguments
if [[ $# -ne 3 ]]; then
    echo "Error: Invalid number of arguments. Expected format: ./reproduce.sh SYSTEM MODEL DATASET"
    exit 1
fi

# Check the validity of arguments
check_argument "$1" 1 valid_systems[*]
check_argument "$2" 2 valid_models[*]
check_argument "$3" 3 valid_datasets[*]

echo ">>>> Instantiate experiment configs with different HPs and write them in files. <<<<"
python runner.py $1 $2 $3 write --prompt_set 1

echo ">>>> Run experiments for different HPs on the dev set. <<<<"
python runner.py $1 $2 $3 predict --prompt_set 1

echo ">>>> Show results for experiments with different HPs <<<<"
python runner.py $1 $2 $3 summarize --prompt_set 1

echo ">>>> Pick the best HP and save the config with that HP. <<<<"
python runner.py $1 $2 $3 write --prompt_set 1 --best

@mohdsanadzakirizvi
Copy link
Author

@HarshTrivedi did you get a chance to look at this issue? Any help is greatly appreciated!

@HarshTrivedi
Copy link
Member

@mohdsanadzakirizvi Sorry for the late response.

What you are doing seems correct. But to see the prompt is affected or not, you should be putting a breakpoint/print statement elsewhere. Put a breakpoint here and ensure that the self.prompt is what you expect it to be.

I left some notes about how to navigate the code/flow better here some time ago. Should be helpful to figure out where the relevant code is based on the config.

Lastly, note that the command python runner.py $1 $2 $3 write --prompt_set 1 is responsible for instantiating the base config with HPs and dumping them in instantiated_configs/. So if you make any change in the base config you want to make sure the write command is called. If the above doesn't work as expected, it'll be a good sanity check to see if the instantiated configs are as you expect them to be as per the change in the base config.

@mohdsanadzakirizvi
Copy link
Author

Thanks for your response! Another question that I had was, how do you pick up the dev and eval sets? I have seen the subsampled files in the folders but where do you read them? It seems to me that inference/ircot.py's "StepByStepCOT..." only get executed when I call the predict on eval:

python runner.py $1 $2 $3 predict --prompt_set 1 --best --eval_test --official

Which doesn't make sense. Shouldn't it also run during the predict on dev too? If so, why am I not getting any print in the output? Are we not making a LLM prediction on dev set unless eval is involved?

@HarshTrivedi
Copy link
Member

Just to clarify: If you leave out the --best --eval_test in that command, it'll run prediction on all instantiated configs on the dev set. The ----best refers to best HP on the dev set, and --eval_test refers to the test set (as opposed to the default dev set). The --official flag is to use officially released eval scripts of respective datasets (instead of internally developed ones).

It seems you already know this. So when you drop these additional flags and still don't get any log/breakpoint in StepByStepCOTGen, it perhaps might mean that you there are no HPs setup to run.

Note that runner.py is a wrapper around run.py for running a batch of experiments. Whenever you run runner.py, it emits out what run.py commands it runs. The run.py is also a wrapper around a collection of scripts like predict.py, evaluate.py, etc. It'll also emit out what exact script it calls. It will be something like python predict.py {file_path} {evaluation_path} for the predict command all the way down. For debugging purposes, perhaps, going down at the root level may be easier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants