OOM Issue #1923

zhentingqi · 2024-06-04T16:58:21Z

Hi! I am running evaluations but keep getting OOM errors. Here is my script:

TASKS="mmlu"
BATCH_SIZE=1
NUM_SHOTS=5


MODEL=Qwen/Qwen1.5-4B
API=vllm
lm_eval \
    --model ${API} \
    --model_args pretrained=${MODEL},dtype="float",gpu_memory_utilization=0.6,max_model_len=1024 \
    --tasks ${TASKS} \
    --device cuda:0 \
    --batch_size ${BATCH_SIZE} \
    --num_fewshot ${NUM_SHOTS} \
    --trust_remote_code \

I am using a 80GB A100. I have already decreased the gpu_memory_utilization and max_model_len, but the problem persists. I have tried Llama3-8b with the same hyperparameters and everything was just fine. Can anyone please tell me why this happens and how I can solve it? Thanks!

The text was updated successfully, but these errors were encountered:

LSinev · 2024-06-04T18:57:55Z

#1894 check this one too

devzzzero · 2024-06-06T13:53:37Z

Try running nvidia-smi to see which processes are using up your gpu memory.

johnwee1 · 2024-06-20T06:00:49Z

not sure why but my vram usage does not seem to imply that gpu_memory_utilization is being respected - i also have issues running a 7B model on an A40 (40GB vram) despite having no issues in using the HF backend (~28gb vram utilized there). specifically: it OOMs even before running any requests. Have verified that there is nothing else running on the GPU.

However, passing enforce_eager=True in model_args into the vllm backend fixes this issue for me by preventing cuda graphs being built.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OOM Issue #1923

OOM Issue #1923

zhentingqi commented Jun 4, 2024

LSinev commented Jun 4, 2024

devzzzero commented Jun 6, 2024

johnwee1 commented Jun 20, 2024

OOM Issue #1923

OOM Issue #1923

Comments

zhentingqi commented Jun 4, 2024

LSinev commented Jun 4, 2024

devzzzero commented Jun 6, 2024

johnwee1 commented Jun 20, 2024