You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using a 80GB A100. I have already decreased the gpu_memory_utilization and max_model_len, but the problem persists. I have tried Llama3-8b with the same hyperparameters and everything was just fine. Can anyone please tell me why this happens and how I can solve it? Thanks!
The text was updated successfully, but these errors were encountered:
not sure why but my vram usage does not seem to imply that gpu_memory_utilization is being respected - i also have issues running a 7B model on an A40 (40GB vram) despite having no issues in using the HF backend (~28gb vram utilized there). specifically: it OOMs even before running any requests. Have verified that there is nothing else running on the GPU.
However, passing enforce_eager=True in model_args into the vllm backend fixes this issue for me by preventing cuda graphs being built.
Hi! I am running evaluations but keep getting OOM errors. Here is my script:
I am using a 80GB A100. I have already decreased the
gpu_memory_utilization
andmax_model_len
, but the problem persists. I have tried Llama3-8b with the same hyperparameters and everything was just fine. Can anyone please tell me why this happens and how I can solve it? Thanks!The text was updated successfully, but these errors were encountered: