Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Triton/vllm_backend launches model on incorrect GPU #7349

Open
tc8 opened this issue Jun 13, 2024 · 0 comments
Open

Triton/vllm_backend launches model on incorrect GPU #7349

tc8 opened this issue Jun 13, 2024 · 0 comments

Comments

@tc8
Copy link

tc8 commented Jun 13, 2024

Description
My issue is similar to triton-inference-server/tensorrtllm_backend#481 except it's for vllm.

I have the following config.pbtxt

    backend: "vllm"
    instance_group [
      {
        count: 1,
        kind: KIND_GPU,
        gpus: [1]
      }
    ]

When I check nvidia-smi, I get

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.12             Driver Version: 535.104.12   CUDA Version: 12.4     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla V100-SXM2-32GB           On  | 00000000:86:00.0 Off |                    0 |
| N/A   43C    P0              74W / 300W |  28465MiB / 32768MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  Tesla V100-SXM2-32GB           On  | 00000000:8A:00.0 Off |                    0 |
| N/A   38C    P0              61W / 300W |    681MiB / 32768MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
+---------------------------------------------------------------------------------------+v

Note that I requested the model to run on GPU 1 but nvidia smi shows that it's running on GPU 0. This is a problem because I have 2 GPUs and I want to run a model on each GPU, but vllm/Triton tries to run both on the same GPU (#0) and causes a CUDA OOM error, which seems related to #6855

I saw that latest commit in main had made changes related to CUDA devices so I have patched my Triton server with this latest code. The logs show that it's using device 1 but for some reason this doesn't seem to actually work
I0613 06:03:52.276264 1 model.py:166] "Detected KIND_GPU model instance, explicitly setting GPU device=1 for vllm_model_1"

Triton Information
24.05

Are you using the Triton container or did you build it yourself?
Container from NCC.

Note that I patched the backend code.

To Reproduce

  1. Run tritonserver with patched vlllm_backend
  2. Use config.pbtxt
    backend: "vllm"
    instance_group [
      {
        count: 1,
        kind: KIND_GPU,
        gpus: [1]
      }
    ]
  1. Check nvidia-smi

Expected behavior
Model to be loaded on gpu 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant