[Bug] lmdeploy lite auto_awq: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! #1675

dawnranger · 2024-05-29T03:16:32Z

Checklist

1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.

Describe the bug

Error when I try to run lmdeploy lite auto_awq on model Qwen1.5-0.5b-chat

Reproduction

lmdeploy lite auto_awq ./Qwen1.5-0.5b-chat

Environment

cuda 11.8
python 3.8.12
lmdeploy                      0.4.2
peft                          0.9.0
torch                         2.1.2+cu118
transformers                  4.41.1

Error traceback

lmdeploy lite auto_awq ./Qwen1.5-0.5b-chat --calib-dataset 'c4' --calib-samples 64 --calib-seqlen 2048 --w-bits 4 --w-group-size 128 --work-dir ./lmdeploy_hf     
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Move model.embed_tokens to GPU.
Move model.layers.0 to CPU.
Move model.layers.1 to CPU.
Move model.layers.2 to CPU.
Move model.layers.3 to CPU.
Move model.layers.4 to CPU.
Move model.layers.5 to CPU.
Move model.layers.6 to CPU.
Move model.layers.7 to CPU.
Move model.layers.8 to CPU.
Move model.layers.9 to CPU.
Move model.layers.10 to CPU.
Move model.layers.11 to CPU.
Move model.layers.12 to CPU.
Move model.layers.13 to CPU.
Move model.layers.14 to CPU.
Move model.layers.15 to CPU.
Move model.layers.16 to CPU.
Move model.layers.17 to CPU.
Move model.layers.18 to CPU.
Move model.layers.19 to CPU.
Move model.layers.20 to CPU.
Move model.layers.21 to CPU.
Move model.layers.22 to CPU.
Move model.layers.23 to CPU.
Move model.norm to GPU.
Move lm_head to CPU.
Loading calibrate dataset ...
Traceback (most recent call last):
  File "/root/venv_lmdeploy_0.4.2/bin/lmdeploy", line 8, in <module>
    sys.exit(run())
  File "/root/venv_lmdeploy_0.4.2/lib/python3.8/site-packages/lmdeploy/cli/entrypoint.py", line 37, in run
    args.run(args)
  File "/root/venv_lmdeploy_0.4.2/lib/python3.8/site-packages/lmdeploy/cli/lite.py", line 137, in auto_awq
    auto_awq(**kwargs)
  File "/root/venv_lmdeploy_0.4.2/lib/python3.8/site-packages/lmdeploy/lite/apis/auto_awq.py", line 96, in auto_awq
    vl_model, model, tokenizer, work_dir = calibrate(model,
  File "/root/venv_lmdeploy_0.4.2/lib/python3.8/site-packages/lmdeploy/lite/apis/calibrate.py", line 235, in calibrate
    calib_ctx.calibrate(all_data)
  File "/root/venv_lmdeploy_0.4.2/lib/python3.8/site-packages/lmdeploy/lite/quantization/calibration.py", line 315, in calibrate
    _ = model(data.to(self.device))
  File "/root/venv_lmdeploy_0.4.2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/venv_lmdeploy_0.4.2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/venv_lmdeploy_0.4.2/lib/python3.8/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 978, in forward
    inputs_embeds = self.embed_tokens(input_ids)
  File "/root/venv_lmdeploy_0.4.2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/venv_lmdeploy_0.4.2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/venv_lmdeploy_0.4.2/lib/python3.8/site-packages/torch/nn/modules/sparse.py", line 162, in forward
    return F.embedding(
  File "/root/venv_lmdeploy_0.4.2/lib/python3.8/site-packages/torch/nn/functional.py", line 2233, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select)

The text was updated successfully, but these errors were encountered:

AllentDan · 2024-05-29T04:49:13Z

Qwen1.5-0.5B-Chat model is not supported by lmdeploy turbomind.

dawnranger · 2024-06-17T08:49:12Z

@AllentDan any play to support qwen2-0.5b?

AllentDan · 2024-06-17T09:00:07Z

After #1782

AllentDan · 2024-06-18T02:58:55Z

As replied in #1782 (comment), you may try Pytorch backend to run Qwen1.5-0.5B-Chat model.

AllentDan · 2024-06-18T02:59:21Z

The issue will be closed. Feel free to reopen it.

lvhan028 assigned AllentDan May 29, 2024

AllentDan closed this as completed Jun 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] lmdeploy lite auto_awq: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! #1675

[Bug] lmdeploy lite auto_awq: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! #1675

dawnranger commented May 29, 2024

AllentDan commented May 29, 2024

dawnranger commented Jun 17, 2024

AllentDan commented Jun 17, 2024

AllentDan commented Jun 18, 2024

AllentDan commented Jun 18, 2024

[Bug] lmdeploy lite auto_awq: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! #1675

[Bug] lmdeploy lite auto_awq: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! #1675

Comments

dawnranger commented May 29, 2024

Checklist

Describe the bug

Reproduction

Environment

Error traceback

AllentDan commented May 29, 2024

dawnranger commented Jun 17, 2024

AllentDan commented Jun 17, 2024

AllentDan commented Jun 18, 2024

AllentDan commented Jun 18, 2024