You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
lmdeploy lite auto_awq ./Qwen1.5-0.5b-chat --calib-dataset 'c4' --calib-samples 64 --calib-seqlen 2048 --w-bits 4 --w-group-size 128 --work-dir ./lmdeploy_hf
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Move model.embed_tokens to GPU.
Move model.layers.0 to CPU.
Move model.layers.1 to CPU.
Move model.layers.2 to CPU.
Move model.layers.3 to CPU.
Move model.layers.4 to CPU.
Move model.layers.5 to CPU.
Move model.layers.6 to CPU.
Move model.layers.7 to CPU.
Move model.layers.8 to CPU.
Move model.layers.9 to CPU.
Move model.layers.10 to CPU.
Move model.layers.11 to CPU.
Move model.layers.12 to CPU.
Move model.layers.13 to CPU.
Move model.layers.14 to CPU.
Move model.layers.15 to CPU.
Move model.layers.16 to CPU.
Move model.layers.17 to CPU.
Move model.layers.18 to CPU.
Move model.layers.19 to CPU.
Move model.layers.20 to CPU.
Move model.layers.21 to CPU.
Move model.layers.22 to CPU.
Move model.layers.23 to CPU.
Move model.norm to GPU.
Move lm_head to CPU.
Loading calibrate dataset ...
Traceback (most recent call last):
File "/root/venv_lmdeploy_0.4.2/bin/lmdeploy", line 8, in<module>sys.exit(run())
File "/root/venv_lmdeploy_0.4.2/lib/python3.8/site-packages/lmdeploy/cli/entrypoint.py", line 37, in run
args.run(args)
File "/root/venv_lmdeploy_0.4.2/lib/python3.8/site-packages/lmdeploy/cli/lite.py", line 137, in auto_awq
auto_awq(**kwargs)
File "/root/venv_lmdeploy_0.4.2/lib/python3.8/site-packages/lmdeploy/lite/apis/auto_awq.py", line 96, in auto_awq
vl_model, model, tokenizer, work_dir = calibrate(model,
File "/root/venv_lmdeploy_0.4.2/lib/python3.8/site-packages/lmdeploy/lite/apis/calibrate.py", line 235, in calibrate
calib_ctx.calibrate(all_data)
File "/root/venv_lmdeploy_0.4.2/lib/python3.8/site-packages/lmdeploy/lite/quantization/calibration.py", line 315, in calibrate
_ = model(data.to(self.device))
File "/root/venv_lmdeploy_0.4.2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/venv_lmdeploy_0.4.2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/root/venv_lmdeploy_0.4.2/lib/python3.8/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 978, in forward
inputs_embeds = self.embed_tokens(input_ids)
File "/root/venv_lmdeploy_0.4.2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/venv_lmdeploy_0.4.2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/root/venv_lmdeploy_0.4.2/lib/python3.8/site-packages/torch/nn/modules/sparse.py", line 162, in forward
return F.embedding(
File "/root/venv_lmdeploy_0.4.2/lib/python3.8/site-packages/torch/nn/functional.py", line 2233, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument forargument indexin method wrapper_CUDA__index_select)
The text was updated successfully, but these errors were encountered:
Checklist
Describe the bug
Error when I try to run lmdeploy lite auto_awq on model
Qwen1.5-0.5b-chat
Reproduction
lmdeploy lite auto_awq ./Qwen1.5-0.5b-chat
Environment
Error traceback
The text was updated successfully, but these errors were encountered: