Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2badd76 breaks examples.models.llama2.export_llama #3983

Open
amqdn opened this issue Jun 14, 2024 · 3 comments
Open

2badd76 breaks examples.models.llama2.export_llama #3983

amqdn opened this issue Jun 14, 2024 · 3 comments

Comments

@amqdn
Copy link

amqdn commented Jun 14, 2024

Hello!

Commit 2badd76 appears to break examples.models.llama2.export_llama, specifically with Llama 3.

Expected Behavior

[INFO 2024-06-14 16:04:23,366 export_llama_lib.py:390] Applying quantizers: []
[INFO 2024-06-14 16:04:23,366 builder.py:91] Loading model with checkpoint=/home/user/.cache/meta/Meta-Llama-3-8B-Instruct/original/consolidated.00.pth, params=/home/user/.cache/meta/Meta-Llama-3-8B-Instruct/original/params.json, use_kv_cache=True, weight_type=WeightType.LLAMA
[INFO 2024-06-14 16:04:25,619 builder.py:112] Loaded model with dtype=torch.bfloat16
[INFO 2024-06-14 16:04:25,920 config.py:58] PyTorch version 2.4.0.dev20240507+cpu available.
linear: layers.0.attention.wq, in=4096, out=4096
linear: layers.0.attention.wk, in=4096, out=1024
linear: layers.0.attention.wv, in=4096, out=1024
linear: layers.0.attention.wo, in=4096, out=4096
linear: layers.0.feed_forward.w1, in=4096, out=14336
linear: layers.0.feed_forward.w2, in=14336, out=4096
linear: layers.0.feed_forward.w3, in=4096, out=14336

...

modelname: llama3
output_file: llama3.pte
[INFO 2024-06-14 16:10:11,610 utils.py:114] Saved exported program to llama3.pte

Current Behavior

[INFO 2024-06-14 16:15:24,437 export_llama_lib.py:390] Applying quantizers: []
[INFO 2024-06-14 16:15:24,437 builder.py:91] Loading model with checkpoint=/home/user/.cache/meta/Meta-Llama-3-8B-Instruct/original/consolidated.00.pth, params=/home/user/.cache/meta/Meta-Llama-3-8B-Instruct/original/params.json, use_kv_cache=True, weight_type=WeightType.LLAMA
[INFO 2024-06-14 16:15:24,834 builder.py:112] Loaded model with dtype=torch.bfloat16
[INFO 2024-06-14 16:15:24,834 builder.py:197] model.to torch.float32
Killed

Steps to Reproduce

  1. Install ExecuTorch w/ XNNPACK from scratch:
git clone --branch main https://github.com/pytorch/executorch.git
cd executorch

git submodule sync
git submodule update --init

./install_requirements.sh --pybind xnnpack
  1. Install examples.models.llama2 dependencies:
./examples/models/llama2/install_requirements.sh
  1. (Optional) Download Meta Llama3, if necessary
  2. Verify export fails @ main:
python -m examples.models.llama2.export_llama --checkpoint <consolidated.00.pth> -p <params.json> -kv --use_sdpa_with_kv_cache -X -qmode 8da4w  --group_size 128 -d fp32 --metadata '{"get_bos_id":128000, "get_eos_id":128001}' --embedding-quantize 4,32 --output_name="llama3.pte"
  1. Verify export succeeds @ parent fbbba34:
git checkout fbbba34
# Repeat export command
  1. Verify export fails @ 2badd76:
git checkout 2badd76
# Repeat export command

Request

Please examine 2badd76 and verify the problem. If confirmed, please fix. Thanks!

@AgainstEntropy
Copy link

This behavior (outputting only Killed) may indicate that an OOM error has occurred.
FYI, it requires ~30G RAM for exporting llama2-7B with flags of -kv --use_sdpa_with_kv_cache -X -qmode 8da4w --group_size 128 -d fp32 on my side.

@cbilgin
Copy link

cbilgin commented Jun 24, 2024

@larryliu0820 2badd76 I guess this is your PR. Any chance you know what's going on here?

@larryliu0820
Copy link
Contributor

@cccclai can you take a look? Seems the PR causes OOM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants