2badd76 breaks examples.models.llama2.export_llama #3983

amqdn · 2024-06-14T21:42:39Z

Hello!

Commit 2badd76 appears to break examples.models.llama2.export_llama, specifically with Llama 3.

Expected Behavior

[INFO 2024-06-14 16:04:23,366 export_llama_lib.py:390] Applying quantizers: []
[INFO 2024-06-14 16:04:23,366 builder.py:91] Loading model with checkpoint=/home/user/.cache/meta/Meta-Llama-3-8B-Instruct/original/consolidated.00.pth, params=/home/user/.cache/meta/Meta-Llama-3-8B-Instruct/original/params.json, use_kv_cache=True, weight_type=WeightType.LLAMA
[INFO 2024-06-14 16:04:25,619 builder.py:112] Loaded model with dtype=torch.bfloat16
[INFO 2024-06-14 16:04:25,920 config.py:58] PyTorch version 2.4.0.dev20240507+cpu available.
linear: layers.0.attention.wq, in=4096, out=4096
linear: layers.0.attention.wk, in=4096, out=1024
linear: layers.0.attention.wv, in=4096, out=1024
linear: layers.0.attention.wo, in=4096, out=4096
linear: layers.0.feed_forward.w1, in=4096, out=14336
linear: layers.0.feed_forward.w2, in=14336, out=4096
linear: layers.0.feed_forward.w3, in=4096, out=14336

...

modelname: llama3
output_file: llama3.pte
[INFO 2024-06-14 16:10:11,610 utils.py:114] Saved exported program to llama3.pte

Current Behavior

[INFO 2024-06-14 16:15:24,437 export_llama_lib.py:390] Applying quantizers: []
[INFO 2024-06-14 16:15:24,437 builder.py:91] Loading model with checkpoint=/home/user/.cache/meta/Meta-Llama-3-8B-Instruct/original/consolidated.00.pth, params=/home/user/.cache/meta/Meta-Llama-3-8B-Instruct/original/params.json, use_kv_cache=True, weight_type=WeightType.LLAMA
[INFO 2024-06-14 16:15:24,834 builder.py:112] Loaded model with dtype=torch.bfloat16
[INFO 2024-06-14 16:15:24,834 builder.py:197] model.to torch.float32
Killed

Steps to Reproduce

Install ExecuTorch w/ XNNPACK from scratch:

git clone --branch main https://github.com/pytorch/executorch.git
cd executorch

git submodule sync
git submodule update --init

./install_requirements.sh --pybind xnnpack

Install examples.models.llama2 dependencies:

./examples/models/llama2/install_requirements.sh

(Optional) Download Meta Llama3, if necessary
Verify export fails @ main:

python -m examples.models.llama2.export_llama --checkpoint <consolidated.00.pth> -p <params.json> -kv --use_sdpa_with_kv_cache -X -qmode 8da4w  --group_size 128 -d fp32 --metadata '{"get_bos_id":128000, "get_eos_id":128001}' --embedding-quantize 4,32 --output_name="llama3.pte"

Verify export succeeds @ parent fbbba34:

git checkout fbbba34
# Repeat export command

Verify export fails @ 2badd76:

git checkout 2badd76
# Repeat export command

Request

Please examine 2badd76 and verify the problem. If confirmed, please fix. Thanks!

The text was updated successfully, but these errors were encountered:

AgainstEntropy · 2024-06-18T18:27:18Z

This behavior (outputting only Killed) may indicate that an OOM error has occurred.
FYI, it requires ~30G RAM for exporting llama2-7B with flags of -kv --use_sdpa_with_kv_cache -X -qmode 8da4w --group_size 128 -d fp32 on my side.

cbilgin · 2024-06-24T17:06:03Z

@larryliu0820 2badd76 I guess this is your PR. Any chance you know what's going on here?

larryliu0820 · 2024-06-26T15:10:21Z

@cccclai can you take a look? Seems the PR causes OOM

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2badd76 breaks examples.models.llama2.export_llama #3983

2badd76 breaks examples.models.llama2.export_llama #3983

amqdn commented Jun 14, 2024

AgainstEntropy commented Jun 18, 2024

cbilgin commented Jun 24, 2024

larryliu0820 commented Jun 26, 2024

2badd76 breaks examples.models.llama2.export_llama #3983

2badd76 breaks examples.models.llama2.export_llama #3983

Comments

amqdn commented Jun 14, 2024

Expected Behavior

Current Behavior

Steps to Reproduce

Request

AgainstEntropy commented Jun 18, 2024

cbilgin commented Jun 24, 2024

larryliu0820 commented Jun 26, 2024