Skip to content

Latest commit

 

History

History
805 lines (763 loc) · 24.1 KB

Benchmark.md

File metadata and controls

805 lines (763 loc) · 24.1 KB

Benchmark

Table of Contents

Parameter Settings

Experimental environment:

  • A100
  • CUDA 11.8
  • python 3.10
  • torch 2.1.1
  • flash_attn 2.3.4
  • xformers 0.0.23
  • auto_gptq 0.5.1
  • bitsandbytes 0.41.3.post2

The following are the same command line settings for all experiments:

    --dataset_test_ratio 0 \
    --dataset cls-fudan-news-zh \
    --save_strategy no \
    --check_dataset_strategy warning \
    --preprocess_num_proc 4 \

If the following parameters are not specified, the following default values are used:

    --max_length 2048 \
    --batch_size 1 \
    --gradient_checkpointing true \
    --use_flash_attn true \
    --lora_rank 8 \
    --lora_target_modules DEFAULT \
    --quantization_bit 0 \
    --gradient_accumulation_steps 16 \

Token statistics of the corresponding test dataset (obtained by qwen's tokenizer): 3234.4±2547.5, min=91, max=19548.

The experimental script can be found in scripts/benchmark/test_memory_time/.

Quantization

The test script is:

swift sft \
    --model_type {MODEL_TYPE} \
    --quantization_bit {QUANTIZATION_BIT} \
    --sft_type lora \
    ...
Model Type [LoRA] Quantization Training Speed (samples/s) GPU Memory (GiB)
qwen-7b-chat bf16 4.31 27.74
int4 (gptq) 2.05 19.21
int8 (gptq) 1.97 22.20
int4 (bnb) 2.41 23.85
qwen-14b-chat bf16 2.60 40.14
int4 (gptq) 1.15 23.30
int8 (gptq) 1.08 29.13
int4 (bnb) 1.36 30.05
qwen-72b-chat bf16 0.59 (2*A100) 73.71+78.54
int4 (gptq) 0.23 54.86
int8 (gptq) 0.21 78.44
int4 (bnb) 0.28 74.87

Model Type & Max Length

LoRA

The test script is:

swift sft \
    --model_type {MODEL_TYPE} \
    --max_length {MAX_LENGTH} \
    --sft_type lora \
    ...
Model Type [LoRA] Max Length Training Speed (samples/s) GPU Memory (GiB)
qwen-1_8b-chat 512 9.88 6.99
1024 9.90 10.71
2048 8.77 16.35
4096 5.92 23.80
8192 4.19 37.03
qwen-7b-chat 512 7.43 18.01
1024 6.51 21.73
2048 4.31 27.74
4096 2.05 35.31
8192 1.34 48.41
qwen-14b-chat 512 5.63 30.14
1024 4.36 34.43
2048 2.60 40.14
4096 1.17 47.95
8192 0.79 60.74
qwen-72b-chat (2*A100) 512 1.41 67.68+73.07
1024 1.02 70.25+77.11
2048 0.59 73.71+78.54
4096 - OOM
8192 - OOM
chatglm3-6b 512 6.72 13.94
1024 6.16 12.99
2048 4.20 17.20
4096 1.92 29.80
8192 1.24 66.82
yi-6b-chat 512 5.27 13.72
1024 5.07 15.44
2048 3.84 16.95
4096 1.99 28.25
8192 1.35 43.81
yi-34b-chat 512 2.32 66.72
1024 1.76 69.10
2048 1.05 71.34
4096 0.47 78.72
8192 0.31 (2*A100) 47.01+65.03
openbuddy-zephyr-7b-chat 512 5.17 14.99
1024 3.92 16.57
2048 3.08 19.89
4096 1.85 23.29
8192 0.92 52.14
baichuan2-7b-chat 512 6.09 18.18
1024 5.36 17.45
2048 3.43 19.18
4096 1.69 34.22
8192 1.16 45.47
baichuan2-13b-chat 512 5.32 31.01
1024 3.91 31.58
2048 1.77 32.40
4096 0.65 49.63
8192 0.36 76.17

Full

The test script is:

swift sft \
    --model_type {MODEL_TYPE} \
    --max_length {MAX_LENGTH} \
    --sft_type full \
    ...
Model Type [FULL] Max Length Training Speed (samples/s) GPU Memory (GiB)
qwen-1_8b-chat 512 10.77 18.16
1024 10.39 18.62
2048 8.73 35.11
4096 5.45 31.62
8192 3.81 38.93
qwen-7b-chat 512 5.96 73.37
1024 5.00 73.64
2048 3.30 74.26
4096 1.64 78.76
8192 1.11 (2*A100) 61.34+73.00
qwen-14b-chat (2*A100) 512 3.66 60.42+72.31
1024 2.98 60.61+74.37
2048 1.93 60.70+78.22
4096 0.92 75.59+78.64
8192 0.62 76.59+77.68

Batch Size

The test script is:

swift sft \
    --batch_size {BATCH_SIZE} \
    --model_type qwen-7b-chat \
    --sft_type lora \
    ...
Model Type [LoRA] Batch Size Training Speed (samples/s) GPU Memory (GiB)
qwen-7b-chat 1 4.31 27.74
2 3.60 43.11
4 3.02 63.81
8 2.77 76.14

Use Flash Attn & Gradient Checkpointing

The test script is:

swift sft \
    --use_flash_attn {USE_FLASH_ATTN} \
    --gradient_checkpointing {GRADIENT_CHECKPOINTING} \
    --model_type qwen-7b-chat \
    --sft_type lora \
    ...
Model Type [LoRA] Use Flash Attn Gradient Checkpointing Training Speed (samples/s) GPU Memory (GiB)
qwen-7b-chat 4.31 27.74
6.19 37.70
3.13 27.71
4.45 57.67

LoRA Rank & LoRA Target Modules

The test script is:

swift sft \
    --lora_rank {LORA_RANK} \
    --lora_target_modules {LORA_TARGET_MODULES} \
    --model_type qwen-7b-chat \
    --sft_type lora \
    ...
Model Type [LoRA] LoRA Rank LoRA Target Modules Training Speed (samples/s) GPU Memory (GiB) Trainable Params (M)
qwen-7b-chat 2 DEFAULT (c_attn) 4.27 27.72 1.05
8 DEFAULT 4.31 27.74 4.19
64 DEFAULT 4.19 27.85 33.55
8 ALL (all linear) 3.22 27.87 17.89

Gradient Accumulation Steps

The test script is:

swift sft \
    --gradient_accumulation_steps {GRADIENT_ACCUMULATION_STEPS} \
    --model_type qwen-7b-chat \
    --sft_type lora \
    ...
Model Type [LoRA] Gradient Accumulation Steps Training Speed (samples/s) GPU Memory (GiB)
qwen-7b-chat 1 4.26 27.73
2 4.32 27.74
4 4.31 27.74
8 4.32 27.74
16 4.33 27.74
32 4.30 27.74
64 4.32 27.74

Tuners

exp_name model_type dataset ms-bench mix ratio tuner tuner_params trainable params(M) flash_attn gradient_checkpointing hypers memory train speed(samples/s) infer speed(tokens/s) train_loss eval_loss gsm8k weighted acc arc weighted acc ceval weighted acc
adalora qwen-7b-chat ms-agent 2.0 adalora rank=8/target=ALL/alpha=32/lr_ratio=None/use_rslora=False/use_dora=False 26.8389(0.3464%) True True lr=5e-05/epoch=2 32.55GiB 0.92(87543 samples/95338.71 seconds) 17.33(2345 tokens/135.29 seconds) 0.57 1.07 0.391 0.665 0.569
adapter qwen-7b-chat ms-agent 2.0 adapter 33.6896(0.4344%) True True lr=5e-05/epoch=2 32.19GiB 1.48(87543 samples/59067.71 seconds) 26.63(4019 tokens/150.90 seconds) 0.55 1.03 0.438 0.662 0.565
dora qwen-7b-chat ms-agent 2.0 lora rank=8/target=ALL/alpha=32/lr_ratio=None/use_rslora=False/use_dora=True 19.2512(0.2487%) True True lr=5e-05/epoch=2 32.46GiB 0.51(87543 samples/171110.54 seconds) 4.29(2413 tokens/562.32 seconds) 0.53 1.01 0.466 0.683 0.577
full+galore128 qwen-7b-chat ms-agent 2.0 full galore_rank=128/galore_per_parameter=false/galore_with_embedding=false 7721.3245(100.0000%) True True lr=5e-05/epoch=2 47.02GiB 1.10(87543 samples/79481.96 seconds) 28.96(2400 tokens/82.88 seconds) 0.55 1.00 0.358 0.688 0.577
full+galore32 qwen-7b-chat ms-agent 2.0 full galore_rank=32/galore_per_parameter=false/galore_with_embedding=false 7721.3245(100.0000%) True True lr=5e-05/epoch=2 47.05GiB 1.11(87543 samples/78989.74 seconds) 29.17(2431 tokens/83.35 seconds) 0.56 1.01 0.386 0.667 0.539
full+galore64 qwen-7b-chat ms-agent 2.0 full galore_rank=64/galore_per_parameter=false/galore_with_embedding=false 7721.3245(100.0000%) True True lr=5e-05/epoch=2 46.91GiB 1.11(87543 samples/79200.36 seconds) 28.94(2448 tokens/84.60 seconds) 0.56 1.01 0.397 0.674 0.544
full+galore_emb qwen-7b-chat ms-agent 2.0 full galore_rank=128/galore_per_parameter=false/galore_with_embedding=true 7721.3245(100.0000%) True True lr=5e-05/epoch=2 44.53GiB 1.10(87543 samples/79775.02 seconds) 29.45(2433 tokens/82.62 seconds) 0.55 1.00 0.398 0.670 0.568
full+galore_perparam qwen-7b-chat ms-agent 2.0 full galore_rank=128/galore_per_parameter=true/galore_with_embedding=false 7721.3245(100.0000%) True True lr=5e-05/epoch=2 47.02GiB 1.25(87543 samples/69821.89 seconds) 29.02(2478 tokens/85.39 seconds) 0.54 1.00 0.372 0.669 0.524
full+no_mix qwen-7b-chat ms-agent 0.0 full 7721.3245(100.0000%) True True lr=5e-05/epoch=2 72.56GiB 1.27(29698 samples/23356.97 seconds) 30.31(11738 tokens/387.29 seconds) 0.57 0.44 0.174 0.652 0.553
full qwen-7b-chat ms-agent 2.0 full 7721.3245(100.0000%) True True lr=5e-05/epoch=2 73.53GiB 1.43(87543 samples/61022.97 seconds) 29.51(3382 tokens/114.62 seconds) 0.54 0.95 0.343 0.536 0.495
llamapro qwen-7b-chat ms-agent 2.0 llamapro num_blocks=4 809.5826(9.4900%) True True lr=5e-05/epoch=2 38.11GiB 1.53(87543 samples/57294.42 seconds) 25.80(2374 tokens/92.02 seconds) 0.53 1.00 0.434 0.645 0.357
lora+ qwen-7b-chat ms-agent 2.0 lora rank=8/target=ALL/alpha=32/lr_ratio=16.0/use_rslora=False/use_dora=False 17.8913(0.2312%) True True lr=5e-05/epoch=2 32.35GiB 0.95(87543 samples/91923.80 seconds) 18.81(3329 tokens/176.94 seconds) 0.53 0.98 0.432 0.647 0.344
lora+neftune qwen-7b-chat ms-agent 2.0 lora rank=8/target=ALL/alpha=32/lr_ratio=None/use_rslora=False/use_dora=False/neftune_noise_alpha=15.0 17.8913(0.2312%) True True lr=5e-05/epoch=2 32.35GiB 0.96(87543 samples/91525.50 seconds) 19.84(161792 tokens/8156.02 seconds) 0.53 1.02 0.456 0.671 0.401
lora+no_mix qwen-7b-chat ms-agent 0.0 lora rank=8/target=ALL/alpha=32/lr_ratio=None/use_rslora=False/use_dora=False 17.8913(0.2312%) True True lr=5e-05/epoch=2 30.86GiB 0.91(29698 samples/32570.15 seconds) 19.89(36308 tokens/1825.26 seconds) 0.53 0.53 0.470 0.666 0.574
lora qwen-7b-chat ms-agent 2.0 lora rank=8/target=ALL/alpha=32/lr_ratio=None/use_rslora=False/use_dora=False 17.8913(0.2312%) True True lr=5e-05/epoch=2 32.35GiB 0.95(87543 samples/91974.29 seconds) 18.11(2415 tokens/133.32 seconds) 0.53 1.01 0.462 0.676 0.304
qwen-7b-chat-eval qwen-7b-chat None 0.0 None None(None) None 30.81(13765 tokens/446.83 seconds) 0.517 0.679 0.568
rslora qwen-7b-chat ms-agent 2.0 lora rank=8/target=ALL/alpha=32/lr_ratio=None/use_rslora=True/use_dora=False 17.8913(0.2312%) True True lr=5e-05/epoch=2 32.35GiB 0.94(87543 samples/92758.63 seconds) 18.87(2762 tokens/146.34 seconds) 0.53 0.99 0.451 0.679 0.339
full+lisa_2 qwen-7b-chat ms-agent 2.0 full lisa_activated_layers=2/lisa_step_interval=20 - True True lr=5e-05/epoch=2 31.11GiB 2.66(76837 samples/28881.28 seconds) 36.10(134469 tokens/3725.21 seconds) 0.62 1.06 0.349 0.653 0.592
full+lisa_4 qwen-7b-chat ms-agent 2.0 full lisa_activated_layers=4/lisa_step_interval=20 - True True lr=5e-05/epoch=2 31.87GiB 2.63(76837 samples/29215.15 seconds) 36.75(135477 tokens/3686.17 seconds) 0.63 1.06 0.377 0.656 0.607
lora+packing+ddp qwen-7b-chat ms-agent 2.0 lora rank=8/target=ALL/alpha=32/lr_ratio=None/use_rslora=False/use_dora=False/packing=True 17.8913(0.2312%) True True lr=5e-05/epoch=2 35.65GiB*2 1.56(7900 samples/5057.30 seconds) 26.20(421094 tokens/16073.09 seconds) 0.63 0.98 0.473 0.664 0.552
lora+packing+lazytokenize qwen-7b-chat ms-agent 2.0 lora rank=8/target=ALL/alpha=32/lr_ratio=None/use_rslora=False/use_dora=False/packing=True 17.8913(0.2312%) True True lr=5e-05/epoch=2 32.83GiB 7.69(78237 samples/10179.40 seconds) 25.86(307390 tokens/11888.17 seconds) 0.63 1.04 0.472 0.660 0.554
lora+packing qwen-7b-chat ms-agent 2.0 lora rank=8/target=ALL/alpha=32/lr_ratio=None/use_rslora=False/use_dora=False/packing=True 17.8913(0.2312%) True True lr=5e-05/epoch=2 28.06GiB 0.79(7900 samples/10048.53 seconds) 26.12(409507 tokens/15675.36 seconds) 0.61 0.95 0.492 0.676 0.539

unsloth

exp_name model_type dataset ms-bench mix ratio tuner tuner_params trainable params(M) flash_attn gradient_checkpointing hypers memory train speed(samples/s) infer speed(tokens/s) train_loss eval_loss gsm8k weighted acc arc weighted acc ceval weighted acc
unsloth+lora+q4 llama3-8b-instruct ms-agent 2.0 lora 4.7186(0.1038%) True True lr=5e-05/epoch=2 21.69GiB 1.76(76839 samples/43763.01 seconds) 15.22(160885 tokens/10570.90 seconds) 0.58 1.03 0.668 0.755 0.501

Export

exp_name model_type calibration dataset quantization method quantization bits infer speed(tokens/s) gsm8k weighted acc arc weighted acc ceval weighted acc
awq-ms-bench-mini qwen-7b-chat ms-bench-mini awq 4 27.25(16501 tokens/605.47 seconds) 0.494 0.665 0.571
awq-pileval qwen-7b-chat pileval awq 4 26.92(12994 tokens/482.72 seconds) 0.497 0.675 0.577
gptq-ms-bench-mini qwen-7b-chat ms-bench-mini gptq 4 31.16(15349 tokens/492.54 seconds) 0.482 0.642 0.556
gptq-pileval qwen-7b-chat pileval gptq 4 31.67(15185 tokens/479.54 seconds) 0.478 0.654 0.559

AWQ

exp_name model_type dataset ms-bench mix ratio tuner tuner_params trainable params(M) flash_attn gradient_checkpointing hypers memory train speed(samples/s) infer speed(tokens/s) train_loss eval_loss gsm8k weighted acc arc weighted acc ceval weighted acc
qwen1half-7b-chat-awq qwen1half-7b-chat-awq ms-agent 2.0 lora rank=8/target=ALL/alpha=32/lr_ratio=None/use_rslora=False/use_dora=False 19.9885(1.5802%) True True lr=5e-05/epoch=2 24.26GiB 0.45(87543 samples/194746.58 seconds) 16.08(2469 tokens/153.58 seconds) 0.55 1.19 0.505 0.737 0.656

AQLM

exp_name model_type dataset ms-bench mix ratio tuner tuner_params trainable params(M) flash_attn gradient_checkpointing hypers memory train speed(samples/s) infer speed(tokens/s) train_loss eval_loss gsm8k weighted acc arc weighted acc ceval weighted acc
llama2-7b-aqlm-2bit-1x16 llama2-7b-aqlm-2bit-1x16 dureader-robust-zh 0.0 lora rank=8/target=ALL/alpha=32/lr_ratio=None/use_rslora=False/use_dora=False 19.9885(1.6510%) True True lr=5e-05/epoch=2 4.04GiB 0.17(14994 samples/86140.71 seconds) 0.48 0.74

Sequence Parallel

Model Dataset Hyper params Total steps Train speed Gpu memory
chatglm3-6b-32k long-alpaca-12k(8055 tokens * 12000 rows) gpu=2/sequence_parallel_size=1(2 GPU DDP baseline) 5940 0.30iter/s(5h13min total) 27G*2
gpu=2/sequence_parallel_size=2(2 GPU with sequence parallel 2) 11880 0.5iter/s(6h total) 20G*2
gpu=4/sequence_parallel_size=4(4 GPU with sequence parallel 4) 11880 1iter/s(3h20min total) 18G*4
gpu=4/sequence_parallel_size=2(4 GPU sequence parallel 2) 5940 0.45iter/s(3h total) 21G*4