--fp16 True question #78

Liavan0122 · 2024-06-13T07:41:33Z

I use custom_finetune.sh and no other redundant parameter settings have been changed.
encountered a problem that is " raise ValueError("Type fp16 is not supported.")ValueError: Type fp16 is not supported."
All installation follows README.md.
However, I can set fp16 in other projects, under the same hardware device.
Please help me with some advice. Thank you !

shiym2000 · 2024-06-14T10:49:42Z

Could you please provide more details about the experimental setup and the error encountered? Additionally, can you confirm if other scripts are running correctly?

Liavan0122 · 2024-06-15T09:19:07Z

Here is my script in scripts/tain/custom_finetune.sh. only change the DATA_PATH IMAGE_PATH and OUTPUT_PATH
and locolhost0,1,2,3 -> locolhost:0,1

DATA_PATH="/home/ailab830/liavanlinux/dl_hw3/TinyLLaVA_Factory/dataset/text_files/output_dataformat.json"
IMAGE_PATH="/home/ailab830/liavanlinux/dl_hw3/TinyLLaVA_Factory/dataset/images"
MODEL_MAX_LENGTH=3072
OUTPUT_DIR="/home/ailab830/liavanlinux/dl_hw3/TinyLLaVA_Factory/custom-finetune-TinyLLaVA-Phi-2-SigLIP-3.1B-lora"

deepspeed --include localhost:0,1 --master_port 29501 tinyllava/train/custom_finetune.py
--deepspeed ./scripts/zero2.json
--data_path $DATA_PATH
--image_folder $IMAGE_PATH
--is_multimodal True
--conv_version phi
--mm_vision_select_layer -2
--image_aspect_ratio square
--fp16 True
--training_recipe lora
--tune_type_llm lora
--tune_type_vision_tower frozen
--tune_vision_tower_from_layer 0
--tune_type_connector full
--lora_r 128
--lora_alpha 256
--group_by_modality_length False
--pretrained_model_path "tinyllava/TinyLLaVA-Phi-2-SigLIP-3.1B"
--output_dir $OUTPUT_DIR
--num_train_epochs 1
--per_device_train_batch_size 4
--per_device_eval_batch_size 4
--gradient_accumulation_steps 8
--evaluation_strategy "no"
--save_strategy "steps"
--save_steps 50000
--save_total_limit 1
--learning_rate 1e-4
--weight_decay 0.
--warmup_ratio 0.03
--lr_scheduler_type "cosine"
--logging_steps 1
--tf32 False
--model_max_length $MODEL_MAX_LENGTH
--gradient_checkpointing True
--dataloader_num_workers 8
--lazy_preprocess True
--report_to tensorboard
--tokenizer_use_fast False
--run_name custom-finetune-TinyLLaVA-Phi-2-SigLIP-3.1B-lora

And this is my error message.
......
base_model.model.connector._connector.2.weight: 6553600 parameters
base_model.model.connector._connector.2.bias: 2560 parameters
Traceback (most recent call last):
File "/home/ailab830/liavanlinux/dl_hw3/TinyLLaVA_Factory/tinyllava/train/custom_finetune.py", line 52, in
train()
File "/home/ailab830/liavanlinux/dl_hw3/TinyLLaVA_Factory/tinyllava/train/custom_finetune.py", line 47, in train
Traceback (most recent call last):
File "/home/ailab830/liavanlinux/dl_hw3/TinyLLaVA_Factory/tinyllava/train/custom_finetune.py", line 52, in
trainer.train()
File "/home/ailab830/liavanlinux/dl_hw3/.env/lib/python3.10/site-packages/transformers/trainer.py", line 1780, in train
train()
File "/home/ailab830/liavanlinux/dl_hw3/TinyLLaVA_Factory/tinyllava/train/custom_finetune.py", line 47, in train
trainer.train()
File "/home/ailab830/liavanlinux/dl_hw3/.env/lib/python3.10/site-packages/transformers/trainer.py", line 1780, in train
return inner_training_loop(return inner_training_loop(

File "/home/ailab830/liavanlinux/dl_hw3/.env/lib/python3.10/site-packages/transformers/trainer.py", line 1933, in _inner_training_loop
File "/home/ailab830/liavanlinux/dl_hw3/.env/lib/python3.10/site-packages/transformers/trainer.py", line 1933, in _inner_training_loop
model, self.optimizer = self.accelerator.prepare(self.model, self.optimizer)
File "/home/ailab830/liavanlinux/dl_hw3/.env/lib/python3.10/site-packages/accelerate/accelerator.py", line 1220, in prepare
model, self.optimizer = self.accelerator.prepare(self.model, self.optimizer)
File "/home/ailab830/liavanlinux/dl_hw3/.env/lib/python3.10/site-packages/accelerate/accelerator.py", line 1220, in prepare
result = self._prepare_deepspeed(*args)result = self._prepare_deepspeed(*args)

File "/home/ailab830/liavanlinux/dl_hw3/.env/lib/python3.10/site-packages/accelerate/accelerator.py", line 1605, in _prepare_deepspeed
File "/home/ailab830/liavanlinux/dl_hw3/.env/lib/python3.10/site-packages/accelerate/accelerator.py", line 1605, in _prepare_deepspeed
engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
File "/home/ailab830/liavanlinux/dl_hw3/.env/lib/python3.10/site-packages/deepspeed/init.py", line 176, in initialize
engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
File "/home/ailab830/liavanlinux/dl_hw3/.env/lib/python3.10/site-packages/deepspeed/init.py", line 176, in initialize
engine = DeepSpeedEngine(args=args,
File "/home/ailab830/liavanlinux/dl_hw3/.env/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 240, in init
engine = DeepSpeedEngine(args=args,
File "/home/ailab830/liavanlinux/dl_hw3/.env/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 240, in init
self._do_sanity_check()self._do_sanity_check()

File "/home/ailab830/liavanlinux/dl_hw3/.env/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1040, in _do_sanity_check
File "/home/ailab830/liavanlinux/dl_hw3/.env/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1040, in _do_sanity_check
raise ValueError("Type fp16 is not supported.")raise ValueError("Type fp16 is not supported.")

ValueErrorValueError: : Type fp16 is not supported.Type fp16 is not supported.

Thank for help !

YingHuTsing · 2024-06-15T14:27:23Z

Hi, could you please check your the version of your packages? accelerate==0.27.2? deepspeed==0.14.0?

Liavan0122 · 2024-06-16T11:41:54Z

yes, they are same.
accelerate 0.27.2
deepspeed 0.14.0

And I re-downloaded again, but doesn't set up on conda environment.
I encounter same --fp16 problem.

YingHuTsing · 2024-06-17T00:37:58Z

from deepspeed.accelerator import get_accelerator
flag = get_accelerator().is_fp16_supported()
print(flag)

please check this flag is True or False.

If it's False, then it seems your environment of GPU and CUDA and Deepspeed/Accelerator does not support fp16. Not sure the versions of them are compatible with each other.

Liavan0122 · 2024-06-17T03:55:01Z

flag is False.
Thank you for help !

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

--fp16 True question #78

--fp16 True question #78

Liavan0122 commented Jun 13, 2024

shiym2000 commented Jun 14, 2024

Liavan0122 commented Jun 15, 2024

YingHuTsing commented Jun 15, 2024

Liavan0122 commented Jun 16, 2024

YingHuTsing commented Jun 17, 2024

Liavan0122 commented Jun 17, 2024

--fp16 True question #78

--fp16 True question #78

Comments

Liavan0122 commented Jun 13, 2024

shiym2000 commented Jun 14, 2024

Liavan0122 commented Jun 15, 2024

YingHuTsing commented Jun 15, 2024

Liavan0122 commented Jun 16, 2024

YingHuTsing commented Jun 17, 2024

Liavan0122 commented Jun 17, 2024