Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--fp16 True question #78

Open
Liavan0122 opened this issue Jun 13, 2024 · 6 comments
Open

--fp16 True question #78

Liavan0122 opened this issue Jun 13, 2024 · 6 comments

Comments

@Liavan0122
Copy link

I use custom_finetune.sh and no other redundant parameter settings have been changed.
encountered a problem that is " raise ValueError("Type fp16 is not supported.")ValueError: Type fp16 is not supported."
All installation follows README.md.
However, I can set fp16 in other projects, under the same hardware device.
Please help me with some advice. Thank you !

@shiym2000
Copy link
Collaborator

Could you please provide more details about the experimental setup and the error encountered? Additionally, can you confirm if other scripts are running correctly?

@Liavan0122
Copy link
Author

Here is my script in scripts/tain/custom_finetune.sh. only change the DATA_PATH IMAGE_PATH and OUTPUT_PATH
and locolhost0,1,2,3 -> locolhost:0,1

DATA_PATH="/home/ailab830/liavanlinux/dl_hw3/TinyLLaVA_Factory/dataset/text_files/output_dataformat.json"
IMAGE_PATH="/home/ailab830/liavanlinux/dl_hw3/TinyLLaVA_Factory/dataset/images"
MODEL_MAX_LENGTH=3072
OUTPUT_DIR="/home/ailab830/liavanlinux/dl_hw3/TinyLLaVA_Factory/custom-finetune-TinyLLaVA-Phi-2-SigLIP-3.1B-lora"

deepspeed --include localhost:0,1 --master_port 29501 tinyllava/train/custom_finetune.py
--deepspeed ./scripts/zero2.json
--data_path $DATA_PATH
--image_folder $IMAGE_PATH
--is_multimodal True
--conv_version phi
--mm_vision_select_layer -2
--image_aspect_ratio square
--fp16 True
--training_recipe lora
--tune_type_llm lora
--tune_type_vision_tower frozen
--tune_vision_tower_from_layer 0
--tune_type_connector full
--lora_r 128
--lora_alpha 256
--group_by_modality_length False
--pretrained_model_path "tinyllava/TinyLLaVA-Phi-2-SigLIP-3.1B"
--output_dir $OUTPUT_DIR
--num_train_epochs 1
--per_device_train_batch_size 4
--per_device_eval_batch_size 4
--gradient_accumulation_steps 8
--evaluation_strategy "no"
--save_strategy "steps"
--save_steps 50000
--save_total_limit 1
--learning_rate 1e-4
--weight_decay 0.
--warmup_ratio 0.03
--lr_scheduler_type "cosine"
--logging_steps 1
--tf32 False
--model_max_length $MODEL_MAX_LENGTH
--gradient_checkpointing True
--dataloader_num_workers 8
--lazy_preprocess True
--report_to tensorboard
--tokenizer_use_fast False
--run_name custom-finetune-TinyLLaVA-Phi-2-SigLIP-3.1B-lora

And this is my error message.
......
base_model.model.connector._connector.2.weight: 6553600 parameters
base_model.model.connector._connector.2.bias: 2560 parameters
Traceback (most recent call last):
File "/home/ailab830/liavanlinux/dl_hw3/TinyLLaVA_Factory/tinyllava/train/custom_finetune.py", line 52, in
train()
File "/home/ailab830/liavanlinux/dl_hw3/TinyLLaVA_Factory/tinyllava/train/custom_finetune.py", line 47, in train
Traceback (most recent call last):
File "/home/ailab830/liavanlinux/dl_hw3/TinyLLaVA_Factory/tinyllava/train/custom_finetune.py", line 52, in
trainer.train()
File "/home/ailab830/liavanlinux/dl_hw3/.env/lib/python3.10/site-packages/transformers/trainer.py", line 1780, in train
train()
File "/home/ailab830/liavanlinux/dl_hw3/TinyLLaVA_Factory/tinyllava/train/custom_finetune.py", line 47, in train
trainer.train()
File "/home/ailab830/liavanlinux/dl_hw3/.env/lib/python3.10/site-packages/transformers/trainer.py", line 1780, in train
return inner_training_loop(return inner_training_loop(

File "/home/ailab830/liavanlinux/dl_hw3/.env/lib/python3.10/site-packages/transformers/trainer.py", line 1933, in _inner_training_loop
File "/home/ailab830/liavanlinux/dl_hw3/.env/lib/python3.10/site-packages/transformers/trainer.py", line 1933, in _inner_training_loop
model, self.optimizer = self.accelerator.prepare(self.model, self.optimizer)
File "/home/ailab830/liavanlinux/dl_hw3/.env/lib/python3.10/site-packages/accelerate/accelerator.py", line 1220, in prepare
model, self.optimizer = self.accelerator.prepare(self.model, self.optimizer)
File "/home/ailab830/liavanlinux/dl_hw3/.env/lib/python3.10/site-packages/accelerate/accelerator.py", line 1220, in prepare
result = self._prepare_deepspeed(*args)result = self._prepare_deepspeed(*args)

File "/home/ailab830/liavanlinux/dl_hw3/.env/lib/python3.10/site-packages/accelerate/accelerator.py", line 1605, in _prepare_deepspeed
File "/home/ailab830/liavanlinux/dl_hw3/.env/lib/python3.10/site-packages/accelerate/accelerator.py", line 1605, in _prepare_deepspeed
engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
File "/home/ailab830/liavanlinux/dl_hw3/.env/lib/python3.10/site-packages/deepspeed/init.py", line 176, in initialize
engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
File "/home/ailab830/liavanlinux/dl_hw3/.env/lib/python3.10/site-packages/deepspeed/init.py", line 176, in initialize
engine = DeepSpeedEngine(args=args,
File "/home/ailab830/liavanlinux/dl_hw3/.env/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 240, in init
engine = DeepSpeedEngine(args=args,
File "/home/ailab830/liavanlinux/dl_hw3/.env/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 240, in init
self._do_sanity_check()self._do_sanity_check()

File "/home/ailab830/liavanlinux/dl_hw3/.env/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1040, in _do_sanity_check
File "/home/ailab830/liavanlinux/dl_hw3/.env/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1040, in _do_sanity_check
raise ValueError("Type fp16 is not supported.")raise ValueError("Type fp16 is not supported.")

ValueErrorValueError: : Type fp16 is not supported.Type fp16 is not supported.

Thank for help !

@YingHuTsing
Copy link
Collaborator

Hi, could you please check your the version of your packages? accelerate==0.27.2? deepspeed==0.14.0?

@Liavan0122
Copy link
Author

yes, they are same.
accelerate 0.27.2
deepspeed 0.14.0

And I re-downloaded again, but doesn't set up on conda environment.
I encounter same --fp16 problem.

@YingHuTsing
Copy link
Collaborator

from deepspeed.accelerator import get_accelerator
flag = get_accelerator().is_fp16_supported()
print(flag)

please check this flag is True or False.

If it's False, then it seems your environment of GPU and CUDA and Deepspeed/Accelerator does not support fp16. Not sure the versions of them are compatible with each other.

@Liavan0122
Copy link
Author

flag is False.
Thank you for help !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants