-
Notifications
You must be signed in to change notification settings - Fork 227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
单卡3090ti进行lora微调,遇到了OOM问题 #228
Comments
可以直接执行非ds的任务吧,还是都报错呢 |
执行非ds任务会报错,是在wsl环境中进行微调。 |
更新了最新的微调代码吗,老代码确实可能会爆显存 |
更新最新微调代码后,开始训练时loss一直为0 |
确定你是使用BF16精度微调 |
请问是在哪个地方确定用BF16精度微调呀? |
已在lora配置文件中指定了bf16字段为true lora.yaml |
data_config: see
|
要加上 能截图看一下数据集载入的运行截图吗 |
你的数据集内容有被正常识别嘛,我建议在开始微调之前,你check一下apply chat template后label的部分 |
经调试发现是构建 |
但是修改后还是出现了OOM问题😂
|
非常感谢,成功读取到了;OOM的原因可能是数据集太大了以及训练轮次导致;这边是多卡微调的时候由于其中某一张卡占用不够也报错了,但是限制这张卡不使用的时候就成功运行了。 |
System Info / 系統信息
torch2.1.0,硬件信息:单卡3090ti
lora.yaml
training_args:
see
transformers.Seq2SeqTrainingArguments
output_dir: ./output
max_steps: 27000
needed to be fit for the dataset
learning_rate: 5e-4
settings for data loading
per_device_train_batch_size: 1
dataloader_num_workers: 16
remove_unused_columns: false
settings for saving checkpoints
save_strategy: steps
save_steps: 500
settings for logging
log_level: info
logging_strategy: steps
logging_steps: 10
settings for evaluation
per_device_eval_batch_size: 2
evaluation_strategy: steps
eval_steps: 500
settings for optimizer
adam_epsilon: 1e-6
uncomment the following line to detect nan or inf values
debug: underflow_overflow
predict_with_generate: true
see
transformers.GenerationConfig
generation_config:
max_new_tokens: 512
set your absolute deepspeed path here
#deepspeed: ds_zero_2.json
peft_config:
peft_type: LORA
task_type: CAUSAL_LM
r: 8
lora_alpha: 32
lora_dropout: 0.1
Who can help? / 谁可以帮助到您?
No response
Information / 问题信息
Reproduction / 复现过程
使用9000条数据进行训练的时候,出现了内存溢出问题
OutOfMemoryError: CUDA out of memory. Tried to allocate 12.00 MiB. GPU 0 has a total capacty of 23.99 GiB of which 0 bytes is free. Including non-PyTorch memory, this process has 17179869184.00 GiB memory in
use. Of the allocated memory 23.01 GiB is allocated by PyTorch, and 195.10 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid
fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Expected behavior / 期待表现
希望能够正常执行微调
The text was updated successfully, but these errors were encountered: