为什么使用paraformer-large模型的长音频版运行finetune.sh脚本，还是无法识别20s以上的音频文件 #1843

lllmd · 2024-06-24T07:45:17Z

Notice: In order to resolve issues more efficiently, please raise issue following the template.
（注意：为了更加高效率解决您遇到的问题，请按照模板提问，补充细节）

❓ Questions and Help

Before asking:

search the issues.
search the docs.

之前使用paraformer-large模型，在finetune.sh文件中添加了max_token_length参数后，仍然无法识别大于20s的音频文件，在更换了paraformer-large的长音频版本后，还是出现同样的问题。

这是运行时显示的内容：
{'scp_file_list': ['/home/ubuntu1/data/list/train_wav.scp', '/home/ubuntu1/data/list/train_text.txt'], 'data_type_list': ['source', 'target'], 'jsonl_file_out': '/home/ubuntu1/data/list/train.jsonl'}
convert wav.scp text to jsonl, ncpu: 32
cpu: 0: 100%|██████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 5.67it/s]
cpu: 0: 100%|████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 4804.47it/s]
processed 5 samples
{'scp_file_list': ['/home/ubuntu1/data/list/val_wav.scp', '/home/ubuntu1/data/list/val_text.txt'], 'data_type_list': ['source', 'target'], 'jsonl_file_out': '/home/ubuntu1/data/list/val.jsonl'}
convert wav.scp text to jsonl, ncpu: 32
cpu: 0: 100%|██████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 2.29it/s]
cpu: 0: 100%|████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 3818.21it/s]
processed 2 samples
log_file: ./outputs/log.txt

这是log文件中显示的内容：
Model summary:
Class Name: BiCifParaformer
Total Number of model parameters: 225.07 M
Number of trainable parameters: 225.07 M (100.0%)
Type: torch.float32
[2024-06-24 15:32:07,818][root][INFO] - Build optim
[2024-06-24 15:32:07,822][root][INFO] - Build scheduler
[2024-06-24 15:32:07,823][root][INFO] - Build dataloader
[2024-06-24 15:32:07,823][root][INFO] - Build dataloader
[2024-06-24 15:32:07,823][root][INFO] - total_num of samplers: 1, /home/ubuntu1/data/list/train.jsonl
[2024-06-24 15:32:07,823][root][INFO] - total_num of samplers: 2, /home/ubuntu1/data/list/val.jsonl
[2024-06-24 15:32:07,823][root][WARNING] - distributed is not initialized, only single shard
[2024-06-24 15:32:07,853][root][INFO] - Train epoch: 0, rank: 0

What have you tried?

在finetune.sh中已经添加：
++dataset_conf.max_token_length=30000
但是还是没有作用

What's your environment?

OS (e.g., Linux):
FunASR Version (e.g., 1.0.27):
ModelScope Version (e.g., 1.15.0):
PyTorch Version (e.g., 2.3.0):
How you installed funasr (pip, source):
Python version:
GPU (e.g., Tesla P40)
CUDA/cuDNN version (e.g., cuda12.4):
Any other relevant information:

The text was updated successfully, but these errors were encountered:

LauraGPT · 2024-06-27T09:15:20Z

https://github.com/modelscope/FunASR/blob/main/funasr/datasets/audio_datasets/index_ds.py#L20

lllmd added the question Further information is requested label Jun 24, 2024

LauraGPT closed this as completed Jun 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

为什么使用paraformer-large模型的长音频版运行finetune.sh脚本，还是无法识别20s以上的音频文件 #1843

为什么使用paraformer-large模型的长音频版运行finetune.sh脚本，还是无法识别20s以上的音频文件 #1843

lllmd commented Jun 24, 2024

LauraGPT commented Jun 27, 2024

为什么使用paraformer-large模型的长音频版运行finetune.sh脚本，还是无法识别20s以上的音频文件 #1843

为什么使用paraformer-large模型的长音频版运行finetune.sh脚本，还是无法识别20s以上的音频文件 #1843

Comments

lllmd commented Jun 24, 2024

❓ Questions and Help

Before asking:

What have you tried?

What's your environment?

LauraGPT commented Jun 27, 2024