About using BEATs as audio feature extractor #1567

XiaokangY · 2024-05-28T11:30:18Z

Do I have to use an audio sequence with a sampling rate of 16k to use BEATs?
Because I found that when I further input the extracted features into resnet18 for the next classification task, I found that the loss could not be reduced.

tcourat · 2024-06-03T09:35:18Z

I belive that the model already preprocess input data to 16000 Hz here :

unilm/beats/BEATs.py

Line 127 in 8ee6f74

    
           fbank = ta_kaldi.fbank(waveform, num_mel_bins=128, sample_frequency=16000, frame_length=25, frame_shift=10)

XiaokangY · 2024-06-03T10:45:19Z

I belive that the model already preprocess input data to 16000 Hz here :我相信该模型已经将输入数据预处理为 16000 Hz：

unilm/beats/BEATs.py

Line 127 in 8ee6f74

fbank = ta_kaldi.fbank(waveform, num_mel_bins=128, sample_frequency=16000, frame_length=25, frame_shift=10)

Haha, I have solved this problem. The problem is that the loss cannot be reduced when the following features are input into the pre-trained model for training. This problem has been bothering me for a long time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About using BEATs as audio feature extractor #1567

About using BEATs as audio feature extractor #1567

XiaokangY commented May 28, 2024

tcourat commented Jun 3, 2024

XiaokangY commented Jun 3, 2024

About using BEATs as audio feature extractor #1567

About using BEATs as audio feature extractor #1567

Comments

XiaokangY commented May 28, 2024

tcourat commented Jun 3, 2024

XiaokangY commented Jun 3, 2024