Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About using BEATs as audio feature extractor #1567

Open
XiaokangY opened this issue May 28, 2024 · 2 comments
Open

About using BEATs as audio feature extractor #1567

XiaokangY opened this issue May 28, 2024 · 2 comments

Comments

@XiaokangY
Copy link

Do I have to use an audio sequence with a sampling rate of 16k to use BEATs?
Because I found that when I further input the extracted features into resnet18 for the next classification task, I found that the loss could not be reduced.

@tcourat
Copy link

tcourat commented Jun 3, 2024

I belive that the model already preprocess input data to 16000 Hz here :

fbank = ta_kaldi.fbank(waveform, num_mel_bins=128, sample_frequency=16000, frame_length=25, frame_shift=10)

@XiaokangY
Copy link
Author

I belive that the model already preprocess input data to 16000 Hz here :我相信该模型已经将输入数据预处理为 16000 Hz:

fbank = ta_kaldi.fbank(waveform, num_mel_bins=128, sample_frequency=16000, frame_length=25, frame_shift=10)

Haha, I have solved this problem. The problem is that the loss cannot be reduced when the following features are input into the pre-trained model for training. This problem has been bothering me for a long time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants