Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

about ViT performance on EEG data #2

Open
DrugLover opened this issue Apr 12, 2024 · 3 comments
Open

about ViT performance on EEG data #2

DrugLover opened this issue Apr 12, 2024 · 3 comments

Comments

@DrugLover
Copy link

Hello 王哥
I wrote a simple ViT model to decode MI-EEG signals.
The overall model is much the same as original ViT, and the code is at here.
I used bci competition IV 2a dataset, which the input data shape is [1, 22, 1125].
I directly used a patch size = [22, 25], so the patch num is 1125/25 = 45.
With this patch setting, I soon face a problem as you mentioned in readme, that the model is overfitted on training set.
The results show that the generalize ability is worse that EEGNet.

Moreover, I applied a dropout layer in patch embedding, which performed to drop some patches.
With this dropout, the results get much better, but it took muuuuuch more epochs to converge(still worse than EEGNet).

Recently, I found other versions of transformer in MI-EEG, the ShallowMirrorTransformer and Conformer.
Sadly, both methods didn't performed as good as some CNN or LSTM-based ones.

I hope to know if there are some tricks when training ViT and see your experiment results.
Thanks a lot!

@zwcolin
Copy link
Owner

zwcolin commented Apr 12, 2024

Thanks for your interests in my course project. This project was started in 2022 and I feel it's a bit outdated (and I'm no longer maintaining it because I don't do research in the EEG domain).

My suggestion would simply be to give up the classical ViT architecture but try the following:

(1) Finetuning from an autoregressive ViT that predict both patches and labels. This gives you more training signals given limited data. You can use a LoRA and/or an adaptor to make training efficient and/or accommodate your input/output.
(2) Try SSSMs such as Mamba.

@DrugLover
Copy link
Author

Thanks for your advice! I have tried to simply replace mamba into aformentioned conformer, but it even get worse.
I did another domain generalization experiment on EEG, and I simply changed the depth of transformer block, and it reached the SOTA performance!

@LiuyinYang1101
Copy link

Thanks for your advice! I have tried to simply replace mamba into aformentioned conformer, but it even get worse. I did another domain generalization experiment on EEG, and I simply changed the depth of transformer block, and it reached the SOTA performance!

Hi DrugLover,
I'm recently working on a similar project, aiming to do an SSL pertaining on EEG data from various sources and later fine-tuning on the downstream classification task using transformer-based models. My experiments also showed that most time these large models did not perform as well as small models (e.g., eegnet). I am wondering if we could have a discussion somewhere on this matter, maybe we can do something together. You can reach me through the following email: [email protected]. I'm looking forward to hearing from you soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants