Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CoCa v2: fixes and improvements #554

Open
5 tasks
iejMac opened this issue Jun 27, 2023 · 4 comments
Open
5 tasks

CoCa v2: fixes and improvements #554

iejMac opened this issue Jun 27, 2023 · 4 comments

Comments

@iejMac
Copy link
Contributor

iejMac commented Jun 27, 2023

There has been some issues raised about some mistakes made in the current CoCa implementation and there are also some improvements that can be made. This issue will enumerate them and we can track progress here. Once the TODO's are completed we can attempt another set of re-training runs and maybe even scale if the results make sense.

Problems:

Improvements:

  • Integrate generation with HF (so we can take out generate func)

Please add anything that I might've missed
cc: @gpucce @rom1504 @rwightman

@iejMac
Copy link
Contributor Author

iejMac commented Jun 27, 2023

I will try to start a small B/32 run with the first 2 problems "solved" in that PR. We can compare to the first few B/32 runs.

@gpucce
Copy link
Contributor

gpucce commented Jun 27, 2023

dataset model acc1 acc5
0 imagenet1k coca_ViT-B-32 0.636 0.881
1 imagenet1k coca_ViT-B-32_fixed_cls_mask 0.638 0.882
2 imagenet1k coca_ViT-L-14 0.756 0.943
3 imagenet1k coca_ViT-L-14_fixed_cls_mask 0.755 0.941

Changing cls mask leaves performance almost unchanged without retraining.

@rwightman
Copy link
Collaborator

Re the remove MHA from attn pooler, I don't recall what the motivation for that was? It doesn't lool like it's doing anything that isn't supported by MHA at this point (like q/k norms, etc)...

@gpucce
Copy link
Contributor

gpucce commented Jun 27, 2023

Re the remove MHA from attn pooler, I don't recall what the motivation for that was? It doesn't lool like it's doing anything that isn't supported by MHA at this point (like q/k norms, etc)...

the point was to split the linear layer passing the "cls" token to clip loss and the one passing the remaining ones to the decoder because this is a difference with respect to the original paper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants