Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]: Experimenting with xLSTM with Flair ? #3454

Open
None-Such opened this issue May 12, 2024 · 0 comments
Open

[Question]: Experimenting with xLSTM with Flair ? #3454

None-Such opened this issue May 12, 2024 · 0 comments
Labels
question Further information is requested

Comments

@None-Such
Copy link

Question

I just noticed a paper that came out this month on arxiv titled: xLSTM: Extended Long Short-Term Memory

It looks like the Algorithm that was the original success of Flair has gotten a significant upgrade, leveraging the latest techniques from modern LLMs.

The authors claim:

Exponential gating and modified memory structures boost xLSTM capabilities to perform favorably when compared to state-of-the-art Transformers and State Space Models, both in performance and scaling.

I was curious if anyone has experimented with this with Flair.

See - https://arxiv.org/abs/2405.04517

I appreciate that the best scoring NER model with Flair is currently based on XLMRobertaModel. However at 2.24 GB in size it is quite large and expensive to use for large scale inference. While the original FlairEmbedding based models are a mere fraction in size at 257 MB or 432 MB. It's an intriguing question if such a new approach could benefit Flair.

Please share your thoughts

@None-Such None-Such added the question Further information is requested label May 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant