[Question]: Experimenting with xLSTM with Flair ? #3454

None-Such · 2024-05-12T22:23:47Z

Question

I just noticed a paper that came out this month on arxiv titled: xLSTM: Extended Long Short-Term Memory

It looks like the Algorithm that was the original success of Flair has gotten a significant upgrade, leveraging the latest techniques from modern LLMs.

The authors claim:

Exponential gating and modified memory structures boost xLSTM capabilities to perform favorably when compared to state-of-the-art Transformers and State Space Models, both in performance and scaling.

I was curious if anyone has experimented with this with Flair.

See - https://arxiv.org/abs/2405.04517

I appreciate that the best scoring NER model with Flair is currently based on XLMRobertaModel. However at 2.24 GB in size it is quite large and expensive to use for large scale inference. While the original FlairEmbedding based models are a mere fraction in size at 257 MB or 432 MB. It's an intriguing question if such a new approach could benefit Flair.

Please share your thoughts

None-Such added the question Further information is requested label May 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question]: Experimenting with xLSTM with Flair ? #3454

[Question]: Experimenting with xLSTM with Flair ? #3454

None-Such commented May 12, 2024

[Question]: Experimenting with xLSTM with Flair ? #3454

[Question]: Experimenting with xLSTM with Flair ? #3454

Comments

None-Such commented May 12, 2024

Question