You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I just noticed a paper that came out this month on arxiv titled: xLSTM: Extended Long Short-Term Memory
It looks like the Algorithm that was the original success of Flair has gotten a significant upgrade, leveraging the latest techniques from modern LLMs.
The authors claim:
Exponential gating and modified memory structures boost xLSTM capabilities to perform favorably when compared to state-of-the-art Transformers and State Space Models, both in performance and scaling.
I was curious if anyone has experimented with this with Flair.
I appreciate that the best scoring NER model with Flair is currently based on XLMRobertaModel. However at 2.24 GB in size it is quite large and expensive to use for large scale inference. While the original FlairEmbedding based models are a mere fraction in size at 257 MB or 432 MB. It's an intriguing question if such a new approach could benefit Flair.
Please share your thoughts
The text was updated successfully, but these errors were encountered:
Question
I just noticed a paper that came out this month on arxiv titled: xLSTM: Extended Long Short-Term Memory
It looks like the Algorithm that was the original success of Flair has gotten a significant upgrade, leveraging the latest techniques from modern LLMs.
The authors claim:
I was curious if anyone has experimented with this with Flair.
See - https://arxiv.org/abs/2405.04517
I appreciate that the best scoring NER model with Flair is currently based on XLMRobertaModel. However at 2.24 GB in size it is quite large and expensive to use for large scale inference. While the original FlairEmbedding based models are a mere fraction in size at 257 MB or 432 MB. It's an intriguing question if such a new approach could benefit Flair.
Please share your thoughts
The text was updated successfully, but these errors were encountered: