You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, thanks for the effort put into the Microsoft.ML.Tokenizers!
I'm the author of the last performance improvements in SharpToken library.
Since MLTokenizers are faster now than SharpToken I looked into the sources to understand where this performance comes from.
Now I have a question (out of curiosity)
Why is it required to copy a ReadOnlySpan<char> to a buffer, when the rest of the code just uses ReadOnlySpan<char> again?
Hi, thanks for the effort put into the Microsoft.ML.Tokenizers!
I'm the author of the last performance improvements in
SharpToken
library.Since MLTokenizers are faster now than SharpToken I looked into the sources to understand where this performance comes from.
Now I have a question (out of curiosity)
Why is it required to copy a
ReadOnlySpan<char>
to a buffer, when the rest of the code just usesReadOnlySpan<char>
again?TiktokenPreTokenizer.cs line: 104
machinelearning/src/Microsoft.ML.Tokenizers/PreTokenizer/TiktokenPreTokenizer.cs
Lines 95 to 107 in 72cfdf6
PreTokenizer.cs line: 74
machinelearning/src/Microsoft.ML.Tokenizers/PreTokenizer/PreTokenizer.cs
Lines 43 to 54 in 72cfdf6
The text was updated successfully, but these errors were encountered: