bpe
Here are 73 public repositories matching this topic...
Simple-to-use scoring function for arbitrarily tokenized texts.
-
Updated
Jun 12, 2024 - Python
(py package) train your own tokenizer based on BPE algorithm for the LLMs (supports the regex pattern and special tokens)
-
Updated
Jun 7, 2024 - Jupyter Notebook
Byte-Pair Encoding tokenizer for training large language models on huge datasets
-
Updated
Jun 4, 2024 - Python
Byte-level byte pair encoding (BPE) in Haskell
-
Updated
May 27, 2024 - Haskell
Unsupervised Word Segmentation for Neural Machine Translation and Text Generation
-
Updated
Apr 30, 2024 - Python
A modified, secure version of BPE algorithm
-
Updated
Mar 29, 2024 - Python
Unsupervised text tokenizer focused on computational efficiency
-
Updated
Mar 29, 2024 - C++
Translating Indian Names to Hindi, a sequence-to-sequence modeling task, using character-level conditional language models.
-
Updated
Mar 16, 2024 - Jupyter Notebook
Improve this page
Add a description, image, and links to the bpe topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the bpe topic, visit your repo's landing page and select "manage topics."