Skip to content

how to split chunks without cutting the sentences? #11

Discussion options

You must be logged in to vote

I see - in that case you may want to use the CharacterSplitter with the separator set as an end of the sentence, like this:

let splitter = CharacterSplitter(withSeparator: ".")
let (chunks, _) = splitter.split(text: "First sentence. Second Sentence", chunkSize: 5)

In this case, the chunkSize is referring to the whole sentences, so it's much smaller than it would be for tokens. If you need more custom functionality then that, you can try implementing your own TextSplitterProtocol, it just needs to conform to this protocol

public protocol TextSplitterProtocol {
    /// Splits the input text into a tuple of chunks and optionally token ids.
    ///
    /// - Parameters:
    ///   - text: The …

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@LexiestLeszek
Comment options

@ZachNagengast
Comment options

Answer selected by ZachNagengast
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants
Converted from issue

This discussion was converted from issue #10 on June 28, 2023 21:57.