Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable stemming and choosing tokenizer, when doing full text search in tantivy #1315

Closed
josca42 opened this issue May 19, 2024 · 1 comment · Fixed by #1356
Closed

Enable stemming and choosing tokenizer, when doing full text search in tantivy #1315

josca42 opened this issue May 19, 2024 · 1 comment · Fixed by #1356
Labels
enhancement New feature or request

Comments

@josca42
Copy link
Contributor

josca42 commented May 19, 2024

SDK

Python

Description

Enabling stemming and using a language specific tokenizer tend to improve recall quite a bit, when doing full text search.

Tantivy has support for this through the tokenizer_name argument in add_text_field.

As far as I can tell the change needed is to add tokenizer_name argument to the following line

And then add the tokenizer_name argument to the create_fts_index method.

I would personally really prefer if the argument could be exposed instead of just enabling the usage of the english stemmer. Tantivy supports a few different language tokenizers, which I think a lot of people would like to use instead of english

I can create a pull request with the suggested changes if you think it is a good idea :-).

@josca42 josca42 added the enhancement New feature or request label May 19, 2024
@wjones127
Copy link
Contributor

This all sounds good to me. Feel free to make a PR :)

wjones127 pushed a commit that referenced this issue Jun 20, 2024
Added the ability to specify tokenizer_name, when creating a full text
search index using tantivy. This enables the use of language specific
stemming.

Also updated the [guide on full text
search](https://lancedb.github.io/lancedb/fts/) with a short section on
choosing tokenizer.

Fixes #1315
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants