Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: enable stemming #1356

Merged
merged 4 commits into from
Jun 20, 2024
Merged

feat: enable stemming #1356

merged 4 commits into from
Jun 20, 2024

Conversation

josca42
Copy link
Contributor

@josca42 josca42 commented Jun 5, 2024

Added the ability to specify tokenizer_name, when creating a full text search index using tantivy. This enables the use of language specific stemming.

Also updated the guide on full text search with a short section on choosing tokenizer.

Fixes #1315

@github-actions github-actions bot added the Python Python SDK label Jun 5, 2024
Copy link

github-actions bot commented Jun 5, 2024

ACTION NEEDED

Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

@josca42 josca42 changed the title Enable stemming feat: enable stemming Jun 5, 2024
@github-actions github-actions bot added the enhancement New feature or request label Jun 5, 2024
@wjones127 wjones127 self-requested a review June 11, 2024 18:20
Copy link
Contributor

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making a PR! 😄

Could you add a unit test where we specify the tokenizer?

@josca42 josca42 requested a review from wjones127 June 17, 2024 11:17
@wjones127 wjones127 merged commit 0fe8440 into lancedb:main Jun 20, 2024
10 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Python Python SDK
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enable stemming and choosing tokenizer, when doing full text search in tantivy
2 participants