-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unexpected result using like
clause with fts
index
#33
Comments
Currently, ClickHouse built-in functions only return the same results as without a Full-Text Search (FTS) index when using the We have not discovered a method to achieve consistent results with other tokenizers. Perhaps we should consider disabling FTS indexing with other tokenizers in built-in functions? @MochiXu @chtlp |
Thank you for bringing this issue to our attention @HowePa ! We will resolve it by disabling the FTS index for the In the meantime, you can still use |
@HowePa The issue you're encountering is quite specific. To address this, we first need to understand the behavior of the Tantivy tokenizer. By default, the Tantivy tokenizer converts all text to lowercase. This means that in the index file, ' If you want Tantivy to be case-sensitive during tokenization, you need to configure the tokenizer with the ALTER TABLE simple_table ADD INDEX doc_idx doc TYPE fts('{"doc":{"tokenizer":{"type":"simple","case_sensitive":true}}}') GRANULARITY 1;
ALTER TABLE simple_table MATERIALIZE INDEX doc_idx; This will ensure that the tokenizer respects the case sensitivity of the text. |
Forgive me if I'm a little wordy @MochiXu . According to your description, if I set Besides, why |
If you would like to gain a deeper understanding of how Tantivy indexing works, it would be beneficial to first learn about ClickHouse's Skip Index, as Tantivy is integrated based on SkipIndex. For example, if there are 10 granules in a table, without using the Tantivy index, ClickHouse would need to access all the granules to perform a search. However, when using the Tantivy index, ClickHouse will only access a subset of granules when executing the Regarding your earlier question, when you set |
So, when I want to do |
Currently, there isn't an effective solution to achieve this, unless you configure a case-sensitive tokenizer. We plan to optimize the search logic for functions like |
Many thanks for your patience. 👍 |
Describe the unexpected behaviour
When I filter
String
usinglike
clause, MyScale gets different results.How to reproduce
![image](https://private-user-images.githubusercontent.com/35292408/336380385-13795eb5-73df-4624-a06a-f6ba70e362d2.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTk1OTczMTUsIm5iZiI6MTcxOTU5NzAxNSwicGF0aCI6Ii8zNTI5MjQwOC8zMzYzODAzODUtMTM3OTVlYjUtNzNkZi00NjI0LWEwNmEtZjZiYTcwZTM2MmQyLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MjglMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjI4VDE3NTAxNVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTNiZmRkNTVkZTZlYTJhNmVmNDI0ODRlMTMyNDViYjRiNmZiZTEzOWE1YTY3MDJhOTAyYTdiNmZkYjAwNGFmNjcmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.pYa1gh6ltLZrheySn7xFg-TXmQ7SheDiPW3E7kUma8k)
version: myscale/myscaledb:1.5
case 1: filter with
fts
case 2: filter without
![image](https://private-user-images.githubusercontent.com/35292408/336380630-b8eb4569-b246-4341-99b8-de795aba73c6.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTk1OTczMTUsIm5iZiI6MTcxOTU5NzAxNSwicGF0aCI6Ii8zNTI5MjQwOC8zMzYzODA2MzAtYjhlYjQ1NjktYjI0Ni00MzQxLTk5YjgtZGU3OTVhYmE3M2M2LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MjglMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjI4VDE3NTAxNVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTQyN2M3Njk5MTczYmE5MjExOTMzMmJjZWRiMzcxYTI0NWUyZGRiMDE3MTM4YmM3YzBkYTYzYzY5NzMwMmI4MTImWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.Cf9YEd54-wiWDQ4Ytd4vJQNO0RtbpO1ak30P6MbrLIQ)
fts
Expected behavior
The results need to be uniform.
The text was updated successfully, but these errors were encountered: