-
Notifications
You must be signed in to change notification settings - Fork 693
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-target vector search #5138
Comments
Migrating some comments/questions from: I would like to suggest the ability to configure scoring/ranking, such as, for a nearText case, sorting by the minimum average distance based on a distance metric (such as cosine) and including some weighting, so if this was using 3 vectors, weights might be [0.4, 0.3, 0.3] to more heavily weight the first vector. depending on the distance metric, there may need to be some normalization, especially if the vectors are coming from different embedding models. |
Migrating some comments/questions from: Related to this, but a different usage scenario, is a query that extends across collections that involves more than one vector. Given a data model like: Document (Collection), Topic (Collection), Image (Collection) Document: Topic: Image: Query: Document content: nearText("cute kittens") |
Migrating some comments/questions from: For the parallel N vector query case, is there the concept of optimizing the ordering, such that the vector that has the least nearby results can be a gating factor on the others? In document search, if you were querying for "happy" AND "aardvark" you would search for "aardvark" first which presumably would be less frequent and help filter the "happy" results. The situation with vectors is not exactly the same but thought a similar process might help. In a query I would use this for, one of the vectors would have something like 1000x the number of nearby vectors than the others so it could be bad performance-wise to enumerate them all only to be just intersected with the other much smaller sets. |
Hi @hadfield
This will be included, the options will be:
|
Ths won't be added in the near future - this is more complex to add and would need more work |
All searches run concurrently so there is no explicit order. In my testing multi target vector search is not much slower than single target vector search (<10%) |
Ok, I'll open a new issue specific to this to track it. |
I would suggest in your tests to include wildly imbalanced vector query results for the individual vectors of the query to explore the performance of such cases. Like a ratio of 10,000+ to 1 for an Object O with vectors A, B that is near to very many objects in A and very few objects in B. |
Describe your feature request
tracking issue
Code of Conduct
The text was updated successfully, but these errors were encountered: