Multi-target vector search #5138

dirkkul · 2024-06-11T13:43:40Z

Describe your feature request

tracking issue

Code of Conduct

I have read and agree to the Weaviate's Contributor Guide and Code of Conduct

hadfield · 2024-06-24T20:20:10Z

Migrating some comments/questions from:
#4955

I would like to suggest the ability to configure scoring/ranking, such as, for a nearText case, sorting by the minimum average distance based on a distance metric (such as cosine) and including some weighting, so if this was using 3 vectors, weights might be [0.4, 0.3, 0.3] to more heavily weight the first vector. depending on the distance metric, there may need to be some normalization, especially if the vectors are coming from different embedding models.

hadfield · 2024-06-24T20:21:42Z

Migrating some comments/questions from:
#4955

Related to this, but a different usage scenario, is a query that extends across collections that involves more than one vector.

Given a data model like:

Document (Collection), Topic (Collection), Image (Collection)

Document:
content (Vector, Text Embedding)

Topic:
description (Vector, Text Embedding)
multiModalDescription (Vector, MultiModal Embedding)

Image:
content (Vector, MultiModal Embedding)

Query:

Document content: nearText("cute kittens")
Matching Documents provide vectors to find nearby Topics based on closeness in the Text Embedding space
Matching Topics provide vectors to find nearby Images based on closeness in the MultiModal Embedding space
So the Topics collection has two vectors and serves to "join" the two embedding spaces allowing queries to traverse across the embedding spaces. One scenario when this arises is when there is an existing dataset for "documents" and an existing dataset for "images" and you want to query across them without having to modify the current data (or the processes that maintain it).
I briefly discussed this use-case with @bobvanluijt a few months back at an event in NYC.
Hopefully I articulated what I mean, but let me know if clarifications are needed, or if I'm on the wrong track.
If this use-case is completely separate, i guess an issue could be added?

hadfield · 2024-06-24T20:23:02Z

Migrating some comments/questions from:
#4955

For the parallel N vector query case, is there the concept of optimizing the ordering, such that the vector that has the least nearby results can be a gating factor on the others? In document search, if you were querying for "happy" AND "aardvark" you would search for "aardvark" first which presumably would be less frequent and help filter the "happy" results. The situation with vectors is not exactly the same but thought a similar process might help.

In a query I would use this for, one of the vectors would have something like 1000x the number of nearby vectors than the others so it could be bad performance-wise to enumerate them all only to be just intersected with the other much smaller sets.

dirkkul · 2024-06-27T12:24:16Z

Hi @hadfield

I would like to suggest the ability to configure scoring/ranking, such as, for a nearText case, sorting by the minimum average distance based on a distance metric (such as cosine) and including some weighting, so if this was using 3 vectors, weights might be [0.4, 0.3, 0.3] to more heavily weight the first vector. depending on the distance metric, there may need to be some normalization, especially if the vectors are coming from different embedding models.

This will be included, the options will be:

sim
minimum
average
manual weights
relative scores (same as hybrid)

dirkkul · 2024-06-27T12:25:06Z

Related to this, but a different usage scenario, is a query that extends across collections that involves more than one vector.

Ths won't be added in the near future - this is more complex to add and would need more work

dirkkul · 2024-06-27T12:26:06Z

For the parallel N vector query case, is there the concept of optimizing the ordering, such that the vector that has the least nearby results can be a gating factor on the others? In document search, if you were querying for "happy" AND "aardvark" you would search for "aardvark" first which presumably would be less frequent and help filter the "happy" results. The situation with vectors is not exactly the same but thought a similar process might help.

All searches run concurrently so there is no explicit order. In my testing multi target vector search is not much slower than single target vector search (<10%)

hadfield · 2024-06-27T14:39:31Z

Related to this, but a different usage scenario, is a query that extends across collections that involves more than one vector.

Ths won't be added in the near future - this is more complex to add and would need more work

Ok, I'll open a new issue specific to this to track it.

hadfield · 2024-06-27T14:51:07Z

For the parallel N vector query case, is there the concept of optimizing the ordering, such that the vector that has the least nearby results can be a gating factor on the others? In document search, if you were querying for "happy" AND "aardvark" you would search for "aardvark" first which presumably would be less frequent and help filter the "happy" results. The situation with vectors is not exactly the same but thought a similar process might help.

All searches run concurrently so there is no explicit order. In my testing multi target vector search is not much slower than single target vector search (<10%)

I would suggest in your tests to include wildly imbalanced vector query results for the individual vectors of the query to explore the performance of such cases. Like a ratio of 10,000+ to 1 for an Object O with vectors A, B that is near to very many objects in A and very few objects in B.

dirkkul added the feature request label Jun 11, 2024

dirkkul mentioned this issue Jun 11, 2024

Named vector multi search #5099

Merged

4 tasks

dirkkul self-assigned this Jun 11, 2024

hadfield mentioned this issue Jun 27, 2024

Query a chain of near Vector/Object across named vectors #5235

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-target vector search #5138

Multi-target vector search #5138

dirkkul commented Jun 11, 2024

hadfield commented Jun 24, 2024

hadfield commented Jun 24, 2024

hadfield commented Jun 24, 2024

dirkkul commented Jun 27, 2024

dirkkul commented Jun 27, 2024

dirkkul commented Jun 27, 2024

hadfield commented Jun 27, 2024

hadfield commented Jun 27, 2024

Multi-target vector search #5138

Multi-target vector search #5138

Comments

dirkkul commented Jun 11, 2024

Describe your feature request

Code of Conduct

hadfield commented Jun 24, 2024

hadfield commented Jun 24, 2024

hadfield commented Jun 24, 2024

dirkkul commented Jun 27, 2024

dirkkul commented Jun 27, 2024

dirkkul commented Jun 27, 2024

hadfield commented Jun 27, 2024

hadfield commented Jun 27, 2024