DocChatAgent: Add fusion ranking when re-ranker is not used #487

pchalasani · 2024-06-01T16:45:57Z

get_relevant_chunks gets sets of relevant passages via various retrieval methods (semantic/dense, sparse-embedding based, lexical/keyword, fuzzy); most of these produce scores (except fuzzy, I think), but they all are at least ordered by match quality.

If the config has a cross-encoder model set, then the union of these passages is then re-ranked and then finally the top $k$ passages are picked.

However if there's no cross-encoder, then the top $k$ are picked from this union of passages, which has no particular order, and we can get a bad set of passages. In this case we should make use of the scores (and implicit rank) that are produced by the retrievals, and blend them using a suitable fusion-ranking strategy, where the first step is to normalize the scores/ranks within each retrieval-set, to put them into the same "score space", so that picking top k is meaningful. The normalization could be:

reciprocal rank based -- score each passage by 1/(rank + c),
min-max-normalized

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DocChatAgent: Add fusion ranking when re-ranker is not used #487

DocChatAgent: Add fusion ranking when re-ranker is not used #487

pchalasani commented Jun 1, 2024

DocChatAgent: Add fusion ranking when re-ranker is not used #487

DocChatAgent: Add fusion ranking when re-ranker is not used #487

Comments

pchalasani commented Jun 1, 2024