Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Sentence-Window Retrieval for benchmark evaluation #7843

Open
davidsbatista opened this issue Jun 11, 2024 · 3 comments
Open

Implement Sentence-Window Retrieval for benchmark evaluation #7843

davidsbatista opened this issue Jun 11, 2024 · 3 comments
Assignees
Labels
2.x Related to Haystack v2.0 P1 High priority, add to the next sprint

Comments

@davidsbatista
Copy link
Contributor

davidsbatista commented Jun 11, 2024

The sentence-window approach breaks down documents into smaller chunks (sentences) and indexes them separately.

During retrieval, we retrieve the sentences that are most relevant to the query via similarity search and replace the
sentence with the full surrounding context, using a static sentence-window around the context.

@davidsbatista davidsbatista self-assigned this Jun 11, 2024
@shadeMe shadeMe added P1 High priority, add to the next sprint 2.x Related to Haystack v2.0 labels Jun 14, 2024
@anakin87
Copy link
Member

Might be related: #7389

@julian-risch
Copy link
Member

@davidsbatista Could you please clarify in the issue description and title whether this issue includes adding sentence retrieval to our benchmarks? Or is this issue limited to adding a new feature/component to Haystack? I would expect the latter to keep the issue small.

@davidsbatista
Copy link
Contributor Author

Initially the idea was to have it benchmarked against other architectures over some dataset and I did an initial naive version:

But then talking with Mathis and Stefano pointing out the issue, I've been working on extending the DocumentSplitter in a way that implementing sentence-window-retrieval becomes easier.

So the idea for this issue for now is just to have the DocumentSplitter provide the functionality as described in #7389 - then latter we can add an implementation to the haystack-evaluation repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.x Related to Haystack v2.0 P1 High priority, add to the next sprint
Projects
None yet
Development

No branches or pull requests

4 participants