haystack-evaluation

This repository contains examples on how to use Haystack to evaluate systems build with Haystack for different tasks and datasets.

This repository is structured as:

Evaluations

Here we provide full examples on how to use Haystack to evaluate systems build also with Haystack for different tasks and datasets.

Name	Dataset	Evaluation Metrics	📚 Article
RAG with parameter search	ARAGOG	ContextRelevance , Faithfulness, Semantic Answer Similarity	Benchmarking Haystack Pipelines for Optimal Performance
Baseline RAG vs HyDE using Harness	ARAGOG	ContextRelevance , Faithfulness, Semantic Answer Similarity	-
Extractive QA with parameter search	SQuAD	Answer Exact Match, DocumentMRR, DocumentMAP, DocumentRecall, Semantic Answer Similarity	-

Name	Description
Baseline RAG	Retrieve-and-Generate (RAG) is a model that combines a retriever and a generator to answer questions. The retriever finds relevant documents and the generator creates an answer based on the retrieved documents.
HyDE	Hybrid Document Embeddings (HyDE) is a HyDE generates a hypothetical document from the query and uses it to retrieve similar documents from the document embedding space.
Extractive QA	Extractive Question Answering (QA) is a task where the model is given a question and a document and it has to find the answer to the question in the document. The answer is typically a span of text from the document.
Sentence-Window	Sentence-Window is a technique that uses a sliding window to extract chunks/sentences from a document. The extracted chunks/sentences are then used to generate answers to questions.

Name	Suitable Metrics	Description
ARAGOG	ContextRelevance, Faithfulness, Semantic Answer Similarity	A collection of papers from ArXiv covering topics around Transformers and Large Language Models, all in PDF format.
SQuAD 2.0	Answer Exact Match, DocumentMRR, DocumentMAP, DocumentRecall Semantic Answer Similarity	A collection of questions and answers from Wikipedia articles, typically used for training and evaluating models for extractive question-answering tasks.

Name		Name	Last commit message	Last commit date
Latest commit History 101 Commits
datasets		datasets
evaluations		evaluations
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md