Skip to content

Using Haystack to benchmark different architectures over different datasets

License

Notifications You must be signed in to change notification settings

deepset-ai/haystack-evaluation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

haystack-evaluation

This repository contains examples on how to use Haystack to evaluate systems build with Haystack for different tasks and datasets.

This repository is structured as:

Evaluations

Here we provide full examples on how to use Haystack to evaluate systems build also with Haystack for different tasks and datasets.

Name Dataset Evaluation Metrics 📚 Article
RAG with parameter search ARAGOG ContextRelevance , Faithfulness, Semantic Answer Similarity Benchmarking Haystack Pipelines for Optimal Performance
Baseline RAG vs HyDE using Harness ARAGOG ContextRelevance , Faithfulness, Semantic Answer Similarity -
Extractive QA with parameter search SQuAD Answer Exact Match, DocumentMRR, DocumentMAP, DocumentRecall, Semantic Answer Similarity -

Techniques/Architectures

Name Description
Baseline RAG Retrieve-and-Generate (RAG) is a model that combines a retriever and a generator to answer questions. The retriever finds relevant documents and the generator creates an answer based on the retrieved documents.
HyDE Hybrid Document Embeddings (HyDE) is a HyDE generates a hypothetical document from the query and uses it to retrieve similar documents from the document embedding space.
Extractive QA Extractive Question Answering (QA) is a task where the model is given a question and a document and it has to find the answer to the question in the document. The answer is typically a span of text from the document.
Sentence-Window Sentence-Window is a technique that uses a sliding window to extract chunks/sentences from a document. The extracted chunks/sentences are then used to generate answers to questions.

Datasets

Name Suitable Metrics Description
ARAGOG ContextRelevance, Faithfulness, Semantic Answer Similarity A collection of papers from ArXiv covering topics around Transformers and Large Language Models, all in PDF format.
SQuAD 2.0 Answer Exact Match, DocumentMRR, DocumentMAP, DocumentRecall Semantic Answer Similarity A collection of questions and answers from Wikipedia articles, typically used for training and evaluating models for extractive question-answering tasks.

About

Using Haystack to benchmark different architectures over different datasets

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published