-
Notifications
You must be signed in to change notification settings - Fork 505
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
issue with metrics evaluation in case of any exception #956
Comments
Hello, I encountered the same problem recently due to exception on content filtering with openai. While investigating I found the origin of the problem here: Line 77 in 2d79365
In Lines 110 to 114 in 2d79365
Lines 144 to 145 in 2d79365
But in the event of an exception and when Lines 77 to 89 in 2d79365
|
A possible solution would be to move the |
@baptiste-pasquier Thanks for RCA. Have you tried the workaround possible solution/code? |
Here is the fix code: baptiste-pasquier@167b9c5 Code for testingimport ragas
from datasets import Dataset
from ragas import evaluate
from ragas.metrics import (
answer_relevancy,
faithfulness,
)
data = {
"question": [
"What is the capital of France?",
"What is the capital of France?",
"What is the capital of France?",
"What is the capital of France?",
],
"answer": [
"Paris is the capital of France.",
"",
"London is the capital of France.",
"x" * 99999, # Context length error
],
"contexts": [
["Paris is the capital of France."],
["Paris is the capital of France."],
["Paris is the capital of France."],
["Paris is the capital of France."],
],
"ground_truth": [
"Paris",
"Paris",
"Paris",
"Paris",
],
}
dataset = Dataset.from_dict(data)
metrics = [faithfulness, answer_relevancy]
run_config = ragas.RunConfig(timeout=10, max_retries=1, max_wait=10) Test 1 (without exception)result = evaluate(
dataset.select(range(3)), # Run without the last row
metrics=metrics,
llm=llm,
embeddings=embeddings,
raise_exceptions=False,
run_config=run_config,
)
result.to_pandas() Result before and after fix:
Test 2 (with exception)A context length exception is created at the last row of the dataset. result = evaluate(
dataset, # Run on the entire dataset
metrics=metrics,
llm=llm,
embeddings=embeddings,
raise_exceptions=False,
run_config=run_config,
)
result.to_pandas() Result before fix:
The NaNs appear in the first row, which is unexpected behavior (the other metrics are shifted in the table). Result after fix:
The NaNs appear in the last row, which is the expected behavior. |
Yes, i have tried the similar fix, it seems working. |
[ ] I have checked the documentation and related resources and couldn't resolve my bug.
Describe the bug
Evaluating dataset with multiple metrics using evaluate function. for now calling evaluate function for each metrics separately[because of issue -> 878 ]
Observation -
Ragas version: 0.1.7
Python version: 3.10.9
Code to Reproduce
Result of df_eval
Result of df_eval_ex
Error trace
Expected behavior
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: