Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve JSON format prompt for large chunks & Handle ZeroDivisionError #982

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

Manav916
Copy link

@Manav916 Manav916 commented May 21, 2024

Description

In this PR there are three different changes

Changes

  1. Fixed a Typo in the filter_question_prompt Instruction from Asses to Assess
  2. Added a try-except block for handling ZeroDivisionError for the filter method in the NodeFilter class
  3. Improved the JSON_FORMAT_INSTRUCTIONS for better output generation
    It seems like the LLM sometimes loses track of the imposed instruction for the output, especially for long prompts. So although the output generated for a chunk_size of 512 is perfect, the output for a chunk_size of 1024 has an extra newline. Here parsing fails when using PydanticOutputParser even though the llm has generated an output and there is only an extra '\n' as shown in the instances below.
    Screenshot 2024-05-22 093242
    But by tuning the prompt and adding Please output your response in the demanded json format. at the end of the instruction we get output without '\n'. This output can then be parsed and the context can be considered.
    Screenshot 2024-05-22 093757

Manav916 and others added 4 commits May 21, 2024 12:18
This commit introduces error handling for ZeroDivisionError in the filter method of the NodeFilter class. This change ensures that the application gracefully handles cases where division by zero occurs, setting the score to 0 by default.
Modify the JSON_FORMAT_INSTRUCTIONS in output_parser.py to ensure better JSON output handling by LLMs, particularly for larger chunk sizes. This change helps maintain the structure of the output without newlines, which optimizes parsing by PydanticOutputParser and reduces failures due to formatting issues in long prompts.
@dosubot dosubot bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label May 21, 2024
@jjmachan jjmachan requested a review from shahules786 May 22, 2024 12:17
@jjmachan
Copy link
Member

@shahules786 could you check if we can merge this in?

src/ragas/testset/filters.py Outdated Show resolved Hide resolved
@dosubot dosubot bot added size:S This PR changes 10-29 lines, ignoring generated files. and removed size:XS This PR changes 0-9 lines, ignoring generated files. labels May 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:S This PR changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants