Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(testset): for generating testset with new docs #1001

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Manav916
Copy link

fixes #848

This pull request introduces updates to the TestsetGenerator and ComplexEvolution classes within the generator.py and evolutions.py files respectively. The primary focus of these changes is to improve the handling of new documents during test dataset generation, ensuring that data from previous documents does not interfere with the generation process.

Before

Screenshot 2024-05-31 183725 before
Here row 2 and 4 are generated from the document used for the previous generation

After

Screenshot 2024-05-31 184647 after
Here all rows generated are based on the new document provided

Modifications have been made to ensure that each test dataset generation cycle starts with a clean state, particularly concerning document data. This prevents data leakage from previous documents and ensures that each test dataset is generated from the newly provided documents only.
@dosubot dosubot bot added the size:S This PR changes 10-29 lines, ignoring generated files. label May 31, 2024
@AliHaider0343
Copy link

how we could use it i am having the exact same issue how we will be able to get the same results?

@XinTaoIntel
Copy link

XinTaoIntel commented Jun 5, 2024

When this will merged? When are having the same issue. After this is merged, any code change I need to make?
Currently I'm just use standard generation method without using evolution. (Do we need to add evolution part to not have this issue?)
testset = generator.generate_with_langchain_docs(
documents,
test_size=3,
distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25},
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:S This PR changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cannot generate TestDataset more than once
3 participants