feat: update evaluation flow sample for abstractive summarization with g-eval method to enable GPT-4-Turbo #3317

fujikosu · 2024-05-21T09:24:01Z

Description

This PR updates a evaluation flow example that was introduced by #2037. This example only supported GPT-4 previously as GPT-4-Turbo was showing poor performance with previous approach. With this update, GPT-4-Turbo is introduced and meta-evaluated along with the implementation update from sampling based approach to weighted average over probability approach. New implementation outperformed previous evaluation performance according to meta-evaluation result. Besides, this new approach reduces estimated cost of evaluation from $6.19 to $1.32 per 100 documents.

Previous approach is still kept under sampling_based directory to provide backward compatibility with GPT-4 evaluator and reference for meta-evaluation

All Promptflow Contribution checklist:

The pull request does not introduce [breaking changes].
CHANGELOG is updated for new features, bug fixes or other significant changes.
I have read the contribution guidelines.
Create an issue and link to the pull request to get dedicated review from promptflow team. Learn more: suggested workflow.

General Guidelines and Best Practices

Title of the pull request is clear and informative.
There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

Pull request includes test coverage for the included changes.

…version under sampling based

…nged gpt-4-turbo back to gpt-4 to pass CI's model

github-actions · 2024-06-04T21:32:45Z

Hi, thank you for your interest in helping to improve the prompt flow experience and for your contribution. We've noticed that there hasn't been recent engagement on this pull request. If this is still an active work stream, please let us know by pushing some changes or leaving a comment.

fujikosu added 2 commits May 21, 2024 02:03

update geval with prob based weighted average version. move previous …

b7ecb98

…version under sampling based

updated README to cover updated gpt4-turbo geval

2492e9c

fujikosu requested a review from a team as a code owner May 21, 2024 09:24

fujikosu had a problem deploying to internal May 21, 2024 09:24 — with GitHub Actions Failure

github-actions bot added the examples Improvements on examples label May 21, 2024

fujikosu added 2 commits May 21, 2024 11:40

fixed spelling error. copied the same prompts for gpt4 directory. cha…

4b68dbd

…nged gpt-4-turbo back to gpt-4 to pass CI's model

Merge branch 'main' into kofuji/improved-geval-with-gpt4turbo

3d58a9c

fujikosu temporarily deployed to internal May 21, 2024 12:07 — with GitHub Actions Inactive

github-actions bot added the no-recent-activity There has been no recent activity on this issue/pull request label Jun 4, 2024

Merge branch 'main' into kofuji/improved-geval-with-gpt4turbo

2f2a570

fujikosu temporarily deployed to internal June 7, 2024 07:37 — with GitHub Actions Inactive

github-actions bot removed the no-recent-activity There has been no recent activity on this issue/pull request label Jun 7, 2024

Merge branch 'main' into kofuji/improved-geval-with-gpt4turbo

7b0288e

fujikosu temporarily deployed to internal June 12, 2024 07:55 — with GitHub Actions Inactive

Merge branch 'main' into kofuji/improved-geval-with-gpt4turbo

da7d421

fujikosu temporarily deployed to internal June 18, 2024 03:32 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: update evaluation flow sample for abstractive summarization with g-eval method to enable GPT-4-Turbo #3317

feat: update evaluation flow sample for abstractive summarization with g-eval method to enable GPT-4-Turbo #3317

fujikosu commented May 21, 2024

github-actions bot commented Jun 4, 2024

feat: update evaluation flow sample for abstractive summarization with g-eval method to enable GPT-4-Turbo #3317

Are you sure you want to change the base?

feat: update evaluation flow sample for abstractive summarization with g-eval method to enable GPT-4-Turbo #3317

Conversation

fujikosu commented May 21, 2024

Description

All Promptflow Contribution checklist:

General Guidelines and Best Practices

Testing Guidelines

github-actions bot commented Jun 4, 2024