Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs request: where does histogram come from? #738

Open
jamesbraza opened this issue Apr 30, 2024 · 1 comment
Open

Docs request: where does histogram come from? #738

jamesbraza opened this issue Apr 30, 2024 · 1 comment

Comments

@jamesbraza
Copy link
Contributor

I have three possible scores: 0, 0.1, and 1 for a Python assertion, and two basic assertions.

providers:
  - openai:chat:gpt-4-0613
  - openai:chat:gpt-4-turbo-2024-04-09
  - anthropic:messages:claude-3-sonnet-20240229
defaultTest:
  assert:
    - description: was answered
      type: not-icontains
      value: cannot answer
    - description: has sentences
      type: javascript
      value: output.length > 20
    - description: check value
      type: python
      value: file://assert.py

At the top of my promptfoo view, I see bins around 0.6 and 0.7, which isn't quite making sense to me:

screenshot of histogram

The request is, can we add a little description such that this figure is easy to understand.

  • I have three different model providers, is that where Prompt 1 (red), Prompt 2 (blue), and Prompt 3 (green) come from?
  • Why does the histogram show scores of 0.6 and 0.7? Is that like a sum of multiple assertions' scores?
@jamesbraza
Copy link
Contributor Author

I now understand that I have three assertions:

  • Two binary ones: can be score 0 or 1
  • One custom assertion: can be score 0, 0.1, 1

I realized the histogram plots mean score: 0.7 = (1 + 1 + 0.1) / 3

That being said, I still think perhaps promptfoo can add a little info bubble or hover-over/tooltip that explains this.

Feel free to close this out if uninterested

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant