Skip to content

Actions: openai/evals

Actions

All workflows

Actions

Loading...
Loading

Showing runs from all workflows
2,117 workflow runs
2,117 workflow runs
Event

Filter by event

Loading
Status

Filter by status

Loading
Branch
Actor

Filter by actor

Loading
Updates on existing solvers and bugged tool eval
Run new evals #2251: Pull request #1506 synchronize by ojaffe
March 27, 2024 16:44 3m 37s ojaffe:ollie/updates_270324
March 27, 2024 16:44 3m 37s
Updates on existing solvers and bugged tool eval
Run new evals #2250: Pull request #1506 opened by ojaffe
March 27, 2024 16:37 3m 41s ojaffe:ollie/updates_270324
March 27, 2024 16:37 3m 41s
Updates on existing solvers and bugged tool eval
Run unit tests #1701: Pull request #1506 opened by ojaffe
March 27, 2024 16:37 10m 44s ojaffe:ollie/updates_270324
March 27, 2024 16:37 10m 44s
Add Gemini Solver (#1503)
Run unit tests #1700: Commit 5a92ac3 pushed by JunShern
March 26, 2024 15:27 4m 12s main
March 26, 2024 15:27 4m 12s
Unified create_retrying for all solvers (#1501)
Run unit tests #1699: Commit 150dcb9 pushed by JunShern
March 26, 2024 15:25 4m 2s main
March 26, 2024 15:25 4m 2s
Add Gemini Solver
Run unit tests #1698: Pull request #1503 synchronize by ojaffe
March 26, 2024 11:39 3m 56s ojaffe:ollie/add_gemini_solver
March 26, 2024 11:39 3m 56s
Add Gemini Solver
Run new evals #2249: Pull request #1503 synchronize by ojaffe
March 26, 2024 11:39 3m 55s ojaffe:ollie/add_gemini_solver
March 26, 2024 11:39 3m 55s
Unified create_retrying for all solvers
Run unit tests #1697: Pull request #1501 synchronize by ojaffe
March 26, 2024 11:35 10m 15s ojaffe:ollie/unify_retrying
March 26, 2024 11:35 10m 15s
Add info about logging and link to logviz (#1480)
Run unit tests #1696: Commit ac44aae pushed by etr2460
March 25, 2024 15:53 9m 36s main
March 25, 2024 15:53 9m 36s
Log model and usage stats in record.sampling (#1449)
Run unit tests #1695: Commit 9b2e1b1 pushed by etr2460
March 25, 2024 15:52 9m 58s main
March 25, 2024 15:52 9m 58s
Address sporadic hanging of evals on certain samples (#1482)
Run unit tests #1694: Commit bfe3925 pushed by etr2460
March 25, 2024 15:51 9m 5s main
March 25, 2024 15:51 9m 5s
TogetherSolver (#1502)
Run unit tests #1693: Commit 5805c20 pushed by JunShern
March 22, 2024 09:50 8m 40s main
March 22, 2024 09:50 8m 40s
Add Gemini Solver
Run unit tests #1692: Pull request #1503 synchronize by ojaffe
March 21, 2024 10:36 14m 41s ojaffe:ollie/add_gemini_solver
March 21, 2024 10:36 14m 41s
Add Gemini Solver
Run new evals #2248: Pull request #1503 synchronize by ojaffe
March 21, 2024 10:36 3m 36s ojaffe:ollie/add_gemini_solver
March 21, 2024 10:36 3m 36s
Add Gemini Solver
Run new evals #2247: Pull request #1503 opened by ojaffe
March 21, 2024 10:32 3m 46s ojaffe:ollie/add_gemini_solver
March 21, 2024 10:32 3m 46s
Add Gemini Solver
Run unit tests #1691: Pull request #1503 opened by ojaffe
March 21, 2024 10:32 3m 55s ojaffe:ollie/add_gemini_solver
March 21, 2024 10:32 3m 55s
TogetherSolver
Run unit tests #1690: Pull request #1502 opened by thesofakillers
March 21, 2024 10:25 3m 42s thesofakillers:together_solver
March 21, 2024 10:25 3m 42s
TogetherSolver
Run new evals #2246: Pull request #1502 opened by thesofakillers
March 21, 2024 10:25 7m 10s thesofakillers:together_solver
March 21, 2024 10:25 7m 10s
Unified create_retrying for all solvers
Run unit tests #1689: Pull request #1501 opened by ojaffe
March 21, 2024 08:49 3m 49s ojaffe:ollie/unify_retrying
March 21, 2024 08:49 3m 49s
AnthropicSolver (#1498)
Run unit tests #1688: Commit e30e141 pushed by JunShern
March 21, 2024 04:15 3m 48s main
March 21, 2024 04:15 3m 48s
Add Human-Relative MLAgentBench (#1496)
Run unit tests #1687: Commit 4f97ce6 pushed by JunShern
March 21, 2024 03:47 4m 56s main
March 21, 2024 03:47 4m 56s
Add Human-Relative MLAgentBench
Run unit tests #1686: Pull request #1496 synchronize by danesherbs
March 21, 2024 03:36 3m 37s danesherbs:dane/add-mlab-v2
March 21, 2024 03:36 3m 37s
Add Human-Relative MLAgentBench
Run new evals #2245: Pull request #1496 synchronize by danesherbs
March 21, 2024 03:36 3m 45s danesherbs:dane/add-mlab-v2
March 21, 2024 03:36 3m 45s
Add Multi-Step Web Tasks (#1500)
Run unit tests #1685: Commit 5b84993 pushed by JunShern
March 21, 2024 03:35 2m 27s main
March 21, 2024 03:35 2m 27s
Add Multi-Step Web Tasks
Run unit tests #1684: Pull request #1500 synchronize by danesherbs
March 21, 2024 02:40 2m 21s danesherbs:dane/add-multi-step-web-tasks
March 21, 2024 02:40 2m 21s