Skip to content

Pull requests: EleutherAI/lm-evaluation-harness

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Sort

Pull requests list

Add Redlite tasks for safety benchmarking
#2020 opened Jun 25, 2024 by inno-simon Loading…
[Not For Merge] Enable chat-template for vLLM
#2017 opened Jun 25, 2024 by akjindal53244 Loading…
Fix regexp parsing for bbh_cot_fewshot
#2013 opened Jun 24, 2024 by arkapal3 Loading…
Added MedConceptsQA Benchmark
#2010 opened Jun 22, 2024 by Ofir408 Loading…
Refactor API models
#2008 opened Jun 22, 2024 by baberabb Loading…
make pytorch an optional dependency
#2004 opened Jun 20, 2024 by dlwh Loading…
Fixes scrolls task bug with few_shot examples
#2003 opened Jun 20, 2024 by xksteven Loading…
Handle Empty openai response
#1999 opened Jun 19, 2024 by ciaranby Loading…
Fix partial caching of openai models
#1997 opened Jun 19, 2024 by ciaranby Loading…
Add Gigachat model
#1996 opened Jun 19, 2024 by seldereyy Draft
Add HumanEval
#1992 opened Jun 19, 2024 by hjlee1371 Loading…
main
#1988 opened Jun 18, 2024 by msamwelmollel Loading…
Fix local completion huggingface tokenizer
#1975 opened Jun 17, 2024 by okdshin Loading…
mela
#1970 opened Jun 16, 2024 by Geralt-Targaryen Loading…
Fix OpenAI API discrepancies
#1969 opened Jun 14, 2024 by chimezie Loading…
Mmlu Pro
#1961 opened Jun 13, 2024 by ysjprojects Loading…
LMJudge
#1950 opened Jun 11, 2024 by baberabb Draft
4 tasks
Alghafa benchmark
#1946 opened Jun 11, 2024 by khalil-Hennara Loading…
Multiprompt
#1922 opened Jun 4, 2024 by lintangsutawika Draft
Confusion matrix metric
#1921 opened Jun 4, 2024 by minaremeli Loading…
ProTip! Adding no:label will show everything without a label.