EleutherAI / lm-evaluation-harness Public

Notifications You must be signed in to change notification settings
Fork 1.5k
Star 5.7k

Code
Issues 207
Pull requests 69
Actions
Projects 1
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Pull requests: EleutherAI/lm-evaluation-harness

Labels 10 Milestones 1

New pull request New

69 Open 1,028 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

Add Redlite tasks for safety benchmarking

#2020 opened Jun 25, 2024 by inno-simon

Loading…

[Not For Merge] Enable chat-template for vLLM

#2017 opened Jun 25, 2024 by akjindal53244

Loading…

Fix regexp parsing for bbh_cot_fewshot

#2013 opened Jun 24, 2024 by arkapal3

Loading…

Added MedConceptsQA Benchmark

#2010 opened Jun 22, 2024 by Ofir408

Loading…

Refactor API models

#2008 opened Jun 22, 2024 by baberabb

Loading…

Error Correction: Eliminate undefined parameter in function call

#2006 opened Jun 21, 2024 by zhabuye

Loading…

make pytorch an optional dependency

#2004 opened Jun 20, 2024 by dlwh

Loading…

Fixes scrolls task bug with few_shot examples

#2003 opened Jun 20, 2024 by xksteven

Loading…

Handle Empty openai response

#1999 opened Jun 19, 2024 by ciaranby

Loading…

Fix partial caching of openai models

#1997 opened Jun 19, 2024 by ciaranby

Loading…

Add Gigachat model

#1996 opened Jun 19, 2024 by seldereyy • Draft

Add HumanEval

#1992 opened Jun 19, 2024 by hjlee1371

Loading…

[Fix] Replace generic exception classes with a more specific ones

#1989 opened Jun 18, 2024 by LSinev

Loading…

main

#1988 opened Jun 18, 2024 by msamwelmollel

Loading…

add persianmmlu benchmark for assessing Persian Language understanding

#1979 opened Jun 17, 2024 by MrzEsma

Loading…

Fix local completion huggingface tokenizer

#1975 opened Jun 17, 2024 by okdshin

Loading…

mela

#1970 opened Jun 16, 2024 by Geralt-Targaryen

Loading…

Fix OpenAI API discrepancies

#1969 opened Jun 14, 2024 by chimezie

Loading…

Mmlu Pro

#1961 opened Jun 13, 2024 by ysjprojects

Loading…

LMJudge

#1950 opened Jun 11, 2024 by baberabb • Draft

4 tasks

Alghafa benchmark

#1946 opened Jun 11, 2024 by khalil-Hennara

Loading…

Easier unitxt tasks loading and removal of unitxt library dependancy

#1933 opened Jun 6, 2024 by elronbandel

Loading…

Prettify lm_eval --tasks list

#1929 opened Jun 5, 2024 by anthony-dipofi • Draft

Multiprompt

#1922 opened Jun 4, 2024 by lintangsutawika • Draft

Confusion matrix metric

#1921 opened Jun 4, 2024 by minaremeli

Loading…

Previous 1 2 3 Next

Previous Next

ProTip! Adding no:label will show everything without a label.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly