[ACL 2024] Benchmarking the Hallucination of Chinese Large Language Models via Unconstrained Generation
-
Updated
May 27, 2024 - Python
[ACL 2024] Benchmarking the Hallucination of Chinese Large Language Models via Unconstrained Generation
ACL 2024 | LooGLE: Long Context Evaluation for Long-Context Language Models
Official Repo of LangSuitE
Official implementation of the ACL 2024: Scientific Inspiration Machines Optimized for Novelty
[ACL 2024 Main] NewsBench: A Systematic Evaluation Framework for Assessing Editorial Capabilities of Large Language Models in Chinese Journalism
Code and data for "Living in the Moment: Can Large Language Models Grasp Co-Temporal Reasoning?" (ACL 2024)
🧙🏻Code and benchmark for our Findings of ACL 2024 paper - "TimeChara: Evaluating Point-in-Time Character Hallucination of Role-Playing Large Language Models"
Unsupervised Multimodal Clustering for Semantics Discovery in Multimodal Utterances (ACL 2024)
RAID is the largest and most challenging benchmark for machine-generated text detectors. (ACL 2024)
Source code of "TRAP: Targeted Random Adversarial Prompt Honeypot for Black-Box Identification", ACL2024 (findings)
Source code of "Calibrating Large Language Models Using Their Generations Only", ACL2024
[ACL'24] A Knowledge-grounded Interactive Evaluation Framework for Large Language Models
From Zero to Hero: Cold-Start Anomaly Detection (ACL 2024)
Code for ACL Shared Task - 10 track 1 ,2, 3 . Achieved rank of 2 and 7 in track 2 and 3 respectively
Repository for the ACL 2024 paper "LIEDER: Linguistically-Informed Evaluation Suite for Discourse Entity Recognition"
Add a description, image, and links to the acl2024 topic page so that developers can more easily learn about it.
To associate your repository with the acl2024 topic, visit your repo's landing page and select "manage topics."