Skip to content

AudioLLMs/AudioLLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 

Repository files navigation

AudioLLMs

This repository is a curated collection of research papers focused on the development, implementation, and evaluation of language models for audio data. Our goal is to provide researchers and practitioners with a comprehensive resource to explore the latest advancements in AudioLLMs. Contributions and suggestions for new papers are highly encouraged!

Papers on Models

Date Model Key Affiliations Paper Link
2024-05 SpeechVerse AWS SpeechVerse: A Large-scale Generalizable Audio Language Model Paper
2024-04 SALMONN Tsinghua SALMONN: Towards Generic Hearing Abilities for Large Language Models Paper / Code / Demo
2024-03 WavLLM CUHK WavLLM: Towards Robust and Adaptive Speech Large Language Model Paper / Code
2024-01 Pengi Microsoft Pengi: An Audio Language Model for Audio Tasks Paper / Code
2023-12 Qwen-Audio Alibaba Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models Paper / Code / Demo
2023-12 LTU-AS MIT Joint Audio and Speech Understanding Paper / Code / Demo
2023-10 UniAudio CUHK An Audio Foundation Model Toward Universal Audio Generation Paper / Code / Demo
2023-09 LLaSM LinkSoul.AI LLaSM: Large Language and Speech Model Paper / Code
2023-06 AudioPaLM Google AudioPaLM: A Large Language Model that Can Speak and Listen Paper / Demo
2023-05 VioLA Microsoft VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation Paper
2023-05 SpeechGPT Fudan SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities Paper / Code / Demo
2023-04 AudioGPT Zhejiang Uni AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head Paper / Code