A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
-
Updated
Jun 26, 2024 - Python
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
autoupdate paper list
ms-swift: Use PEFT or Full-parameter to finetune 250+ LLMs or 35+ MLLMs. (Qwen2, GLM4, Internlm2, Yi, Llama3, Llava, MiniCPM-V, Deepseek, Baichuan2, Phi3-Vision, ...)
Visual Instruction Tuning for Qwen2 Base Model
Visualize streams of multimodal data. Fast, easy to use, and simple to integrate. Built in Rust using egui.
Turn your screen into actions (using LLMs). Inspired by adept.ai, rewind.ai, Apple Shortcut. Rust + WASM.
HuixiangDou: Overcoming Group Chat Scenarios with LLM-based Technical Assistance
Generative AI suite powered by state-of-the-art models and providing advanced AI/AGI functions. It features AI personas, AGI functions, multi-model chats, text-to-image, voice, response streaming, code highlighting and execution, PDF import, presets for developers, much more. Deploy on-prem or in the cloud.
A simple open-sourced SigLIP model finetuned on Genshin Impact's image-text pairs.
The easiest way to serve AI/ML models in production - Build Model Inference Service, LLM APIs, Multi-model Inference Graph/Pipelines, LLM/RAG apps, and more!
Official Pytorch implementation of MICCAI 2024 paper (early accept, top 11%) Mammo-CLIP: A Vision Language Foundation Model to Enhance Data Efficiency and Robustness in Mammography
A toolkit for building AI agents that use devices
Data Infrastructure for Multimodal AI: Data, models, and orchestration in a unified declarative interface.
React component library for crafting user-friendly and engaging conversational experiences
[ICLR 2023] Multimodal Analogical Reasoning over Knowledge Graphs
Paper list about multimodal and large language models, only used to record papers I read in the daily arxiv for personal needs.
Build real-time multimodal AI applications 🤖🎙️📹
Repository contains LinkedIn posts about Generative AI knowledge sharing, learning resources and research explanations.
Optimized local inference for LLMs with HuggingFace-like APIs for quantization, vision/language models, multimodal agents, speech, vector DB, and RAG.
(CVPR 2024) 🧩 TokenCompose: Text-to-Image Diffusion with Token-level Supervision
Add a description, image, and links to the multimodal topic page so that developers can more easily learn about it.
To associate your repository with the multimodal topic, visit your repo's landing page and select "manage topics."