llm-inference

🔮 SuperDuperDB: Bring AI to your database! Build, deploy and manage any AI application directly with your existing data infrastructure, without moving your data. Including streaming inference, scalable model training and vector search.

Updated Jun 26, 2024
Python

predibase / lorax

Star

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

transformers pytorch llama gpt lora model-serving fine-tuning llm llmops llm-serving llm-inference

Updated Jun 26, 2024
Python

microsoft / autogen

Star

A programming framework for agentic AI. Discord: https://aka.ms/autogen-dc. Roadmap: https://aka.ms/autogen-roadmap

chat chatbot gpt chat-application agent-based-framework agent-oriented-programming gpt-4 chatgpt llmops gpt-35-turbo llm-agent llm-inference agentic llm-framework agentic-agi

Updated Jun 26, 2024
Jupyter Notebook

feifeibear / long-context-attention

Star

Sequence Parallel Attention for Long Context LLM Model Training and Inference

pytorch attention-is-all-you-need llm-training llm-inference ring-attention deepspeed-ulysses

Updated Jun 26, 2024
Python

katha-ai / VELOCITI

Star

VELOCITI Benchmark Evaluation and Visualisation Code

benchmarking benchmark video artificial-intelligence dataset awesome-list clip evaluation-metrics video-understanding vlm semantic-role-labeling llm chain-of-thought vision-language-model llm-inference llama3

Updated Jun 26, 2024
Python

bentoml / BentoML

Star

The easiest way to serve AI/ML models in production - Build Model Inference Service, LLM APIs, Multi-model Inference Graph/Pipelines, LLM/RAG apps, and more!

python machine-learning deep-learning model-serving multimodal mlops ml-engineering llm generative-ai llmops llm-serving model-inference-service llm-inference inference-platform

Updated Jun 26, 2024
Python

intel / intel-extension-for-transformers

Star

⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡

retrieval chatbot rag habana large-language-model chatpdf llm-inference 4-bits speculative-decoding llm-cpu streamingllm intel-optimized-llamacpp neural-chat neural-chat-7b autoround gaudi3

Updated Jun 26, 2024
Python

Lightning-AI / litgpt

Star

Load, pretrain, finetune, deploy 20+ LLMs on your own data. Uses state-of-the-art techniques: flash attention, FSDP, 4-bit, LoRA, and more.

ai deep-learning artificial-intelligence large-language-models llm llms llm-inference

Updated Jun 26, 2024
Python

Infini-AI-Lab / TriForce

Star

TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding

acceleration efficiency inference llm long-context llm-inference speculative-decoding

Updated Jun 26, 2024
Python

arcee-ai / arcee-python

Star

The Arcee client for executing domain-adpated language model routines https://pypi.org/project/arcee-py/

ai llm llmops llm-training llm-inference

Updated Jun 26, 2024
Python

felladrin / MiniSearch

Star

Minimalist web-searching app with an AI assistant that runs directly from your browser. Uses Web-LLM, Ratchet-ML, Wllama and SearXNG. Demo: https://felladrin-minisearch.hf.space