LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
-
Updated
Jun 26, 2024 - Python
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
(REOS) Radar and Electro-Optical Simulation Framework written in C++.
(REOS) Radar and ElectroOptical Simulation Framework written in Fortran.
CUDA C++ Core Libraries
Safe rust wrapper around CUDA toolkit
Spiral's Machine Learning Library
vector calculation with GPU acceleration using CUDA
Kernel Tuner
Some common CUDA kernel implementations (Not the fastest).
🎉CUDA 笔记 / 大模型手撕CUDA / C++笔记,更新随缘: flash_attn、sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.
🚀 TensorRT-YOLO: Supports YOLOv3, YOLOv5, YOLOv6, YOLOv7, YOLOv8, YOLOv9, YOLOv10, and PP-YOLOE using TensorRT acceleration with EfficientNMS, CUDA Kernels and CUDA Graphs!
A beginner's guide to CUDA programming
CUDA Kernel Benchmarking Library
A tool for examining GPU scheduling behavior.
From zero to hero CUDA for accelerating maths and machine learning on GPU.
Just a few cuda kernels with ability to use it from python as dll
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
Accelerate and optimize existing C/C++ CPU-only applications using the most essential CUDA tools and techniques.
Implement Neural Networks in Cuda from Scratch
Add a description, image, and links to the cuda-kernels topic page so that developers can more easily learn about it.
To associate your repository with the cuda-kernels topic, visit your repo's landing page and select "manage topics."