-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Pull requests: vllm-project/vllm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[BugFix] Prevent
LLM.encode
for non-generation Models
#5184
opened Jun 1, 2024 by
robertgshaw2-neuralmagic
Loading…
[Kernel] Switch fp8 layers to use the CUTLASS kernels
#5183
opened Jun 1, 2024 by
tlrmchlsmth
•
Draft
bug fixed: cuda out of memory lead to 'AsyncEngineDeadError: Background loop has errored already.
#5173
opened Jun 1, 2024 by
charent
Loading…
[Bugfix] Fix KeyError: 1 When Using LoRA adapters
#5164
opened May 31, 2024 by
BlackBird-Coding
Loading…
[Kernel] Pass a device pointer into the quantize kernel for the scales
#5159
opened May 31, 2024 by
tlrmchlsmth
Loading…
[Core] Optimize gpu_memory_utilization parameter to be applied on the GPU memory available after peak_memory
#5158
opened May 31, 2024 by
alexm-neuralmagic
Loading…
[Kernel] Add GPU architecture guards to the CUTLASS w8a8 kernels to reduce binary size
#5157
opened May 31, 2024 by
tlrmchlsmth
•
Draft
[KERNEL] int8 quantization kernel refactoring & optimization WIP
#5146
opened May 31, 2024 by
ZelboK
Loading…
add gptq_marlin test for bug report https://github.com/vllm-project/vllm/issues/5088
#5145
opened May 30, 2024 by
alexm-neuralmagic
Loading…
[Bugfix] Fix KV head calculation for MPT models when using GQA
#5142
opened May 30, 2024 by
bfontain
Loading…
[Feature][Frontend]: Add support for
stream_options
in ChatCompletionRequest
#5135
opened May 30, 2024 by
Etelis
Loading…
[Speculative Decoding 1/2 ] Add typical acceptance sampling as one of the sampling techniques in the verifier
#5131
opened May 30, 2024 by
sroy745
Loading…
[Misc] Simplify code and fix type annotations in
conftest.py
#5118
opened May 30, 2024 by
DarkLight1337
Loading…
[Kernel] Add
w4a16
support for compressed_tensors
models
#5116
opened May 30, 2024 by
dsikka
Loading…
[Speculative Decoding] Enable arbitrary model inputs
#5101
opened May 29, 2024 by
abhigoyal1997
•
Draft
1 of 8 tasks
[Core][CUDA Graph] add output buffer for cudagraph to reduce memory footprint
#5074
opened May 27, 2024 by
youkaichao
Loading…
Previous Next
ProTip!
Follow long discussions with comments:>50.