vllm-project / vllm Public

Notifications You must be signed in to change notification settings
Fork 2.8k
Star 20.3k

Code
Issues 846
Pull requests 261
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Pull requests: vllm-project/vllm

Labels 41 Milestones 0

New pull request New

261 Open 1,803 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

[BugFix] Prevent LLM.encode for non-generation Models

#5184 opened Jun 1, 2024 by robertgshaw2-neuralmagic

Loading…

[Kernel] Switch fp8 layers to use the CUTLASS kernels

#5183 opened Jun 1, 2024 by tlrmchlsmth • Draft

[Model] LoRA support added for command-r

#5178 opened Jun 1, 2024 by sergey-tinkoff

Loading…

draft2

#5175 opened Jun 1, 2024 by khluu • Draft

bug fixed: cuda out of memory lead to 'AsyncEngineDeadError: Background loop has errored already.

#5173 opened Jun 1, 2024 by charent

Loading…

[Bugfix] Fix illegal memory access for lora

#5169 opened May 31, 2024 by sfc-gh-zhwang • Draft

[Bugfix] Fix KeyError: 1 When Using LoRA adapters

#5164 opened May 31, 2024 by BlackBird-Coding

Loading…

[Kernel] Pass a device pointer into the quantize kernel for the scales

#5159 opened May 31, 2024 by tlrmchlsmth

Loading…

[Core] Optimize gpu_memory_utilization parameter to be applied on the GPU memory available after peak_memory

#5158 opened May 31, 2024 by alexm-neuralmagic

Loading…

[Kernel] Add GPU architecture guards to the CUTLASS w8a8 kernels to reduce binary size

#5157 opened May 31, 2024 by tlrmchlsmth • Draft

[KERNEL] int8 quantization kernel refactoring & optimization WIP

#5146 opened May 31, 2024 by ZelboK

Loading…

add gptq_marlin test for bug report https://github.com/vllm-project/vllm/issues/5088

#5145 opened May 30, 2024 by alexm-neuralmagic

Loading…

[Bugfix] Fix KV head calculation for MPT models when using GQA

#5142 opened May 30, 2024 by bfontain

Loading…

[CI/Build] Test buildkite monorepo plugin

#5140 opened May 30, 2024 by dgoupil

Loading…

[Core] Remove unnecessary copies in flash attn backend

#5138 opened May 30, 2024 by Yard1

Loading…

[Feature][Frontend]: Add support for stream_options in ChatCompletionRequest

#5135 opened May 30, 2024 by Etelis

Loading…

[Speculative Decoding 1/2 ] Add typical acceptance sampling as one of the sampling techniques in the verifier

#5131 opened May 30, 2024 by sroy745

Loading…

[Misc] Simplify code and fix type annotations in conftest.py

#5118 opened May 30, 2024 by DarkLight1337

Loading…

[Kernel] Add w4a16 support for compressed_tensors models

#5116 opened May 30, 2024 by dsikka

Loading…

New CI template on AWS stack

#5110 opened May 29, 2024 by khluu

Loading…

[Speculative Decoding] Enable arbitrary model inputs

#5101 opened May 29, 2024 by abhigoyal1997 • Draft

1 of 8 tasks

[CI/Build] Simplify OpenAI server setup in tests

#5100 opened May 29, 2024 by DarkLight1337 • Draft

[Misc] Add vLLM version getter to utils

#5098 opened May 29, 2024 by DarkLight1337

Loading…

New vllm CLI

#5090 opened May 28, 2024 by EthanqX

Loading…

[Core][CUDA Graph] add output buffer for cudagraph to reduce memory footprint

#5074 opened May 27, 2024 by youkaichao

Loading…

Previous 1 2 3 4 5 … 10 11 Next

Previous Next

ProTip! Follow long discussions with comments:>50.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly