| | vLLM (high-throughput LLM serving engine) (github.com/vllm-project) |
| 2 points by roody_wurlitzer 14 hours ago | past | discuss |
|
| | vLLM multi-turn conversations design (github.com/vllm-project) |
| 1 point by CCs 34 days ago | past |
|
| | VLLM-Omni: A framework for efficient model inference with Omni-modality models (github.com/vllm-project) |
| 2 points by zyh888 85 days ago | past | 1 comment |
|
| | Cost-efficient and pluggable Infrastructure components for GenAI inference (github.com/vllm-project) |
| 1 point by rrampage on Feb 23, 2025 | past |
|
| | Cost-efficient and pluggable Infrastructure components for GenAI inference (github.com/vllm-project) |
| 1 point by delduca on Feb 22, 2025 | past |
|
| | LLM compressor: compress models for efficient deployment (github.com/vllm-project) |
| 1 point by hajduksplit on Aug 20, 2024 | past | 1 comment |
|
| | VLLM Sacrifices Accuracy for Speed (github.com/vllm-project) |
| 1 point by behnamoh on Jan 24, 2024 | past |
|
| | Easy, fast, and cheap LLM serving for everyone (github.com/vllm-project) |
| 2 points by vincent_s on Dec 17, 2023 | past |
|
| | vllm (github.com/vllm-project) |
| 1 point by tosh on Dec 15, 2023 | past |
|
| | Mixtral Expert Parallelism (github.com/vllm-project) |
| 1 point by tosh on Dec 15, 2023 | past |
|
| | Official PR Reveals the Inference Code for Mixtral 8x7B (github.com/vllm-project) |
| 2 points by georgehill on Dec 11, 2023 | past |
|
| | Vllm: High-throughput and memory-efficient inference and serving engine for LLMs (github.com/vllm-project) |
| 3 points by tosh on Sept 10, 2023 | past |
|
| | Vllm (github.com/vllm-project) |
| 3 points by kordlessagain on Aug 6, 2023 | past |
|
| | VLLM (github.com/vllm-project) |
| 2 points by sherlockxu on June 25, 2023 | past |
|