20231088/vllm - vllm - Luminance Code Repo

20231088/vllm

Author	SHA1	Message	Date
Robert Shaw	d4d93db2c5	[V1] V1 Enablement Oracle (#13726 ) Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Michael Goin <michael@neuralmagic.com>	2025-03-14 22:02:20 -07:00
Lu Fang	8c0d15d5c5	[Misc][Easy] Annotate unused vars in the csrc files (#14798 ) Signed-off-by: Lu Fang <lufang@fb.com>	2025-03-15 12:40:09 +08:00
Isotr0py	97ac781c62	[Misc] Remove misleading message in gemma2 and gemma3 (#14850 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-03-14 21:35:12 -07:00
Russell Bryant	776dcec8fe	Disable outlines cache by default (#14837 )	2025-03-15 03:57:55 +00:00
Tyler Michael Smith	ccf02fcbae	Revert "[Model] Mamba2 Prefill Performance Tweaks: Fixing Flurry of U… (#14848 )	2025-03-14 20:45:42 -07:00
DefTruth	acaea3bb07	[Bugfix][V1] Fix flashinfer sampling (#14815 )	2025-03-14 20:42:38 -07:00
Liangfu Chen	9f37422779	[Neuron][CI] update docker run command (#14829 ) Signed-off-by: Liangfu Chen <liangfc@amazon.com>	2025-03-14 18:51:35 -07:00
yarongmu-google	dd344e0342	[Bugfix] Fix torch_xla in V0 which can't handle None seed introduced … (#14844 ) Signed-off-by: Yarong Mu <ymu@google.com>	2025-03-15 00:41:15 +00:00
Yuan Tang	54a8804455	[Doc] More neutral K8s deployment guide (#14084 ) Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-03-14 16:12:36 -07:00
Russell Bryant	bbd94a19fc	[Build/CI] Upgrade aiohttp to incldue CVE fix (#14840 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-03-14 23:11:28 +00:00
Russell Bryant	233ffce1eb	[Build/CI] Move ninja to common deps (#14835 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-03-14 21:25:28 +00:00
Richard Liu	40677783aa	[CI] Add TPU v1 test (#14834 ) Signed-off-by: Richard Liu <ricliu@google.com>	2025-03-14 17:13:30 -04:00
Michael Goin	14f301b541	Update to torch==2.6.0 (#12721 ) Signed-off-by: mgoin <michael@neuralmagic.com> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: luka <luka@neuralmagic.com> Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-03-14 16:58:30 -04:00
Russell Bryant	46f98893dd	[V1] Fix model parameterization for structured output tests (#14833 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-03-14 20:55:18 +00:00
Chih-Chieh Yang	fe66b34728	[Model] Mamba2 Prefill Performance Tweaks: Fixing Flurry of Unnecessary Memory Copies (#14778 ) Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>	2025-03-14 16:36:18 -04:00
Alexei-V-Ivanov-AMD	270a5da495	Re-enable the AMD Entrypoints Test (#14711 ) Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>	2025-03-14 12:18:13 -07:00
Kevin H. Luu	7097b4cc1c	[release] Remove log cleanup commands from TPU job (#14838 )	2025-03-14 11:59:52 -07:00
Yajie Wang	977a16772c	[Bugfix][Kernel]: Fix AllSpark kernel compilation errors and enable for CUDA < 12.0 (#14430 ) Signed-off-by: wyj371990 <wyj371990@alibaba-inc.com>	2025-03-14 09:55:14 -07:00
daniel-salib	73deea2fdb	[Frontend] track server_load (#13950 )	2025-03-14 09:53:17 -07:00
Mark McLoughlin	9d2b4a70f4	[V1][Metrics] Updated list of deprecated metrics in v0.8 (#14695 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-03-15 00:45:25 +08:00
Russell Bryant	0b0d6421b2	[Frontend] Fix log message to use http vs https (#14774 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-03-14 09:21:09 -07:00
Russell Bryant	1140991a7b	[V1] Fix vocab size calculation for structured output (#14826 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-03-14 09:18:38 -07:00
Cyrus Leung	613c5bb945	[Bugfix] Fix Aria test loading (#14823 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-14 09:11:23 -07:00
Guillaume Calmettes	fd8e055ffb	[BugFix]: properly catch templating error when preprocess input (#13976 ) Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>	2025-03-14 05:58:34 -07:00
Cyrus Leung	ab93f1360f	[VLM] Various cleanup and fixes (#14806 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-14 05:58:19 -07:00
DefTruth	40253bab44	[Bugfix][W8A8] fixed cutlass block fp8 binding (#14796 )	2025-03-14 03:32:42 -07:00
Woosuk Kwon	c77620d22d	[V1][Minor] Minor code cleanup for scheduling metrics (#14800 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-03-14 08:21:28 +00:00
Jee Jee Li	989ecd2007	[Misc] Gemma3ForConditionalGeneration supports LoRA (#14797 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-03-14 01:07:30 -07:00
WeiCheng	54cc46f3eb	[Bugfix] Fix small typo in the example of Streaming delimiter (#14793 )	2025-03-14 08:05:17 +00:00
Cyrus Leung	601bd3268e	[Misc] Clean up type annotation for `SupportsMultiModal` (#14794 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-14 00:59:56 -07:00
Li Wang	09269b3127	[BugFix]Fix performance serving benchmark when enable profiling (#14737 ) Signed-off-by: wangli <wangli858794774@gmail.com>	2025-03-14 07:02:05 +00:00
Thien Tran	27b50f1fe6	[Bugfix][Kernel][CPU] Fix num_tokens in CPU rotary embedding kernel (#14667 ) Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg>	2025-03-13 23:47:49 -07:00
Lucas Wilkinson	9532c49836	[Attention] MLA get rid of materialization (#14770 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-03-13 23:39:02 -07:00
Roger Wang	0c2af17c76	[CI] Fix missing example model id in processor test (#14787 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2025-03-14 13:52:15 +08:00
Jennifer Zhao	a6e0d096dd	[Feature] Add visionarena offline support for benchmark_throughput (#14654 ) Signed-off-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com> Signed-off-by: Jennifer Zhao <ai.jenniferzhao@gmail.com> Co-authored-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com> Co-authored-by: Jennifer Zhao <JenZhao@users.noreply.github.com> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>	2025-03-14 04:07:54 +00:00
Liangfu Chen	d3d4956261	[Neuron] flatten test parameterization for neuron attention kernels (#14712 )	2025-03-13 20:46:56 -07:00
Nick Hill	4059adc31b	[Misc][Minor] Simplify `SamplingParams.__post_init__()` (#14772 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-03-14 11:44:20 +08:00
Kevin H. Luu	f1f632d9ec	[ci] Reduce number of tests in fastcheck (#14782 )	2025-03-13 20:43:45 -07:00
Thien Tran	95d680b862	[Bugfix][IPEX] Add `VLLM_CPU_MOE_PREPACK` to allow disabling MoE prepack when CPU does not support it (#14681 ) Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg>	2025-03-13 20:43:18 -07:00
Thomas Parnell	fb4c7f8ef0	[Kernel] [V1] Further optimizations to ROCm (Triton) Backend to better handle GQA. (#14431 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com> Co-authored-by: Jan van Lunteren <jvl@zurich.ibm.com> Co-authored-by: Burkhard Ringlein <ngl@zurich.ibm.com> Co-authored-by: Chih-Chieh Yang <chih.chieh.yang@ibm.com>	2025-03-13 20:42:27 -07:00
Varun Sundar Rabindranath	0b1cfa6180	[Kernel] LoRA - Enable CUDAGraphs for V1 (#14626 ) Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2025-03-13 20:42:04 -07:00
Woosuk Kwon	32ef4983cd	[V1] Temporarily disable FlashInfer Rejection Sampler (#14788 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-03-13 20:40:35 -07:00
Roger Wang	ad19c8a003	[V1] Move OOM check into sampler run (#14728 ) Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: Simon Mo <simon.mo@hey.com>	2025-03-13 20:40:23 -07:00
Jeff Daily	2a602b055a	forward fix PR 14245, restore build on ROCm 6.2 (#14709 ) Signed-off-by: Jeff Daily <jeff.daily@amd.com>	2025-03-13 20:40:15 -07:00
Alexander Matveev	7888e1d0a3	[V1] TPU - Enable prefix caching by default (#14773 )	2025-03-13 20:40:05 -07:00
Chen Zhang	60c872d4b6	[Doc] Fix small typo in Transformers fallback (#14791 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-03-13 20:33:12 -07:00
yasu52	3fb17d26c8	[Doc] Fix typo in documentation (#14783 ) Signed-off-by: yasu52 <tsuguro4649@gmail.com>	2025-03-13 20:33:09 -07:00
Lucas Wilkinson	d47807ba08	[Attention] Remove slow setattr in MLA (#14769 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-03-13 21:31:14 +00:00
afeldman-nm	02fcaa3d0a	[V1] Detokenizer: Respect Stop Tokens + not include_stop_str_in_output (#14624 ) Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com>	2025-03-13 19:07:34 +00:00
Aaron Pham	8a4a2efc6f	[V1][Core] using cached vocab_size for Structured Outputs (#14630 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz>	2025-03-13 11:39:28 -07:00

1 2 3 4 5 ...

5173 Commits