20231088/vllm - vllm - Luminance Code Repo

20231088/vllm

Author	SHA1	Message	Date
Michael Goin	c70cf0fe06	[Kernel] Use moe_wna16 kernel for compressed tensors wna16 moe models (#16038 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-10 15:08:47 +08:00
Cyrus Leung	a5d11a54dc	[Bugfix] Fix validation error for text-only Mllama 3.2 (#16377 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-10 14:19:42 +08:00
Cyrus Leung	3d4c87758e	[Misc] Update transformers version limits of multi-modal tests (#16381 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-09 23:03:33 -07:00
Aaron Ang	a9bd832fc5	[Model] use AutoWeightsLoader for deepseek_v2, internlm2 (#16383 ) Signed-off-by: Aaron Ang <aaron.angyd@gmail.com>	2025-04-09 23:01:00 -07:00
Chenyaaang	417bcefbae	fix sonnet dataset sample when prefix len is very small (#16379 ) Signed-off-by: Chenyaaang <chenyangli@google.com>	2025-04-10 05:35:07 +00:00
Michael Goin	baada0e737	[Bugfix][TPU] Fix TPU validate_request (#16369 ) Signed-off-by: Michael Goin <mgoin64@gmail.com>	2025-04-10 12:55:12 +08:00
Benjamin Kitor	82eb61dd4c	[misc] use tqdm.auto where appropriate (#16290 ) Signed-off-by: Benjamin Kitor <bkitor@gigaio.com>	2025-04-09 21:54:54 -07:00
Roger Wang	0d4d06fe2f	[CI][Bugfix] Pin triton version for CPU (#16384 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2025-04-10 04:35:00 +00:00
Jintao	4aed0ca6a2	[bugfix] Avoid the time consumption caused by creating dummy videos. (#16371 )	2025-04-10 04:30:05 +00:00
Chengji Yao	1621b25288	[TPU] Fix dummy loading OOM (#16372 ) Signed-off-by: Chengji Yao <chengjiyao@google.com>	2025-04-10 04:06:16 +00:00
Aaron Ang	a564797151	[Model] use AutoWeightsLoader for granite, granitemoe, granitemoeshared, grok1, mixtral (#16325 ) Signed-off-by: Aaron Ang <aaron.angyd@gmail.com>	2025-04-09 20:07:40 -07:00
Guillaume Calmettes	1da6a09274	[Bugfix]: do not shutdown server if `skip_special_use=False` for MistralTokenizer (#14094 ) Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>	2025-04-09 19:43:09 -07:00
Yuxuan Zhang	1e44ffc3ff	Add GLM-4-0414 support (#16338 ) Signed-off-by: lvfei.lv <lvfei.lv@alibaba-inc.com> Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: yihong0618 <zouzou0208@gmail.com> Signed-off-by: Lu Fang <fanglu@fb.com> Signed-off-by: Ajay Vohra <ajayvohr@amazon.com> Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com> Co-authored-by: Accelerator1996 <lvfei.lv@alibaba-inc.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Co-authored-by: Michael Goin <michael@neuralmagic.com> Co-authored-by: yihong <zouzou0208@gmail.com> Co-authored-by: Lucia Fang <116399278+luccafong@users.noreply.github.com> Co-authored-by: ajayvohra2005 <ajayvohr@amazon.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com> Co-authored-by: Guillaume Calmettes <gcalmettes@scaleway.com>	2025-04-10 09:19:42 +08:00
Chengji Yao	a454748544	[TPU][V1] Refine tpu_model_runner to mitigate future recompilation issues (#16275 ) Signed-off-by: Chengji Yao <chengjiyao@google.com>	2025-04-09 18:51:51 -06:00
Reid	1bff42c4b7	[Misc] refactor Structured Outputs example (#16322 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-04-09 23:32:42 +00:00
Joe Runde	cb391d85dc	[Hardware] add platform-specific request validation api (#16291 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2025-04-09 12:50:01 -07:00
Russell Bryant	fee5b8d37f	[Build/CI] Add tracing deps to vllm container image (#15224 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-04-09 19:14:06 +00:00
Michael Goin	b2ce859bd2	Fix `benchmark_throughput.py --backend=hf` (#16352 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-09 19:09:28 +00:00
Chendi.Xue	566f10a929	[CI]Fix hpu docker and numpy version for CI (#16355 ) Signed-off-by: Chendi Xue <chendi.xue@intel.com>	2025-04-09 17:52:26 +00:00
Guillaume Calmettes	c3b5189137	[Bugfix] catch AssertionError in MistralTokenizer as ValueError (#16344 ) Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>	2025-04-09 17:33:24 +00:00
zh Wang	a25866ac8d	[Bugfix] Fix profiling.py (#16202 ) Signed-off-by: zh Wang <rekind133@outlook.com>	2025-04-09 17:03:34 +00:00
Michael Goin	098900d7c2	Revert "Update label-tpu mergify and remove removal bot" (#16350 )	2025-04-09 07:59:36 -07:00
Guillaume Calmettes	98d01d3ce2	[Bugfix][Frontend] respect provided default guided decoding backend (#15476 ) Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>	2025-04-09 05:11:10 -07:00
Nicolò Lucchesi	d55244df31	[Model] Add `SupportsMultiModal.get_language_model` interface (#16007 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-04-09 04:12:54 -07:00
yihong	04149cce27	[BugFix] fix some typos found by typos. (#16314 ) Signed-off-by: yihong0618 <zouzou0208@gmail.com>	2025-04-09 03:43:59 -07:00
ajayvohra2005	24834f4894	update neuron config (#16289 ) Signed-off-by: Ajay Vohra <ajayvohr@amazon.com>	2025-04-09 03:43:22 -07:00
Lucia Fang	ec7da6fcf3	[BugFix] llama4 qknorm should be not shared across head (#16311 ) Signed-off-by: Lu Fang <fanglu@fb.com>	2025-04-09 00:59:14 -07:00
yihong	819d548e8a	[BugFix] logger is not callable (#16312 ) Signed-off-by: yihong0618 <zouzou0208@gmail.com>	2025-04-09 00:59:02 -07:00
Michael Goin	477d2a8aa2	Update label-tpu mergify and remove removal bot (#16298 )	2025-04-09 07:56:25 +00:00
Cyrus Leung	e484e02857	[Bugfix] Avoid transferring cached multi-modal items from P0 to P1 (#16273 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-09 00:51:27 -07:00
Accelerator1996	24f6b9a713	[Misc] Fix test_sharded_state_loader.py(#16004 ) (#16005 ) Signed-off-by: lvfei.lv <lvfei.lv@alibaba-inc.com>	2025-04-09 14:47:30 +08:00
Luka Govedič	9cdde47289	[BugFix] Fix fusion test and add them to CI (#16287 ) Signed-off-by: luka <luka@neuralmagic.com>	2025-04-08 23:46:45 -07:00
Chengji Yao	b1eb4ca152	[TPU] Update PyTorch/XLA (#16288 ) Signed-off-by: Chengji Yao <chengjiyao@google.com>	2025-04-09 14:46:32 +08:00
Michael Goin	87b4ac56c2	[CI][Bugfix] Fix bad tolerance for test_batch_base64_embedding (#16221 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-09 04:14:46 +00:00
Russell Bryant	cb84e45ac7	[Core] Upgrade to xgrammar 0.1.18, add cache size limit (#16283 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-04-08 19:13:22 -07:00
rongfu.leng	4716377fbc	[Feature] Estimate max-model-len use available KV cache memory (#16168 ) Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>	2025-04-08 19:12:51 -07:00
rongfu.leng	4e9cf8c1dd	[Bugfix] fix gettid method is not define (#16084 ) Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>	2025-04-08 19:12:44 -07:00
TJian	2976dc27e9	[Bug] [ROCm] Fix Llama 4 Enablement Bug on ROCm: V0 ROCmFlashAttentionImpl and Triton Fused MoE bugs (#16198 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com> Co-authored-by: Hongxia Yang <hongxia.yang@amd.com> Co-authored-by: kliuae <kuanfu.liu@embeddedllm.com>	2025-04-08 19:12:34 -07:00
Chauncey	102bf967f0	[Model] Add smolvlm support (#16017 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-04-08 19:12:17 -07:00
yueshen2016	1f4b09b525	Add support to modelopt quantization of Mixtral model (#15961 ) Signed-off-by: Yue <yueshen@nvidia.com>	2025-04-09 01:53:31 +00:00
Jee Jee Li	86c3369eb8	[CI/Build] Fix CI LoRA failure (#16270 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-04-09 09:13:56 +08:00
Russell Bryant	2755c34a8f	[V1] Update structured output offline inference example (#15721 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-04-08 22:34:09 +00:00
Jinzhen Lin	db10422184	[Bugfix] fix deepseek fp16 scale bug (#14809 ) Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-04-08 16:56:09 -04:00
Lucas Wilkinson	e1a2c699dd	[BugFix] Fix Llama4 - Index Error When Single Request Near Max Context (#16209 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2025-04-08 18:56:51 +00:00
Harry Mellor	0115ccd5c0	Add warning that content below line in template will be removed (#16276 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-08 18:18:40 +00:00
Isotr0py	40b4284fe3	[Bugfix] Handle `process_weights_after_loading` for `QKVCrossParallelLinear` (#15328 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-04-08 10:02:23 -07:00
Cyrus Leung	4ebc0b9640	[Bugfix] Proper input validation for multi-modal encoder-decoder models (#16156 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-08 09:45:21 -07:00
Kero Liang	dc96fd54c6	[Misc] Avoid stripping meaningful whitespace from `nvidia-smi topo -m` output in collect_env.py (#16272 ) Signed-off-by: imkero <kerorek@outlook.com>	2025-04-08 16:08:09 +00:00
wang.yuqi	1f5d13ab9f	[New Model]: jinaai/jina-embeddings-v3 (#16120 )	2025-04-08 08:39:12 -07:00
Harry Mellor	90cb44eb02	Update to transformers==4.51.1 (#16257 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-08 06:53:39 -07:00

... 2 3 4 5 6 ...

5902 Commits