20231088/vllm - vllm - Luminance Code Repo

20231088/vllm

Author	SHA1	Message	Date
Michael Yao	26507f8973	[Docs] Fix a link and grammar issue in production-stack.md (#16809 ) Some checks failed pre-commit / pre-commit (push) Has been cancelled Details Close inactive issues and PRs / close-issues-and-pull-requests (push) Has been cancelled Details Signed-off-by: windsonsea <haifeng.yao@daocloud.io>	2025-04-18 06:42:58 +00:00
Nathan Weinberg	9c1d5b456d	[Doc] add podman setup instructions for official image (#16796 ) Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-04-18 06:10:49 +00:00
Lucia Fang	e31045f95c	[Bugfix] fix pp for llama4 (#16746 ) Signed-off-by: Lu Fang <fanglu@fb.com>	2025-04-18 13:51:30 +08:00
Luka Govedič	aaec845f8e	[ROCm] [Attention] Cleanup ROCm output passing (#16431 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com>	2025-04-18 05:46:45 +00:00
rongfu.leng	7bdfd29a35	[Misc] add collect_env to cli and docker image (#16759 ) Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>	2025-04-17 22:13:35 -07:00
Harry Mellor	e78587a64c	Improve-mm-and-pooler-and-decoding-configs (#16789 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-17 22:13:32 -07:00
Lucas Wilkinson	7eb4255628	[BugFix] Accuracy fix for llama4 int4 - improperly casted scales (#16801 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2025-04-17 22:13:29 -07:00
Michael Goin	6a0f547561	Add hardware print to TPU V1 test (#16792 )	2025-04-17 22:13:26 -07:00
Shanshan Shen	30ed81b7ca	[V1][Structured Output] Minor modification to `_validate_structured_output()` (#16748 ) Signed-off-by: shen-shanshan <467638484@qq.com>	2025-04-18 13:12:54 +08:00
Chauncey	7a4a5de729	[Misc] Update outdated note: LMCache now supports chunked prefill (#16697 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-04-18 05:12:42 +00:00
Cyrus Leung	c16fb5dae8	[Doc] Improve help examples for `--compilation-config` (#16729 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-17 21:22:34 -07:00
Tarun Kumar	e37073efd7	Add property-based testing for vLLM endpoints using an API defined by an OpenAPI 3.1 schema (#16721 ) Signed-off-by: Tarun Kumar <takumar@redhat.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-04-17 21:08:27 -07:00
Lucas Wilkinson	183dad7a85	[Attention] Update to lastest FA3 code (#13111 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2025-04-17 15:14:07 -07:00
Yihua Cheng	3408e47159	[P/D][V1] KV Connector API V1 (#15960 ) Signed-off-by: ApostaC <yihua98@uchicago.edu> Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> Signed-off-by: remi <remi@mistral.ai> Co-authored-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Rémi Delacourt <54138269+Flechman@users.noreply.github.com> Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>	2025-04-17 13:22:40 -07:00
Nick Hill	0377b8310b	[MLA] Simplification to batch P/D reordering (#16673 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-04-17 16:12:09 -04:00
Mark McLoughlin	e4755f7fac	[V1][Metrics] Fix http metrics middleware (#15894 )	2025-04-17 19:52:18 +00:00
Sijia(Jackson) Chen	92edf35826	[ROCM] enable aiter fused moe kernel for llama4 bf16 checkpoints (#16674 )	2025-04-17 11:44:34 -07:00
Nicolò Lucchesi	eb5819b2d9	[V1][TPU] Enable Top K (#15489 ) Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Hyesoo Yang <hyeygit@gmail.com> Co-authored-by: Hyesoo Yang <hyeygit@gmail.com>	2025-04-17 18:18:11 +00:00
Nicolò Lucchesi	5989f4684d	[TPU][V1] Fix padding recompilation when `max-num-batched-tokens` is not even (#16726 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-04-17 18:09:57 +00:00
rongfu.leng	5125d72f02	[Model] use AutoWeightsLoader for olmoe,opt,orion,persimmon,phi3_small (#16548 ) Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>	2025-04-17 17:48:31 +00:00
Ximingwang-09	a018e555fd	[Kernel] Add fp8_w8a8 fused MoE kernel tuning configs for DeepSeek V3/R1 on NVIDIA H20 (#16753 ) Signed-off-by: ximing.wxm <ximing.wxm@antgroup.com> Co-authored-by: ximing.wxm <ximing.wxm@antgroup.com>	2025-04-18 00:01:30 +08:00
Robin	6211b92273	[Bugfix]Fix index out of range error in api server log (#16787 ) Signed-off-by: WangErXiao <863579016@qq.com>	2025-04-17 09:01:07 -07:00
Nick Hill	05fcd1b430	[V1][Perf] Faster incremental detokenization (#15137 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-04-17 07:45:24 -07:00
Insu Kim	7c02d6a137	[Doc] Changed explanation of generation_tokens_total and prompt_tokens_total counter type metrics to avoid confusion (#16784 ) Signed-off-by: insukim1994 <insu.kim@moreh.io>	2025-04-17 14:10:08 +00:00
wang.yuqi	11c3b98491	[Doc] Document Matryoshka Representation Learning support (#16770 )	2025-04-17 13:37:37 +00:00
Cyrus Leung	dbe7f07001	[Doc] Make sure to update vLLM when installing latest code (#16781 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-17 06:53:31 -06:00
Reid	c69bf4ee06	fix: hyperlink (#16778 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-04-17 11:34:20 +00:00
Harry Mellor	d27ea94034	Improve configs - `TokenizerPoolConfig` + `DeviceConfig` (#16603 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-17 11:19:42 +00:00
Reid	99ed526101	[Misc] refactor examples series - lmcache (#16758 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-04-17 11:02:35 +00:00
Michael Yao	207da28186	[Doc] Fix a 404 link in installation/cpu.md (#16773 ) Signed-off-by: windsonsea <haifeng.yao@daocloud.io>	2025-04-17 10:46:21 +00:00
intervitens	5b1aca2ae3	[Bugfix] Fix GLM4 model (#16618 ) Signed-off-by: intervitens <intervitens@tutanota.com>	2025-04-17 03:35:07 -07:00
Reid	d8e557b5e5	[doc] add open-webui example (#16747 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-04-17 18:27:32 +08:00
Cyrus Leung	61a44a0b22	[Doc] Add more tips to avoid OOM (#16765 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-17 09:54:34 +00:00
DefTruth	a6481525b8	[misc] ignore marlin_moe_wna16 local gen codes (#16760 ) Signed-off-by: DefTruth <qiustudent_r@163.com>	2025-04-17 17:15:14 +08:00
Richard Liaw	8cac35ba43	[Ray] Improve documentation on batch inference (#16609 ) Signed-off-by: Richard Liaw <rliaw@berkeley.edu>	2025-04-16 22:19:26 -07:00
Russell Bryant	9dbf7a2dc1	[V1] Remove log noise when idle (#16735 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-04-16 21:34:08 -07:00
David Heineman	607029e515	[Bugfix] Revert max_prompt_len validation for decoder-only models. (#16741 ) Signed-off-by: David Heineman <david@davidheineman.com>	2025-04-16 21:33:15 -07:00
Isotr0py	cb072ce93b	[Bugfix] Update Florence-2 tokenizer to make grounding tasks work (#16734 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-04-17 04:17:39 +00:00
Divakar Verma	95aca283b4	[rocm][V0] fix selection logic for custom PA in V0 (#16426 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com>	2025-04-16 19:52:11 -07:00
Robert Shaw	2b05b8ce69	[V1][Frontend] Improve Shutdown And Logs (#11737 ) Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Andrew Feldman <afeldman@neuralmagic.com> Co-authored-by: afeldman-nm <156691304+afeldman-nm@users.noreply.github.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-04-16 19:48:34 -07:00
Aaruni Aggarwal	3c776dcefb	Adding vllm buildkite job for IBM Power (#16679 ) Signed-off-by: Aaruni Aggarwal <aaruniagg@gmail.com>	2025-04-17 10:47:47 +08:00
Bryan Lu	2cbd4d2999	[V1][Spec Dec Bug Fix] Respect Spec Dec Method Specification (#16636 ) Signed-off-by: Bryan Lu <yuzhelu@amazon.com>	2025-04-16 19:47:26 -07:00
Staszek Paśko	3092375e27	[V1][Performance] Implement custom serializaton for MultiModalKwargs [Rebased] (#16432 ) Signed-off-by: Staszek Pasko <staszek@gmail.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-04-16 19:28:32 -07:00
Harry Mellor	3cd91dc955	Help user create custom model for Transformers backend remote code models (#16719 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-17 01:05:59 +00:00
Jade Zheng	8a7368e069	[Misc] Remove redundant comment (#16703 ) Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>	2025-04-17 00:44:52 +00:00
Harry Mellor	93e561ec4d	Improve error for structured output backend selection (#16717 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-17 00:35:35 +00:00
Joe Runde	e1b004839a	[Hardware] Add processor inputs to platform validation (#16680 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2025-04-16 09:28:42 -07:00
xsank	ee378f3d49	[Model] support modernbert (#16648 ) Signed-off-by: 唯勤 <xsank.mz@alibaba-inc.com> Co-authored-by: 唯勤 <xsank.mz@alibaba-inc.com>	2025-04-16 05:30:15 -07:00
DefTruth	e82ee40de3	[Bugfix][Kernel] fix potential cuda graph broken for merge_attn_states kernel (#16693 ) Signed-off-by: DefTruth <qiustudent_r@163.com>	2025-04-16 03:31:39 -07:00
Cyrus Leung	facbe2a114	[Doc] Improve OOM troubleshooting (#16704 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-16 18:29:48 +08:00

1 2 3 4 5 ...

5902 Commits