20231088/vllm - vllm - Luminance Code Repo

20231088/vllm

Author	SHA1	Message	Date
Cyrus Leung	e02883c400	[Misc] Don't run ruff at all on 3rd party libs (#14493 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-08 07:16:40 -08:00
Russell Bryant	9085aabd62	[benchmarks] Add option to use unique jsonschema for each request (#14457 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-03-08 06:36:39 -08:00
Roger Wang	8d5aa466fb	[V1][Core] Fix memory issue with logits & sampling (#13776 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2025-03-08 06:11:04 -08:00
Aaron Pham	0b7f06b447	[Misc] add `use_tqdm_on_load` to reduce logs (#14407 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz>	2025-03-08 05:57:46 -08:00
Isotr0py	03fe18ae0f	[VLM] Add TP support for Phi-4-MM (#14453 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-03-08 05:57:14 -08:00
Alexander Matveev	cb8bdfade2	[V1] TPU - Add tensor parallel support via Ray (#13618 ) Signed-off-by: Alexander Matveev <amatveev@redhat.com>	2025-03-08 08:19:38 -05:00
Cyrus Leung	33f227e16b	[CI/Build] Use a fixed seed to avoid flaky tests (#14480 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-08 11:30:09 +00:00
Harry Mellor	cfd0ae8234	Add RLHF document (#14482 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-03-08 09:51:39 +00:00
Lucas Wilkinson	7caff01a7b	[Build/BugFix] Fix hopper 12.8 build (#14354 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-03-08 08:11:56 +00:00
Harry Mellor	be0b399d74	Add training doc signposting to TRL (#14439 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-03-08 07:35:07 +00:00
Jee Jee Li	b8b0ccbd2d	[Bugfix] Make the deviceprofiler include LoRA memory. (#14469 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-03-08 07:12:22 +00:00
Robin	c908a07f57	[Doc] Added QwQ-32B to the supported models list in the reasoning out… (#14479 ) Signed-off-by: WangErXiao <863579016@qq.com>	2025-03-08 07:07:32 +00:00
Robin	7b6fd6e486	[Doc]add doc for Qwen models tool calling (#14478 ) Signed-off-by: WangErXiao <863579016@qq.com>	2025-03-08 06:58:46 +00:00
Harry Mellor	47512b3200	Default to `generation_config` from model (#12622 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-03-08 14:46:15 +08:00
Roger Meier	3b9c6c6947	[CI/Build] refactor: set timezone of container to UTC (#12888 ) Signed-off-by: Roger Meier <r.meier@siemens.com>	2025-03-07 22:42:01 -08:00
Aviv Keshet	4aae667668	[core] add `extra_args` to `SamplingParams` (#13300 ) Signed-off-by: Aviv Keshet <akeshet@scaledcognition.com>	2025-03-08 14:41:18 +08:00
Cody Yu	9f3bc0f58c	[MISC][V1] Register process killing handler only in the main thread (#14380 ) Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>	2025-03-07 22:40:06 -08:00
Mathis Felardos	980385f8c1	[Bugfix][Disaggregated] Add a check in send_kv_caches_and_hidden_states and fix the reshape of the KVCache (#14369 ) Signed-off-by: Mathis Felardos <mathis@mistral.ai>	2025-03-07 22:39:31 -08:00
Tyler Michael Smith	ca7a2d5f28	Revert "[Perf] Reduce MLA CPU overheads in V1 (#14384 )" (#14471 )	2025-03-07 22:18:53 -08:00
Tyler Michael Smith	333681408f	[Bugfix][V1] Handle MLA in kv_cache_interface (#14462 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-03-07 22:18:25 -08:00
afeldman-nm	ef64044079	[V1] Prompt logprobs + APC compatibility; prompt logprobs reqs cannot fill APC (#13949 )	2025-03-08 01:48:12 +00:00
yarongmu-google	66e16a038e	[Bugfix] Fix torch_xla which can't handle None seed introduced in #14274 (#14459 ) Signed-off-by: Yarong Mu <ymu@google.com>	2025-03-07 23:17:04 +00:00
Mark McLoughlin	e1f0835ae0	[V1][Metrics] Fix traceback with preemptions+LoRA (#14220 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-03-07 15:36:16 -05:00
Nick Hill	8ed5421aaa	[V1] Eagerly remove finished requests from the batch (#14388 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-03-07 10:56:00 -08:00
youkaichao	c6359e8ca6	[v1] torch.compile integration explanation (#14437 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-03-08 01:55:50 +08:00
Jee Jee Li	952a074980	[Misc] Add Phi4-MM example (#14343 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-03-07 17:28:52 +00:00
Jinzhen Lin	d0feea31c7	[Kernel] optimize performance of gptq marlin kernel when n is small (#14138 ) Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>	2025-03-07 11:53:38 -05:00
Jeremy Arnold	58abe35455	[Benchmarks] Make detokenization optional in benchmark scripts (#11697 ) Signed-off-by: Jeremy Arnold <Jeremy.Arnold@amd.com>	2025-03-07 08:09:00 -08:00
York-RDWang	f7ebad2307	[Doc] Update prefix_caching.md to match the example image (#14420 )	2025-03-07 15:29:00 +00:00
Aaron Pham	80e9afb5bc	[V1][Core] Support for Structured Outputs (#12388 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-03-07 07:19:11 -08:00
iefgnoix	1e3598edeb	Use the optimized block sizes after tuning the kernel. (#14329 )	2025-03-07 13:25:13 +00:00
Harry Mellor	f7a6bd0fa1	Fix missing `kv_caches` and `attn_metadata` in `OpenVINOCausalLM` (#14271 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-03-07 12:30:42 +00:00
Aleksandr Malyshev	0ca3b8e01c	[BUGFIX] Skip tokenization support for throughput benchmark (#12712 ) Signed-off-by: root <root@banff-cyxtera-s73-5.ctr.dcgpu> Signed-off-by: Aleksandr Malyshev <maleksan@amd.com> Co-authored-by: root <root@banff-cyxtera-s73-5.ctr.dcgpu> Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>	2025-03-07 02:51:47 -08:00
மனோஜ்குமார் பழனிச்சாமி	cc10281498	[Misc] Set default value of seed to None (#14274 ) Signed-off-by: மனோஜ்குமார் பழனிச்சாமி <smartmanoj42857@gmail.com>	2025-03-07 10:40:01 +00:00
Cyrus Leung	05fb6718f0	[Bugfix] Clean up multi-modal processors (#14417 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-07 10:33:38 +00:00
Jee Jee Li	12c29a881f	[Bugfix] Further clean up LoRA test (#14422 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-03-07 10:30:55 +00:00
Peng Li	70da0c0748	correct wrong markdown syntax (#14414 ) Signed-off-by: vincent-pli <justdoit.pli@gmail.com>	2025-03-07 08:01:18 +00:00
Cyrus Leung	c1588a2c94	[GH] Auto-apply multi-modality label to relevant PRs (#14402 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-07 15:26:32 +08:00
Ilya Lavrenov	8ca7a71df7	OpenVINO: added CPU-like conditions (#14338 ) Signed-off-by: Ilya Lavrenov <ilya.lavrenov@intel.com>	2025-03-06 22:24:49 -08:00
Isotr0py	63137cd922	[Build] Add nightly wheel fallback when latest commit wheel unavailable (#14358 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-03-06 22:10:57 -08:00
Jee Jee Li	ddd1ef66ec	[Bugfix] Fix JambaForCausalLM LoRA (#14370 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-03-06 22:05:47 -08:00
Lucas Wilkinson	e5e03c2c1b	[BugFix] Illegal Memory Access in the blockwise cutlass fp8 GEMMs (#14396 )	2025-03-06 21:56:06 -08:00
Luka Govedič	e1744502c2	[FP8] Refactor apply_fp8_linear and apply_fp8_linear_generic into an object (#14390 ) Signed-off-by: luka <luka@neuralmagic.com>	2025-03-07 05:20:16 +00:00
Lucas Wilkinson	dae6896977	[Perf] Reduce MLA CPU overheads in V1 (#14384 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-03-06 19:59:14 -08:00
Brayden Zhong	c34eeec58d	[Bugfix] Correctly call `cudaProfilerStop` in benchmarks script (#14183 ) Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-03-07 00:42:49 +00:00
Daniel Li	ad60bbb2b2	[Doc] Fix a typo (#14385 )	2025-03-06 16:31:52 -08:00
Chengji Yao	0578e5a462	[Hardware][TPU]Enable ragged paged attention kernel and resolve recompilation issue (#14310 ) Signed-off-by: Chengji Yao <chengjiyao@google.com>	2025-03-06 23:31:05 +00:00
Michael Goin	04222984f8	[Docs] Add nsight guide to profiling docs (#14298 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-03-06 14:19:58 -08:00
Michael Goin	6832707e90	[V1][Bugfix] Standardize quantized kv cache rejection for attention backends (#14221 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-03-06 14:18:29 -08:00
Michael Goin	6b2ef5cd17	[Bug] Fix Attention when ignored in by quant_method (#14313 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-03-06 14:18:06 -08:00

1 2 3 4 5 ...

5033 Commits