20231088/vllm - vllm - Luminance Code Repo

20231088/vllm

Author	SHA1	Message	Date
Angky William	fdcb850f14	[Misc] Enable vLLM to Dynamically Load LoRA from a Remote Server (#10546 ) Signed-off-by: Angky William <angkywilliam@Angkys-MacBook-Pro.local> Co-authored-by: Angky William <angkywilliam@Angkys-MacBook-Pro.local>	2025-04-15 22:31:38 +00:00
Ye (Charlotte) Qi	16eda8c43a	[Frontend] Added chat templates for LLaMa4 pythonic tool calling (#16463 ) Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com> Co-authored-by: Kai Wu <kaiwu@meta.com>	2025-04-12 06:26:17 +08:00
Michael Goin	ed37599544	Update supported_hardware.md for TPU INT8 (#16437 )	2025-04-11 12:28:07 +08:00
Driss Guessous	652907b354	Torchao (#14231 ) Signed-off-by: drisspg <drisspguessous@gmail.com>	2025-04-07 19:39:28 -04:00
yihong	95d63f38c0	doc: fix some typos in doc (#16154 ) Signed-off-by: yihong0618 <zouzou0208@gmail.com>	2025-04-07 05:32:06 +00:00
Tristan Leclercq	4285e423a6	[Misc] Auto detect bitsandbytes pre-quantized models (#16027 ) Signed-off-by: Tristan Leclercq <tristanleclercq@gmail.com>	2025-04-04 23:30:45 -07:00
Matthias Matt	cefb9e5a28	[Frontend] Implement Tool Calling with `tool_choice='required'` (#13483 ) Signed-off-by: Liangfu Chen <liangfc@amazon.com> Signed-off-by: Matt, Matthias <matthias.matt@tuwien.ac.at> Co-authored-by: Liangfu Chen <liangfc@amazon.com> Co-authored-by: mgoin <michael@neuralmagic.com>	2025-04-02 07:45:45 -07:00
chaow-amd	2041c0e360	[Doc] Quark quantization documentation (#15861 ) Signed-off-by: chaow <chaow@amd.com>	2025-04-01 08:32:45 -07:00
shangmingc	239b7befdd	[V1][Spec Decode] Remove deprecated spec decode config params (#15466 ) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>	2025-03-31 09:19:35 -07:00
Ce Gao	762b424a52	[Docs] Document v0 engine support in reasoning outputs (#15739 ) Signed-off-by: Ce Gao <cegao@tensorchord.ai>	2025-03-29 03:46:57 +00:00
Alex Brooks	1711b929b6	[Model] Add Reasoning Parser for Granite Models (#14202 ) Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com> Co-authored-by: Joe Runde <joe@joerun.de>	2025-03-26 14:28:07 +00:00
Jee Jee Li	3892e58ad7	[Misc] Upgrade BNB version (#15183 )	2025-03-24 05:51:42 +00:00
Robin	d6cd59f122	[Frontend] Support tool calling and reasoning parser (#14511 ) Signed-off-by: WangErXiao <863579016@qq.com>	2025-03-23 14:00:07 -07:00
shangmingc	50c9636d87	[V1][Usage] Refactor speculative decoding configuration and tests (#14434 ) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>	2025-03-22 19:28:10 -10:00
Jee Jee Li	10f55fe6c5	[Misc] Clean up the BitsAndBytes arguments (#15140 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-03-20 19:17:12 -07:00
Bryan Lu	9ed6ee92d6	[Bugfix] EAGLE output norm bug (#14464 ) Signed-off-by: Bryan Lu <yuzhelu@amazon.com>	2025-03-15 06:50:33 +00:00
yasu52	3fb17d26c8	[Doc] Fix typo in documentation (#14783 ) Signed-off-by: yasu52 <tsuguro4649@gmail.com>	2025-03-13 20:33:09 -07:00
Robin	c908a07f57	[Doc] Added QwQ-32B to the supported models list in the reasoning out… (#14479 ) Signed-off-by: WangErXiao <863579016@qq.com>	2025-03-08 07:07:32 +00:00
Robin	7b6fd6e486	[Doc]add doc for Qwen models tool calling (#14478 ) Signed-off-by: WangErXiao <863579016@qq.com>	2025-03-08 06:58:46 +00:00
Yanyi Liu	0ddc991f5c	[Doc] Update reasoning with stream example to use OpenAI library (#14077 ) Signed-off-by: liuyanyi <wolfsonliu@163.com>	2025-03-06 13:20:37 +00:00
Ce Gao	f5f7f00cd9	[Bugfix][Structured Output] Support outlines engine with reasoning outputs for DeepSeek R1 (#14114 )	2025-03-06 03:49:20 +00:00
Qubitium-ModelCloud	cd1d3c3df8	[Docs] Add GPTQModel (#14056 ) Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-03-03 21:59:09 +00:00
Harry Mellor	cf069aa8aa	Update deprecated Python 3.8 typing (#13971 )	2025-03-02 17:34:51 -08:00
Ce Gao	bf33700ecd	[v0][structured output] Support reasoning output (#12955 ) Signed-off-by: Ce Gao <cegao@tensorchord.ai>	2025-03-02 14:49:42 -05:00
Harry Mellor	f58f8b5c96	Update AutoAWQ docs (#14042 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-02-28 15:20:29 +00:00
Szymon Ożóg	7f0be2aa24	[Model] Deepseek GGUF support (#13167 )	2025-02-27 02:08:35 -08:00
Jee Jee Li	5157338ed9	[Misc] Improve LoRA spelling (#13831 )	2025-02-25 23:43:01 -08:00
Joe Runde	bfbc0b32c6	[Frontend] Add backend-specific options for guided decoding (#13505 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2025-02-20 15:07:58 -05:00
Harry Mellor	00b69c2d27	[Misc] Remove dangling references to `--use-v2-block-manager` (#13492 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-02-19 03:37:26 +00:00
Harry Mellor	2358ca527b	[Doc]: Improve feature tables (#13224 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-02-18 18:52:39 +08:00
shangmingc	46cdd59577	[Feature][Spec Decode] Simplify the use of Eagle Spec Decode (#12304 ) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>	2025-02-16 19:32:26 -08:00
Cyrus Leung	51f0b5f7f6	[Bugfix] Clean up and fix multi-modal processors (#13012 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-02-10 10:45:21 +00:00
Yuan Tang	243137143c	[Doc] Add link to tool_choice tracking issue in tool_calling.md (#13003 ) Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-02-10 06:09:33 +00:00
Michael Goin	c53dc466b1	[Doc] Remove performance warning for auto_awq.md (#12743 )	2025-02-04 22:43:11 -08:00
Thomas Parnell	bb392af434	[Doc] Replace ibm-fms with ibm-ai-platform (#12709 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-02-04 07:05:04 +00:00
Brian Dellabetta	44bbca78d7	[Doc] int4 w4a16 example (#12585 ) Based on a request by @mgoin , with @kylesayrs we have added an example doc for int4 w4a16 quantization, following the pre-existing int8 w8a8 quantization example and the example available in [`llm-compressor`](https://github.com/vllm-project/llm-compressor/blob/main/examples/quantization_w4a16/llama3_example.py) FIX #n/a (no issue created) @kylesayrs and I have discussed a couple additional improvements for the quantization docs. We will revisit at a later date, possibly including: - A section for "choosing the correct quantization scheme/ compression technique" - Additional vision or audio calibration datasets --------- Signed-off-by: Brian Dellabetta <bdellabe@redhat.com> Co-authored-by: Michael Goin <michael@neuralmagic.com>	2025-01-31 15:38:48 -08:00
Harry Mellor	dd6a3a02cb	[Doc] Convert docs to use colon fences (#12471 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-01-29 11:38:29 +08:00
Ce Gao	a7e3eba66f	[Frontend] Support reasoning content for deepseek r1 (#12473 ) Signed-off-by: Ce Gao <cegao@tensorchord.ai> Co-authored-by: Rafael Vasquez <rafvasq21@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Michael Goin <mgoin@redhat.com>	2025-01-29 11:38:08 +08:00
Russell Bryant	c5cffcd0cd	[Docs] Update spec decode + structured output in compat matrix (#12373 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-01-24 01:15:52 +00:00
Gregory Shtrasberg	e97f802b2d	[FP8][Kernel] Dynamic kv cache scaling factors computation (#11906 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Co-authored-by: Micah Williamson <micah.williamson@amd.com>	2025-01-23 18:04:03 +00:00
Michael Goin	01a55941f5	[Docs] Update FP8 KV Cache documentation (#12238 ) Signed-off-by: mgoin <michael@neuralmagic.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-01-23 11:18:09 +08:00
Kyle Sayers	3f9b7ab9f5	[Doc] Update examples to remove SparseAutoModelForCausalLM (#12062 ) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>	2025-01-15 06:36:01 +00:00
TJian	8a1f938e6f	[Doc] Update Quantization Hardware Support Documentation (#12025 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-01-14 04:37:52 +00:00
Harry Mellor	e8c23ff989	[Doc] Organise installation documentation into categories and tabs (#11935 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-01-13 12:27:36 +00:00
Akshat Tripathi	8bddb73512	[Hardware][CPU] Multi-LoRA implementation for the CPU backend (#11100 ) Signed-off-by: Akshat Tripathi <akshat@krai.ai> Signed-off-by: Oleg Mosalov <oleg@krai.ai> Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Oleg Mosalov <oleg@krai.ai> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Isotr0py <2037008807@qq.com>	2025-01-12 13:01:52 +00:00
Rafael Vasquez	43f3d9e699	[CI/Build] Add markdown linter (#11857 ) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>	2025-01-12 00:17:13 -08:00
Harry Mellor	482cdc494e	[Doc] Rename offline inference examples (#11927 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-01-10 23:50:29 +08:00
Harry Mellor	d85c47d6ad	Replace "online inference" with "online serving" (#11923 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-01-10 12:05:56 +00:00
Harry Mellor	aba8d6ee00	[Doc] Move examples into categories (#11840 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-01-08 13:09:53 +00:00
sroy745	973f5dc581	[Doc]Add documentation for using EAGLE in vLLM (#11417 ) Signed-off-by: Sourashis Roy <sroy@roblox.com>	2025-01-07 19:19:12 +00:00

1 2

52 Commits