20231088/vllm - vllm - Luminance Code Repo

20231088/vllm

Author	SHA1	Message	Date
Guillaume Calmettes	1da6a09274	[Bugfix]: do not shutdown server if `skip_special_use=False` for MistralTokenizer (#14094 ) Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>	2025-04-09 19:43:09 -07:00
Matthias Matt	cefb9e5a28	[Frontend] Implement Tool Calling with `tool_choice='required'` (#13483 ) Signed-off-by: Liangfu Chen <liangfc@amazon.com> Signed-off-by: Matt, Matthias <matthias.matt@tuwien.ac.at> Co-authored-by: Liangfu Chen <liangfc@amazon.com> Co-authored-by: mgoin <michael@neuralmagic.com>	2025-04-02 07:45:45 -07:00
Ce Gao	32b14baf8a	[Refactor][Frontend] Keep all logic about reasoning into one class (#14428 ) Signed-off-by: Ce Gao <cegao@tensorchord.ai>	2025-03-28 00:23:30 -07:00
Jason (Siyu) Zhu	cec8c7d7f8	Refactor error handling for multiple exceptions in preprocessing (#15650 ) Signed-off-by: JasonZhu1313 <jasonchu13@outlook.com>	2025-03-28 03:27:20 +00:00
Robin	d6cd59f122	[Frontend] Support tool calling and reasoning parser (#14511 ) Signed-off-by: WangErXiao <863579016@qq.com>	2025-03-23 14:00:07 -07:00
Guillaume Calmettes	fd8e055ffb	[BugFix]: properly catch templating error when preprocess input (#13976 ) Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>	2025-03-14 05:58:34 -07:00
Harry Mellor	47512b3200	Default to `generation_config` from model (#12622 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-03-08 14:46:15 +08:00
Benjamin Chislett	32985bed7c	[Frontend] Allow return_tokens_as_token_ids to be passed as a request param (#14066 ) Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>	2025-03-05 06:30:40 +00:00
Harry Mellor	e5b2f1601a	[Frontend] Do `prompt_logprobs` clamping for chat as well as completions (#14225 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-03-04 20:13:06 +00:00
Harry Mellor	9badee53de	Fix performance when `--generation-config` is not `None` (#14223 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-03-04 20:59:22 +01:00
Harry Mellor	cf069aa8aa	Update deprecated Python 3.8 typing (#13971 )	2025-03-02 17:34:51 -08:00
Keyun Tong	0ffdf8ce0c	[HTTP Server] Make model param optional in request (#13568 )	2025-02-21 21:55:50 -08:00
Rafael Vasquez	314cfade02	[Frontend] Generate valid tool call IDs when using `tokenizer-mode=mistral` (#12332 )	2025-02-12 08:29:56 -08:00
Russell Bryant	e489ad7a21	[Misc] Add SPDX-License-Identifier headers to python source files (#12628 ) - Add SPDX license headers to python source files - Check for SPDX headers using pre-commit commit 9d7ef44c3cfb72ca4c32e1c677d99259d10d4745 Author: Russell Bryant <rbryant@redhat.com> Date: Fri Jan 31 14:18:24 2025 -0500 Add SPDX license headers to python source files This commit adds SPDX license headers to python source files as recommended to the project by the Linux Foundation. These headers provide a concise way that is both human and machine readable for communicating license information for each source file. It helps avoid any ambiguity about the license of the code and can also be easily used by tools to help manage license compliance. The Linux Foundation runs license scans against the codebase to help ensure we are in compliance with the licenses of the code we use, including dependencies. Having these headers in place helps that tool do its job. More information can be found on the SPDX site: - https://spdx.dev/learn/handling-license-info/ Signed-off-by: Russell Bryant <rbryant@redhat.com> commit 5a1cf1cb3b80759131c73f6a9dddebccac039dea Author: Russell Bryant <rbryant@redhat.com> Date: Fri Jan 31 14:36:32 2025 -0500 Check for SPDX headers using pre-commit Signed-off-by: Russell Bryant <rbryant@redhat.com> --------- Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-02-02 11:58:18 -08:00
Ce Gao	a7e3eba66f	[Frontend] Support reasoning content for deepseek r1 (#12473 ) Signed-off-by: Ce Gao <cegao@tensorchord.ai> Co-authored-by: Rafael Vasquez <rafvasq21@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Michael Goin <mgoin@redhat.com>	2025-01-29 11:38:08 +08:00
Robert Shaw	33fc1e2e86	[Frontend] Improve `StreamingResponse` Exception Handling (#11752 )	2025-01-05 16:35:01 -05:00
Joe Runde	4db72e57f6	[Bugfix][Refactor] Unify model management in frontend (#11660 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2025-01-01 02:21:51 +00:00
Yanyi Liu	5aef49806d	[Feature] Add load generation config from model (#11164 ) Signed-off-by: liuyanyi <wolfsonliu@163.com> Signed-off-by: Yanyi Liu <wolfsonliu@163.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2024-12-19 10:50:38 +00:00
Joe Runde	2d1b9baa8f	[Bugfix] Fix request cancellation without polling (#11190 )	2024-12-17 12:26:32 -08:00
Brad Hilton	9c3dadd1c9	[Frontend] Add `logits_processors` as an extra completion argument (#11150 ) Signed-off-by: Brad Hilton <brad.hilton.nw@gmail.com>	2024-12-14 16:46:42 +00:00
Jiaxin Shan	85362f028c	[Misc][LoRA] Ensure Lora Adapter requests return adapter name (#11094 ) Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com> Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2024-12-12 09:25:16 +00:00
Clayton	7439a8b5fc	[Bugfix] Multiple fixes to tool streaming with hermes and mistral (#10979 ) Signed-off-by: cedonley <clayton@donley.io>	2024-12-12 01:10:12 +00:00
Joe Runde	980ad394a8	[Frontend] Use request id from header (#10968 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2024-12-10 13:46:29 +08:00
Chauncey	da7e702c6f	[Bug]: When apply continue_final_message for OpenAI server, the "echo":false is ignored (#10180 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2024-11-21 16:24:32 +00:00
Cyrus Leung	32e46e000f	[Frontend] Automatic detection of chat content format from AST (#9919 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-11-16 13:35:40 +08:00
Patrick von Platen	11cd1ae6ad	[Tool parsing] Improve / correct mistral tool parsing (#10333 )	2024-11-15 00:42:49 +00:00
Guillaume Calmettes	52b48c1ead	[BugFix]: properly deserialize `tool_calls` iterator before processing by mistral-common when MistralTokenizer is used (#9951 ) Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>	2024-11-14 04:48:16 +00:00
Mike Depinet	f67ce05d0b	[Frontend] Pythonic tool parser (#9859 ) Signed-off-by: Mike Depinet <mike@fixie.ai>	2024-11-14 04:14:34 +00:00
Cyrus Leung	0b8bb86bf1	[1/N] Initial prototype for multi-modal processor (#10044 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-11-13 12:39:03 +00:00
zifeitong	47db6ec831	[Frontend] Add per-request number of cached token stats (#10174 )	2024-11-12 16:42:28 +00:00
Cyrus Leung	06386a64dd	[Frontend] Chat-based Embeddings API (#9759 )	2024-11-01 08:13:35 +00:00
Zhong Qishuai	ef7865b4f9	[Frontend] re-enable multi-modality input in the new beam search implementation (#9427 ) Signed-off-by: Qishuai Ferdinandzhong@gmail.com	2024-10-29 11:49:47 +00:00
Vinay R Damodaran	33bab41060	[Bugfix]: Make chat content text allow type content (#9358 ) Signed-off-by: Vinay Damodaran <vrdn@hey.com>	2024-10-24 05:05:49 +00:00
Yuhong Guo	434984e665	[Frontend] Support custom request_id from request (#9550 ) Co-authored-by: Yuhong Guo <yuhong.gyh@antgroup.com>	2024-10-22 18:07:30 +00:00
Cyrus Leung	390be74649	[Misc] Print stack trace using `logger.exception` (#9461 )	2024-10-17 13:55:48 +00:00
Chang Su	ba30942240	[Bugfix] Fix vLLM UsageInfo and logprobs None AssertionError with empty token_ids (#9034 ) Co-authored-by: Nick Hill <nickhill@us.ibm.com>	2024-10-15 15:40:43 -07:00
Nick Hill	e9d517f276	[BugFix] Fix chat API continuous usage stats (#9357 )	2024-10-14 23:19:48 -07:00
Brendan Wong	4d31cd424b	[Frontend] merge beam search implementations (#9296 )	2024-10-14 15:05:52 -07:00
Maximilien de Bayser	ec10cb8511	[BugFix] Fix tool call finish reason in streaming case (#9209 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com>	2024-10-11 18:24:26 -07:00
Brendan Wong	8c746226c9	[Frontend] API support for beam search for MQLLMEngine (#9117 )	2024-10-08 05:51:43 +00:00
Yanyi Liu	fdf59d30ea	[Bugfix] fix tool_parser error handling when serve a model not support it (#8709 )	2024-10-06 12:51:08 +00:00
Brendan Wong	168cab6bbf	[Frontend] API support for beam search (#9087 ) Co-authored-by: youkaichao <youkaichao@126.com>	2024-10-05 23:39:03 -07:00
代君	3dbb215b38	[Frontend][Feature] support tool calling for internlm/internlm2_5-7b-chat model (#8405 )	2024-10-04 10:36:39 +08:00
Sebastian Schoennenbeck	35bd215168	[Core] [Frontend] Priority scheduling for embeddings and in the OpenAI-API (#8965 )	2024-10-01 09:58:06 +00:00
Joe Runde	062c89e7c9	[Frontend][Core] Move guided decoding params into sampling params (#8252 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com> Co-authored-by: Nick Hill <nickhill@us.ibm.com>	2024-10-01 09:34:25 +08:00
danieljannai21	6c9ba48fde	[Frontend] Added support for HF's new `continue_final_message` parameter (#8942 )	2024-09-29 17:59:47 +00:00
Maximilien de Bayser	344cd2b6f4	[Feature] Add support for Llama 3.1 and 3.2 tool use (#8343 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com>	2024-09-26 17:01:42 -07:00
Nick Hill	4b377d6feb	[BugFix] Fix test breakages from transformers 4.45 upgrade (#8829 )	2024-09-26 16:46:43 -07:00
Pernekhan Utemuratov	93d364da34	[Bugfix] Include encoder prompts len to non-stream api usage response (#8861 )	2024-09-26 15:47:00 -07:00
Chen Zhang	770ec6024f	[Model] Add support for the multi-modal Llama 3.2 model (#8811 ) Co-authored-by: simon-mo <xmo@berkeley.edu> Co-authored-by: Chang Su <chang.s.su@oracle.com> Co-authored-by: Simon Mo <simon.mo@hey.com> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2024-09-25 13:29:32 -07:00

1 2 3

121 Commits