20231088/vllm - vllm - Luminance Code Repo

20231088/vllm

Author	SHA1	Message	Date
Isotr0py	2ca830dbaa	[Doc] Reorder vision language examples in alphabet order (#11228 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2024-12-16 11:23:33 +00:00
Isotr0py	d927dbcd88	[Model] Refactor Ultravox to use merged input processor (#11198 ) Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-12-16 10:09:53 +00:00
Cyrus Leung	b10609e6a1	[Misc] Clean up multi-modal processor (#11207 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-15 06:30:28 +00:00
Cyrus Leung	93abf23a64	[VLM] Fully dynamic prompt replacement in merged input processor (#11199 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-14 17:52:18 +00:00
Cyrus Leung	0920ab9131	[Doc] Reorganize online pooling APIs (#11172 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-14 00:22:22 +08:00
Cyrus Leung	eeec9e3390	[Frontend] Separate pooling APIs in offline inference (#11129 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-13 10:40:07 +00:00
Jani Monoses	7cd7409142	PaliGemma 2 support (#11142 )	2024-12-13 07:40:07 +00:00
Roger Wang	4816d20aa4	[V1] Fix torch profiling for offline inference (#11125 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2024-12-12 15:51:53 +00:00
Alexander Matveev	4e11683368	[V1] VLM preprocessor hashing (#11020 ) Signed-off-by: Roger Wang <ywang@roblox.com> Signed-off-by: Alexander Matveev <alexm@neuralmagic.com> Co-authored-by: Michael Goin <michael@neuralmagic.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2024-12-12 00:55:30 +00:00
Cyrus Leung	8f10d5e393	[Misc] Split up pooling tasks (#10820 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-11 01:28:00 -08:00
Maxime Fournioux	fe2e10c71b	Add example of helm chart for vllm deployment on k8s (#9199 ) Signed-off-by: Maxime Fournioux <55544262+mfournioux@users.noreply.github.com>	2024-12-10 09:19:27 +00:00
Cyrus Leung	39e227c7ae	[Model] Update multi-modal processor to support Mantis(LLaVA) model (#10711 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-07 17:10:05 +00:00
Cyrus Leung	1c768fe537	[Doc] Explicitly state that InternVL 2.5 is supported (#10978 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-07 16:58:02 +00:00
Travis Johnson	39c89e71a8	[Misc] Update llama 3.2 template to support system prompt with images (#10901 ) Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>	2024-12-05 05:54:06 +00:00
Kuntai Du	0590ec3fd9	[Core] Implement disagg prefill by StatelessProcessGroup (#10502 ) This PR provides initial support for single-node disaggregated prefill in 1P1D scenario. Signed-off-by: KuntaiDu <kuntai@uchicago.edu> Co-authored-by: ApostaC <yihua98@uchicago.edu> Co-authored-by: YaoJiayi <120040070@link.cuhk.edu.cn>	2024-12-01 19:01:00 -06:00
Cyrus Leung	d2f058e76c	[Misc] Rename embedding classes to pooling (#10801 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-01 14:36:51 +08:00
Sanket Kale	a6760f6456	[Feature] vLLM ARM Enablement for AARCH64 CPUs (#9228 ) Signed-off-by: Sanket Kale <sanketk.kale@fujitsu.com> Co-authored-by: Sanket Kale <sanketk.kale@fujitsu.com> Co-authored-by: mgoin <michael@neuralmagic.com>	2024-11-25 18:32:39 -08:00
zhou fan	b1d920531f	[Model]: Add support for Aria model (#10514 ) Signed-off-by: xffxff <1247714429@qq.com> Co-authored-by: Isotr0py <2037008807@qq.com>	2024-11-25 18:10:55 +00:00
Maximilien de Bayser	214efc2c3c	Support Cross encoder models (#10400 ) Signed-off-by: Max de Bayser <maxdebayser@gmail.com> Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Signed-off-by: Flavia Beo <flavia.beo@ibm.com> Co-authored-by: Flavia Beo <flavia.beo@ibm.com>	2024-11-24 18:56:20 -08:00
Russell Bryant	ebda51968b	[Core] Fix broken log configuration (#10458 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2024-11-23 10:23:51 +08:00
Travis Johnson	9195dbdbca	[Bugfix][Frontend] Update Llama Chat Templates to also support Non-Tool use (#10164 ) Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>	2024-11-23 10:17:38 +08:00
Woosuk Kwon	46fe9b46d8	[Minor] Revert change in offline inference example (#10545 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2024-11-21 21:28:16 +00:00
Maximilien de Bayser	a324d3a1a7	Change granite chat template to keep json list formatting for tool calls (#10452 ) Signed-off-by: Max de Bayser <maxdebayser@gmail.com>	2024-11-19 18:16:54 -07:00
ismael-dm	31894a2155	[Doc] Add documentation for Structured Outputs (#9943 ) Signed-off-by: ismael-dm <ismaeldm99@gmail.com>	2024-11-18 09:52:12 -08:00
wchen61	d1557e66d3	[Misc] Enhance offline_inference to support user-configurable paramet… (#10392 ) Signed-off-by: wchen61 <wchen61@foxmail.com>	2024-11-17 11:32:40 +00:00
Mike Depinet	f67ce05d0b	[Frontend] Pythonic tool parser (#9859 ) Signed-off-by: Mike Depinet <mike@fixie.ai>	2024-11-14 04:14:34 +00:00
Austin Veselka	1b886aa104	[Model] Adding Support for Qwen2VL as an Embedding Model. Using MrLight/dse-qwen2-2b-mrl-v1 (#9944 ) Signed-off-by: FurtherAI <austin.veselka@lighton.ai> Co-authored-by: FurtherAI <austin.veselka@lighton.ai>	2024-11-13 08:28:13 +00:00
harrywu	874f551b36	[Metrics] add more metrics (#4464 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Robert Shaw <rshaw@neuralmagic.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-11-12 00:17:38 +08:00
Isotr0py	1ff4aed5bd	[Model] Expose size to Idefics3 as mm_processor_kwargs (#10146 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2024-11-08 09:56:58 +00:00
Russell Bryant	3be5b26a76	[CI/Build] Add shell script linting using shellcheck (#7925 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2024-11-07 18:17:29 +00:00
Maximilien de Bayser	ae62fd17c0	[Frontend] Tool calling parser for Granite 3.0 models (#9027 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com>	2024-11-07 07:09:02 -08:00
Flávia Béo	aa9078fa03	Adds method to read the pooling types from model's files (#9506 ) Signed-off-by: Flavia Beo <flavia.beo@ibm.com> Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Max de Bayser <mbayser@br.ibm.com>	2024-11-07 08:42:40 +00:00
Jee Jee Li	a5bba7d234	[Model] Add Idefics3 support (#9767 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: B-201 <Joy25810@foxmail.com> Co-authored-by: B-201 <Joy25810@foxmail.com>	2024-11-06 11:41:17 +00:00
shanshan wang	54597724f4	[Model] Add support for H2OVL-Mississippi models (#9747 ) Signed-off-by: Shanshan Wang <shanshan.wang@h2o.ai> Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2024-11-04 00:15:36 +00:00
Peter Salas	6c0b7f548d	[Core][VLM] Add precise multi-modal placeholder tracking (#8346 ) Signed-off-by: Peter Salas <peter@fixie.ai>	2024-11-01 16:21:10 -07:00
Cyrus Leung	ba0d892074	[Frontend] Use a proper chat template for VLM2Vec (#9912 )	2024-11-01 14:09:07 +00:00
Alex Brooks	16b8f7a86f	[CI/Build] Add Model Tests for Qwen2-VL (#9846 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-10-31 09:10:52 -07:00
Guillaume Calmettes	abbfb6134d	[Misc][OpenAI] deprecate max_tokens in favor of new max_completion_tokens field for chat completion endpoint (#9837 )	2024-10-30 18:15:56 -07:00
Will Eaton	882a1ad0de	[Model] tool calling support for ibm-granite/granite-20b-functioncalling (#8339 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com>	2024-10-29 15:07:37 -07:00
Yunfei Chu	fc6c274626	[Model] Add Qwen2-Audio model support (#9248 ) Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-10-23 17:54:22 +00:00
Alex Brooks	31a08f5bd2	[Model] Add min_pixels / max_pixels to Qwen2VL as mm_processor_kwargs (#9612 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2024-10-23 14:05:18 +00:00
Isotr0py	3ff57ebfca	[Model] Initialize Florence-2 language backbone support (#9555 )	2024-10-23 10:42:47 +00:00
Cyrus Leung	831540cf04	[Model] Support E5-V (#9576 )	2024-10-23 11:35:29 +08:00
Cody Yu	d11bf435a0	[MISC] Consolidate cleanup() and refactor offline_inference_with_prefix.py (#9510 )	2024-10-18 14:30:55 -07:00
Michael Goin	3921a2f29e	[Model] Support Pixtral models in the HF Transformers format (#9036 )	2024-10-18 13:29:56 -06:00
Cyrus Leung	051eaf6db3	[Model] Add user-configurable task for models that support both generation and embedding (#9424 )	2024-10-18 11:31:58 -07:00
Tyler Michael Smith	ae8b633ba3	[Bugfix] Fix offline_inference_with_prefix.py (#9505 )	2024-10-18 16:59:19 +00:00
Kuntai Du	81ede99ca4	[Core] Deprecating block manager v1 and make block manager v2 default (#8704 ) Removing the block manager v1. This is the initial piece of prefix-caching-centric design. In order to achieve prefix-caching-centric design, we need to simplify the code path so that we only use v2 block manager (which has much higher performance on prefix caching).	2024-10-17 11:38:15 -05:00
Lucas Wilkinson	9d30a056e7	[misc] CUDA Time Layerwise Profiler (#8337 ) Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-10-17 10:36:09 -04:00
Roger Wang	59230ef32b	[Misc] Consolidate example usage of OpenAI client for multimodal models (#9412 ) Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-10-16 11:20:51 +00:00

1 2 3 4 5

205 Commits