20231088/vllm - vllm - Luminance Code Repo

20231088/vllm

Author	SHA1	Message	Date
Alexander Matveev	fdea8ec167	[V1] VLM - enable processor cache by default (#11305 ) Signed-off-by: Alexander Matveev <alexm@neuralmagic.com>	2024-12-18 18:54:46 -05:00
Joe Runde	ca5f54a9b9	[Bugfix] fix minicpmv test (#11304 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2024-12-18 10:34:26 -08:00
Kunshang Ji	f954fe0e65	[FIX] update openai version (#11287 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2024-12-18 10:17:05 -08:00
Simon Mo	362cff1eb3	[CI][Misc] Remove Github Action Release Workflow (#11274 )	2024-12-18 10:16:53 -08:00
Isotr0py	996aa70f00	[Bugfix] Fix broken phi3-v mm_processor_kwargs tests (#11263 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2024-12-18 10:16:40 -08:00
Dipika Sikka	60508ffda9	[Kernel]: Cutlass 2:4 Sparsity + FP8/Int8 Quant Support (#10995 ) Co-authored-by: Faraz Shahsavan <faraz.shahsavan@gmail.com> Co-authored-by: ilmarkov <markovilya197@gmail.com> Co-authored-by: Rahul Tuli <rahul@neuralmagic.com> Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>	2024-12-18 09:57:16 -05:00
Yan Ma	f04e407e6b	[MISC][XPU]update ipex link for CI fix (#11278 )	2024-12-17 22:34:23 -08:00
Wallas Henrique	8b79f9e107	[Bugfix] Fix guided decoding with tokenizer mode mistral (#11046 )	2024-12-17 22:34:08 -08:00
Konrad Zawora	866fa4550d	[Bugfix] Restore support for larger block sizes (#11259 ) Signed-off-by: Konrad Zawora <kzawora@habana.ai>	2024-12-17 16:39:07 -08:00
Cody Yu	bf8717ebae	[V1] Prefix caching for vision language models (#11187 ) Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>	2024-12-17 16:37:59 -08:00
Michael Goin	c77eb8a33c	[Bugfix] Set temperature=0.7 in test_guided_choice_chat (#11264 )	2024-12-17 16:34:06 -08:00
Joe Runde	2d1b9baa8f	[Bugfix] Fix request cancellation without polling (#11190 )	2024-12-17 12:26:32 -08:00
Isotr0py	f9ecbb18bf	[Misc] Allow passing logits_soft_cap for xformers backend (#11252 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2024-12-17 00:37:04 -08:00
Roger Wang	02222a0256	[Misc] Kernel Benchmark for `RMSNorm` (#11241 ) Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: Xiaoyu Zhang <BBuf@users.noreply.github.com>	2024-12-17 06:57:02 +00:00
Tyler Michael Smith	2bfdbf2a36	[V1][Core] Use weakref.finalize instead of atexit (#11242 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2024-12-16 22:11:33 -08:00
wangxiyuan	e88db68cf5	[Platform] platform agnostic for EngineArgs initialization (#11225 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2024-12-16 22:11:06 -08:00
Roger Wang	59c9b6ebeb	[V1][VLM] Proper memory profiling for image language models (#11210 ) Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: ywang96 <ywang@example.com>	2024-12-16 22:10:57 -08:00
kYLe	66d4b16724	[Frontend] Add OpenAI API support for input_audio (#11027 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-16 22:09:58 -08:00
Michael Goin	0064f697d3	[CI] Add test case with JSON schema using references + use xgrammar by default with OpenAI parse (#10935 ) Signed-off-by: mgoin <michael@neuralmagic.com>	2024-12-17 11:39:58 +08:00
youkaichao	35bae114a8	fix gh200 tests on main (#11246 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-12-16 17:22:38 -08:00
youkaichao	88a412ed3d	[torch.compile] fast inductor (#11108 ) Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2024-12-16 16:15:22 -08:00
youkaichao	c301616ed2	[ci][tests] add gh200 tests (#11244 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-12-16 15:53:18 -08:00
bk-TurbaAI	35ffa682b1	[Docs] hint to enable use of GPU performance counters in profiling tools for multi-node distributed serving (#11235 ) Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-12-16 22:20:39 +00:00
youkaichao	551603feff	[core] overhaul memory profiling and fix backward compatibility (#10511 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-12-16 13:32:25 -08:00
Varun Sundar Rabindranath	efbce85f4d	[misc] Layerwise profile updates (#10242 ) Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2024-12-16 18:14:57 +00:00
Isotr0py	2ca830dbaa	[Doc] Reorder vision language examples in alphabet order (#11228 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2024-12-16 11:23:33 +00:00
Isotr0py	d927dbcd88	[Model] Refactor Ultravox to use merged input processor (#11198 ) Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-12-16 10:09:53 +00:00
Jani Monoses	bddbbcb132	[Model] Support Cohere2ForCausalLM (Cohere R7B) (#11203 )	2024-12-16 09:56:19 +00:00
cennn	b3b1526f03	WIP: [CI/Build] simplify Dockerfile build for ARM64 / GH200 (#11212 ) Signed-off-by: drikster80 <ed.sealing@gmail.com> Co-authored-by: drikster80 <ed.sealing@gmail.com>	2024-12-16 09:20:49 +00:00
yansh97	17138af7c4	[Bugfix] Fix the default value for temperature in ChatCompletionRequest (#11219 )	2024-12-16 00:15:40 -08:00
chenqianfzh	69ba344de8	[Bugfix] Fix block size validation (#10938 )	2024-12-15 16:38:40 -08:00
AlexHe99	da6f409246	Update deploying_with_k8s.rst (#10922 )	2024-12-15 16:33:58 -08:00
Woosuk Kwon	25ebed2f8c	[V1][Minor] Cache np arange to reduce input preparation overhead (#11214 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2024-12-15 13:33:00 -08:00
shangmingc	d263bd9df7	[Core] Support disaggregated prefill with Mooncake Transfer Engine (#10884 ) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>	2024-12-15 21:28:18 +00:00
Kuntai Du	38e599d6a8	[Doc] add documentation for disaggregated prefilling (#11197 ) Signed-off-by: Kuntai Du <kuntai@uchicago.edu>	2024-12-15 13:31:16 -06:00
Cyrus Leung	96d673e0f8	[Bugfix] Fix error handling of unsupported sliding window (#11213 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-15 10:59:42 -07:00
Cyrus Leung	b10609e6a1	[Misc] Clean up multi-modal processor (#11207 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-15 06:30:28 +00:00
youkaichao	a1c02058ba	[torch.compile] allow tracking forward time (#11081 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-12-14 19:45:00 -08:00
Jee Jee Li	15859f2357	[[Misc]Upgrade bitsandbytes to the latest version 0.45.0 (#11201 )	2024-12-15 03:03:06 +00:00
Sungjae Lee	886936837c	[Performance][Core] Optimize the performance of evictor v1 and v2 by applying a priority queue and lazy deletion (#7209 )	2024-12-14 11:38:10 -08:00
Mark McLoughlin	6d917d0eeb	Enable mypy checking on V1 code (#11105 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2024-12-14 09:54:04 -08:00
Cyrus Leung	93abf23a64	[VLM] Fully dynamic prompt replacement in merged input processor (#11199 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-14 17:52:18 +00:00
Brad Hilton	9c3dadd1c9	[Frontend] Add `logits_processors` as an extra completion argument (#11150 ) Signed-off-by: Brad Hilton <brad.hilton.nw@gmail.com>	2024-12-14 16:46:42 +00:00
Jee Jee Li	3cb5769883	[Misc] Minor improvements to the readability of PunicaWrapperBase (#11200 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2024-12-14 16:38:27 +00:00
Tyler Michael Smith	ea7bd68d10	[V1][Bugfix] Fix V1 TP trust-remote-code (#11182 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2024-12-14 08:21:23 +00:00
Russell Bryant	48259264a4	[Core] Update outlines and increase its threadpool size (#11140 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2024-12-14 07:46:18 +00:00
dhuangnm	24a3d12b82	update compressed-tensors to latest version (#11183 ) Co-authored-by: dhuangnm <dhuang@MacBook-Pro-2.local>	2024-12-14 03:22:44 +00:00
Cody Yu	9855aea21b	[Bugfix][V1] Re-compute an entire block when fully cache hit (#11186 ) Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>	2024-12-13 17:08:23 -08:00
Tyler Michael Smith	4b5b8a6a3b	[V1][Bugfix] Fix EngineCoreProc profile (#11185 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2024-12-13 17:02:35 -08:00
Russell Bryant	4863e5fba5	[Core] V1: Use multiprocessing by default (#11074 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2024-12-13 16:27:32 -08:00

1 2 3 4 5 ...

3847 Commits