20231088/vllm - vllm - Luminance Code Repo

20231088/vllm

Author	SHA1	Message	Date
Daniele	a2c71c5405	[CI/Build] remove .github from .dockerignore, add dirty repo check (#9375 )	2024-10-17 10:25:06 -07:00
Kuntai Du	81ede99ca4	[Core] Deprecating block manager v1 and make block manager v2 default (#8704 ) Removing the block manager v1. This is the initial piece of prefix-caching-centric design. In order to achieve prefix-caching-centric design, we need to simplify the code path so that we only use v2 block manager (which has much higher performance on prefix caching).	2024-10-17 11:38:15 -05:00
Li, Jiang	5eda21e773	[Hardware][CPU] compressed-tensor INT8 W8A8 AZP support (#9344 )	2024-10-17 12:21:04 -04:00
Woosuk Kwon	8e1cddcd44	[TPU] Call torch._sync(param) during weight loading (#9437 )	2024-10-17 09:00:11 -07:00
sasha0552	5e443b594f	[Bugfix] Allow prefill of assistant response when using `mistral_common` (#9446 )	2024-10-17 15:06:37 +00:00
Lucas Wilkinson	9d30a056e7	[misc] CUDA Time Layerwise Profiler (#8337 ) Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-10-17 10:36:09 -04:00
Cyrus Leung	390be74649	[Misc] Print stack trace using `logger.exception` (#9461 )	2024-10-17 13:55:48 +00:00
Lucas Wilkinson	e312e52b44	[Kernel] Add Exllama as a backend for compressed-tensors (#9395 )	2024-10-17 09:48:26 -04:00
Yuan Tang	dbfa8d31d5	Add notes on the use of Slack (#9442 )	2024-10-17 04:46:46 +00:00
rasmith	92d86da217	[BugFix] [Kernel] Fix GPU SEGV occurring in int8 kernels (#9391 )	2024-10-17 01:34:06 +00:00
Tyler Michael Smith	c3fab5f769	[Bugfix][Kernel] Prevent integer overflow in fp8 dynamic per-token quantize kernel (#9425 )	2024-10-16 23:46:06 +00:00
Russell Bryant	776dbd74f1	[CI/Build] mypy: Resolve some errors from checking vllm/engine (#9267 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2024-10-16 22:55:59 +00:00
Lily Liu	8345045833	[Performance][Spec Decode] Optimize ngram lookup performance (#9333 )	2024-10-16 13:37:45 -06:00
Junhao Li	5b8a1fde84	[Model][Bugfix] Add FATReLU activation and support for openbmb/MiniCPM-S-1B-sft (#9396 )	2024-10-16 16:40:24 +00:00
Mor Zusman	fb60ae9b91	[Kernel][Model] Improve continuous batching for Jamba and Mamba (#9189 )	2024-10-16 12:12:43 -04:00
Patrick von Platen	415f76a9cb	Support mistral interleaved attn (#9414 )	2024-10-16 13:28:30 +00:00
Isotr0py	cf1d62a644	[Model] Support SDPA attention for Molmo vision backbone (#9410 )	2024-10-16 11:52:01 +00:00
Roger Wang	59230ef32b	[Misc] Consolidate example usage of OpenAI client for multimodal models (#9412 ) Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-10-16 11:20:51 +00:00
Cyrus Leung	cee711fdbb	[Core] Rename input data types (#8688 )	2024-10-16 10:49:37 +00:00
Cyrus Leung	1de76a0e55	[CI/Build] Test VLM embeddings (#9406 )	2024-10-16 09:44:30 +00:00
Cyrus Leung	7abba39ee6	[Model] VLM2Vec, the first multimodal embedding model in vLLM (#9303 )	2024-10-16 14:31:00 +08:00
Cyrus Leung	7e7eae338d	[Misc] Standardize RoPE handling for Qwen2-VL (#9250 )	2024-10-16 13:56:17 +08:00
Reza Salehi	ed920135c8	[Bugfix] Molmo text-only input bug fix (#9397 ) Co-authored-by: sanghol <sanghol@allenai.org> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2024-10-16 04:56:09 +00:00
Lucas Wilkinson	717a5f82cd	[Bugfix][CI/Build] Fix CUDA 11.8 Build (#9386 )	2024-10-16 00:15:21 +00:00
Chang Su	ba30942240	[Bugfix] Fix vLLM UsageInfo and logprobs None AssertionError with empty token_ids (#9034 ) Co-authored-by: Nick Hill <nickhill@us.ibm.com>	2024-10-15 15:40:43 -07:00
Michael Goin	22f8a69549	[Misc] Directly use compressed-tensors for checkpoint definitions (#8909 ) Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-10-15 15:40:25 -07:00
Grace Ho	5d264f4ab8	pass ignore_eos parameter to all benchmark_serving calls (#9349 )	2024-10-15 13:30:44 -07:00
Nick Hill	e9d517f276	[BugFix] Fix chat API continuous usage stats (#9357 )	2024-10-14 23:19:48 -07:00
hhzhang16	55e081fbad	[Bugfix] Update InternVL input mapper to support image embeds (#9351 )	2024-10-14 21:29:19 -07:00
Michael Goin	8e836d982a	[Doc] Fix code formatting in spec_decode.rst (#9348 )	2024-10-14 21:29:11 -07:00
Steve Grubb	44eaa5a5d9	[Frontend] Clarify model_type error messages (#9345 )	2024-10-14 21:29:01 -07:00
Tyler Michael Smith	169b530607	[Bugfix] Clean up some cruft in mamba.py (#9343 )	2024-10-15 00:24:25 +00:00
Xiang Xu	f0fe4fe86d	[Model] Make llama3.2 support multiple and interleaved images (#9095 )	2024-10-14 15:24:26 -07:00
Brendan Wong	4d31cd424b	[Frontend] merge beam search implementations (#9296 )	2024-10-14 15:05:52 -07:00
Woosuk Kwon	473e7b3606	[TPU] Fix TPU SMEM OOM by Pallas paged attention kernel (#9350 )	2024-10-14 15:02:06 -07:00
Simon Mo	fd47e57f4b	[Docs] Remove PDF build from Readtehdocs (#9347 )	2024-10-14 11:57:47 -07:00
Daniele	203ab8f80f	[CI/Build] setuptools-scm fixes (#8900 )	2024-10-14 11:34:47 -07:00
Kunshang Ji	4141608c6a	[Hardware][intel GPU] add async output process for xpu (#8897 )	2024-10-14 12:23:33 -06:00
Reza Salehi	dfe43a2071	[Model] Molmo vLLM Integration (#9016 ) Co-authored-by: sanghol <sanghol@allenai.org> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2024-10-14 07:56:24 -07:00
Tyler Michael Smith	16b24e7dcd	[Bugfix] Bandaid fix for speculative decoding tests (#9327 )	2024-10-13 23:02:11 +00:00
Lily Liu	f519902c52	[CI] Fix merge conflict (#9317 )	2024-10-13 06:41:23 +00:00
Jee Jee Li	250e26a63e	[Bugfix]Fix MiniCPM's LoRA bug (#9286 )	2024-10-12 09:36:47 -07:00
Yunmeng	2b184ddd4f	[Misc][Installation] Improve source installation script and doc (#9309 ) Co-authored-by: youkaichao <youkaichao@126.com>	2024-10-12 09:36:40 -07:00
Xiang Xu	00298e092c	[Bugfix] Fix bug of xformer prefill for encoder-decoder (#9026 )	2024-10-12 15:00:43 +08:00
Lily Liu	89feb4c84d	[SpecDec] Remove Batch Expansion (2/3) (#9298 )	2024-10-12 05:13:37 +00:00
Maximilien de Bayser	ec10cb8511	[BugFix] Fix tool call finish reason in streaming case (#9209 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com>	2024-10-11 18:24:26 -07:00
Prashant Gupta	d11b46f3a5	[bugfix] fix f-string for error (#9295 ) Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>	2024-10-11 17:03:48 -07:00
Allen Wang	c6cf9295e1	[Bugfix] Sets `is_first_step_output` for TPUModelRunner (#9202 )	2024-10-11 13:28:10 -07:00
Lucas Wilkinson	de9fb4bef8	[Bugfix][CI/Build] Fix docker build where CUDA archs < 7.0 are being detected (#9254 )	2024-10-11 15:57:39 -04:00
Wallas Henrique	8baf85e4e9	[Doc] Compatibility matrix for mutual exclusive features (#8512 ) Signed-off-by: Wallas Santos <wallashss@ibm.com>	2024-10-11 11:18:50 -07:00

1 2 3 4 5 ...

3022 Commits