20231088/vllm - vllm - Luminance Code Repo

20231088/vllm

Author	SHA1	Message	Date
Tyler Michael Smith	02cc3b51a7	[misc] benchmark_serving.py -- add ITL results and tweak TPOT results (#5263 )	2024-06-05 10:17:51 -07:00
Simon Mo	d5b1eb081e	[CI] Add nightly benchmarks (#5260 )	2024-06-05 09:42:08 -07:00
tomeras91	f0a500545f	[Frontend] OpenAI API server: Add `add_special_tokens` to ChatCompletionRequest (default False) (#5278 )	2024-06-05 09:32:58 -07:00
Woosuk Kwon	c65146e75e	[Misc] Fix docstring of get_attn_backend (#5271 )	2024-06-05 09:18:59 -07:00
Woosuk Kwon	41ca62cf03	[Misc] Add CustomOp interface for device portability (#5255 )	2024-06-05 09:18:19 -07:00
zifeitong	974fc9b845	[Bugfix] Fix prompt_logprobs when SamplingParams.detokenize is set to True (#5226 )	2024-06-04 19:37:28 -07:00
youkaichao	fee4dcc33a	[Misc] update collect env (#5261 )	2024-06-04 17:29:09 -05:00
Michael Goin	650a4cc55e	[Misc] Add transformers version to collect_env.py (#5259 )	2024-06-04 12:52:28 -07:00
Simon Mo	9ca62d8668	[CI] mark AMD test as softfail to prevent blockage (#5256 )	2024-06-04 11:34:53 -07:00
Li, Jiang	45c35f0d58	[CI/Build] Reducing CPU CI execution time (#5241 )	2024-06-04 10:26:40 -07:00
Cyrus Leung	9ba093b4f4	[CI/Build] Simplify model loading for `HfRunner` (#5251 )	2024-06-04 10:09:19 -07:00
Woosuk Kwon	27208be66e	[Kernel] Add back batch size 1536 and 3072 to MoE tuning (#5242 )	2024-06-04 09:58:47 -07:00
Jie Fu (傅杰)	87d5abef75	[Bugfix] Fix a bug caused by pip install setuptools>=49.4.0 for CPU backend (#5249 )	2024-06-04 09:57:51 -07:00
Cyrus Leung	ec784b2526	[CI/Build] Add inputs tests (#5215 )	2024-06-03 21:01:46 -07:00
zifeitong	a58f24e590	[Bugfix] Fix torch.compile() error when using MultiprocessingGPUExecutor (#5229 )	2024-06-03 20:55:50 -07:00
afeldman-nm	f42a006b15	[Bugfix]: During testing, use pytest monkeypatch for safely overriding the env var that indicates the vLLM backend (#5210 )	2024-06-03 20:32:57 -07:00
Woosuk Kwon	3a434b07ed	[Kernel] Enhance MoE benchmarking & tuning script (#4921 )	2024-06-03 20:06:59 -07:00
Zhuohan Li	bd0e7802e0	[Bugfix] Add warmup for prefix caching example (#5235 )	2024-06-03 19:36:41 -07:00
Toshiki Kataoka	06b2550cbb	[Bugfix] Support `prompt_logprobs==0` (#5217 )	2024-06-03 17:59:30 -07:00
Breno Faria	f775a07e30	[FRONTEND] OpenAI `tools` support named functions (#5032 )	2024-06-03 18:25:29 -05:00
Kevin H. Luu	4f0d17c05c	New CI template on AWS stack (#5110 ) Signed-off-by: kevin <kevin@anyscale.com>	2024-06-03 16:16:43 -07:00
Kaiyang Chen	10c38e3e46	[Misc]: Implement CPU/GPU swapping in BlockManagerV2 (#3834 )	2024-06-03 13:37:11 -07:00
Yuan	cafb8e06c5	[CI/BUILD] enable intel queue for longer CPU tests (#4113 )	2024-06-03 10:39:50 -07:00
Tyler Michael Smith	cbb2f59cc8	[Kernel] Pass a device pointer into the quantize kernel for the scales (#5159 )	2024-06-03 09:52:30 -07:00
Antoni Baum	0ab278ca31	[Core] Remove unnecessary copies in flash attn backend (#5138 )	2024-06-03 09:39:31 -07:00
Cyrus Leung	7a64d24aad	[Core] Support image processor (#4197 )	2024-06-02 22:56:41 -07:00
Cyrus Leung	dfbe60dc62	[Misc] Simplify code and fix type annotations in `conftest.py` (#5118 )	2024-06-02 16:05:50 -07:00
Divakar Verma	a66cf40b20	[Kernel][ROCm][AMD] enable fused topk_softmax kernel for moe layer (#4927 ) This PR enables the fused topk_softmax kernel used in moe layer for HIP	2024-06-02 14:13:26 -07:00
Avinash Raj	f790ad3c50	[Frontend][OpenAI] Support for returning max_model_len on /v1/models response (#4643 )	2024-06-02 08:06:13 +00:00
Simon Mo	ed59a7ed23	Update test_ignore_eos (#4898 )	2024-06-02 02:21:53 +00:00
Robert Shaw	044793d8df	[BugFix] Prevent `LLM.encode` for non-generation Models (#5184 ) Co-authored-by: mgoin <michael@neuralmagic.com>	2024-06-01 23:35:41 +00:00
Daniil Arapov	c2d6d2f960	[Bugfix]: Fix issues related to prefix caching example (#5177 ) (#5180 )	2024-06-01 15:53:52 -07:00
Zhuohan Li	8279078e21	[Bugfix] Remove deprecated @abstractproperty (#5174 )	2024-06-01 22:40:25 +00:00
chenqianfzh	b9c0605a8e	[Feature][Kernel] Support bitsandbytes quantization and QLoRA (#4776 )	2024-06-01 14:51:10 -06:00
Nadav Shmayovits	37464a0f74	[Bugfix] Fix call to init_logger in openai server (#4765 )	2024-06-01 17:18:50 +00:00
Ye Cao	c354072828	[Minor] Fix the path typo in loader.py: save_sharded_states.py -> save_sharded_state.py (#5151 ) Signed-off-by: Ye Cao <caoye.cao@alibaba-inc.com>	2024-06-01 17:11:22 +00:00
Varun Sundar Rabindranath	f081c3ce4b	[Kernel] Update Cutlass fp8 configs (#5144 ) Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>	2024-06-01 08:46:07 +00:00
Tyler Michael Smith	260d119e86	[Kernel] Refactor CUTLASS kernels to always take scales that reside on the GPU (#5137 )	2024-06-01 06:45:32 +00:00
Daniele	a360ff80bb	[CI/Build] CMakeLists: build all extensions' cmake targets at the same time (#5034 )	2024-05-31 22:06:45 -06:00
Tyler Michael Smith	1197e02141	[Build] Guard against older CUDA versions when building CUTLASS 3.x kernels (#5168 )	2024-05-31 17:21:38 -07:00
Nick Hill	657579113f	[Doc] Add checkmark for GPTBigCodeForCausalLM LoRA support (#5171 )	2024-05-31 17:20:19 -07:00
Cody Yu	e9899fb7a4	[Model] Enable FP8 QKV in MoE and refine kernel tuning script (#5039 )	2024-05-31 14:29:19 -07:00
functionxu123	a377f0bd5e	[Misc]: optimize eager mode host time (#4196 ) Co-authored-by: xuhao <xuhao@cambricon.com>	2024-05-31 13:14:50 +08:00
Simon Mo	e9d3aa04f6	Revert "[Kernel] Marlin_24: Ensure the mma.sp instruction is using the ::ordered_metadata modifier (introduced with PTX 8.5)" (#5149 )	2024-05-30 22:00:26 -07:00
SnowDist	a22dea54d3	[Model] Support MAP-NEO model (#5081 ) Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>	2024-05-30 19:24:41 -07:00
simon-mo	533c217792	Fix cutlass sm_90a vesrion in CMakeList	2024-05-31 02:13:01 +00:00
Alexander Matveev	6d21fa1cad	[Kernel] Marlin_24: Ensure the mma.sp instruction is using the ::ordered_metadata modifier (introduced with PTX 8.5) (#5136 )	2024-05-30 21:02:11 -05:00
Robert Shaw	b35be5403f	[Bugfix] Avoid Warnings in SparseML Activation Quantization (#5120 )	2024-05-30 17:04:37 -07:00
Simon Mo	45a1a69b98	[Build] Disable sm_90a in cu11 (#5141 )	2024-05-30 14:37:16 -07:00
Simon Mo	87a658c812	Bump version to v0.4.3 (#5046 )	2024-05-30 11:13:46 -07:00

1 2 3 4 5 ...

1485 Commits