20231088/vllm - vllm - Luminance Code Repo

20231088/vllm

Author	SHA1	Message	Date
TianYu GUO	e67c295b0c	[Bugfix] fix automatic prefix args and add log info (#3608 )	2024-03-25 05:35:22 -07:00
Woosuk Kwon	925f3332ca	[Core] Refactor Attention Take 2 (#3462 )	2024-03-25 04:39:33 +00:00
少年	b0dfa91dd7	[Model] Add starcoder2 awq support (#3569 )	2024-03-24 21:07:36 -07:00
Woosuk Kwon	56a8652f33	[Bugfix] store lock file in tmp directory (#3578 )" (#3599 ) Co-authored-by: youkaichao <youkaichao@126.com>	2024-03-24 20:06:50 -07:00
Kunshang Ji	6d93d35308	[BugFix] tensor.get_device() -> tensor.device (#3604 )	2024-03-24 19:01:13 -07:00
youkaichao	837e185142	[CI/Build] fix flaky test (#3602 )	2024-03-24 17:43:05 -07:00
youkaichao	42bc386129	[CI/Build] respect the common environment variable MAX_JOBS (#3600 )	2024-03-24 17:04:00 -07:00
youkaichao	8b268a46a7	[CI] typo fix: is_hip --> is_hip() (#3595 )	2024-03-24 16:03:06 -07:00
Nick Hill	41deac4a3d	[BugFix] 1D query fix for MoE models (#3597 )	2024-03-24 16:00:16 -07:00
Woosuk Kwon	af9e53496f	[BugFix] Fix Falcon tied embeddings (#3590 ) Co-authored-by: 44670 <44670@users.noreply.github.com>	2024-03-24 06:34:01 -07:00
Roger Wang	f8a12ecc7f	[Misc] Bump transformers version (#3592 )	2024-03-24 06:32:45 -07:00
Woosuk Kwon	3c5ab9b811	[Misc] Fix BLOOM copyright notice (#3591 )	2024-03-23 23:30:56 -07:00
kota-iizuka	743a0b7402	[Bugfix] use SoftLockFile instead of LockFile (#3578 )	2024-03-23 11:43:11 -07:00
Antoni Baum	bfdb1ba5c3	[Core] Improve detokenization performance for prefill (#3469 ) Co-authored-by: MeloYang <meloyang05@gmail.com>	2024-03-22 13:44:12 -07:00
Thomas Parnell	cf2f084d56	Dynamic scheduler delay to improve ITL performance (#3279 ) Co-authored-by: Jan van Lunteren <jvl@zurich.ibm.com>	2024-03-22 12:28:14 -07:00
Hanzhi Zhou	f721096d48	[BugFix] Some fixes for custom allreduce kernels (#2760 )	2024-03-21 23:02:58 -07:00
Zhuohan Li	e90fc21f2e	[Hardware][Neuron] Refactor neuron support (#3471 )	2024-03-22 01:22:17 +00:00
Roy	ea5f14e6ff	[Bugfix][Model] Fix Qwen2 (#3554 )	2024-03-22 00:18:58 +00:00
Taemin Lee	b7050ca7df	[BugFix] gemma loading after quantization or LoRA. (#3553 )	2024-03-21 13:16:57 -07:00
Woosuk Kwon	c188ecb080	[Misc] Bump up transformers to v4.39.0 & Remove StarCoder2Config (#3551 ) Co-authored-by: Roy <jasonailu87@gmail.com> Co-authored-by: Roger Meier <r.meier@siemens.com>	2024-03-21 07:58:12 -07:00
Roy	865732342b	[Misc][Log] Add log for tokenizer length not equal to vocabulary size (#3500 )	2024-03-21 18:07:48 +08:00
Lalit Pradhan	4c07dd28c0	[🚀 Ready to be merged] Added support for Jais models (#3183 )	2024-03-21 09:45:24 +00:00
SangBin Cho	3bbff9e5ab	Fix 1D query issue from `_prune_hidden_states` (#3539 )	2024-03-21 08:49:06 +00:00
ElizaWszola	6ebd02bdef	[PREFIX CACHING FOLLOW UP] OrderedDict-based evictor (#3431 ) Co-authored-by: rsnm2 <rshaw@neuralmagic.com> Co-authored-by: Luka <luka@paperspace>	2024-03-20 23:20:04 -07:00
Zhuohan Li	523e30ea0c	[BugFix] Hot fix in setup.py for neuron build (#3537 )	2024-03-20 17:59:52 -07:00
Roy	f1c0fc3919	Migrate `logits` computation and gather to `model_runner` (#3233 )	2024-03-20 23:25:01 +00:00
SangBin Cho	6e435de766	[1/n][Chunked Prefill] Refactor input query shapes (#3236 )	2024-03-20 14:46:05 -07:00
Antoni Baum	426ec4ec67	[1/n] Triton sampling kernel (#3186 ) Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>	2024-03-20 14:45:08 -07:00
James Whedbee	80e254834d	[Bugfix] Fix ROCm support in CMakeLists.txt (#3534 )	2024-03-20 21:05:03 +00:00
bnellnm	ba8ae1d84f	Check for _is_cuda() in compute_num_jobs (#3481 )	2024-03-20 10:06:56 -07:00
Allen.Dou	84eaa68425	Abort when nvcc command is not found in the PATH (#3527 )	2024-03-20 09:28:29 -07:00
Woosuk Kwon	5ee14494e4	[Misc] Remove cache stream and cache events (#3461 )	2024-03-20 00:38:53 -07:00
Nick Hill	4ad521d8b5	[Core] Add generic typing to `LRUCache` (#3511 )	2024-03-20 00:36:09 -07:00
ElizaWszola	9474e89ba4	[PREFIX CACHING FOLLOW UP] A bunch of fixes to block allocator performance when automatic prefix caching is disabled (#3357 ) Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>	2024-03-20 00:11:11 -07:00
Simon Mo	20478c4d3a	Use lru_cache for some environment detection utils (#3508 )	2024-03-19 21:34:15 +00:00
Jim Burtoft	63e8b28a99	[Doc] minor fix of spelling in amd-installation.rst (#3506 )	2024-03-19 20:32:30 +00:00
Simon Mo	cc63d03fbb	Revert "[Core] Cache some utils" (#3507 )	2024-03-19 13:22:58 -07:00
Jim Burtoft	2a60c9bd17	[Doc] minor fix to neuron-installation.rst (#3505 )	2024-03-19 13:21:35 -07:00
ifsheldon	c614cfee58	Update dockerfile with ModelScope support (#3429 )	2024-03-19 10:54:59 -07:00
Nick Hill	7341c77d69	[BugFix] Avoid initializing CUDA too early (#3487 )	2024-03-18 23:05:20 -07:00
Simon Mo	ef65dcfa6f	[Doc] Add docs about OpenAI compatible server (#3288 )	2024-03-18 22:05:34 -07:00
youkaichao	6a9c583e73	[Core] print error before deadlock (#3459 )	2024-03-19 04:06:23 +00:00
Antoni Baum	b37cdce2b1	[Core] Cache some utils (#3474 )	2024-03-18 17:14:26 -07:00
Zhuohan Li	b30880a762	[Misc] Update README for the Third vLLM Meetup (#3479 )	2024-03-18 15:58:38 -07:00
Antoni Baum	49eedea373	[Core] Zero-copy asdict for InputMetadata (#3475 )	2024-03-18 22:56:40 +00:00
bnellnm	9fdf3de346	Cmake based build system (#2830 )	2024-03-18 15:38:33 -07:00
Zhuohan Li	c0c17d4896	[Misc] Fix PR Template (#3478 )	2024-03-18 15:00:31 -07:00
Robert Shaw	097aa0ea22	[CI/Build] Fix Bad Import In Test (#3473 )	2024-03-18 20:28:00 +00:00
Cade Daniel	482b0adf1b	[Testing] Add test_config.py to CI (#3437 )	2024-03-18 12:48:45 -07:00
Simon Mo	8c654c045f	CI: Add ROCm Docker Build (#2886 )	2024-03-18 19:33:47 +00:00

1 2 3 4 5 ...

947 Commits