TianYu GUO
|
e67c295b0c
|
[Bugfix] fix automatic prefix args and add log info (#3608)
|
2024-03-25 05:35:22 -07:00 |
|
Woosuk Kwon
|
925f3332ca
|
[Core] Refactor Attention Take 2 (#3462)
|
2024-03-25 04:39:33 +00:00 |
|
少年
|
b0dfa91dd7
|
[Model] Add starcoder2 awq support (#3569)
|
2024-03-24 21:07:36 -07:00 |
|
Woosuk Kwon
|
56a8652f33
|
[Bugfix] store lock file in tmp directory (#3578)" (#3599)
Co-authored-by: youkaichao <youkaichao@126.com>
|
2024-03-24 20:06:50 -07:00 |
|
Kunshang Ji
|
6d93d35308
|
[BugFix] tensor.get_device() -> tensor.device (#3604)
|
2024-03-24 19:01:13 -07:00 |
|
youkaichao
|
837e185142
|
[CI/Build] fix flaky test (#3602)
|
2024-03-24 17:43:05 -07:00 |
|
youkaichao
|
42bc386129
|
[CI/Build] respect the common environment variable MAX_JOBS (#3600)
|
2024-03-24 17:04:00 -07:00 |
|
youkaichao
|
8b268a46a7
|
[CI] typo fix: is_hip --> is_hip() (#3595)
|
2024-03-24 16:03:06 -07:00 |
|
Nick Hill
|
41deac4a3d
|
[BugFix] 1D query fix for MoE models (#3597)
|
2024-03-24 16:00:16 -07:00 |
|
Woosuk Kwon
|
af9e53496f
|
[BugFix] Fix Falcon tied embeddings (#3590)
Co-authored-by: 44670 <44670@users.noreply.github.com>
|
2024-03-24 06:34:01 -07:00 |
|
Roger Wang
|
f8a12ecc7f
|
[Misc] Bump transformers version (#3592)
|
2024-03-24 06:32:45 -07:00 |
|
Woosuk Kwon
|
3c5ab9b811
|
[Misc] Fix BLOOM copyright notice (#3591)
|
2024-03-23 23:30:56 -07:00 |
|
kota-iizuka
|
743a0b7402
|
[Bugfix] use SoftLockFile instead of LockFile (#3578)
|
2024-03-23 11:43:11 -07:00 |
|
Antoni Baum
|
bfdb1ba5c3
|
[Core] Improve detokenization performance for prefill (#3469)
Co-authored-by: MeloYang <meloyang05@gmail.com>
|
2024-03-22 13:44:12 -07:00 |
|
Thomas Parnell
|
cf2f084d56
|
Dynamic scheduler delay to improve ITL performance (#3279)
Co-authored-by: Jan van Lunteren <jvl@zurich.ibm.com>
|
2024-03-22 12:28:14 -07:00 |
|
Hanzhi Zhou
|
f721096d48
|
[BugFix] Some fixes for custom allreduce kernels (#2760)
|
2024-03-21 23:02:58 -07:00 |
|
Zhuohan Li
|
e90fc21f2e
|
[Hardware][Neuron] Refactor neuron support (#3471)
|
2024-03-22 01:22:17 +00:00 |
|
Roy
|
ea5f14e6ff
|
[Bugfix][Model] Fix Qwen2 (#3554)
|
2024-03-22 00:18:58 +00:00 |
|
Taemin Lee
|
b7050ca7df
|
[BugFix] gemma loading after quantization or LoRA. (#3553)
|
2024-03-21 13:16:57 -07:00 |
|
Woosuk Kwon
|
c188ecb080
|
[Misc] Bump up transformers to v4.39.0 & Remove StarCoder2Config (#3551)
Co-authored-by: Roy <jasonailu87@gmail.com>
Co-authored-by: Roger Meier <r.meier@siemens.com>
|
2024-03-21 07:58:12 -07:00 |
|
Roy
|
865732342b
|
[Misc][Log] Add log for tokenizer length not equal to vocabulary size (#3500)
|
2024-03-21 18:07:48 +08:00 |
|
Lalit Pradhan
|
4c07dd28c0
|
[🚀 Ready to be merged] Added support for Jais models (#3183)
|
2024-03-21 09:45:24 +00:00 |
|
SangBin Cho
|
3bbff9e5ab
|
Fix 1D query issue from _prune_hidden_states (#3539)
|
2024-03-21 08:49:06 +00:00 |
|
ElizaWszola
|
6ebd02bdef
|
[PREFIX CACHING FOLLOW UP] OrderedDict-based evictor (#3431)
Co-authored-by: rsnm2 <rshaw@neuralmagic.com>
Co-authored-by: Luka <luka@paperspace>
|
2024-03-20 23:20:04 -07:00 |
|
Zhuohan Li
|
523e30ea0c
|
[BugFix] Hot fix in setup.py for neuron build (#3537)
|
2024-03-20 17:59:52 -07:00 |
|
Roy
|
f1c0fc3919
|
Migrate logits computation and gather to model_runner (#3233)
|
2024-03-20 23:25:01 +00:00 |
|
SangBin Cho
|
6e435de766
|
[1/n][Chunked Prefill] Refactor input query shapes (#3236)
|
2024-03-20 14:46:05 -07:00 |
|
Antoni Baum
|
426ec4ec67
|
[1/n] Triton sampling kernel (#3186)
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
|
2024-03-20 14:45:08 -07:00 |
|
James Whedbee
|
80e254834d
|
[Bugfix] Fix ROCm support in CMakeLists.txt (#3534)
|
2024-03-20 21:05:03 +00:00 |
|
bnellnm
|
ba8ae1d84f
|
Check for _is_cuda() in compute_num_jobs (#3481)
|
2024-03-20 10:06:56 -07:00 |
|
Allen.Dou
|
84eaa68425
|
Abort when nvcc command is not found in the PATH (#3527)
|
2024-03-20 09:28:29 -07:00 |
|
Woosuk Kwon
|
5ee14494e4
|
[Misc] Remove cache stream and cache events (#3461)
|
2024-03-20 00:38:53 -07:00 |
|
Nick Hill
|
4ad521d8b5
|
[Core] Add generic typing to LRUCache (#3511)
|
2024-03-20 00:36:09 -07:00 |
|
ElizaWszola
|
9474e89ba4
|
[PREFIX CACHING FOLLOW UP] A bunch of fixes to block allocator performance when automatic prefix caching is disabled (#3357)
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
|
2024-03-20 00:11:11 -07:00 |
|
Simon Mo
|
20478c4d3a
|
Use lru_cache for some environment detection utils (#3508)
|
2024-03-19 21:34:15 +00:00 |
|
Jim Burtoft
|
63e8b28a99
|
[Doc] minor fix of spelling in amd-installation.rst (#3506)
|
2024-03-19 20:32:30 +00:00 |
|
Simon Mo
|
cc63d03fbb
|
Revert "[Core] Cache some utils" (#3507)
|
2024-03-19 13:22:58 -07:00 |
|
Jim Burtoft
|
2a60c9bd17
|
[Doc] minor fix to neuron-installation.rst (#3505)
|
2024-03-19 13:21:35 -07:00 |
|
ifsheldon
|
c614cfee58
|
Update dockerfile with ModelScope support (#3429)
|
2024-03-19 10:54:59 -07:00 |
|
Nick Hill
|
7341c77d69
|
[BugFix] Avoid initializing CUDA too early (#3487)
|
2024-03-18 23:05:20 -07:00 |
|
Simon Mo
|
ef65dcfa6f
|
[Doc] Add docs about OpenAI compatible server (#3288)
|
2024-03-18 22:05:34 -07:00 |
|
youkaichao
|
6a9c583e73
|
[Core] print error before deadlock (#3459)
|
2024-03-19 04:06:23 +00:00 |
|
Antoni Baum
|
b37cdce2b1
|
[Core] Cache some utils (#3474)
|
2024-03-18 17:14:26 -07:00 |
|
Zhuohan Li
|
b30880a762
|
[Misc] Update README for the Third vLLM Meetup (#3479)
|
2024-03-18 15:58:38 -07:00 |
|
Antoni Baum
|
49eedea373
|
[Core] Zero-copy asdict for InputMetadata (#3475)
|
2024-03-18 22:56:40 +00:00 |
|
bnellnm
|
9fdf3de346
|
Cmake based build system (#2830)
|
2024-03-18 15:38:33 -07:00 |
|
Zhuohan Li
|
c0c17d4896
|
[Misc] Fix PR Template (#3478)
|
2024-03-18 15:00:31 -07:00 |
|
Robert Shaw
|
097aa0ea22
|
[CI/Build] Fix Bad Import In Test (#3473)
|
2024-03-18 20:28:00 +00:00 |
|
Cade Daniel
|
482b0adf1b
|
[Testing] Add test_config.py to CI (#3437)
|
2024-03-18 12:48:45 -07:00 |
|
Simon Mo
|
8c654c045f
|
CI: Add ROCm Docker Build (#2886)
|
2024-03-18 19:33:47 +00:00 |
|