SangBin Cho
|
b51c1cc9d2
|
[2/N] Chunked prefill data update (#3538)
|
2024-03-28 10:06:01 -07:00 |
|
Roger Wang
|
ce567a2926
|
[Kernel] DBRX Triton MoE kernel H100 (#3692)
|
2024-03-28 10:05:34 -07:00 |
|
wenyujin333
|
d6ea427f04
|
[Model] Add support for Qwen2MoeModel (#3346)
|
2024-03-28 15:19:59 +00:00 |
|
Cade Daniel
|
14ccd94c89
|
[Core][Bugfix]Refactor block manager for better testability (#3492)
|
2024-03-27 23:59:28 -07:00 |
|
Woosuk Kwon
|
8267b06c30
|
[Kernel] Add Triton MoE kernel configs for DBRX on A100 (#3679)
|
2024-03-27 22:22:25 -07:00 |
|
youkaichao
|
3492859b68
|
[CI/Build] update default number of jobs and nvcc threads to avoid overloading the system (#3675)
|
2024-03-28 00:18:54 -04:00 |
|
hxer7963
|
098e1776ba
|
[Model] Add support for xverse (#3610)
Co-authored-by: willhe <hexin@xverse.cn>
Co-authored-by: root <root@localhost.localdomain>
|
2024-03-27 18:12:54 -07:00 |
|
Roy
|
10e6322283
|
[Model] Fix and clean commandr (#3671)
|
2024-03-28 00:20:00 +00:00 |
|
Woosuk Kwon
|
6d9aa00fc4
|
[Docs] Add Command-R to supported models (#3669)
|
2024-03-27 15:20:00 -07:00 |
|
zeppombal
|
1182607e18
|
Add support for Cohere's Command-R model (#3433)
Co-authored-by: José Maria Pombal <jose.pombal@unbabel.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2024-03-27 14:19:32 -07:00 |
|
Roger Wang
|
45b6ef6513
|
feat(benchmarks): Add Prefix Caching Benchmark to Serving Benchmark (#3277)
|
2024-03-27 13:39:26 -07:00 |
|
AmadeusChan
|
1956931436
|
[Misc] add the "download-dir" option to the latency/throughput benchmarks (#3621)
|
2024-03-27 13:39:05 -07:00 |
|
Megha Agarwal
|
e24336b5a7
|
[Model] Add support for DBRX (#3660)
|
2024-03-27 13:01:46 -07:00 |
|
youkaichao
|
d18f4e73f3
|
[Bugfix] [Hotfix] fix nccl library name (#3661)
|
2024-03-27 17:23:54 +00:00 |
|
Woosuk Kwon
|
82c540bebf
|
[Bugfix] More faithful implementation of Gemma (#3653)
|
2024-03-27 09:37:18 -07:00 |
|
youkaichao
|
8f44facddd
|
[Core] remove cupy dependency (#3625)
|
2024-03-27 00:33:26 -07:00 |
|
Woosuk Kwon
|
e66b629c04
|
[Misc] Minor fix in KVCache type (#3652)
|
2024-03-26 23:14:06 -07:00 |
|
Jee Li
|
76879342a3
|
[Doc]add lora support (#3649)
|
2024-03-27 02:06:46 +00:00 |
|
Jee Li
|
566b57c5c4
|
[Kernel] support non-zero cuda devices in punica kernels (#3636)
|
2024-03-27 00:37:42 +00:00 |
|
Nick Hill
|
0dc72273b8
|
[BugFix] Fix ipv4 address parsing regression (#3645)
|
2024-03-26 14:39:44 -07:00 |
|
liiliiliil
|
a979d9771e
|
[Bugfix] Fix ipv6 address parsing bug (#3641)
|
2024-03-26 11:58:20 -07:00 |
|
Jee Li
|
8af890a865
|
Enable more models to inference based on LoRA (#3382)
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
|
2024-03-25 18:09:31 -07:00 |
|
Nick Hill
|
dfeb2ecc3a
|
[Misc] Include matched stop string/token in responses (#2976)
Co-authored-by: Sahil Suneja <sahilsuneja@gmail.com>
|
2024-03-25 17:31:32 -07:00 |
|
Antoni Baum
|
3a243095e5
|
Optimize _get_ranks in Sampler (#3623)
|
2024-03-25 16:03:02 -07:00 |
|
xwjiang2010
|
64172a976c
|
[Feature] Add vision language model support. (#3042)
|
2024-03-25 14:16:30 -07:00 |
|
Simon Mo
|
f408d05c52
|
hotfix isort on logprobs ranks pr (#3622)
|
2024-03-25 11:55:46 -07:00 |
|
Dylan Hawk
|
0b4997e05c
|
[Bugfix] API stream returning two stops (#3450)
Co-authored-by: Dylan Hawk <dylanwawk@gmail.com>
|
2024-03-25 10:14:34 -07:00 |
|
Travis Johnson
|
c13ad1b7bd
|
feat: implement the min_tokens sampling parameter (#3124)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
|
2024-03-25 10:14:26 -07:00 |
|
Swapnil Parekh
|
819924e749
|
[Core] Adding token ranks along with logprobs (#3516)
Co-authored-by: Swapnil Parekh <swapnilp@ibm.com>
|
2024-03-25 10:13:10 -07:00 |
|
SangBin Cho
|
01bfb22b41
|
[CI] Try introducing isort. (#3495)
|
2024-03-25 07:59:47 -07:00 |
|
TianYu GUO
|
e67c295b0c
|
[Bugfix] fix automatic prefix args and add log info (#3608)
|
2024-03-25 05:35:22 -07:00 |
|
Woosuk Kwon
|
925f3332ca
|
[Core] Refactor Attention Take 2 (#3462)
|
2024-03-25 04:39:33 +00:00 |
|
少年
|
b0dfa91dd7
|
[Model] Add starcoder2 awq support (#3569)
|
2024-03-24 21:07:36 -07:00 |
|
Woosuk Kwon
|
56a8652f33
|
[Bugfix] store lock file in tmp directory (#3578)" (#3599)
Co-authored-by: youkaichao <youkaichao@126.com>
|
2024-03-24 20:06:50 -07:00 |
|
Kunshang Ji
|
6d93d35308
|
[BugFix] tensor.get_device() -> tensor.device (#3604)
|
2024-03-24 19:01:13 -07:00 |
|
youkaichao
|
837e185142
|
[CI/Build] fix flaky test (#3602)
|
2024-03-24 17:43:05 -07:00 |
|
youkaichao
|
42bc386129
|
[CI/Build] respect the common environment variable MAX_JOBS (#3600)
|
2024-03-24 17:04:00 -07:00 |
|
youkaichao
|
8b268a46a7
|
[CI] typo fix: is_hip --> is_hip() (#3595)
|
2024-03-24 16:03:06 -07:00 |
|
Nick Hill
|
41deac4a3d
|
[BugFix] 1D query fix for MoE models (#3597)
|
2024-03-24 16:00:16 -07:00 |
|
Woosuk Kwon
|
af9e53496f
|
[BugFix] Fix Falcon tied embeddings (#3590)
Co-authored-by: 44670 <44670@users.noreply.github.com>
|
2024-03-24 06:34:01 -07:00 |
|
Roger Wang
|
f8a12ecc7f
|
[Misc] Bump transformers version (#3592)
|
2024-03-24 06:32:45 -07:00 |
|
Woosuk Kwon
|
3c5ab9b811
|
[Misc] Fix BLOOM copyright notice (#3591)
|
2024-03-23 23:30:56 -07:00 |
|
kota-iizuka
|
743a0b7402
|
[Bugfix] use SoftLockFile instead of LockFile (#3578)
|
2024-03-23 11:43:11 -07:00 |
|
Antoni Baum
|
bfdb1ba5c3
|
[Core] Improve detokenization performance for prefill (#3469)
Co-authored-by: MeloYang <meloyang05@gmail.com>
|
2024-03-22 13:44:12 -07:00 |
|
Thomas Parnell
|
cf2f084d56
|
Dynamic scheduler delay to improve ITL performance (#3279)
Co-authored-by: Jan van Lunteren <jvl@zurich.ibm.com>
|
2024-03-22 12:28:14 -07:00 |
|
Hanzhi Zhou
|
f721096d48
|
[BugFix] Some fixes for custom allreduce kernels (#2760)
|
2024-03-21 23:02:58 -07:00 |
|
Zhuohan Li
|
e90fc21f2e
|
[Hardware][Neuron] Refactor neuron support (#3471)
|
2024-03-22 01:22:17 +00:00 |
|
Roy
|
ea5f14e6ff
|
[Bugfix][Model] Fix Qwen2 (#3554)
|
2024-03-22 00:18:58 +00:00 |
|
Taemin Lee
|
b7050ca7df
|
[BugFix] gemma loading after quantization or LoRA. (#3553)
|
2024-03-21 13:16:57 -07:00 |
|
Woosuk Kwon
|
c188ecb080
|
[Misc] Bump up transformers to v4.39.0 & Remove StarCoder2Config (#3551)
Co-authored-by: Roy <jasonailu87@gmail.com>
Co-authored-by: Roger Meier <r.meier@siemens.com>
|
2024-03-21 07:58:12 -07:00 |
|