afeldman-nm
|
fd95e026e0
|
[Core] Subclass ModelRunner to support cross-attention & encoder sequences (towards eventual encoder/decoder model support) (#4942)
Co-authored-by: Andrew Feldman <afeld2012@gmail.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
|
2024-08-06 16:51:47 -04:00 |
|
youkaichao
|
c8a7e93273
|
[core][scheduler] simplify and improve scheduler (#6867)
|
2024-07-31 23:51:09 -07:00 |
|
Jiaxin Shan
|
42c7f66a38
|
[Core] Support dynamically loading Lora adapter from HuggingFace (#6234)
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
|
2024-07-22 15:42:40 -07:00 |
|
Cyrus Leung
|
0e9164b40a
|
[mypy] Enable type checking for test directory (#5017)
|
2024-06-15 04:45:31 +00:00 |
|
SangBin Cho
|
e7c46b9527
|
[Scheduler] Warning upon preemption and Swapping (#4647)
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
|
2024-05-13 23:50:44 +09:00 |
|
youkaichao
|
20cfcdec99
|
[Core][Optimization] change python dict to pytorch tensor for blocks to swap (#4659)
|
2024-05-08 12:07:05 -07:00 |
|
youkaichao
|
469f85c782
|
[Core][Optimization] change copy-on-write from dict[int, list] to list (#4648)
|
2024-05-07 11:06:32 -07:00 |
|
youkaichao
|
63575bc2e1
|
[Core][Optimization] change python dict to pytorch tensor (#4607)
|
2024-05-06 21:30:27 -07:00 |
|
SangBin Cho
|
0f8a91401c
|
[Core] Ignore infeasible swap requests. (#4557)
|
2024-05-02 14:31:20 -07:00 |
|
SangBin Cho
|
050f285ff6
|
[Core] Scheduling optimization 2 (#4280)
|
2024-04-23 08:02:11 +00:00 |
|
SangBin Cho
|
ad8d696a99
|
[Core] Scheduler perf fix (#4270)
|
2024-04-22 21:11:06 +00:00 |
|
SangBin Cho
|
18de883489
|
[Chunked Prefill][4/n] Chunked prefill scheduler. (#3853)
|
2024-04-05 10:17:58 -07:00 |
|
SangBin Cho
|
3dcb3e8b98
|
[3/N] Refactor scheduler for chunked prefill scheduling (#3550)
|
2024-04-03 14:13:49 -07:00 |
|
SangBin Cho
|
b51c1cc9d2
|
[2/N] Chunked prefill data update (#3538)
|
2024-03-28 10:06:01 -07:00 |
|
Cade Daniel
|
14ccd94c89
|
[Core][Bugfix]Refactor block manager for better testability (#3492)
|
2024-03-27 23:59:28 -07:00 |
|
SangBin Cho
|
01bfb22b41
|
[CI] Try introducing isort. (#3495)
|
2024-03-25 07:59:47 -07:00 |
|
Thomas Parnell
|
cf2f084d56
|
Dynamic scheduler delay to improve ITL performance (#3279)
Co-authored-by: Jan van Lunteren <jvl@zurich.ibm.com>
|
2024-03-22 12:28:14 -07:00 |
|
SangBin Cho
|
6e435de766
|
[1/n][Chunked Prefill] Refactor input query shapes (#3236)
|
2024-03-20 14:46:05 -07:00 |
|
Cade Daniel
|
a33ce60c66
|
[Testing] Fix core tests (#3224)
|
2024-03-06 01:04:23 -08:00 |
|
SangBin Cho
|
24aecf421a
|
[Tests] Add block manager and scheduler tests (#3108)
|
2024-03-05 18:23:34 -08:00 |
|