Kuntai Du
81ede99ca4
[Core] Deprecating block manager v1 and make block manager v2 default ( #8704 )
...
Removing the block manager v1. This is the initial piece of prefix-caching-centric design. In order to achieve prefix-caching-centric design, we need to simplify the code path so that we only use v2 block manager (which has much higher performance on prefix caching).
2024-10-17 11:38:15 -05:00
sroy745
f3a507f1d3
[Core] Add an environment variable which needs to be set explicitly to allow BlockSpaceManagerV1 ( #9149 )
2024-10-10 14:17:17 +08:00
sroy745
ee777d9c30
Fix test_schedule_swapped_simple in test_scheduler.py ( #8780 )
2024-09-24 21:26:18 -07:00
sroy745
88577ac928
Fix tests in test_scheduler.py that fail with BlockManager V2 ( #8728 )
2024-09-24 04:43:13 +00:00
afeldman-nm
fd95e026e0
[Core] Subclass ModelRunner to support cross-attention & encoder sequences (towards eventual encoder/decoder model support) ( #4942 )
...
Co-authored-by: Andrew Feldman <afeld2012@gmail.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
2024-08-06 16:51:47 -04:00
youkaichao
c8a7e93273
[core][scheduler] simplify and improve scheduler ( #6867 )
2024-07-31 23:51:09 -07:00
Jiaxin Shan
42c7f66a38
[Core] Support dynamically loading Lora adapter from HuggingFace ( #6234 )
...
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
2024-07-22 15:42:40 -07:00
Cyrus Leung
0e9164b40a
[mypy] Enable type checking for test directory ( #5017 )
2024-06-15 04:45:31 +00:00
SangBin Cho
e7c46b9527
[Scheduler] Warning upon preemption and Swapping ( #4647 )
...
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
2024-05-13 23:50:44 +09:00
youkaichao
20cfcdec99
[Core][Optimization] change python dict to pytorch tensor for blocks to swap ( #4659 )
2024-05-08 12:07:05 -07:00
youkaichao
469f85c782
[Core][Optimization] change copy-on-write from dict[int, list] to list ( #4648 )
2024-05-07 11:06:32 -07:00
youkaichao
63575bc2e1
[Core][Optimization] change python dict to pytorch tensor ( #4607 )
2024-05-06 21:30:27 -07:00
SangBin Cho
0f8a91401c
[Core] Ignore infeasible swap requests. ( #4557 )
2024-05-02 14:31:20 -07:00
SangBin Cho
050f285ff6
[Core] Scheduling optimization 2 ( #4280 )
2024-04-23 08:02:11 +00:00
SangBin Cho
ad8d696a99
[Core] Scheduler perf fix ( #4270 )
2024-04-22 21:11:06 +00:00
SangBin Cho
18de883489
[Chunked Prefill][4/n] Chunked prefill scheduler. ( #3853 )
2024-04-05 10:17:58 -07:00
SangBin Cho
3dcb3e8b98
[3/N] Refactor scheduler for chunked prefill scheduling ( #3550 )
2024-04-03 14:13:49 -07:00
SangBin Cho
b51c1cc9d2
[2/N] Chunked prefill data update ( #3538 )
2024-03-28 10:06:01 -07:00
Cade Daniel
14ccd94c89
[Core][Bugfix]Refactor block manager for better testability ( #3492 )
2024-03-27 23:59:28 -07:00
SangBin Cho
01bfb22b41
[CI] Try introducing isort. ( #3495 )
2024-03-25 07:59:47 -07:00
Thomas Parnell
cf2f084d56
Dynamic scheduler delay to improve ITL performance ( #3279 )
...
Co-authored-by: Jan van Lunteren <jvl@zurich.ibm.com>
2024-03-22 12:28:14 -07:00
SangBin Cho
6e435de766
[1/n][Chunked Prefill] Refactor input query shapes ( #3236 )
2024-03-20 14:46:05 -07:00
Cade Daniel
a33ce60c66
[Testing] Fix core tests ( #3224 )
2024-03-06 01:04:23 -08:00
SangBin Cho
24aecf421a
[Tests] Add block manager and scheduler tests ( #3108 )
2024-03-05 18:23:34 -08:00