11 Commits

Author SHA1 Message Date
Kuntai Du
81ede99ca4
[Core] Deprecating block manager v1 and make block manager v2 default (#8704)
Removing the block manager v1. This is the initial piece of prefix-caching-centric design. In order to achieve prefix-caching-centric design, we need to simplify the code path so that we only use v2 block manager (which has much higher performance on prefix caching).
2024-10-17 11:38:15 -05:00
sroy745
f3a507f1d3
[Core] Add an environment variable which needs to be set explicitly to allow BlockSpaceManagerV1 (#9149) 2024-10-10 14:17:17 +08:00
sroy745
fc3afc20df
Fix tests in test_chunked_prefill_scheduler which fail with BlockManager V2 (#8752) 2024-09-24 21:26:36 -07:00
Cody Yu
e3580537a4
[Performance] Enable chunked prefill and prefix caching together (#7753) 2024-08-28 00:36:31 -07:00
Megha Agarwal
2eedede875
[Core] Asynchronous Output Processor (#7049)
Co-authored-by: Alexander Matveev <alexm@neuralmagic.com>
2024-08-26 20:53:20 -07:00
Cyrus Leung
0e9164b40a
[mypy] Enable type checking for test directory (#5017) 2024-06-15 04:45:31 +00:00
SangBin Cho
847cdcca1c
[CI] Upgrade codespell version. (#5381) 2024-06-12 10:06:14 -07:00
youkaichao
20cfcdec99
[Core][Optimization] change python dict to pytorch tensor for blocks to swap (#4659) 2024-05-08 12:07:05 -07:00
SangBin Cho
0f8a91401c
[Core] Ignore infeasible swap requests. (#4557) 2024-05-02 14:31:20 -07:00
SangBin Cho
67b4221a61
[Core][5/N] Fully working chunked prefill e2e (#3884) 2024-04-10 17:56:48 -07:00
SangBin Cho
18de883489
[Chunked Prefill][4/n] Chunked prefill scheduler. (#3853) 2024-04-05 10:17:58 -07:00