SangBin Cho
|
ff7ec82c4d
|
[Core] Optimize SPMD architecture with delta + serialization optimization (#7109)
|
2024-08-18 17:57:20 -07:00 |
|
Murali Andoorveedu
|
c5832d2ae9
|
[Core] Pipeline Parallel Support (#4412)
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
|
2024-07-02 10:58:08 -07:00 |
|
Cyrus Leung
|
cff6a1fec1
|
[CI/Build] Reuse code for checking output consistency (#5988)
|
2024-06-30 11:44:25 +08:00 |
|
youkaichao
|
8ea5e44a43
|
[CI/Test] improve robustness of test (vllm_runner) (#5357)
[CI/Test] improve robustness of test by replacing del with context manager (vllm_runner) (#5357)
|
2024-06-08 08:59:20 +00:00 |
|
youkaichao
|
9fb900f90c
|
[CI/Test] improve robustness of test (hf_runner) (#5347)
[CI/Test] improve robustness of test by replacing del with context manager (hf_runner) (#5347)
|
2024-06-07 22:31:32 -07:00 |
|
SangBin Cho
|
e7c46b9527
|
[Scheduler] Warning upon preemption and Swapping (#4647)
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
|
2024-05-13 23:50:44 +09:00 |
|
SangBin Cho
|
0f8a91401c
|
[Core] Ignore infeasible swap requests. (#4557)
|
2024-05-02 14:31:20 -07:00 |
|
SangBin Cho
|
0d62fe58db
|
[Bug fix][Core] assert num_new_tokens == 1 fails when SamplingParams.n is not 1 and max_tokens is large & Add tests for preemption (#4451)
|
2024-05-01 19:24:13 -07:00 |
|