7 Commits

Author SHA1 Message Date
Nick Hill
9a3f49ae07
[BugFix] Overhaul async request cancellation (#7111) 2024-08-07 13:21:41 +08:00
Antoni Baum
ff578cae54
Add health check, make async Engine more robust (#3015)
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2024-03-04 22:01:40 +00:00
Antoni Baum
017d9f1515
Add metrics to RequestOutput (#2876) 2024-02-20 21:55:57 -08:00
Zhuohan Li
9d9072a069
Implement prompt logprobs & Batched topk for computing logprobs (#1328)
Co-authored-by: Yunmo Chen <16273544+wanmok@users.noreply.github.com>
2023-10-16 10:56:50 -07:00
Zhuohan Li
ba0bfd40e2
TP/quantization/weight loading refactor part 1 - Simplify parallel linear logic (#1181) 2023-10-02 15:36:09 -07:00
Antoni Baum
ff36139ffc
Remove AsyncLLMEngine busy loop, shield background task (#1059) 2023-09-17 00:29:08 -07:00
Antoni Baum
c07ece5ca4
Make AsyncLLMEngine more robust & fix batched abort (#969)
Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>
Co-authored-by: Avnish Narayan <38871737+avnishn@users.noreply.github.com>
2023-09-07 13:43:45 -07:00