7 Commits

Author SHA1 Message Date
Woosuk Kwon
f1c8520146
[BugFix] Fix input positions for long context with sliding window (#2088) 2023-12-13 12:28:13 -08:00
Simon Mo
5ffc0d13a2
Migrate linter from pylint to ruff (#1665) 2023-11-20 11:58:01 -08:00
Zhuohan Li
9d9072a069
Implement prompt logprobs & Batched topk for computing logprobs (#1328)
Co-authored-by: Yunmo Chen <16273544+wanmok@users.noreply.github.com>
2023-10-16 10:56:50 -07:00
Zhuohan Li
ba0bfd40e2
TP/quantization/weight loading refactor part 1 - Simplify parallel linear logic (#1181) 2023-10-02 15:36:09 -07:00
Antoni Baum
c9927c1a6a
Use queue for finished requests (#957) 2023-09-05 19:27:23 -07:00
Zhuohan Li
002800f081
Align vLLM's beam search implementation with HF generate (#857) 2023-09-04 17:29:42 -07:00
Woosuk Kwon
32b6816e55
Add tests for models (#922) 2023-09-01 11:19:43 +09:00