Alexander Matveev
|
e76466dde2
|
[Core] draft_model_runner: Implement prepare_inputs on GPU for advance_step (#6338)
|
2024-07-17 14:30:28 -07:00 |
|
sroy745
|
ae151d73be
|
[Speculative Decoding] Enabling bonus token in speculative decoding for KV cache based models (#5765)
|
2024-07-10 16:02:47 -07:00 |
|
Cody Yu
|
b2c620230a
|
[Spec Decode] Introduce DraftModelRunner (#5799)
|
2024-06-28 09:17:51 -07:00 |
|
Cyrus Leung
|
0e9164b40a
|
[mypy] Enable type checking for test directory (#5017)
|
2024-06-15 04:45:31 +00:00 |
|
Nick Hill
|
faf71bcd4b
|
[Speculative Decoding] Add ProposerWorkerBase abstract class (#5252)
|
2024-06-05 14:53:05 -07:00 |
|
Cody Yu
|
bc8ad68455
|
[Misc][Refactor] Introduce ExecuteModelData (#4540)
|
2024-05-03 17:47:07 -07:00 |
|
Cade Daniel
|
ab50275111
|
[Speculative decoding] Support target-model logprobs (#4378)
|
2024-05-03 15:52:01 -07:00 |
|
SangBin Cho
|
3521ba4f25
|
[Core][Model runner refactoring 1/N] Refactor attn metadata term (#4518)
|
2024-05-03 10:20:12 -07:00 |
|
leiwen83
|
b38e42fbca
|
[Speculative decoding] Add ngram prompt lookup decoding (#4237)
Co-authored-by: Lei Wen <wenlei03@qiyi.com>
|
2024-05-01 11:13:03 -07:00 |
|
Cade Daniel
|
62b8aebc6f
|
[Speculative decoding 7/9] Speculative decoding end-to-end correctness tests. (#3951)
|
2024-04-23 08:02:36 +00:00 |
|
Cade Daniel
|
e95cd87959
|
[Speculative decoding 6/9] Integrate speculative decoding with LLMEngine (#3894)
|
2024-04-16 13:09:21 -07:00 |
|
SangBin Cho
|
01bfb22b41
|
[CI] Try introducing isort. (#3495)
|
2024-03-25 07:59:47 -07:00 |
|
SangBin Cho
|
6e435de766
|
[1/n][Chunked Prefill] Refactor input query shapes (#3236)
|
2024-03-20 14:46:05 -07:00 |
|
Zhuohan Li
|
2f8844ba08
|
Re-enable the 80 char line width limit (#3305)
|
2024-03-10 19:49:14 -07:00 |
|
Cade Daniel
|
8437bae6ef
|
[Speculative decoding 3/9] Worker which speculates, scores, and applies rejection sampling (#3103)
|
2024-03-08 23:32:46 -08:00 |
|