sroy745
|
14f91fe67c
|
[Spec Decode] Disable Log Prob serialization to CPU for spec decoding for both draft and target models. (#6485)
|
2024-07-20 23:58:58 -07:00 |
|
sroy745
|
ae151d73be
|
[Speculative Decoding] Enabling bonus token in speculative decoding for KV cache based models (#5765)
|
2024-07-10 16:02:47 -07:00 |
|
sroy745
|
80ca1e6a3a
|
[Speculative Decoding 2/2 ] Integrate typical acceptance sampler into Spec Decode Worker (#5348)
|
2024-07-01 00:33:05 -07:00 |
|
Nick Hill
|
faf71bcd4b
|
[Speculative Decoding] Add ProposerWorkerBase abstract class (#5252)
|
2024-06-05 14:53:05 -07:00 |
|
Lily Liu
|
d5a1697772
|
[Dynamic Spec Decoding] Minor fix for disabling speculative decoding (#5000)
|
2024-05-25 10:00:14 -07:00 |
|
Cody Yu
|
f942efb5a3
|
[Dynamic Spec Decoding] Auto-disable by the running queue size (#4592)
Co-authored-by: Cade Daniel <edacih@gmail.com>
|
2024-05-08 21:44:00 +00:00 |
|