8 Commits

Author SHA1 Message Date
Lu Fang
8c0d15d5c5
[Misc][Easy] Annotate unused vars in the csrc files (#14798)
Signed-off-by: Lu Fang <lufang@fb.com>
2025-03-15 12:40:09 +08:00
Elfie Guo
0794e7446e
[Misc] Add multipstep chunked-prefill support for FlashInfer (#10467) 2025-01-15 12:47:49 +08:00
Pavani Majety
b6dde33019
[Core] Flashinfer - Remove advance step size restriction (#10282) 2024-11-13 16:29:32 +08:00
Varun Sundar Rabindranath
afb050b29d
[Core] CUDA Graphs for Multi-Step + Chunked-Prefill (#8645)
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
2024-10-02 19:44:39 +00:00
Varun Sundar Rabindranath
c2ec430ab5
[Core] Multi-Step + Single Step Prefills via Chunked Prefill code path (#8378)
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
2024-09-27 13:32:07 -07:00
Tyler Michael Smith
71d21c73ab
[Bugfix] Fixup advance_step.cu warning (#8815) 2024-09-26 16:23:45 -07:00
William Lin
a6c0f3658d
[multi-step] add flashinfer backend (#7928) 2024-09-12 11:16:22 -07:00
Alexander Matveev
e76466dde2
[Core] draft_model_runner: Implement prepare_inputs on GPU for advance_step (#6338) 2024-07-17 14:30:28 -07:00