DefTruth
|
0f9a6e3d22
|
[Bugfix][Kernel] allow non-power-of-2 for prefix prefill with alibi (#4573)
|
2024-05-08 09:19:58 -07:00 |
|
SangBin Cho
|
3521ba4f25
|
[Core][Model runner refactoring 1/N] Refactor attn metadata term (#4518)
|
2024-05-03 10:20:12 -07:00 |
|
Michał Moskal
|
32881f3f31
|
[kernel] fix sliding window in prefix prefill Triton kernel (#4405)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
|
2024-05-02 11:23:37 -07:00 |
|
Michał Moskal
|
e8cc7967ff
|
[Bugfix][Kernel] allow non-power-of-two head sizes in prefix prefill (#4128)
|
2024-04-18 00:51:28 -07:00 |
|
Roger Wang
|
45b6ef6513
|
feat(benchmarks): Add Prefix Caching Benchmark to Serving Benchmark (#3277)
|
2024-03-27 13:39:26 -07:00 |
|
SangBin Cho
|
01bfb22b41
|
[CI] Try introducing isort. (#3495)
|
2024-03-25 07:59:47 -07:00 |
|
Woosuk Kwon
|
925f3332ca
|
[Core] Refactor Attention Take 2 (#3462)
|
2024-03-25 04:39:33 +00:00 |
|
simon-mo
|
ad50bf4b25
|
fix lint
|
2024-03-15 22:23:38 -07:00 |
|
Tao He
|
3123f15138
|
Fixes the incorrect argument in the prefix-prefill test cases (#3246)
|
2024-03-15 20:58:10 -07:00 |
|
Zhuohan Li
|
2f8844ba08
|
Re-enable the 80 char line width limit (#3305)
|
2024-03-10 19:49:14 -07:00 |
|
Woosuk Kwon
|
2daf23ab0c
|
Separate attention backends (#3005)
|
2024-03-07 01:45:50 -08:00 |
|
Tao He
|
71bcaf99e2
|
Enable GQA support in the prefix prefill kernels (#3007)
Signed-off-by: Tao He <sighingnow@gmail.com>
|
2024-02-27 01:14:31 -08:00 |
|
Kunshang Ji
|
96b6f475dd
|
Remove hardcoded device="cuda" to support more devices (#2503)
Co-authored-by: Jiang Li <jiang1.li@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
|
2024-02-01 15:46:39 -08:00 |
|
Jason Zhu
|
7a0b011dd5
|
Add a 1-line docstring to explain why calling context_attention_fwd twice in test_prefix_prefill.py (#2553)
|
2024-01-22 14:47:25 -08:00 |
|
shiyi.c_98
|
d10f8e1d43
|
[Experimental] Prefix Caching Support (#1669)
Co-authored-by: DouHappy <2278958187@qq.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
|
2024-01-17 16:32:10 -08:00 |
|