SnowDist
|
a22dea54d3
|
[Model] Support MAP-NEO model (#5081)
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
|
2024-05-30 19:24:41 -07:00 |
|
Eric Xihui Lin
|
8e192ff967
|
[Kernel][Backend][Model] Blocksparse flash attention kernel and Phi-3-Small model (#4799)
Co-authored-by: beagleski <yunanzhang@microsoft.com>
Co-authored-by: bapatra <bapatra@microsoft.com>
Co-authored-by: Barun Patra <codedecde@users.noreply.github.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2024-05-24 22:00:52 -07:00 |
|
Michael Goin
|
5f6d10c14c
|
[CI/Build] Enforce style for C++ and CUDA code with clang-format (#4722)
|
2024-05-22 07:18:41 +00:00 |
|
Steve Grubb
|
dac6a3f6ed
|
[Misc] Apply a couple g++ cleanups (#4719)
|
2024-05-10 13:37:05 +00:00 |
|
youkaichao
|
20cfcdec99
|
[Core][Optimization] change python dict to pytorch tensor for blocks to swap (#4659)
|
2024-05-08 12:07:05 -07:00 |
|
youkaichao
|
63575bc2e1
|
[Core][Optimization] change python dict to pytorch tensor (#4607)
|
2024-05-06 21:30:27 -07:00 |
|
SangBin Cho
|
3521ba4f25
|
[Core][Model runner refactoring 1/N] Refactor attn metadata term (#4518)
|
2024-05-03 10:20:12 -07:00 |
|
Woosuk Kwon
|
498eb5cfa3
|
[Bugfix] Add kv_scale input parameter to CPU backend (#3840)
|
2024-04-04 04:33:08 +00:00 |
|
bigPYJ1151
|
0e3f06fe9c
|
[Hardware][Intel] Add CPU inference backend (#3634)
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
Co-authored-by: Yuan Zhou <yuan.zhou@intel.com>
|
2024-04-01 22:07:30 -07:00 |
|