Luka Govedič
|
5d73ae49d6
|
[Kernel] AQ AZP 3/4: Asymmetric quantization kernels (#7270)
|
2024-09-16 11:52:40 -07:00 |
|
bnellnm
|
73202dbe77
|
[Kernel][Misc] register ops to prevent graph breaks (#6917)
Co-authored-by: Sage Moore <sage@neuralmagic.com>
|
2024-09-11 12:52:19 -07:00 |
|
Li, Jiang
|
0b952af458
|
[Hardware][Intel] Support compressed-tensor W8A8 for CPU backend (#7257)
|
2024-09-11 09:46:46 -07:00 |
|
Lucas Wilkinson
|
a8d604ca2a
|
[Misc] Disambiguate quantized types via a new ScalarType (#6396)
|
2024-08-02 13:51:58 -07:00 |
|
Li, Jiang
|
3bbb4936dc
|
[Hardware] [Intel] Enable Multiprocessing and tensor parallel in CPU backend and update documentation (#6125)
|
2024-07-26 13:50:10 -07:00 |
|
Michael Goin
|
978aed5300
|
[Kernel][Attention] Separate Attention.kv_scale into k_scale and v_scale (#6081)
|
2024-07-16 15:31:32 -07:00 |
|
Chip Kerchner
|
38a1674abb
|
Support CPU inference with VSX PowerPC ISA (#5652)
|
2024-06-26 21:53:04 +00:00 |
|
Roger Wang
|
bd620b01fb
|
[Kernel][CPU] Add Quick gelu to CPU (#5717)
|
2024-06-21 06:39:40 +00:00 |
|
Jie Fu (傅杰)
|
cd9c0d65d9
|
[Hardware][Intel] Support CPU inference with AVX2 ISA (#5452)
|
2024-06-13 17:22:24 -06:00 |
|
bnellnm
|
5467ac3196
|
[Kernel][Misc] Use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (#5047)
|
2024-06-09 16:23:30 -04:00 |
|
Jie Fu (傅杰)
|
6840a71610
|
[Misc] Remove unused cuda_utils.h in CPU backend (#5345)
|
2024-06-07 14:09:13 -07:00 |
|
Yuan
|
cafb8e06c5
|
[CI/BUILD] enable intel queue for longer CPU tests (#4113)
|
2024-06-03 10:39:50 -07:00 |
|
SnowDist
|
a22dea54d3
|
[Model] Support MAP-NEO model (#5081)
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
|
2024-05-30 19:24:41 -07:00 |
|
Eric Xihui Lin
|
8e192ff967
|
[Kernel][Backend][Model] Blocksparse flash attention kernel and Phi-3-Small model (#4799)
Co-authored-by: beagleski <yunanzhang@microsoft.com>
Co-authored-by: bapatra <bapatra@microsoft.com>
Co-authored-by: Barun Patra <codedecde@users.noreply.github.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2024-05-24 22:00:52 -07:00 |
|
Michael Goin
|
5f6d10c14c
|
[CI/Build] Enforce style for C++ and CUDA code with clang-format (#4722)
|
2024-05-22 07:18:41 +00:00 |
|
Steve Grubb
|
dac6a3f6ed
|
[Misc] Apply a couple g++ cleanups (#4719)
|
2024-05-10 13:37:05 +00:00 |
|
youkaichao
|
20cfcdec99
|
[Core][Optimization] change python dict to pytorch tensor for blocks to swap (#4659)
|
2024-05-08 12:07:05 -07:00 |
|
youkaichao
|
63575bc2e1
|
[Core][Optimization] change python dict to pytorch tensor (#4607)
|
2024-05-06 21:30:27 -07:00 |
|
SangBin Cho
|
3521ba4f25
|
[Core][Model runner refactoring 1/N] Refactor attn metadata term (#4518)
|
2024-05-03 10:20:12 -07:00 |
|
Woosuk Kwon
|
498eb5cfa3
|
[Bugfix] Add kv_scale input parameter to CPU backend (#3840)
|
2024-04-04 04:33:08 +00:00 |
|
bigPYJ1151
|
0e3f06fe9c
|
[Hardware][Intel] Add CPU inference backend (#3634)
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
Co-authored-by: Yuan Zhou <yuan.zhou@intel.com>
|
2024-04-01 22:07:30 -07:00 |
|