Michael Goin
|
22f8a69549
|
[Misc] Directly use compressed-tensors for checkpoint definitions (#8909)
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-10-15 15:40:25 -07:00 |
|
Luka Govedič
|
172d1cd276
|
[Kernel] AQ AZP 4/4: Integrate asymmetric quantization to linear method (#7271)
|
2024-09-27 14:25:10 -04:00 |
|
Li, Jiang
|
0b952af458
|
[Hardware][Intel] Support compressed-tensor W8A8 for CPU backend (#7257)
|
2024-09-11 09:46:46 -07:00 |
|
Dipika Sikka
|
fc911880cc
|
[Kernel] Expand MoE weight loading + Add Fused Marlin MoE Kernel (#7766)
Co-authored-by: ElizaWszola <eliza@neuralmagic.com>
|
2024-08-27 15:07:09 -07:00 |
|
Kyle Sayers
|
f55a9aea45
|
[Misc] Revert compressed-tensors code reuse (#7521)
|
2024-08-14 15:07:37 -07:00 |
|
Kyle Sayers
|
373538f973
|
[Misc] compressed-tensors code reuse (#7277)
|
2024-08-13 19:05:15 -04:00 |
|
Dipika Sikka
|
0f7052bc7e
|
[Misc] Refactor linear layer weight loading; introduce BasevLLMParameter and weight_loader_v2 (#5874)
|
2024-08-07 09:17:58 -07:00 |
|
Michael Goin
|
fb3db61688
|
[CI/Build] Remove sparseml requirement from testing (#7037)
|
2024-08-01 12:00:51 -07:00 |
|
Michael Goin
|
9e0b558a09
|
[Misc] Support FP8 kv cache scales from compressed-tensors (#6528)
|
2024-07-23 04:11:50 +00:00 |
|
Robert Shaw
|
b675069d74
|
[ Misc ] Refactor Marlin Python Utilities (#6082)
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>
|
2024-07-11 15:40:11 +00:00 |
|
Robert Shaw
|
abfe705a02
|
[ Misc ] Support Fp8 via llm-compressor (#6110)
Co-authored-by: Robert Shaw <rshaw@neuralmagic>
|
2024-07-07 20:42:11 +00:00 |
|
Robert Shaw
|
62963d129e
|
[ Misc ] Clean Up CompressedTensorsW8A8 (#6113)
|
2024-07-03 22:50:08 +00:00 |
|
Robert Shaw
|
af9ad46fca
|
[ Misc ] Refactor w8a8 to use process_weights_after_load (Simplify Weight Loading) (#5940)
Co-authored-by: Robert Shaw <rshaw@neuralmagic>
|
2024-06-30 23:06:27 +00:00 |
|
Dipika Sikka
|
dd248f7675
|
[Misc] Update w4a16 compressed-tensors support to include w8a16 (#5794)
|
2024-06-25 19:23:35 +00:00 |
|
Dipika Sikka
|
4a30d7e3cc
|
[Misc] Add per channel support for static activation quantization; update w8a8 schemes to share base classes (#5650)
|
2024-06-19 18:06:44 -04:00 |
|
Dipika Sikka
|
95db455e7f
|
[Misc] Add channel-wise quantization support for w8a8 dynamic per token activation quantization (#5542)
|
2024-06-18 12:45:05 -04:00 |
|
Dipika Sikka
|
890d8d960b
|
[Kernel] compressed-tensors marlin 24 support (#5435)
|
2024-06-17 12:32:48 -04:00 |
|
Dipika Sikka
|
c2637a613b
|
[Kernel] w4a16 support for compressed-tensors (#5385)
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
|
2024-06-13 10:19:56 -04:00 |
|
Dipika Sikka
|
5884c2b454
|
[Misc] Update to comply with the new compressed-tensors config (#5350)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2024-06-10 03:49:46 +00:00 |
|
youkaichao
|
8ea5e44a43
|
[CI/Test] improve robustness of test (vllm_runner) (#5357)
[CI/Test] improve robustness of test by replacing del with context manager (vllm_runner) (#5357)
|
2024-06-08 08:59:20 +00:00 |
|
Dipika Sikka
|
ca3ea51bde
|
[Kernel] Dynamic Per-Token Activation Quantization (#5037)
Co-authored-by: Varun Sundar Rabindranath <varunsundar08@gmail.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2024-06-07 09:36:26 -07:00 |
|
Dipika Sikka
|
a1242324c9
|
[Kernel] Initial Activation Quantization Support (#4525)
Co-authored-by: Varun Sundar Rabindranath <varunsundar08@gmail.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2024-05-23 21:29:18 +00:00 |
|