Dipika Sikka
|
23f322297f
|
[Misc] Remove SqueezeLLM (#8220)
|
2024-09-06 16:29:03 -06:00 |
|
Stas Bekman
|
98c12cffe5
|
[Doc] fix the autoAWQ example (#7937)
|
2024-08-28 12:12:32 +00:00 |
|
Michael Goin
|
d4f0f17b02
|
[Doc] Update quantization supported hardware table (#7595)
|
2024-08-16 13:59:27 -07:00 |
|
Michael Goin
|
b3f4e17935
|
[Doc] Add docs for llmcompressor INT8 and FP8 checkpoints (#7444)
|
2024-08-16 13:59:16 -07:00 |
|
Woosuk Kwon
|
e20233d361
|
Revert "[Doc] Update supported_hardware.rst (#7276)" (#7467)
|
2024-08-13 01:37:08 -07:00 |
|
jon-chuang
|
a046f86397
|
[Core/Bugfix] Add FP8 K/V Scale and dtype conversion for prefix/prefill Triton Kernel (#7208)
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
|
2024-08-12 22:47:41 +00:00 |
|
Michael Goin
|
6d94420246
|
[Doc] Update supported_hardware.rst (#7276)
|
2024-08-07 14:21:50 -07:00 |
|
dongmao zhang
|
87525fab92
|
[bitsandbytes]: support read bnb pre-quantized model (#5753)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2024-07-23 23:45:09 +00:00 |
|
Michael Goin
|
47f0954af0
|
[Kernel] Expand FP8 support to Ampere GPUs using FP8 Marlin (#5975)
|
2024-07-03 17:38:00 +00:00 |
|
Michael Goin
|
5b15bde539
|
[Doc] Documentation on supported hardware for quantization methods (#5745)
|
2024-06-21 12:44:29 -04:00 |
|
SangBin Cho
|
246598a6b1
|
[CI] docfix (#5410)
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: ywang96 <ywang@roblox.com>
|
2024-06-11 01:28:50 -07:00 |
|
Michael Goin
|
77c87beb06
|
[Doc] Add documentation for FP8 W8A8 (#5388)
|
2024-06-10 18:55:12 -06:00 |
|
Adrian Abeyta
|
2ff767b513
|
Enable scaled FP8 (e4m3fn) KV cache on ROCm (AMD GPU) (#3290)
Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Co-authored-by: HaiShaw <hixiao@gmail.com>
Co-authored-by: AdrianAbeyta <Adrian.Abeyta@amd.com>
Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com>
Co-authored-by: root <root@gt-pla-u18-08.pla.dcgpu>
Co-authored-by: mawong-amd <156021403+mawong-amd@users.noreply.github.com>
Co-authored-by: ttbachyinsda <ttbachyinsda@outlook.com>
Co-authored-by: guofangze <guofangze@kuaishou.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: jacobthebanana <50071502+jacobthebanana@users.noreply.github.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-04-03 14:15:55 -07:00 |
|
Simon Mo
|
f964493274
|
[CI] Ensure documentation build is checked in CI (#2842)
|
2024-02-12 22:53:07 -08:00 |
|
zhaoyang-star
|
9090bf02e7
|
Support FP8-E5M2 KV Cache (#2279)
Co-authored-by: zhaoyang <zhao.yang16@zte.com.cn>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
|
2024-01-28 16:43:54 -08:00 |
|
Simon Mo
|
4cefa9b49b
|
[Docs] Update the AWQ documentation to highlight performance issue (#1883)
|
2023-12-02 15:52:47 -08:00 |
|
Casper
|
8516999495
|
Add Quantization and AutoAWQ to docs (#1235)
|
2023-11-04 22:43:39 -07:00 |
|