Pavani Majety
|
efcf946a15
|
[Hardware][NV] Add support for ModelOpt static scaling checkpoints. (#6112)
|
2024-09-11 00:38:40 -04:00 |
|
Dipika Sikka
|
23f322297f
|
[Misc] Remove SqueezeLLM (#8220)
|
2024-09-06 16:29:03 -06:00 |
|
Gurpreet Singh Dhami
|
b5f49ee55b
|
Update README.md (#6847)
|
2024-07-27 00:26:45 +00:00 |
|
Cyrus Leung
|
0e9164b40a
|
[mypy] Enable type checking for test directory (#5017)
|
2024-06-15 04:45:31 +00:00 |
|
Allen.Dou
|
d74674bbd9
|
[Misc] Fix arg names (#5524)
|
2024-06-14 09:47:44 -07:00 |
|
Allen.Dou
|
55d6361b13
|
[Misc] Fix arg names in quantizer script (#5507)
|
2024-06-13 19:02:53 -07:00 |
|
Antoni Baum
|
69e1d2fb69
|
[Core] Refactor model loading code (#4097)
|
2024-04-16 11:34:39 -07:00 |
|
Cade Daniel
|
e0dd4d3589
|
[Misc] Fix linter issues in examples/fp8/quantizer/quantize.py (#3864)
|
2024-04-04 21:57:33 -07:00 |
|
Adrian Abeyta
|
2ff767b513
|
Enable scaled FP8 (e4m3fn) KV cache on ROCm (AMD GPU) (#3290)
Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Co-authored-by: HaiShaw <hixiao@gmail.com>
Co-authored-by: AdrianAbeyta <Adrian.Abeyta@amd.com>
Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com>
Co-authored-by: root <root@gt-pla-u18-08.pla.dcgpu>
Co-authored-by: mawong-amd <156021403+mawong-amd@users.noreply.github.com>
Co-authored-by: ttbachyinsda <ttbachyinsda@outlook.com>
Co-authored-by: guofangze <guofangze@kuaishou.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: jacobthebanana <50071502+jacobthebanana@users.noreply.github.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-04-03 14:15:55 -07:00 |
|