20231088/vllm - vllm - Luminance Code Repo

20231088/vllm

Author	SHA1	Message	Date
Dipika Sikka	23f322297f	[Misc] Remove `SqueezeLLM` (#8220 )	2024-09-06 16:29:03 -06:00
Stas Bekman	98c12cffe5	[Doc] fix the autoAWQ example (#7937 )	2024-08-28 12:12:32 +00:00
Michael Goin	d4f0f17b02	[Doc] Update quantization supported hardware table (#7595 )	2024-08-16 13:59:27 -07:00
Michael Goin	b3f4e17935	[Doc] Add docs for llmcompressor INT8 and FP8 checkpoints (#7444 )	2024-08-16 13:59:16 -07:00
Woosuk Kwon	e20233d361	Revert "[Doc] Update supported_hardware.rst (#7276 )" (#7467 )	2024-08-13 01:37:08 -07:00
jon-chuang	a046f86397	[Core/Bugfix] Add FP8 K/V Scale and dtype conversion for prefix/prefill Triton Kernel (#7208 ) Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>	2024-08-12 22:47:41 +00:00
Michael Goin	6d94420246	[Doc] Update supported_hardware.rst (#7276 )	2024-08-07 14:21:50 -07:00
dongmao zhang	87525fab92	[bitsandbytes]: support read bnb pre-quantized model (#5753 ) Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-07-23 23:45:09 +00:00
Michael Goin	47f0954af0	[Kernel] Expand FP8 support to Ampere GPUs using FP8 Marlin (#5975 )	2024-07-03 17:38:00 +00:00
Michael Goin	5b15bde539	[Doc] Documentation on supported hardware for quantization methods (#5745 )	2024-06-21 12:44:29 -04:00
SangBin Cho	246598a6b1	[CI] docfix (#5410 ) Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: ywang96 <ywang@roblox.com>	2024-06-11 01:28:50 -07:00
Michael Goin	77c87beb06	[Doc] Add documentation for FP8 W8A8 (#5388 )	2024-06-10 18:55:12 -06:00
Adrian Abeyta	2ff767b513	Enable scaled FP8 (e4m3fn) KV cache on ROCm (AMD GPU) (#3290 ) Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Co-authored-by: HaiShaw <hixiao@gmail.com> Co-authored-by: AdrianAbeyta <Adrian.Abeyta@amd.com> Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com> Co-authored-by: root <root@gt-pla-u18-08.pla.dcgpu> Co-authored-by: mawong-amd <156021403+mawong-amd@users.noreply.github.com> Co-authored-by: ttbachyinsda <ttbachyinsda@outlook.com> Co-authored-by: guofangze <guofangze@kuaishou.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: jacobthebanana <50071502+jacobthebanana@users.noreply.github.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2024-04-03 14:15:55 -07:00
Simon Mo	f964493274	[CI] Ensure documentation build is checked in CI (#2842 )	2024-02-12 22:53:07 -08:00
zhaoyang-star	9090bf02e7	Support FP8-E5M2 KV Cache (#2279 ) Co-authored-by: zhaoyang <zhao.yang16@zte.com.cn> Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>	2024-01-28 16:43:54 -08:00
Simon Mo	4cefa9b49b	[Docs] Update the AWQ documentation to highlight performance issue (#1883 )	2023-12-02 15:52:47 -08:00
Casper	8516999495	Add Quantization and AutoAWQ to docs (#1235 )	2023-11-04 22:43:39 -07:00

17 Commits