vllm/tests/weight_loading/models.txt

gptq_marlin, robertgshaw2/zephyr-7b-beta-channelwise-gptq, main
gptq_marlin, TheBloke/Llama-2-7B-GPTQ, main
gptq_marlin, TheBloke/TinyLlama-1.1B-Chat-v1.0-GPTQ, main
gptq_marlin, TheBloke/TinyLlama-1.1B-Chat-v1.0-GPTQ, gptq-8bit--1g-actorder_True
gptq_marlin, TheBloke/TinyLlama-1.1B-Chat-v1.0-GPTQ, gptq-8bit-32g-actorder_True
gptq_marlin, TechxGenus/gemma-1.1-2b-it-GPTQ, main
gptq, robertgshaw2/zephyr-7b-beta-channelwise-gptq, main
gptq, TheBloke/Llama-2-7B-GPTQ, main
gptq, TheBloke/TinyLlama-1.1B-Chat-v1.0-GPTQ, main
gptq, TheBloke/TinyLlama-1.1B-Chat-v1.0-GPTQ, gptq-8bit--1g-actorder_True
gptq, TheBloke/TinyLlama-1.1B-Chat-v1.0-GPTQ, gptq-8bit-32g-actorder_True
gptq, TechxGenus/gemma-1.1-2b-it-GPTQ, main
compressed-tensors, nm-testing/tinyllama-oneshot-w8w8-test-static-shape-change, main
compressed-tensors, nm-testing/tinyllama-oneshot-w8-channel-a8-tensor, main
compressed-tensors, nm-testing/tinyllama-oneshot-w8a8-dynamic-token-v2, main
compressed-tensors, nm-testing/tinyllama-oneshot-w8a8-channel-dynamic-token-v2, main
compressed-tensors, nm-testing/tinyllama-oneshot-w4a16-group128-v2, main
compressed-tensors, nm-testing/tinyllama-oneshot-w8a16-per-channel, main
compressed-tensors, nm-testing/Meta-Llama-3-8B-FP8-compressed-tensors-test, main
compressed-tensors, nm-testing/Phi-3-mini-128k-instruct-FP8, main
compressed-tensors, neuralmagic/Phi-3-medium-128k-instruct-quantized.w4a16, main
compressed-tensors, nm-testing/TinyLlama-1.1B-Chat-v1.0-actorder-group, main
#compressed-tensors, mgoin/DeepSeek-Coder-V2-Lite-Instruct-FP8, main
compressed-tensors, nm-testing/SparseLlama-3.1-8B-gsm8k-pruned.2of4-FP8-Dynamic-testing, main, 90
compressed-tensors, nm-testing/SparseLlama-3.1-8B-gsm8k-pruned.2of4-W8A8-testing, main, 90
awq, casperhansen/mixtral-instruct-awq, main
awq_marlin, casperhansen/mixtral-instruct-awq, main
fp8, neuralmagic/Meta-Llama-3-8B-Instruct-FP8-KV, main
marlin, nm-testing/zephyr-beta-7b-marlin-g128, main
marlin, robertgshaw2/zephyr-7b-beta-channelwise-marlin, main
qqq, HandH1998/QQQ-Llama-3-8b-g128, main
qqq, HandH1998/QQQ-Llama-3-8b, main
hqq, nm-testing/Llama-3.2-1B-Instruct-HQQ, main
None, mgleize/fairseq2-dummy-Llama-3.2-1B, main
[Misc] Update `gptq_marlin` to use new vLLMParameters (#7281) 2024-08-13 14:30:11 -04:00			`gptq_marlin, robertgshaw2/zephyr-7b-beta-channelwise-gptq, main`
			`gptq_marlin, TheBloke/Llama-2-7B-GPTQ, main`
			`gptq_marlin, TheBloke/TinyLlama-1.1B-Chat-v1.0-GPTQ, main`
			`gptq_marlin, TheBloke/TinyLlama-1.1B-Chat-v1.0-GPTQ, gptq-8bit--1g-actorder_True`
			`gptq_marlin, TheBloke/TinyLlama-1.1B-Chat-v1.0-GPTQ, gptq-8bit-32g-actorder_True`
			`gptq_marlin, TechxGenus/gemma-1.1-2b-it-GPTQ, main`
[Misc] Update `GPTQ` to use `vLLMParameters` (#7976) 2024-09-03 17:21:44 -04:00			`gptq, robertgshaw2/zephyr-7b-beta-channelwise-gptq, main`
			`gptq, TheBloke/Llama-2-7B-GPTQ, main`
			`gptq, TheBloke/TinyLlama-1.1B-Chat-v1.0-GPTQ, main`
			`gptq, TheBloke/TinyLlama-1.1B-Chat-v1.0-GPTQ, gptq-8bit--1g-actorder_True`
			`gptq, TheBloke/TinyLlama-1.1B-Chat-v1.0-GPTQ, gptq-8bit-32g-actorder_True`
			`gptq, TechxGenus/gemma-1.1-2b-it-GPTQ, main`
[Misc] Update `gptq_marlin` to use new vLLMParameters (#7281) 2024-08-13 14:30:11 -04:00			`compressed-tensors, nm-testing/tinyllama-oneshot-w8w8-test-static-shape-change, main`
			`compressed-tensors, nm-testing/tinyllama-oneshot-w8-channel-a8-tensor, main`
			`compressed-tensors, nm-testing/tinyllama-oneshot-w8a8-dynamic-token-v2, main`
			`compressed-tensors, nm-testing/tinyllama-oneshot-w8a8-channel-dynamic-token-v2, main`
			`compressed-tensors, nm-testing/tinyllama-oneshot-w4a16-group128-v2, main`
			`compressed-tensors, nm-testing/tinyllama-oneshot-w8a16-per-channel, main`
			`compressed-tensors, nm-testing/Meta-Llama-3-8B-FP8-compressed-tensors-test, main`
			`compressed-tensors, nm-testing/Phi-3-mini-128k-instruct-FP8, main`
[Misc] Update `awq` and `awq_marlin` to use `vLLMParameters` (#7422) 2024-08-13 17:08:20 -04:00			`compressed-tensors, neuralmagic/Phi-3-medium-128k-instruct-quantized.w4a16, main`
[Misc] GPTQ Activation Ordering (#8135) 2024-09-09 16:27:26 -04:00			`compressed-tensors, nm-testing/TinyLlama-1.1B-Chat-v1.0-actorder-group, main`
[Attention] MLA decode optimizations (#12528) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Signed-off-by: simon-mo <xmo@berkeley.edu> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: simon-mo <simon.mo@hey.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Zhuohan Li <zhuohan123@gmail.com> Co-authored-by: Tyler Michael Smith <tysmith@redhat.com> Co-authored-by: Alexander Matveev <59768536+alexm-neuralmagic@users.noreply.github.com> Co-authored-by: simon-mo <xmo@berkeley.edu> 2025-01-31 02:49:37 -05:00			`#compressed-tensors, mgoin/DeepSeek-Coder-V2-Lite-Instruct-FP8, main`
[Kernel]: Cutlass 2:4 Sparsity + FP8/Int8 Quant Support (#10995) Co-authored-by: Faraz Shahsavan <faraz.shahsavan@gmail.com> Co-authored-by: ilmarkov <markovilya197@gmail.com> Co-authored-by: Rahul Tuli <rahul@neuralmagic.com> Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> 2024-12-18 09:57:16 -05:00			`compressed-tensors, nm-testing/SparseLlama-3.1-8B-gsm8k-pruned.2of4-FP8-Dynamic-testing, main, 90`
			`compressed-tensors, nm-testing/SparseLlama-3.1-8B-gsm8k-pruned.2of4-W8A8-testing, main, 90`
[Misc] Update `awq` and `awq_marlin` to use `vLLMParameters` (#7422) 2024-08-13 17:08:20 -04:00			`awq, casperhansen/mixtral-instruct-awq, main`
			`awq_marlin, casperhansen/mixtral-instruct-awq, main`
[Misc] Update `marlin` to use vLLMParameters (#7803) 2024-08-23 14:30:52 -04:00			`fp8, neuralmagic/Meta-Llama-3-8B-Instruct-FP8-KV, main`
			`marlin, nm-testing/zephyr-beta-7b-marlin-g128, main`
[Misc] Update `qqq` to use vLLMParameters (#7805) 2024-08-26 15:16:15 -04:00			`marlin, robertgshaw2/zephyr-7b-beta-channelwise-marlin, main`
			`qqq, HandH1998/QQQ-Llama-3-8b-g128, main`
[Model][Quantization] HQQ support through Marlin kernel expansion (#9766) Signed-off-by: ElizaWszola <eliza@neuralmagic.com> 2024-11-19 22:31:12 +01:00			`qqq, HandH1998/QQQ-Llama-3-8b, main`
[Model] Support for fairseq2 Llama (#11442) Signed-off-by: Martin Gleize <mgleize@meta.com> Co-authored-by: mgleize user <mgleize@a100-st-p4de24xlarge-4.fair-a100.hpcaas> 2025-01-19 19:40:40 +01:00			`hqq, nm-testing/Llama-3.2-1B-Instruct-HQQ, main`
			`None, mgleize/fairseq2-dummy-Llama-3.2-1B, main`