vllm/compile at c0c25e25fa93ee7c3f279abbba5597c0fafa74ee - vllm - Luminance Code Repo

20231088/vllm

History

Luka Govedič e1744502c2

[FP8] Refactor apply_fp8_linear and apply_fp8_linear_generic into an object (#14390 )

Signed-off-by: luka <luka@neuralmagic.com>

2025-03-07 05:20:16 +00:00

..

Update deprecated Python 3.8 typing (#13971 )

2025-03-02 17:34:51 -08:00

__init__.py

[torch.compile] register allreduce operations as custom ops (#8526 )

2024-09-16 22:57:57 -07:00

backend.py

[torch.compile] Fix RMSNorm + quant fusion in the non-cutlass-fp8 case, rename RedundantReshapesPass to NoopEliminationPass (#10902 )

2025-02-28 16:20:11 -07:00

test_basic_correctness.py

Update deprecated Python 3.8 typing (#13971 )

2025-03-02 17:34:51 -08:00

test_full_graph.py

[Misc] Add SPDX-License-Identifier headers to python source files (#12628 )

2025-02-02 11:58:18 -08:00

test_functionalization.py

[torch.compile] Fix RMSNorm + quant fusion in the non-cutlass-fp8 case, rename RedundantReshapesPass to NoopEliminationPass (#10902 )

2025-02-28 16:20:11 -07:00

test_fusion.py

[FP8] Refactor apply_fp8_linear and apply_fp8_linear_generic into an object (#14390 )

2025-03-07 05:20:16 +00:00

test_pass_manager.py

[Misc] Add SPDX-License-Identifier headers to python source files (#12628 )

2025-02-02 11:58:18 -08:00

test_wrapper.py

[Misc] Add SPDX-License-Identifier headers to python source files (#12628 )

2025-02-02 11:58:18 -08:00

utils.py

Consolidate Llama model usage in tests (#13094 )

2025-02-13 22:18:03 -08:00