vllm/kernels at cea95dfb941878b3370a7c40ca7ab2d549524445 - vllm - Luminance Code Repo

20231088/vllm

History

Luka Govedič 7937009a7e

[Kernel] Replaced blockReduce[...] functions with cub::BlockReduce (#7233 )

Co-authored-by: Michael Goin <michael@neuralmagic.com>

2024-08-21 20:18:00 -04:00

..

benchmark_aqlm.py

[Frontend] Add FlexibleArgumentParser to support both underscore and dash in names (#5718 )

2024-06-20 17:00:13 -06:00

benchmark_layernorm.py

[Kernel] Replaced blockReduce[...] functions with cub::BlockReduce (#7233 )

2024-08-21 20:18:00 -04:00

benchmark_machete.py

[Kernel] (1/N) Machete - Hopper Optimized Mixed Precision Linear Kernel (#7174 )

2024-08-20 07:09:33 -06:00

benchmark_marlin.py

[Misc] Disambiguate quantized types via a new ScalarType (#6396 )

2024-08-02 13:51:58 -07:00

benchmark_moe.py

[Kernel] W8A16 Int8 inside FusedMoE (#7415 )

2024-08-16 10:06:51 -07:00

benchmark_paged_attention.py

[Model] H2O Danube3-4b (#6451 )

2024-07-26 20:47:50 -07:00

benchmark_quant.py

[Kernel] Replaced blockReduce[...] functions with cub::BlockReduce (#7233 )

2024-08-21 20:18:00 -04:00

benchmark_rope.py

[Model] H2O Danube3-4b (#6451 )

2024-07-26 20:47:50 -07:00

benchmark_shapes.py

Add marlin unit tests and marlin benchmark script (#4815 )

2024-05-16 09:36:49 -04:00

graph_machete_bench.py

[Kernel] (1/N) Machete - Hopper Optimized Mixed Precision Linear Kernel (#7174 )

2024-08-20 07:09:33 -06:00

weight_shapes.py

[Kernel] (1/N) Machete - Hopper Optimized Mixed Precision Linear Kernel (#7174 )

2024-08-20 07:09:33 -06:00