20231088/vllm - vllm - Luminance Code Repo

20231088/vllm

Author	SHA1	Message	Date
Cyrus Leung	6ffa3f314c	[CI/Build] Avoid CUDA initialization (#8534 )	2024-09-18 10:38:11 +00:00
Mor Zusman	7fc23be81c	[Kernel] W8A16 Int8 inside FusedMoE (#7415 )	2024-08-16 10:06:51 -07:00
Michael Goin	8065a7e220	[Frontend] Add FlexibleArgumentParser to support both underscore and dash in names (#5718 )	2024-06-20 17:00:13 -06:00
Cyrus Leung	0e9164b40a	[mypy] Enable type checking for test directory (#5017 )	2024-06-15 04:45:31 +00:00
Philipp Moritz	51a08e7d8f	[Kernel] Re-tune Mixtral MoE configurations for FP8 on H100 (#5238 )	2024-06-05 10:59:14 -07:00
Woosuk Kwon	27208be66e	[Kernel] Add back batch size 1536 and 3072 to MoE tuning (#5242 )	2024-06-04 09:58:47 -07:00
Woosuk Kwon	3a434b07ed	[Kernel] Enhance MoE benchmarking & tuning script (#4921 )	2024-06-03 20:06:59 -07:00