20231088/vllm - vllm - Luminance Code Repo

20231088/vllm

Author	SHA1	Message	Date
Li, Jiang	550b2801ad	[CPU][Bugfix] Using custom allreduce for CPU backend (#15934 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-04-02 07:46:47 -07:00
Thien Tran	4f044b1d67	[Kernel][CPU] CPU MLA (#14744 ) Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg>	2025-03-25 09:34:59 +00:00
Gregory Shtrasberg	e97f802b2d	[FP8][Kernel] Dynamic kv cache scaling factors computation (#11906 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Co-authored-by: Micah Williamson <micah.williamson@amd.com>	2025-01-23 18:04:03 +00:00
Lu Fang	4068f4b5b5	[MISC] Replace c10::optional with std::optional (#11730 ) Signed-off-by: Lu Fang <lufang@fb.com>	2025-01-05 10:20:34 +09:00
Li, Jiang	5eda21e773	[Hardware][CPU] compressed-tensor INT8 W8A8 AZP support (#9344 )	2024-10-17 12:21:04 -04:00
Luka Govedič	5d73ae49d6	[Kernel] AQ AZP 3/4: Asymmetric quantization kernels (#7270 )	2024-09-16 11:52:40 -07:00
bnellnm	73202dbe77	[Kernel][Misc] register ops to prevent graph breaks (#6917 ) Co-authored-by: Sage Moore <sage@neuralmagic.com>	2024-09-11 12:52:19 -07:00
Li, Jiang	0b952af458	[Hardware][Intel] Support compressed-tensor W8A8 for CPU backend (#7257 )	2024-09-11 09:46:46 -07:00
Lucas Wilkinson	a8d604ca2a	[Misc] Disambiguate quantized types via a new ScalarType (#6396 )	2024-08-02 13:51:58 -07:00
Li, Jiang	3bbb4936dc	[Hardware] [Intel] Enable Multiprocessing and tensor parallel in CPU backend and update documentation (#6125 )	2024-07-26 13:50:10 -07:00
Michael Goin	978aed5300	[Kernel][Attention] Separate `Attention.kv_scale` into `k_scale` and `v_scale` (#6081 )	2024-07-16 15:31:32 -07:00
Roger Wang	bd620b01fb	[Kernel][CPU] Add Quick `gelu` to CPU (#5717 )	2024-06-21 06:39:40 +00:00
bnellnm	5467ac3196	[Kernel][Misc] Use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (#5047 )	2024-06-09 16:23:30 -04:00

13 Commits