20231088/vllm - vllm - Luminance Code Repo

20231088/vllm

Author	SHA1	Message	Date
Rasmus Larsen	ea8489fce2	ROCm: Allow setting compilation target (#2581 )	2024-01-29 10:52:31 -08:00
zhaoyang-star	9090bf02e7	Support FP8-E5M2 KV Cache (#2279 ) Co-authored-by: zhaoyang <zhao.yang16@zte.com.cn> Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>	2024-01-28 16:43:54 -08:00
Hanzhi Zhou	380170038e	Implement custom all reduce kernels (#2192 )	2024-01-27 12:46:35 -08:00
Philipp Moritz	390b495ff3	Don't build punica kernels by default (#2605 )	2024-01-26 15:19:19 -08:00
Hongxia Yang	6b7de1a030	[ROCm] add support to ROCm 6.0 and MI300 (#2274 )	2024-01-26 12:41:10 -08:00
Antoni Baum	9b945daaf1	[Experimental] Add multi-LoRA support (#1804 ) Co-authored-by: Chen Shen <scv119@gmail.com> Co-authored-by: Shreyas Krishnaswamy <shrekris@anyscale.com> Co-authored-by: Avnish Narayan <avnish@anyscale.com>	2024-01-23 15:26:37 -08:00
Liangfu Chen	18473cf498	[Neuron] Add an option to build with neuron (#2065 )	2024-01-18 10:58:50 -08:00
Simon Mo	6e01e8c1c8	[CI] Add Buildkite (#2355 )	2024-01-14 12:37:58 -08:00
kliuae	1b7c791d60	[ROCm] Fixes for GPTQ on ROCm (#2180 )	2023-12-18 10:41:04 -08:00
Woosuk Kwon	2acd76f346	[ROCm] Temporarily remove GPTQ ROCm support (#2138 )	2023-12-15 17:13:58 -08:00
CHU Tianxiang	0fbfc4b81b	Add GPTQ support (#916 )	2023-12-15 03:04:22 -08:00
TJian	6ccc0bfffb	Merge EmbeddedLLM/vllm-rocm into vLLM main (#1836 ) Co-authored-by: Philipp Moritz <pcmoritz@gmail.com> Co-authored-by: Amir Balwel <amoooori04@gmail.com> Co-authored-by: root <kuanfu.liu@akirakan.com> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: kuanfu <kuanfu.liu@embeddedllm.com> Co-authored-by: miloice <17350011+kliuae@users.noreply.github.com>	2023-12-07 23:16:52 -08:00
Daya Khudia	c8e7eb1eb3	fix typo in getenv call (#1972 )	2023-12-07 16:04:41 -08:00
AguirreNicolas	24f60a54f4	[Docker] Adding number of nvcc_threads during build as envar (#1893 )	2023-12-07 11:00:32 -08:00
Yanming W	e0c6f556e8	[Build] Avoid building too many extensions (#1624 )	2023-11-23 16:31:19 -08:00
Simon Mo	5ffc0d13a2	Migrate linter from `pylint` to `ruff` (#1665 )	2023-11-20 11:58:01 -08:00
Woosuk Kwon	fd58b73a40	Build CUDA11.8 wheels for release (#1596 )	2023-11-09 03:52:29 -08:00
Stephen Krider	9cabcb7645	Add Dockerfile (#1350 )	2023-10-31 12:36:47 -07:00
Jared Roesch	79a30912b8	Add py.typed so consumers of vLLM can get type checking (#1509 ) * Add py.typed so consumers of vLLM can get type checking * Update py.typed --------- Co-authored-by: aarnphm <29749331+aarnphm@users.noreply.github.com> Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>	2023-10-30 14:50:47 -07:00
chooper1	1f24755bf8	Support SqueezeLLM (#1326 ) Co-authored-by: squeeze-ai-lab <squeezeailab.bair@gmail.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2023-10-21 23:14:59 -07:00
Woosuk Kwon	d0740dff1b	Fix error message on `TORCH_CUDA_ARCH_LIST` (#1239 ) Co-authored-by: Yunfeng Bai <yunfeng.bai@scale.com>	2023-10-14 14:47:43 -07:00
Antoni Baum	cf5cb1e33e	Allocate more shared memory to attention kernel (#1154 )	2023-09-26 22:27:13 -07:00
Woosuk Kwon	a425bd9a9a	[Setup] Enable `TORCH_CUDA_ARCH_LIST` for selecting target GPUs (#1074 )	2023-09-26 10:21:08 -07:00
Woosuk Kwon	e3e79e9e8a	Implement AWQ quantization support for LLaMA (#1032 ) Co-authored-by: Robert Irvine <robert@seamlessml.com> Co-authored-by: root <rirv938@gmail.com> Co-authored-by: Casper <casperbh.96@gmail.com> Co-authored-by: julian-q <julianhquevedo@gmail.com>	2023-09-16 00:03:37 -07:00
Woosuk Kwon	d6770d1f23	Update setup.py (#1006 )	2023-09-10 23:42:45 -07:00
Woosuk Kwon	a41c20435e	Add compute capability 8.9 to default targets (#829 )	2023-08-23 07:28:38 +09:00
Xudong Zhang	65fc1c3127	set default coompute capability according to cuda version (#773 )	2023-08-21 16:05:44 -07:00
Cody Yu	2b7d3aca2e	Update setup.py (#282 ) Co-authored-by: neubig <neubig@gmail.com>	2023-06-27 14:34:23 -07:00
Woosuk Kwon	570fb2e9cc	[PyPI] Fix package info in setup.py (#158 )	2023-06-19 18:05:01 -07:00
Woosuk Kwon	dcda03b4cb	Write README and front page of doc (#147 )	2023-06-18 03:19:38 -07:00
Woosuk Kwon	0b98ba15c7	Change the name to vLLM (#150 )	2023-06-17 03:07:40 -07:00
Woosuk Kwon	e38074b1e6	Support FP32 (#141 )	2023-06-07 00:40:21 -07:00
Woosuk Kwon	376725ce74	[PyPI] Packaging for PyPI distribution (#140 )	2023-06-05 20:03:14 -07:00
Woosuk Kwon	d721168449	Improve setup script & Add a guard for bfloat16 kernels (#130 )	2023-05-27 00:59:32 -07:00
Woosuk Kwon	7addca5935	Specify python package dependencies in requirements.txt (#78 )	2023-05-07 16:30:43 -07:00
Woosuk Kwon	e070829ae8	Support bfloat16 data type (#54 )	2023-05-03 14:09:44 -07:00
Woosuk Kwon	436e523bf1	Refactor attention kernels (#53 )	2023-05-03 13:40:13 -07:00
Woosuk Kwon	897cb2ae28	Optimize data movement (#20 )	2023-04-02 00:30:17 -07:00
Woosuk Kwon	09e9245478	Add custom kernel for RMS normalization (#16 )	2023-04-01 00:51:22 +08:00
Woosuk Kwon	88c0268a18	Implement custom kernel for LLaMA rotary embedding (#14 )	2023-03-30 11:04:21 -07:00
Woosuk Kwon	0deacbce6e	Implement `single_query_cached_kv_attention` kernel (#3 )	2023-03-01 15:02:19 -08:00
Woosuk Kwon	ffad4e1e03	cache_kernel -> cache_kernels	2023-02-16 20:05:45 +00:00
Woosuk Kwon	6f058c7ba8	Implement cache ops	2023-02-16 07:47:03 +00:00
Woosuk Kwon	3be29a1104	Add blank setup file	2023-02-09 11:37:06 +00:00

44 Commits