20231088/vllm - vllm - Luminance Code Repo

20231088/vllm

Author	SHA1	Message	Date
Woosuk Kwon	b81a6a6bb3	[Docs] Add supported quantization methods to docs (#2135 )	2023-12-15 13:29:22 -08:00
Antoni Baum	21d93c140d	Optimize Mixtral with expert parallelism (#2090 )	2023-12-13 23:55:07 -08:00
Woosuk Kwon	31d2ab4aff	Remove python 3.10 requirement (#2040 )	2023-12-11 12:26:42 -08:00
Ram	2eaa81b236	Update README.md to add megablocks requirement for mixtral (#2033 )	2023-12-11 11:37:34 -08:00
Pierre Stock	b5f882cc98	Mixtral 8x7B support (#2011 ) Co-authored-by: Pierre Stock <p@mistral.ai> Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>	2023-12-11 01:09:15 -08:00
TJian	6ccc0bfffb	Merge EmbeddedLLM/vllm-rocm into vLLM main (#1836 ) Co-authored-by: Philipp Moritz <pcmoritz@gmail.com> Co-authored-by: Amir Balwel <amoooori04@gmail.com> Co-authored-by: root <kuanfu.liu@akirakan.com> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: kuanfu <kuanfu.liu@embeddedllm.com> Co-authored-by: miloice <17350011+kliuae@users.noreply.github.com>	2023-12-07 23:16:52 -08:00
Woosuk Kwon	e5452ddfd6	Normalize head weights for Baichuan 2 (#1876 )	2023-11-30 20:03:58 -08:00
Zhuohan Li	32c927b53f	[FIX] Update the doc link in README.md (#1730 )	2023-11-20 12:46:24 -08:00
Zhuohan Li	415d109527	[Fix] Update Supported Models List (#1690 )	2023-11-16 14:47:26 -08:00
maximzubkov	521b35f799	Support Microsoft Phi 1.5 (#1664 )	2023-11-16 14:28:39 -08:00
ldwang	6368e777a8	Add Aquila2 to README (#1331 ) Signed-off-by: ldwang <ftgreat@gmail.com> Co-authored-by: ldwang <ftgreat@gmail.com>	2023-10-12 12:11:16 -07:00
Zhuohan Li	9eed4d1f3e	Update README.md (#1292 )	2023-10-08 23:15:50 -07:00
Woosuk Kwon	202351d5bf	Add Mistral to supported model list (#1221 )	2023-09-28 14:33:04 -07:00
Woosuk Kwon	8d926e91f1	Announce the First vLLM Meetup (#1148 )	2023-09-22 11:37:14 -07:00
Zhuohan Li	c1026311b5	[Community] Add vLLM Discord server (#1086 )	2023-09-18 12:23:35 -07:00
Woosuk Kwon	eda1a7cad3	Announce paper release (#1036 )	2023-09-13 17:38:13 -07:00
Ikko Eltociear Ashimine	3272d7a0b7	Fix typo in README.md (#1033 )	2023-09-13 12:55:23 -07:00
Zhuohan Li	c128d69856	Fix README.md Link (#927 )	2023-08-31 17:18:34 -07:00
Zhuohan Li	0080d8329d	Add acknowledgement to a16z grant	2023-08-30 02:26:47 -07:00
ldwang	85ebcda94d	Fix typo of Aquila in README.md (#836 )	2023-08-22 20:48:36 -07:00
Zhuohan Li	14f9c72bfd	Update Supported Model List (#825 )	2023-08-22 11:51:44 -07:00
Zhuohan Li	f7389f4763	[Doc] Add Baichuan 13B to supported models (#656 )	2023-08-02 16:45:12 -07:00
Zhuohan Li	1b0bd0fe8a	Add Falcon support (new) (#592 )	2023-08-02 14:04:39 -07:00
Zhuohan Li	df5dd3c68e	Add Baichuan-7B to README (#494 )	2023-07-25 15:25:12 -07:00
Zhuohan Li	6fc2a38b11	Add support for LLaMA-2 (#505 )	2023-07-20 11:38:27 -07:00
Andre Slavescu	c894836108	[Model] Add support for GPT-J (#226 ) Co-authored-by: woWoosuk Kwon <woosuk.kwon@berkeley.edu>	2023-07-08 17:55:16 -07:00
Woosuk Kwon	404422f42e	[Model] Add support for MPT (#334 )	2023-07-03 16:47:53 -07:00
Woosuk Kwon	e41f06702c	Add support for BLOOM (#331 )	2023-07-03 13:12:35 -07:00
Zhanghao Wu	f72297562f	Add news for the vllm+skypilot example (#314 )	2023-06-29 12:32:37 -07:00
Zhuohan Li	2cf1a333b6	[Doc] Documentation for distributed inference (#261 )	2023-06-26 11:34:23 -07:00
Lianmin Zheng	6214dd6ce9	Update README.md (#236 )	2023-06-25 16:58:06 -07:00
Woosuk Kwon	665c48963b	[Docs] Add GPTBigCode to supported models (#213 )	2023-06-22 15:05:11 -07:00
Zhuohan Li	033f5c78f5	Remove e.g. in README (#167 )	2023-06-20 14:00:28 +08:00
Woosuk Kwon	794e578de0	[Minor] Fix URLs (#166 )	2023-06-19 22:57:14 -07:00
Zhuohan Li	fc72e39de3	Change image urls (#164 )	2023-06-20 11:15:15 +08:00
Woosuk Kwon	b7e62d3454	Fix repo & documentation URLs (#163 )	2023-06-19 20:03:40 -07:00
Woosuk Kwon	364536acd1	[Docs] Minor fix (#162 )	2023-06-19 19:58:23 -07:00
Zhuohan Li	0b32a987dd	Add and list supported models in README (#161 )	2023-06-20 10:57:46 +08:00
Zhuohan Li	a255885f83	Add logo and polish readme (#156 )	2023-06-19 16:31:13 +08:00
Woosuk Kwon	dcda03b4cb	Write README and front page of doc (#147 )	2023-06-18 03:19:38 -07:00
Woosuk Kwon	0b98ba15c7	Change the name to vLLM (#150 )	2023-06-17 03:07:40 -07:00
Woosuk Kwon	c3442c1f6f	Refactor system architecture (#109 )	2023-05-20 13:06:59 -07:00
Woosuk Kwon	7addca5935	Specify python package dependencies in requirements.txt (#78 )	2023-05-07 16:30:43 -07:00
Woosuk Kwon	c9d5b6d4a8	Replace FlashAttention with xformers (#70 )	2023-05-05 02:01:08 -07:00
Woosuk Kwon	2c5cd0defe	Add ninja to dependency (#21 )	2023-04-01 19:00:20 -07:00
Zhuohan Li	e3f00d191e	Modify README to include info on loading LLaMA (#18 )	2023-04-01 01:07:57 +08:00
Woosuk Kwon	80a2f812f1	Implement LLaMA (#9 ) Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>	2023-03-30 12:25:32 +08:00
Zhuohan Li	721fa3df15	FastAPI-based working frontend (#10 )	2023-03-29 14:48:56 +08:00
Zhuohan Li	2f49f15585	Support tensor parallel (#2 )	2023-03-21 13:45:42 -07:00
Woosuk Kwon	e9d3f2ff77	Add memory analyzer & utomatically configure KV cache size (#6 )	2023-03-11 23:23:14 -08:00

1 2

53 Commits