90 Commits

Author SHA1 Message Date
Philipp Moritz
4ca2c358b1
Add documentation section about LoRA (#2834) 2024-02-12 17:24:45 +01:00
Hongxia Yang
0580aab02f
[ROCm] support Radeon™ 7900 series (gfx1100) without using flash-attention (#2768) 2024-02-10 23:14:37 -08:00
Philipp Moritz
931746bc6d
Add documentation on how to do incremental builds (#2796) 2024-02-07 14:42:02 -08:00
Massimiliano Pronesti
5ed704ec8c
docs: fix langchain (#2736) 2024-02-03 18:17:55 -08:00
Fengzhe Zhou
cd9e60c76c
Add Internlm2 (#2666) 2024-02-01 09:27:40 -08:00
Zhuohan Li
1af090b57d
Bump up version to v0.3.0 (#2656) 2024-01-31 00:07:07 -08:00
zhaoyang-star
9090bf02e7
Support FP8-E5M2 KV Cache (#2279)
Co-authored-by: zhaoyang <zhao.yang16@zte.com.cn>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2024-01-28 16:43:54 -08:00
Hongxia Yang
6b7de1a030
[ROCm] add support to ROCm 6.0 and MI300 (#2274) 2024-01-26 12:41:10 -08:00
Junyang Lin
2832e7b9f9
fix names and license for Qwen2 (#2589) 2024-01-24 22:37:51 -08:00
LastWhisper
223c19224b
Fix the syntax error in the doc of supported_models (#2584) 2024-01-24 11:22:51 -08:00
Erfan Al-Hossami
9c1352eb57
[Feature] Simple API token authentication and pluggable middlewares (#1106) 2024-01-23 15:13:00 -08:00
Junyang Lin
94b5edeb53
Add qwen2 (#2495) 2024-01-22 14:34:21 -08:00
Hyunsung Lee
e1957c6ebd
Add StableLM3B model (#2372) 2024-01-16 20:32:40 -08:00
Simon
827cbcd37c
Update quickstart.rst (#2369) 2024-01-12 12:56:18 -08:00
Zhuohan Li
f745847ef7
[Minor] Fix the format in quick start guide related to Model Scope (#2425) 2024-01-11 19:44:01 -08:00
Jiaxiang
6549aef245
[DOC] Add additional comments for LLMEngine and AsyncLLMEngine (#1011) 2024-01-11 19:26:49 -08:00
Zhuohan Li
fd4ea8ef5c
Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221) 2024-01-03 11:30:22 -08:00
Shivam Thakkar
1db83e31a2
[Docs] Update installation instructions to include CUDA 11.8 xFormers (#2246) 2023-12-22 23:20:02 -08:00
Ronen Schaffer
c17daa9f89
[Docs] Fix broken links (#2222) 2023-12-20 12:43:42 -08:00
avideci
de60a3fb93
Added DeciLM-7b and DeciLM-7b-instruct (#2062) 2023-12-19 02:29:33 -08:00
kliuae
1b7c791d60
[ROCm] Fixes for GPTQ on ROCm (#2180) 2023-12-18 10:41:04 -08:00
Suhong Moon
3ec8c25cd0
[Docs] Update documentation for gpu-memory-utilization option (#2162) 2023-12-17 10:51:57 -08:00
Woosuk Kwon
f8c688d746
[Minor] Add Phi 2 to supported models (#2159) 2023-12-17 02:54:57 -08:00
Woosuk Kwon
26c52a5ea6
[Docs] Add CUDA graph support to docs (#2148) 2023-12-17 01:49:20 -08:00
Woosuk Kwon
b81a6a6bb3
[Docs] Add supported quantization methods to docs (#2135) 2023-12-15 13:29:22 -08:00
Antoni Baum
21d93c140d
Optimize Mixtral with expert parallelism (#2090) 2023-12-13 23:55:07 -08:00
Woosuk Kwon
096827c284
[Docs] Add notes on ROCm-supported models (#2087) 2023-12-13 09:45:34 -08:00
Woosuk Kwon
6565d9e33e
Update installation instruction for vLLM + CUDA 11.8 (#2086) 2023-12-13 09:25:59 -08:00
TJian
f375ec8440
[ROCm] Upgrade xformers version for ROCm & update doc (#2079)
Co-authored-by: miloice <jeffaw99@hotmail.com>
2023-12-13 00:56:05 -08:00
Ikko Eltociear Ashimine
c0ce15dfb2
Update run_on_sky.rst (#2025)
sharable -> shareable
2023-12-11 10:32:58 -08:00
Woosuk Kwon
4ff0203987
Minor fixes for Mixtral (#2015) 2023-12-11 09:16:15 -08:00
Simon Mo
c85b80c2b6
[Docker] Add cuda arch list as build option (#1950) 2023-12-08 09:53:47 -08:00
TJian
6ccc0bfffb
Merge EmbeddedLLM/vllm-rocm into vLLM main (#1836)
Co-authored-by: Philipp Moritz <pcmoritz@gmail.com>
Co-authored-by: Amir Balwel <amoooori04@gmail.com>
Co-authored-by: root <kuanfu.liu@akirakan.com>
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: kuanfu <kuanfu.liu@embeddedllm.com>
Co-authored-by: miloice <17350011+kliuae@users.noreply.github.com>
2023-12-07 23:16:52 -08:00
AguirreNicolas
24f60a54f4
[Docker] Adding number of nvcc_threads during build as envar (#1893) 2023-12-07 11:00:32 -08:00
gottlike
42c02f5892
Fix quickstart.rst typo jinja (#1964) 2023-12-07 08:34:44 -08:00
Peter Götz
d940ce497e
Fix typo in adding_model.rst (#1947)
adpated -> adapted
2023-12-06 10:04:26 -08:00
Massimiliano Pronesti
c07a442854
chore(examples-docs): upgrade to OpenAI V1 (#1785) 2023-12-03 01:11:22 -08:00
Simon Mo
5313c2cb8b
Add Production Metrics in Prometheus format (#1890) 2023-12-02 16:37:44 -08:00
Simon Mo
4cefa9b49b
[Docs] Update the AWQ documentation to highlight performance issue (#1883) 2023-12-02 15:52:47 -08:00
Woosuk Kwon
e5452ddfd6
Normalize head weights for Baichuan 2 (#1876) 2023-11-30 20:03:58 -08:00
Adam Brusselback
66785cc05c
Support chat template and echo for chat API (#1756) 2023-11-30 16:43:13 -08:00
Massimiliano Pronesti
05a38612b0
docs: add instruction for langchain (#1162) 2023-11-30 10:57:44 -08:00
Simon Mo
0f621c2c7d
[Docs] Add information about using shared memory in docker (#1845) 2023-11-29 18:33:56 -08:00
Casper
a921d8be9d
[DOCS] Add engine args documentation (#1741) 2023-11-22 12:31:27 -08:00
Wen Sun
112627e8b2
[Docs] Fix the code block's format in deploying_with_docker page (#1722) 2023-11-20 01:22:39 -08:00
Simon Mo
37c1e3c218
Documentation about official docker image (#1709) 2023-11-19 20:56:26 -08:00
Woosuk Kwon
06e9ebebd5
Add instructions to install vLLM+cu118 (#1717) 2023-11-18 23:48:58 -08:00
liuyhwangyh
edb305584b
Support download models from www.modelscope.cn (#1588) 2023-11-17 20:38:31 -08:00
Zhuohan Li
0fc280b06c
Update the adding-model doc according to the new refactor (#1692) 2023-11-16 18:46:26 -08:00
Zhuohan Li
415d109527
[Fix] Update Supported Models List (#1690) 2023-11-16 14:47:26 -08:00