Sherlock Xu
b0925b3878
docs: Add BentoML deployment doc ( #3336 )
...
Signed-off-by: Sherlock113 <sherlockxu07@gmail.com>
2024-03-12 10:34:30 -07:00
Zhuohan Li
4c922709b6
Add distributed model executor abstraction ( #3191 )
2024-03-11 11:03:45 -07:00
Philipp Moritz
657061fdce
[docs] Add LoRA support information for models ( #3299 )
2024-03-11 00:54:51 -07:00
Roger Wang
99c3cfb83c
[Docs] Fix Unmocked Imports ( #3275 )
2024-03-08 09:58:01 -08:00
Jialun Lyu
27a7b070db
Add document for vllm paged attention kernel. ( #2978 )
2024-03-04 09:23:34 -08:00
Liangfu Chen
d0fae88114
[DOC] add setup document to support neuron backend ( #2777 )
2024-03-04 01:03:51 +00:00
Sage Moore
ce4f5a29fb
Add Automatic Prefix Caching ( #2762 )
...
Co-authored-by: ElizaWszola <eliza@neuralmagic.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-03-02 00:50:01 -08:00
Yuan Tang
49d849b3ab
docs: Add tutorial on deploying vLLM model with KServe ( #2586 )
...
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2024-03-01 11:04:14 -08:00
Ganesh Jagadeesan
a8683102cc
multi-lora documentation fix ( #3064 )
2024-02-27 21:26:15 -08:00
Woosuk Kwon
8b430d7dea
[Minor] Fix StableLMEpochForCausalLM -> StableLmForCausalLM ( #3046 )
2024-02-26 20:23:50 -08:00
张大成
48a8f4a7fd
Support Orion model ( #2539 )
...
Co-authored-by: zhangdacheng <zhangdacheng@ainirobot.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-02-26 19:17:06 -08:00
Harry Mellor
ef978fe411
Port metrics from aioprometheus
to prometheus_client
( #2730 )
2024-02-25 11:54:00 -08:00
Zhuohan Li
a9c8212895
[FIX] Add Gemma model to the doc ( #2966 )
2024-02-21 09:46:15 -08:00
Isotr0py
ab3a5a8259
Support OLMo models. ( #2832 )
2024-02-18 21:05:15 -08:00
jvmncs
8f36444c4f
multi-LoRA as extra models in OpenAI server ( #2775 )
...
how to serve the loras (mimicking the [multilora inference example](https://github.com/vllm-project/vllm/blob/main/examples/multilora_inference.py )):
```terminal
$ export LORA_PATH=~/.cache/huggingface/hub/models--yard1--llama-2-7b-sql-lora-test/
$ python -m vllm.entrypoints.api_server \
--model meta-llama/Llama-2-7b-hf \
--enable-lora \
--lora-modules sql-lora=$LORA_PATH sql-lora2=$LORA_PATH
```
the above server will list 3 separate values if the user queries `/models`: one for the base served model, and one each for the specified lora modules. in this case sql-lora and sql-lora2 point to the same underlying lora, but this need not be the case. lora config values take the same values they do in EngineArgs
no work has been done here to scope client permissions to specific models
2024-02-17 12:00:48 -08:00
Philipp Moritz
317b29de0f
Remove Yi model definition, please use LlamaForCausalLM
instead ( #2854 )
...
Co-authored-by: Roy <jasonailu87@gmail.com>
2024-02-13 14:22:22 -08:00
Simon Mo
f964493274
[CI] Ensure documentation build is checked in CI ( #2842 )
2024-02-12 22:53:07 -08:00
Philipp Moritz
4ca2c358b1
Add documentation section about LoRA ( #2834 )
2024-02-12 17:24:45 +01:00
Hongxia Yang
0580aab02f
[ROCm] support Radeon™ 7900 series (gfx1100) without using flash-attention ( #2768 )
2024-02-10 23:14:37 -08:00
Philipp Moritz
931746bc6d
Add documentation on how to do incremental builds ( #2796 )
2024-02-07 14:42:02 -08:00
Massimiliano Pronesti
5ed704ec8c
docs: fix langchain ( #2736 )
2024-02-03 18:17:55 -08:00
Fengzhe Zhou
cd9e60c76c
Add Internlm2 ( #2666 )
2024-02-01 09:27:40 -08:00
Zhuohan Li
1af090b57d
Bump up version to v0.3.0 ( #2656 )
2024-01-31 00:07:07 -08:00
zhaoyang-star
9090bf02e7
Support FP8-E5M2 KV Cache ( #2279 )
...
Co-authored-by: zhaoyang <zhao.yang16@zte.com.cn>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2024-01-28 16:43:54 -08:00
Hongxia Yang
6b7de1a030
[ROCm] add support to ROCm 6.0 and MI300 ( #2274 )
2024-01-26 12:41:10 -08:00
Junyang Lin
2832e7b9f9
fix names and license for Qwen2 ( #2589 )
2024-01-24 22:37:51 -08:00
LastWhisper
223c19224b
Fix the syntax error in the doc of supported_models ( #2584 )
2024-01-24 11:22:51 -08:00
Erfan Al-Hossami
9c1352eb57
[Feature] Simple API token authentication and pluggable middlewares ( #1106 )
2024-01-23 15:13:00 -08:00
Junyang Lin
94b5edeb53
Add qwen2 ( #2495 )
2024-01-22 14:34:21 -08:00
Hyunsung Lee
e1957c6ebd
Add StableLM3B model ( #2372 )
2024-01-16 20:32:40 -08:00
Simon
827cbcd37c
Update quickstart.rst ( #2369 )
2024-01-12 12:56:18 -08:00
Zhuohan Li
f745847ef7
[Minor] Fix the format in quick start guide related to Model Scope ( #2425 )
2024-01-11 19:44:01 -08:00
Jiaxiang
6549aef245
[DOC] Add additional comments for LLMEngine and AsyncLLMEngine ( #1011 )
2024-01-11 19:26:49 -08:00
Zhuohan Li
fd4ea8ef5c
Use NCCL instead of ray for control-plane communication to remove serialization overhead ( #2221 )
2024-01-03 11:30:22 -08:00
Shivam Thakkar
1db83e31a2
[Docs] Update installation instructions to include CUDA 11.8 xFormers ( #2246 )
2023-12-22 23:20:02 -08:00
Ronen Schaffer
c17daa9f89
[Docs] Fix broken links ( #2222 )
2023-12-20 12:43:42 -08:00
avideci
de60a3fb93
Added DeciLM-7b and DeciLM-7b-instruct ( #2062 )
2023-12-19 02:29:33 -08:00
kliuae
1b7c791d60
[ROCm] Fixes for GPTQ on ROCm ( #2180 )
2023-12-18 10:41:04 -08:00
Suhong Moon
3ec8c25cd0
[Docs] Update documentation for gpu-memory-utilization option ( #2162 )
2023-12-17 10:51:57 -08:00
Woosuk Kwon
f8c688d746
[Minor] Add Phi 2 to supported models ( #2159 )
2023-12-17 02:54:57 -08:00
Woosuk Kwon
26c52a5ea6
[Docs] Add CUDA graph support to docs ( #2148 )
2023-12-17 01:49:20 -08:00
Woosuk Kwon
b81a6a6bb3
[Docs] Add supported quantization methods to docs ( #2135 )
2023-12-15 13:29:22 -08:00
Antoni Baum
21d93c140d
Optimize Mixtral with expert parallelism ( #2090 )
2023-12-13 23:55:07 -08:00
Woosuk Kwon
096827c284
[Docs] Add notes on ROCm-supported models ( #2087 )
2023-12-13 09:45:34 -08:00
Woosuk Kwon
6565d9e33e
Update installation instruction for vLLM + CUDA 11.8 ( #2086 )
2023-12-13 09:25:59 -08:00
TJian
f375ec8440
[ROCm] Upgrade xformers version for ROCm & update doc ( #2079 )
...
Co-authored-by: miloice <jeffaw99@hotmail.com>
2023-12-13 00:56:05 -08:00
Ikko Eltociear Ashimine
c0ce15dfb2
Update run_on_sky.rst ( #2025 )
...
sharable -> shareable
2023-12-11 10:32:58 -08:00
Woosuk Kwon
4ff0203987
Minor fixes for Mixtral ( #2015 )
2023-12-11 09:16:15 -08:00
Simon Mo
c85b80c2b6
[Docker] Add cuda arch list as build option ( #1950 )
2023-12-08 09:53:47 -08:00
TJian
6ccc0bfffb
Merge EmbeddedLLM/vllm-rocm into vLLM main ( #1836 )
...
Co-authored-by: Philipp Moritz <pcmoritz@gmail.com>
Co-authored-by: Amir Balwel <amoooori04@gmail.com>
Co-authored-by: root <kuanfu.liu@akirakan.com>
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: kuanfu <kuanfu.liu@embeddedllm.com>
Co-authored-by: miloice <17350011+kliuae@users.noreply.github.com>
2023-12-07 23:16:52 -08:00