[Doc][4/N] Reorganize API Reference (#11843)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
This commit is contained in:
parent
aba8d6ee00
commit
6cd40a5bfe
@ -38,7 +38,7 @@ steps:
|
|||||||
- pip install -r requirements-docs.txt
|
- pip install -r requirements-docs.txt
|
||||||
- SPHINXOPTS=\"-W\" make html
|
- SPHINXOPTS=\"-W\" make html
|
||||||
# Check API reference (if it fails, you may have missing mock imports)
|
# Check API reference (if it fails, you may have missing mock imports)
|
||||||
- grep \"sig sig-object py\" build/html/dev/sampling_params.html
|
- grep \"sig sig-object py\" build/html/api/params.html
|
||||||
|
|
||||||
- label: Async Engine, Inputs, Utils, Worker Test # 24min
|
- label: Async Engine, Inputs, Utils, Worker Test # 24min
|
||||||
fast_check: true
|
fast_check: true
|
||||||
|
@ -2,8 +2,8 @@
|
|||||||
# to run the OpenAI compatible server.
|
# to run the OpenAI compatible server.
|
||||||
|
|
||||||
# Please update any changes made here to
|
# Please update any changes made here to
|
||||||
# docs/source/dev/dockerfile/dockerfile.md and
|
# docs/source/contributing/dockerfile/dockerfile.md and
|
||||||
# docs/source/assets/dev/dockerfile-stages-dependency.png
|
# docs/source/assets/contributing/dockerfile-stages-dependency.png
|
||||||
|
|
||||||
ARG CUDA_VERSION=12.4.1
|
ARG CUDA_VERSION=12.4.1
|
||||||
#################### BASE BUILD IMAGE ####################
|
#################### BASE BUILD IMAGE ####################
|
||||||
|
@ -11,18 +11,8 @@ vLLM provides experimental support for multi-modal models through the {mod}`vllm
|
|||||||
Multi-modal inputs can be passed alongside text and token prompts to [supported models](#supported-mm-models)
|
Multi-modal inputs can be passed alongside text and token prompts to [supported models](#supported-mm-models)
|
||||||
via the `multi_modal_data` field in {class}`vllm.inputs.PromptType`.
|
via the `multi_modal_data` field in {class}`vllm.inputs.PromptType`.
|
||||||
|
|
||||||
Currently, vLLM only has built-in support for image data. You can extend vLLM to process additional modalities
|
|
||||||
by following [this guide](#adding-multimodal-plugin).
|
|
||||||
|
|
||||||
Looking to add your own multi-modal model? Please follow the instructions listed [here](#enabling-multimodal-inputs).
|
Looking to add your own multi-modal model? Please follow the instructions listed [here](#enabling-multimodal-inputs).
|
||||||
|
|
||||||
## Guides
|
|
||||||
|
|
||||||
```{toctree}
|
|
||||||
:maxdepth: 1
|
|
||||||
|
|
||||||
adding_multimodal_plugin
|
|
||||||
```
|
|
||||||
|
|
||||||
## Module Contents
|
## Module Contents
|
||||||
|
|
22
docs/source/api/params.md
Normal file
22
docs/source/api/params.md
Normal file
@ -0,0 +1,22 @@
|
|||||||
|
# Optional Parameters
|
||||||
|
|
||||||
|
Optional parameters for vLLM APIs.
|
||||||
|
|
||||||
|
(sampling-params)=
|
||||||
|
|
||||||
|
## Sampling Parameters
|
||||||
|
|
||||||
|
```{eval-rst}
|
||||||
|
.. autoclass:: vllm.SamplingParams
|
||||||
|
:members:
|
||||||
|
```
|
||||||
|
|
||||||
|
(pooling-params)=
|
||||||
|
|
||||||
|
## Pooling Parameters
|
||||||
|
|
||||||
|
```{eval-rst}
|
||||||
|
.. autoclass:: vllm.PoolingParams
|
||||||
|
:members:
|
||||||
|
```
|
||||||
|
|
Before Width: | Height: | Size: 115 KiB After Width: | Height: | Size: 115 KiB |
@ -17,7 +17,7 @@ The edges of the build graph represent:
|
|||||||
|
|
||||||
- `RUN --mount=(.\*)from=...` dependencies (with a dotted line and an empty diamond arrow head)
|
- `RUN --mount=(.\*)from=...` dependencies (with a dotted line and an empty diamond arrow head)
|
||||||
|
|
||||||
> ```{figure} ../../assets/dev/dockerfile-stages-dependency.png
|
> ```{figure} /assets/contributing/dockerfile-stages-dependency.png
|
||||||
> :align: center
|
> :align: center
|
||||||
> :alt: query
|
> :alt: query
|
||||||
> :width: 100%
|
> :width: 100%
|
||||||
|
@ -53,7 +53,7 @@ for output in outputs:
|
|||||||
```
|
```
|
||||||
|
|
||||||
More API details can be found in the {doc}`Offline Inference
|
More API details can be found in the {doc}`Offline Inference
|
||||||
</dev/offline_inference/offline_index>` section of the API docs.
|
</api/offline_inference/index>` section of the API docs.
|
||||||
|
|
||||||
The code for the `LLM` class can be found in <gh-file:vllm/entrypoints/llm.py>.
|
The code for the `LLM` class can be found in <gh-file:vllm/entrypoints/llm.py>.
|
||||||
|
|
||||||
|
@ -1,16 +0,0 @@
|
|||||||
(adding-multimodal-plugin)=
|
|
||||||
|
|
||||||
# Adding a Multimodal Plugin
|
|
||||||
|
|
||||||
This document teaches you how to add a new modality to vLLM.
|
|
||||||
|
|
||||||
Each modality in vLLM is represented by a {class}`~vllm.multimodal.MultiModalPlugin` and registered to {data}`~vllm.multimodal.MULTIMODAL_REGISTRY`.
|
|
||||||
For vLLM to recognize a new modality type, you have to create a new plugin and then pass it to {meth}`~vllm.multimodal.MultiModalRegistry.register_plugin`.
|
|
||||||
|
|
||||||
The remainder of this document details how to define custom {class}`~vllm.multimodal.MultiModalPlugin` s.
|
|
||||||
|
|
||||||
```{note}
|
|
||||||
This article is a work in progress.
|
|
||||||
```
|
|
||||||
|
|
||||||
% TODO: Add more instructions on how to add new plugins once embeddings is in.
|
|
@ -1,6 +0,0 @@
|
|||||||
# Pooling Parameters
|
|
||||||
|
|
||||||
```{eval-rst}
|
|
||||||
.. autoclass:: vllm.PoolingParams
|
|
||||||
:members:
|
|
||||||
```
|
|
@ -1,6 +0,0 @@
|
|||||||
# Sampling Parameters
|
|
||||||
|
|
||||||
```{eval-rst}
|
|
||||||
.. autoclass:: vllm.SamplingParams
|
|
||||||
:members:
|
|
||||||
```
|
|
@ -42,7 +42,7 @@ The first line of this example imports the classes {class}`~vllm.LLM` and {class
|
|||||||
from vllm import LLM, SamplingParams
|
from vllm import LLM, SamplingParams
|
||||||
```
|
```
|
||||||
|
|
||||||
The next section defines a list of input prompts and sampling parameters for text generation. The [sampling temperature](https://arxiv.org/html/2402.05201v1) is set to `0.8` and the [nucleus sampling probability](https://en.wikipedia.org/wiki/Top-p_sampling) is set to `0.95`. You can find more information about the sampling parameters [here](https://docs.vllm.ai/en/stable/dev/sampling_params.html).
|
The next section defines a list of input prompts and sampling parameters for text generation. The [sampling temperature](https://arxiv.org/html/2402.05201v1) is set to `0.8` and the [nucleus sampling probability](https://en.wikipedia.org/wiki/Top-p_sampling) is set to `0.95`. You can find more information about the sampling parameters [here](#sampling-params).
|
||||||
|
|
||||||
```python
|
```python
|
||||||
prompts = [
|
prompts = [
|
||||||
|
@ -137,10 +137,10 @@ community/sponsors
|
|||||||
:caption: API Reference
|
:caption: API Reference
|
||||||
:maxdepth: 2
|
:maxdepth: 2
|
||||||
|
|
||||||
dev/sampling_params
|
api/offline_inference/index
|
||||||
dev/pooling_params
|
api/engine/index
|
||||||
dev/offline_inference/offline_index
|
api/multimodal/index
|
||||||
dev/engine/engine_index
|
api/params
|
||||||
```
|
```
|
||||||
|
|
||||||
% Design Documents: Details about vLLM internals
|
% Design Documents: Details about vLLM internals
|
||||||
@ -154,7 +154,6 @@ design/huggingface_integration
|
|||||||
design/plugin_system
|
design/plugin_system
|
||||||
design/kernel/paged_attention
|
design/kernel/paged_attention
|
||||||
design/input_processing/model_inputs_index
|
design/input_processing/model_inputs_index
|
||||||
design/multimodal/multimodal_index
|
|
||||||
design/automatic_prefix_caching
|
design/automatic_prefix_caching
|
||||||
design/multiprocessing
|
design/multiprocessing
|
||||||
```
|
```
|
||||||
|
@ -23,7 +23,7 @@ The available APIs depend on the type of model that is being run:
|
|||||||
Please refer to the above pages for more details about each API.
|
Please refer to the above pages for more details about each API.
|
||||||
|
|
||||||
```{seealso}
|
```{seealso}
|
||||||
[API Reference](/dev/offline_inference/offline_index)
|
[API Reference](/api/offline_inference/index)
|
||||||
```
|
```
|
||||||
|
|
||||||
## Configuration Options
|
## Configuration Options
|
||||||
|
@ -195,7 +195,7 @@ Code example: <gh-file:examples/online_serving/openai_completion_client.py>
|
|||||||
|
|
||||||
#### Extra parameters
|
#### Extra parameters
|
||||||
|
|
||||||
The following [sampling parameters (click through to see documentation)](../dev/sampling_params.md) are supported.
|
The following [sampling parameters](#sampling-params) are supported.
|
||||||
|
|
||||||
```{literalinclude} ../../../vllm/entrypoints/openai/protocol.py
|
```{literalinclude} ../../../vllm/entrypoints/openai/protocol.py
|
||||||
:language: python
|
:language: python
|
||||||
@ -226,7 +226,7 @@ Code example: <gh-file:examples/online_serving/openai_chat_completion_client.py>
|
|||||||
|
|
||||||
#### Extra parameters
|
#### Extra parameters
|
||||||
|
|
||||||
The following [sampling parameters (click through to see documentation)](../dev/sampling_params.md) are supported.
|
The following [sampling parameters](#sampling-params) are supported.
|
||||||
|
|
||||||
```{literalinclude} ../../../vllm/entrypoints/openai/protocol.py
|
```{literalinclude} ../../../vllm/entrypoints/openai/protocol.py
|
||||||
:language: python
|
:language: python
|
||||||
@ -259,7 +259,7 @@ Code example: <gh-file:examples/online_serving/openai_embedding_client.py>
|
|||||||
|
|
||||||
#### Extra parameters
|
#### Extra parameters
|
||||||
|
|
||||||
The following [pooling parameters (click through to see documentation)](../dev/pooling_params.md) are supported.
|
The following [pooling parameters](#pooling-params) are supported.
|
||||||
|
|
||||||
```{literalinclude} ../../../vllm/entrypoints/openai/protocol.py
|
```{literalinclude} ../../../vllm/entrypoints/openai/protocol.py
|
||||||
:language: python
|
:language: python
|
||||||
@ -447,7 +447,7 @@ Response:
|
|||||||
|
|
||||||
#### Extra parameters
|
#### Extra parameters
|
||||||
|
|
||||||
The following [pooling parameters (click through to see documentation)](../dev/pooling_params.md) are supported.
|
The following [pooling parameters](#pooling-params) are supported.
|
||||||
|
|
||||||
```{literalinclude} ../../../vllm/entrypoints/openai/protocol.py
|
```{literalinclude} ../../../vllm/entrypoints/openai/protocol.py
|
||||||
:language: python
|
:language: python
|
||||||
|
@ -49,9 +49,6 @@ class MultiModalPlugin(ABC):
|
|||||||
process the same data differently). This registry is in turn used by
|
process the same data differently). This registry is in turn used by
|
||||||
:class:`~MultiModalRegistry` which acts at a higher level
|
:class:`~MultiModalRegistry` which acts at a higher level
|
||||||
(i.e., the modality of the data).
|
(i.e., the modality of the data).
|
||||||
|
|
||||||
See also:
|
|
||||||
:ref:`adding-multimodal-plugin`
|
|
||||||
"""
|
"""
|
||||||
|
|
||||||
def __init__(self) -> None:
|
def __init__(self) -> None:
|
||||||
|
@ -99,12 +99,6 @@ class MultiModalDataBuiltins(TypedDict, total=False):
|
|||||||
MultiModalDataDict: TypeAlias = Mapping[str, ModalityData[Any]]
|
MultiModalDataDict: TypeAlias = Mapping[str, ModalityData[Any]]
|
||||||
"""
|
"""
|
||||||
A dictionary containing an entry for each modality type to input.
|
A dictionary containing an entry for each modality type to input.
|
||||||
|
|
||||||
Note:
|
|
||||||
This dictionary also accepts modality keys defined outside
|
|
||||||
:class:`MultiModalDataBuiltins` as long as a customized plugin
|
|
||||||
is registered through the :class:`~vllm.multimodal.MULTIMODAL_REGISTRY`.
|
|
||||||
Read more on that :ref:`here <adding-multimodal-plugin>`.
|
|
||||||
"""
|
"""
|
||||||
|
|
||||||
|
|
||||||
|
@ -125,9 +125,6 @@ class MultiModalRegistry:
|
|||||||
def register_plugin(self, plugin: MultiModalPlugin) -> None:
|
def register_plugin(self, plugin: MultiModalPlugin) -> None:
|
||||||
"""
|
"""
|
||||||
Register a multi-modal plugin so it can be recognized by vLLM.
|
Register a multi-modal plugin so it can be recognized by vLLM.
|
||||||
|
|
||||||
See also:
|
|
||||||
:ref:`adding-multimodal-plugin`
|
|
||||||
"""
|
"""
|
||||||
data_type_key = plugin.get_data_key()
|
data_type_key = plugin.get_data_key()
|
||||||
|
|
||||||
|
@ -7,7 +7,7 @@ class PoolingParams(
|
|||||||
msgspec.Struct,
|
msgspec.Struct,
|
||||||
omit_defaults=True, # type: ignore[call-arg]
|
omit_defaults=True, # type: ignore[call-arg]
|
||||||
array_like=True): # type: ignore[call-arg]
|
array_like=True): # type: ignore[call-arg]
|
||||||
"""Pooling parameters for embeddings API.
|
"""API parameters for pooling models. This is currently a placeholder.
|
||||||
|
|
||||||
Attributes:
|
Attributes:
|
||||||
additional_data: Any additional data needed for pooling.
|
additional_data: Any additional data needed for pooling.
|
||||||
|
Loading…
x
Reference in New Issue
Block a user