vllm/models at 7dbe738d653b563c646883c1ae6f6df927436d01 - vllm

20231088/vllm

History

[Core] Deprecating block manager v1 and make block manager v2 default (#8704 )

Removing the block manager v1. This is the initial piece of prefix-caching-centric design. In order to achieve prefix-caching-centric design, we need to simplify the code path so that we only use v2 block manager (which has much higher performance on prefix caching).

2024-10-17 11:38:15 -05:00

adding_model.rst

[Misc] Collect model support info in a single process per model (#9233 )

2024-10-11 11:08:11 +00:00

enabling_multimodal_inputs.rst

[VLM][Core] Support profiling with multiple multi-modal inputs per prompt (#7126 )

2024-08-14 17:55:42 +00:00

engine_args.rst

[Doc][CI/Build] Update docs and tests to use vllm serve (#6431 )

2024-07-17 07:43:21 +00:00

lora.rst

[Core] Support Lora lineage and base model metadata management (#6315 )