Harry Mellor
e78587a64c
Improve-mm-and-pooler-and-decoding-configs ( #16789 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-04-17 22:13:32 -07:00
Tarun Kumar
e37073efd7
Add property-based testing for vLLM endpoints using an API defined by an OpenAPI 3.1 schema ( #16721 )
...
Signed-off-by: Tarun Kumar <takumar@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
2025-04-17 21:08:27 -07:00
Angky William
fdcb850f14
[Misc] Enable vLLM to Dynamically Load LoRA from a Remote Server ( #10546 )
...
Signed-off-by: Angky William <angkywilliam@Angkys-MacBook-Pro.local>
Co-authored-by: Angky William <angkywilliam@Angkys-MacBook-Pro.local>
2025-04-15 22:31:38 +00:00
Ryan McConville
6c11ecf8d3
[Bugfix] Validate logit biases to prevent out of vocab ids crashing engine ( #16529 )
...
Signed-off-by: Ryan McConville <ryan@ryanmcconville.com>
2025-04-12 20:19:19 +00:00
Cyrus Leung
d9fc8cd9da
[V1] Enable multi-input by default ( #15799 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-04-12 08:52:39 +00:00
wang.yuqi
fbf722c6e6
[Frontend] support matryoshka representation / support embedding API dimensions ( #16331 )
2025-04-11 23:23:10 -07:00
Russell Bryant
9665313c39
[V1] Set structured output backend to auto
by default ( #15724 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-04-10 17:53:26 +00:00
Michael Goin
87b4ac56c2
[CI][Bugfix] Fix bad tolerance for test_batch_base64_embedding ( #16221 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-04-09 04:14:46 +00:00
Cyrus Leung
4ebc0b9640
[Bugfix] Proper input validation for multi-modal encoder-decoder models ( #16156 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-04-08 09:45:21 -07:00
Matthias Matt
cefb9e5a28
[Frontend] Implement Tool Calling with tool_choice='required'
( #13483 )
...
Signed-off-by: Liangfu Chen <liangfc@amazon.com>
Signed-off-by: Matt, Matthias <matthias.matt@tuwien.ac.at>
Co-authored-by: Liangfu Chen <liangfc@amazon.com>
Co-authored-by: mgoin <michael@neuralmagic.com>
2025-04-02 07:45:45 -07:00
Mark McLoughlin
98d7367b61
[Metrics] Hide deprecated metrics ( #15458 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-04-02 07:37:19 -07:00
Chauncey
594a8b9030
[Bugfix] Fix the issue where the model name is empty string, causing no response with the model name. ( #15938 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-04-02 06:33:52 -07:00
Eric Tang
ddb94c2605
[core] Add tags parameter to wake_up() ( #15500 )
...
Signed-off-by: Eric <erictang000@gmail.com>
2025-04-02 01:59:27 -07:00
Varun Sundar Rabindranath
79455cf421
[Misc] Enable V1 LoRA by default ( #15320 )
...
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
2025-04-01 16:53:56 +08:00
pansicheng
7fd8c0f85c
fix test_phi3v ( #15321 )
...
Signed-off-by: pansicheng <sicheng.pan.chn@gmail.com>
2025-03-30 02:01:34 -07:00
Varun Sundar Rabindranath
1286211f57
[Bugfix] LoRA V1: add and fix entrypoints tests ( #15715 )
...
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
2025-03-28 21:10:41 -07:00
Lize Cai
a10314c6b3
[Misc] Fix test_sleep to use query parameters ( #14373 )
...
Signed-off-by: Lize Cai <lize.cai@sap.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2025-03-28 18:00:14 +08:00
Ce Gao
32b14baf8a
[Refactor][Frontend] Keep all logic about reasoning into one class ( #14428 )
...
Signed-off-by: Ce Gao <cegao@tensorchord.ai>
2025-03-28 00:23:30 -07:00
Alex Brooks
1711b929b6
[Model] Add Reasoning Parser for Granite Models ( #14202 )
...
Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>
Co-authored-by: Joe Runde <joe@joerun.de>
2025-03-26 14:28:07 +00:00
Cyrus Leung
cbcdf2c609
[Bugfix] Fix chat template loading ( #15143 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: chaunceyjiang <chaunceyjiang@gmail.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2025-03-24 13:50:09 +00:00
Robin
d6cd59f122
[Frontend] Support tool calling and reasoning parser ( #14511 )
...
Signed-off-by: WangErXiao <863579016@qq.com>
2025-03-23 14:00:07 -07:00
Cyrus Leung
f690372b68
[Core] Update dtype detection and defaults ( #14858 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-03-19 13:49:33 +08:00
Sibi
a73e183e36
[Misc] Replace os environ to monkeypatch in test suite ( #14516 )
...
Signed-off-by: sibi <85477603+t-sibiraj@users.noreply.github.com>
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Aaron Pham <contact@aarnphm.xyz>
2025-03-16 20:35:57 -07:00
Jun Duan
74bc397b0a
[Core] Expose API endpoint /is_sleeping
( #14312 )
...
Signed-off-by: Jun Duan <jun.duan.phd@outlook.com>
2025-03-15 06:28:14 -07:00
Robert Shaw
d4d93db2c5
[V1] V1 Enablement Oracle ( #13726 )
...
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2025-03-14 22:02:20 -07:00
daniel-salib
73deea2fdb
[Frontend] track server_load ( #13950 )
2025-03-14 09:53:17 -07:00
Cyrus Leung
33f227e16b
[CI/Build] Use a fixed seed to avoid flaky tests ( #14480 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-03-08 11:30:09 +00:00
Harry Mellor
47512b3200
Default to generation_config
from model ( #12622 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-03-08 14:46:15 +08:00
மனோஜ்குமார் பழனிச்சாமி
cc10281498
[Misc] Set default value of seed to None ( #14274 )
...
Signed-off-by: மனோஜ்குமார் பழனிச்சாமி <smartmanoj42857@gmail.com>
2025-03-07 10:40:01 +00:00
Nicolò Lucchesi
fa82b93853
[Frontend][Docs] Transcription API streaming ( #13301 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-03-06 10:39:35 +00:00
Benjamin Chislett
32985bed7c
[Frontend] Allow return_tokens_as_token_ids to be passed as a request param ( #14066 )
...
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
2025-03-05 06:30:40 +00:00
Mark McLoughlin
ae122b1cbd
[WIP][[V1][Metrics] Implement max_num_generation_tokens, request_params_n, and request_params_max_tokens metrics ( #14055 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-03-03 19:04:45 +00:00
Harry Mellor
cf069aa8aa
Update deprecated Python 3.8 typing ( #13971 )
2025-03-02 17:34:51 -08:00
Harry Mellor
4be4b26cb7
Fix entrypoint tests for embedding models ( #14052 )
2025-02-28 08:56:44 -08:00
Harry Mellor
76c89fcadd
Use smaller embedding model when not testing model specifically ( #13891 )
2025-02-28 00:50:43 -08:00
Mark McLoughlin
cd711c48b2
[V1][Metrics] Handle preemptions ( #13169 )
2025-02-26 20:04:59 -08:00
Cyrus Leung
934bb99c71
[Bugfix] Update expected token counts for Ultravox tests ( #13895 )
2025-02-26 04:56:50 -08:00
Jee Jee Li
5157338ed9
[Misc] Improve LoRA spelling ( #13831 )
2025-02-25 23:43:01 -08:00
Mark McLoughlin
1cd981da4f
[V1][Metrics] Support vllm:cache_config_info
( #13299 )
2025-02-22 00:20:00 -08:00
Keyun Tong
0ffdf8ce0c
[HTTP Server] Make model param optional in request ( #13568 )
2025-02-21 21:55:50 -08:00
Gabriel Marinho
1c3c975766
[FEATURE] Enables /score endpoint for embedding models ( #12846 )
2025-02-20 22:09:47 -08:00
youkaichao
ba81163997
[core] add sleep and wake up endpoint and v1 support ( #12987 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: cennn <2523403608@qq.com>
Co-authored-by: cennn <2523403608@qq.com>
2025-02-20 12:41:17 +08:00
Kevin H. Luu
d5d214ac7f
[1/n][CI] Load models in CI from S3 instead of HF ( #13205 )
...
Signed-off-by: <>
Co-authored-by: EC2 Default User <ec2-user@ip-172-31-20-117.us-west-2.compute.internal>
2025-02-19 07:34:59 +00:00
Mark McLoughlin
2ad1bc7afe
[V1][Metrics] Add iteration_tokens_total histogram from V0 ( #13288 )
2025-02-15 03:56:19 -08:00
Alexander Matveev
45f90bcbba
[WIP] TPU V1 Support Refactored ( #13049 )
2025-02-14 00:21:53 -08:00
Harry Mellor
f2b20fe491
Consolidate Llama model usage in tests ( #13094 )
2025-02-13 22:18:03 -08:00
Nicolò Lucchesi
d84cef76eb
[Frontend] Add /v1/audio/transcriptions
OpenAI API endpoint ( #12909 )
2025-02-13 07:23:45 -08:00
Vaibhav Jain
37dfa60037
[Bugfix] Missing Content Type returns 500 Internal Server Error ( #13193 )
2025-02-13 06:52:22 -08:00
LikeSundayLikeRain
04f50ad9d1
[Bugfix] deepseek_r1_reasoning_parser put reason content in wrong field in certain edge case ( #13097 )
2025-02-12 23:11:26 -08:00
Mark McLoughlin
75e6e14516
[V1][Metrics] Add several request timing histograms ( #12644 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-02-11 10:14:00 -05:00