20231088/vllm - vllm - Luminance Code Repo

20231088/vllm

Author	SHA1	Message	Date
Isotr0py	a811dd6608	[Model] merged input processor for Phi-3-Vision models (#10977 ) Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-12-09 12:55:10 -08:00
tomeras91	395b1c7454	[Frontend] don't block event loop in tokenization (preprocess) in OpenAI compatible server (#10635 ) Signed-off-by: Tomer Asida <tomera@ai21.com>	2024-11-27 13:21:10 -08:00
Chauncey	d04b13a380	[Bug]: Authorization ignored when root_path is set (#10606 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2024-11-25 16:21:41 +00:00
Maximilien de Bayser	214efc2c3c	Support Cross encoder models (#10400 ) Signed-off-by: Max de Bayser <maxdebayser@gmail.com> Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Signed-off-by: Flavia Beo <flavia.beo@ibm.com> Co-authored-by: Flavia Beo <flavia.beo@ibm.com>	2024-11-24 18:56:20 -08:00
Varun Vinayak Shenoy	7d8ffb344f	[Bugfix] Internal Server Error when tool_choice is incorrect. (#10567 ) Signed-off-by: Varun Shenoy <varun.vinayak.shenoy@oracle.com>	2024-11-22 21:13:29 -08:00
Chauncey	da7e702c6f	[Bug]: When apply continue_final_message for OpenAI server, the "echo":false is ignored (#10180 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2024-11-21 16:24:32 +00:00
Guillaume Calmettes	c68f7ede6a	[Bugfix]: allow extra fields in requests to openai compatible server (#10463 ) Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>	2024-11-20 16:42:21 -05:00
Cyrus Leung	32e46e000f	[Frontend] Automatic detection of chat content format from AST (#9919 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-11-16 13:35:40 +08:00
Mike Depinet	f67ce05d0b	[Frontend] Pythonic tool parser (#9859 ) Signed-off-by: Mike Depinet <mike@fixie.ai>	2024-11-14 04:14:34 +00:00
Robert Shaw	6ace6fba2c	[V1] `AsyncLLM` Implementation (#9826 ) Signed-off-by: Nick Hill <nickhill@us.ibm.com> Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nickhill@us.ibm.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Nick Hill <nhill@redhat.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2024-11-11 23:05:38 +00:00
litianjian	28b2877d30	Online video support for VLMs (#10020 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: litianjian <litianjian@bytedance.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-11-07 20:25:59 +00:00
tomeras91	ac04a97a9f	[Frontend] Add max_tokens prometheus metric (#9881 ) Signed-off-by: Tomer Asida <tomera@ai21.com>	2024-11-04 22:53:24 +00:00
Robert Shaw	1c45f4c385	[CI] Basic Integration Test For TPU (#9968 ) Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>	2024-11-04 11:34:26 -08:00
Cyrus Leung	ba0d892074	[Frontend] Use a proper chat template for VLM2Vec (#9912 )	2024-11-01 14:09:07 +00:00
Cyrus Leung	06386a64dd	[Frontend] Chat-based Embeddings API (#9759 )	2024-11-01 08:13:35 +00:00
Joe Runde	031a7995f3	[Bugfix][Frontend] Reject guided decoding in multistep mode (#9892 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2024-11-01 01:09:46 +00:00
Guillaume Calmettes	abbfb6134d	[Misc][OpenAI] deprecate max_tokens in favor of new max_completion_tokens field for chat completion endpoint (#9837 )	2024-10-30 18:15:56 -07:00
Joe Runde	67bdf8e523	[Bugfix][Frontend] Guard against bad token ids (#9634 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2024-10-29 14:13:20 -07:00
Zhong Qishuai	ef7865b4f9	[Frontend] re-enable multi-modality input in the new beam search implementation (#9427 ) Signed-off-by: Qishuai Ferdinandzhong@gmail.com	2024-10-29 11:49:47 +00:00
Vinay R Damodaran	33bab41060	[Bugfix]: Make chat content text allow type content (#9358 ) Signed-off-by: Vinay Damodaran <vrdn@hey.com>	2024-10-24 05:05:49 +00:00
Wallas Henrique	c0292211ce	[CI/Build] Replaced some models on tests for smaller ones (#9570 ) Signed-off-by: Wallas Santos <wallashss@ibm.com>	2024-10-22 04:52:14 +00:00
youkaichao	76a5e13270	[core] move parallel sampling out from vllm core (#9302 )	2024-10-22 00:31:44 +00:00
Chen Zhang	5b59fe0f08	[Bugfix] Pass json-schema to GuidedDecodingParams and make test stronger (#9530 )	2024-10-20 00:05:02 +00:00
Cyrus Leung	051eaf6db3	[Model] Add user-configurable task for models that support both generation and embedding (#9424 )	2024-10-18 11:31:58 -07:00
Chang Su	ba30942240	[Bugfix] Fix vLLM UsageInfo and logprobs None AssertionError with empty token_ids (#9034 ) Co-authored-by: Nick Hill <nickhill@us.ibm.com>	2024-10-15 15:40:43 -07:00
Nick Hill	e9d517f276	[BugFix] Fix chat API continuous usage stats (#9357 )	2024-10-14 23:19:48 -07:00
youkaichao	cbc2ef5529	[misc] hide best_of from engine (#9261 ) Co-authored-by: Brendan Wong <bjwpokemon@gmail.com>	2024-10-10 21:30:44 -07:00
Daniele	9a94ca4a5d	[Bugfix] fix OpenAI API server startup with --disable-frontend-multiprocessing (#8537 )	2024-10-08 09:38:40 -07:00
Alex Brooks	069d3bd8d0	[Frontend] Add Early Validation For Chat Template / Tool Call Parser (#9151 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2024-10-08 14:31:26 +00:00
Brendan Wong	8c746226c9	[Frontend] API support for beam search for MQLLMEngine (#9117 )	2024-10-08 05:51:43 +00:00
Brendan Wong	168cab6bbf	[Frontend] API support for beam search (#9087 ) Co-authored-by: youkaichao <youkaichao@126.com>	2024-10-05 23:39:03 -07:00
Flávia Béo	0dcc8cbe5a	Adds truncate_prompt_tokens param for embeddings creation (#8999 ) Signed-off-by: Flavia Beo <flavia.beo@ibm.com>	2024-10-04 18:31:40 +00:00
Roger Wang	26aa325f4f	[Core][VLM] Test registration for OOT multimodal models (#8717 ) Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-10-04 10:38:25 -07:00
Joe Runde	062c89e7c9	[Frontend][Core] Move guided decoding params into sampling params (#8252 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com> Co-authored-by: Nick Hill <nickhill@us.ibm.com>	2024-10-01 09:34:25 +08:00
danieljannai21	6c9ba48fde	[Frontend] Added support for HF's new `continue_final_message` parameter (#8942 )	2024-09-29 17:59:47 +00:00
Nick Hill	4b377d6feb	[BugFix] Fix test breakages from transformers 4.45 upgrade (#8829 )	2024-09-26 16:46:43 -07:00
Alexander Matveev	1a2aef3e59	Add output streaming support to multi-step + async while ensuring RequestOutput obj reuse (#8335 )	2024-09-23 15:38:04 -07:00
Jiaxin Shan	260d40b5ea	[Core] Support Lora lineage and base model metadata management (#6315 )	2024-09-20 06:20:56 +00:00
Alexander Matveev	7c7714d856	[Core][Bugfix][Perf] Introduce `MQLLMEngine` to avoid `asyncio` OH (#8157 ) Co-authored-by: Nick Hill <nickhill@us.ibm.com> Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com> Co-authored-by: Simon Mo <simon.mo@hey.com>	2024-09-18 13:56:58 +00:00
Pooya Davoodi	cea95dfb94	[Frontend] Create ErrorResponse instead of raising exceptions in run_batch (#8347 )	2024-09-11 05:30:11 +00:00
Jiaxin Shan	db3bf7c991	[Core] Support load and unload LoRA in api server (#6566 ) Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2024-09-05 18:10:33 -07:00
Roger Wang	5231f0898e	[Frontend][VLM] Add support for multiple multi-modal items (#8049 )	2024-08-31 16:35:53 -07:00
Nick Hill	39178c7fbc	[Tests] Disable retries and use context manager for openai client (#7565 )	2024-08-26 21:33:17 -07:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟	0b769992ec	[Bugfix]: Use float32 for base64 embedding (#7855 ) Signed-off-by: Hollow Man <hollowman@opensuse.org>	2024-08-26 03:16:38 +00:00
Tyler Rockwood	d81abefd2e	[Frontend] add json_schema support from OpenAI protocol (#7654 )	2024-08-23 23:07:24 -07:00
Pooya Davoodi	8da48e4d95	[Frontend] Publish Prometheus metrics in run_batch API (#7641 )	2024-08-23 23:04:22 -07:00
Maximilien de Bayser	e25fee57c2	[BugFix] Fix server crash on empty prompt (#7746 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com>	2024-08-23 13:12:44 +00:00
Joe Runde	b903e1ba7f	[Frontend] error suppression cleanup (#7786 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2024-08-22 21:50:21 +00:00
Joe Runde	cde9183b40	[Bug][Frontend] Improve ZMQ client robustness (#7443 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2024-08-22 02:18:11 +00:00
Peter Salas	1ca0d4f86b	[Model] Add UltravoxModel and UltravoxConfig (#7615 )	2024-08-21 22:49:39 +00:00

1 2

96 Commits