316 Commits

Author SHA1 Message Date
Cyrus Leung
ac5bc615b0
[Model] MiniCPM-V/O supports V1 (#15487)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-03-27 06:07:29 -07:00
Harry Mellor
cf5c8f1686
Separate base model from TransformersModel (#15467)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-03-26 18:13:38 +08:00
Cyrus Leung
997c8811d6
[Model] Support multi-image for Molmo (#15438)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-03-26 11:26:33 +08:00
Harry Mellor
97cfa65df7
Add pipeline parallel support to TransformersModel (#12832)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
2025-03-25 10:41:45 +08:00
Manish Sethi
761702fd19
[Core] Integrate fastsafetensors loader for loading model weights (#10647)
Signed-off-by: Manish Sethi <Manish.sethi1@ibm.com>
2025-03-24 08:08:02 -07:00
Roger Wang
9c5c81b0da
[Misc][Doc] Add note regarding loading generation_config by default (#15281)
Signed-off-by: Roger Wang <ywang@roblox.com>
2025-03-23 14:00:55 -07:00
Naitong Yu
2f4bd358f1
[Model] Support Tele-FLM Model (#15023)
Signed-off-by: Naitong Yu <ntyu@baai.ac.cn>
Signed-off-by: jiangxin <horizon94@outlook.com>
Co-authored-by: Jason Fang <jasonfang3900@gmail.com>
Co-authored-by: jiangxin <horizon94@outlook.com>
2025-03-22 02:04:44 -07:00
Cyrus Leung
61f412187d
[Bugfix] Re-enable Gemma3 for V1 (#14980)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-03-18 23:58:22 -07:00
yury-tokpanov
452e8fd968
[MODEL] Add support for Zamba2 models (#13185)
Signed-off-by: Yury Tokpanov <yury@zyphra.com>
Signed-off-by: Quentin Anthony <qganthony@yahoo.com>
Co-authored-by: Quentin Anthony <qganthony@yahoo.com>
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-03-18 08:56:21 -07:00
Patrick von Platen
f863ffc965
[Mistral-Small 3.1] Update docs and tests (#14977)
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2025-03-18 03:29:42 -07:00
Roger Wang
37e3806132
[Bugfix] Make Gemma3 MM V0 only for now (#14971)
Signed-off-by: Roger Wang <ywang@roblox.com>
2025-03-17 10:04:21 -07:00
Chen Zhang
60c872d4b6
[Doc] Fix small typo in Transformers fallback (#14791)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-03-13 20:33:12 -07:00
Isotr0py
b1cc4dfef5
[VLM] Support loading InternVideo2.5 models as original InternVLChatModel (#14738)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-03-13 03:10:02 -07:00
Cyrus Leung
382403921f
[VLM] Support pan-and-scan for Gemma3 multi-modal processor (#14672)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Roger Wang <ywang@roblox.com>
2025-03-13 02:23:12 -07:00
Woosuk Kwon
c0c25e25fa
[Model] Add support for Gemma 3 (#14660)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-03-12 08:36:33 -07:00
Cyrus Leung
af295e9b01
[Bugfix] Update --hf-overrides for Alibaba-NLP/gte-Qwen2 (#14609)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-03-11 07:59:43 -07:00
Cyrus Leung
001a9c7b0d
[Doc] Update PaliGemma note to a warning (#14565)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-03-10 15:02:28 +00:00
Harry Mellor
60a98b2de5
[Docs] Mention model_impl arg when explaining Transformers fallback (#14552)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-03-10 12:13:10 +00:00
Irina Yuryeva
4f27044aab
[Doc] Correct beam_search using in generative_models.md (#14363) 2025-03-06 15:37:10 +00:00
lkchen
5d802522a7
[V1][VLM][Pixtral-HF] Support Pixtral-HF on V1 (#14275)
Signed-off-by: Linkun Chen <github@lkchen.net>
2025-03-06 08:58:41 +00:00
kYLe
1769928079
[Model] Update Paligemma multimodal processing with PromptUpdate (#14015)
Signed-off-by: Kyle Huang <kylhuang@nvidia.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-03-06 08:31:38 +00:00
Congcong Chen
0a995d5434
[Model] New model support for Phi-4-multimodal-instruct (#14119) 2025-03-04 20:57:01 -08:00
Travis Johnson
c060b71408
[Model] Add support for GraniteMoeShared models (#13313)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-03-04 08:04:52 +08:00
Harry Mellor
98175b2816
Improve the docs for TransformersModel (#14147)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-03-03 17:03:05 +00:00
Jee Jee Li
cc5e8f6db8
[Model] Add LoRA support for TransformersModel (#13770)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-03-02 09:17:34 +08:00
Isotr0py
edf309ebbe
[VLM] Support multimodal inputs for Florence-2 models (#13320) 2025-02-27 02:06:41 -08:00
Michael Goin
07c4353057
[Model] Support Grok1 (#13795)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-02-26 01:07:12 +00:00
Gabriel Marinho
1c3c975766
[FEATURE] Enables /score endpoint for embedding models (#12846) 2025-02-20 22:09:47 -08:00
Harry Mellor
992e5c3d34
Merge similar examples in offline_inference into single basic example (#12737) 2025-02-20 04:53:51 -08:00
Jee Jee Li
512368e34a
[Misc] Qwen2.5 VL support LoRA (#13261) 2025-02-19 18:37:55 -08:00
Roger Wang
fd84857f64
[Doc] Add clarification note regarding paligemma (#13511) 2025-02-18 22:24:03 -08:00
Harry Mellor
2358ca527b
[Doc]: Improve feature tables (#13224)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-02-18 18:52:39 +08:00
Isotr0py
67ef8f666a
[Model] Enable quantization support for transformers backend (#12960) 2025-02-17 19:52:47 -08:00
Roger Wang
b7d309860e
[V1] Update doc and examples for H2O-VL (#13349)
Signed-off-by: Roger Wang <ywang@roblox.com>
2025-02-16 10:35:54 +00:00
Nicolò Lucchesi
579d7a63b2
[Bugfix][Docs] Fix offline Whisper (#13274) 2025-02-14 21:32:37 -08:00
Cyrus Leung
1bc3b5e71b
[VLM] Separate text-only and vision variants of the same model architecture (#13157) 2025-02-13 06:19:15 -08:00
Cyrus Leung
c9d3ecf016
[VLM] Merged multi-modal processor for Molmo (#12966) 2025-02-13 04:34:00 -08:00
Farzad Abdolhosseini
08b2d845d6
[Model] Ultravox Model: Support v0.5 Release (#12912)
Signed-off-by: Farzad Abdolhosseini <farzad@fixie.ai>
2025-02-10 22:02:48 +00:00
Jee Jee Li
86222a3dab
[VLM] Merged multi-modal processor for GLM4V (#12449)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-02-08 20:32:16 +00:00
Jun Duan
256a2d29dc
[Doc] Correct HF repository for TeleChat2 models (#12949) 2025-02-08 01:42:15 -08:00
Sumit Vij
d88506dda4
[Model] LoRA Support for Ultravox model (#11253) 2025-02-05 19:54:13 -08:00
Cyrus Leung
75404d041b
[VLM] Update compatibility with transformers 4.49 2025-02-05 19:09:45 -08:00
Roger Wang
bf3b79efb8
[VLM] Qwen2.5-VL 2025-02-05 13:31:38 -08:00
Isotr0py
815079de8e
[VLM] merged multimodal processor and V1 support for idefics3 (#12660)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-02-04 20:00:51 +08:00
Cyrus Leung
d1ca7df84d
[VLM] Merged multi-modal processor for InternVL-based models (#12553)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
2025-02-04 16:44:52 +08:00
Arthur
a1a2aaadb9
[Model]: Add transformers backend support (#11330)
# Adds support for `transformers` as a backend

Following https://github.com/huggingface/transformers/pull/35235, a
bunch of models should already be supported, we are ramping up support
for more models.

Thanks @Isotr0py for the TP support, and @hmellor for his help as well!
This includes: 
- `trust_remote_code=True` support: any model on the hub, if it
implements attention the correct way can be natively supported!!
- tensor parallel support

---------

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <41363108+Isotr0py@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-02-03 21:30:38 +08:00
Alphi
d93bf4da85
[Model] Refactoring of MiniCPM-V and add MiniCPM-o-2.6 support for vLLM (#12069)
Signed-off-by: hzh <hezhihui_thu@163.com>
Signed-off-by: Sungjae Lee <33976427+llsj14@users.noreply.github.com>
Signed-off-by: shaochangxu.scx <shaochangxu.scx@antgroup.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Oleg Mosalov <oleg@krai.ai>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Signed-off-by: Yida Wu <yidawu@alumni.cmu.edu>
Signed-off-by: Chenguang Li <757486878@qq.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Shanshan Shen <467638484@qq.com>
Signed-off-by: elijah <f1renze.142857@gmail.com>
Signed-off-by: Yikun <yikunkero@gmail.com>
Signed-off-by: mgoin <michael@neuralmagic.com>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Konrad Zawora <kzawora@habana.ai>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
Co-authored-by: Sungjae Lee <33976427+llsj14@users.noreply.github.com>
Co-authored-by: shaochangxu <85155497+shaochangxu@users.noreply.github.com>
Co-authored-by: shaochangxu.scx <shaochangxu.scx@antgroup.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
Co-authored-by: sixgod <evethwillbeok@outlook.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: Rafael Vasquez <rafvasq21@gmail.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Akshat Tripathi <Akshat.tripathi6568@gmail.com>
Co-authored-by: Oleg Mosalov <oleg@krai.ai>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Avshalom Manevich <12231371+avshalomman@users.noreply.github.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
Co-authored-by: Yangcheng Li <liyangcheng.lyc@alibaba-inc.com>
Co-authored-by: Siyuan Li <94890248+liaoyanqing666@users.noreply.github.com>
Co-authored-by: Concurrensee <yida.wu@amd.com>
Co-authored-by: Chenguang Li <757486878@qq.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Alex Brooks <alex.brooks@ibm.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Shanshan Shen <467638484@qq.com>
Co-authored-by: elijah <30852919+e1ijah1@users.noreply.github.com>
Co-authored-by: Yikun Jiang <yikunkero@gmail.com>
Co-authored-by: Steve Luo <36296769+SunflowerAries@users.noreply.github.com>
Co-authored-by: mgoin <michael@neuralmagic.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Konrad Zawora <kzawora@habana.ai>
Co-authored-by: TJian <tunjian1996@gmail.com>
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: maang-h <55082429+maang-h@users.noreply.github.com>
Co-authored-by: Elfie Guo <164945471+elfiegg@users.noreply.github.com>
Co-authored-by: Rui Qiao <161574667+ruisearch42@users.noreply.github.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2025-01-29 09:24:59 +00:00
Harry Mellor
dd6a3a02cb
[Doc] Convert docs to use colon fences (#12471)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-01-29 11:38:29 +08:00
Cyrus Leung
8f58a51358
[VLM] Merged multi-modal processor and V1 support for Qwen-VL (#12504)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-28 16:25:05 +00:00
Isotr0py
2cbeedad09
[Docs] Document Phi-4 support (#12362)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-01-23 19:18:51 +00:00