Russell Bryant
|
3be5b26a76
|
[CI/Build] Add shell script linting using shellcheck (#7925)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2024-11-07 18:17:29 +00:00 |
|
Russell Bryant
|
de0e61a323
|
[CI/Build] Always run mypy (#10122)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2024-11-07 16:43:16 +00:00 |
|
Nicolò Lucchesi
|
9d43afcc53
|
[Feature] [Spec decode]: Combine chunked prefill with speculative decoding (#9291)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2024-11-07 08:15:14 -08:00 |
|
Maximilien de Bayser
|
ae62fd17c0
|
[Frontend] Tool calling parser for Granite 3.0 models (#9027)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
|
2024-11-07 07:09:02 -08:00 |
|
Atlas
|
a62bc0109c
|
[Misc] Add Gamma-Distribution Request Generation Support for Serving Benchmark. (#10105)
Signed-off-by: Mozhou <spli161006@gmail.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
|
2024-11-07 11:20:30 +00:00 |
|
Jiahao Li
|
999df95b4e
|
[Bugfix] Make image processor respect mm_processor_kwargs for Qwen2-VL (#10112)
Signed-off-by: Jiahao Li <liplus17@163.com>
|
2024-11-07 10:50:44 +00:00 |
|
Li, Jiang
|
a6f332d0d9
|
[Hardware][CPU][bugfix] Fix half dtype support on AVX2-only target (#10108)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2024-11-07 18:42:50 +08:00 |
|
Lei Yang
|
0dfba97b42
|
[Frontend] Fix multiple values for keyword argument error (#10075) (#10076)
Signed-off-by: Lei <ylxx@live.com>
|
2024-11-07 09:07:19 +00:00 |
|
Flávia Béo
|
aa9078fa03
|
Adds method to read the pooling types from model's files (#9506)
Signed-off-by: Flavia Beo <flavia.beo@ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Max de Bayser <mbayser@br.ibm.com>
|
2024-11-07 08:42:40 +00:00 |
|
Russell Bryant
|
e036e527a0
|
[CI/Build] Improve mypy + python version matrix (#10041)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2024-11-07 07:54:16 +00:00 |
|
Hanzhi Zhou
|
6192e9b8fe
|
[Core][Distributed] Refactor ipc buffer init in CustomAllreduce (#10030)
Signed-off-by: Hanzhi Zhou <hanzhi713@gmail.com>
|
2024-11-06 23:50:47 -08:00 |
|
Rafael Vasquez
|
d7263a1bb8
|
Doc: Improve benchmark documentation (#9927)
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
|
2024-11-06 23:50:35 -08:00 |
|
Russell Bryant
|
104d729656
|
[CI/Build] re-add codespell to CI (#10083)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2024-11-06 22:54:46 -08:00 |
|
Cyrus Leung
|
db7db4aab9
|
[Misc] Consolidate ModelConfig code related to HF config (#10104)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-11-07 06:00:21 +00:00 |
|
Nick Hill
|
1fa020c539
|
[V1][BugFix] Fix Generator construction in greedy + seed case (#10097)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2024-11-07 05:06:57 +00:00 |
|
youkaichao
|
e7b84c394d
|
[doc] add back Python 3.8 ABI (#10100)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-06 21:06:41 -08:00 |
|
Li, Jiang
|
a4b3e0c1e9
|
[Hardware][CPU] Update torch 2.5 (#9911)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2024-11-07 04:43:08 +00:00 |
|
Nick Hill
|
29862b884b
|
[Frontend] Adjust try/except blocks in API impl (#10056)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2024-11-06 20:07:51 -08:00 |
|
Yan Ma
|
d3859f1891
|
[Misc][XPU] Upgrade to Pytorch 2.5 for xpu backend (#9823)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: yan ma <yan.ma@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
|
2024-11-06 17:29:03 -08:00 |
|
Michael Goin
|
4ab3256644
|
[Bugfix] Fix FP8 torch._scaled_mm fallback for torch>2.5 with CUDA<12.4 (#10095)
Signed-off-by: mgoin <michael@neuralmagic.com>
|
2024-11-07 00:54:13 +00:00 |
|
youkaichao
|
719c1ca468
|
[core][distributed] add stateless_init_process_group (#10072)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-06 16:42:09 -08:00 |
|
Russell Bryant
|
74f2f8a0f1
|
[CI/Build] Always run the ruff workflow (#10092)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2024-11-06 22:25:23 +00:00 |
|
Joe Runde
|
d58268c56a
|
[V1] Make v1 more testable (#9888)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2024-11-06 11:57:35 -08:00 |
|
Russell Bryant
|
87bd7e0515
|
[CI/Build] change conflict PR comment from mergify (#10080)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2024-11-06 10:15:42 -08:00 |
|
Russell Bryant
|
098f94de42
|
[CI/Build] Drop Python 3.8 support (#10038)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-11-06 14:31:01 +00:00 |
|
Michael Goin
|
399c798608
|
Remove ScaledActivation for AWQ (#10057)
Signed-off-by: mgoin <michael@neuralmagic.com>
|
2024-11-06 14:27:06 +00:00 |
|
Eric
|
406d4cc480
|
[Model][LoRA]LoRA support added for Qwen2VLForConditionalGeneration (#10022)
Signed-off-by: ericperfect <ericperfectttt@gmail.com>
|
2024-11-06 14:13:15 +00:00 |
|
Jee Jee Li
|
a5bba7d234
|
[Model] Add Idefics3 support (#9767)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: B-201 <Joy25810@foxmail.com>
Co-authored-by: B-201 <Joy25810@foxmail.com>
|
2024-11-06 11:41:17 +00:00 |
|
Jee Jee Li
|
2003cc3513
|
[Model][LoRA]LoRA support added for LlamaEmbeddingModel (#10071)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-11-06 09:49:19 +00:00 |
|
Woosuk Kwon
|
6a585a23d2
|
[Hotfix] Fix ruff errors (#10073)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-11-06 01:24:28 -08:00 |
|
Konrad Zawora
|
a02a50e6e5
|
[Hardware][Intel-Gaudi] Add Intel Gaudi (HPU) inference backend (#6143)
Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
Signed-off-by: Bob Zhu <bob.zhu@intel.com>
Signed-off-by: zehao-intel <zehao.huang@intel.com>
Signed-off-by: Konrad Zawora <kzawora@habana.ai>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
Co-authored-by: Sanju C Sudhakaran <scsudhakaran@habana.ai>
Co-authored-by: Michal Adamczyk <madamczyk@habana.ai>
Co-authored-by: Marceli Fylcek <mfylcek@habana.ai>
Co-authored-by: Himangshu Lahkar <49579433+hlahkar@users.noreply.github.com>
Co-authored-by: Vivek Goel <vgoel@habana.ai>
Co-authored-by: yuwenzho <yuwen.zhou@intel.com>
Co-authored-by: Dominika Olszewska <dolszewska@habana.ai>
Co-authored-by: barak goldberg <149692267+bgoldberg-habana@users.noreply.github.com>
Co-authored-by: Michal Szutenberg <37601244+szutenberg@users.noreply.github.com>
Co-authored-by: Jan Kaniecki <jkaniecki@habana.ai>
Co-authored-by: Agata Dobrzyniewicz <160237065+adobrzyniewicz-habana@users.noreply.github.com>
Co-authored-by: Krzysztof Wisniewski <kwisniewski@habana.ai>
Co-authored-by: Dudi Lester <160421192+dudilester@users.noreply.github.com>
Co-authored-by: Ilia Taraban <tarabanil@gmail.com>
Co-authored-by: Chendi.Xue <chendi.xue@intel.com>
Co-authored-by: Michał Kuligowski <mkuligowski@habana.ai>
Co-authored-by: Jakub Maksymczuk <jmaksymczuk@habana.ai>
Co-authored-by: Tomasz Zielinski <85164140+tzielinski-habana@users.noreply.github.com>
Co-authored-by: Sun Choi <schoi@habana.ai>
Co-authored-by: Iryna Boiko <iboiko@habana.ai>
Co-authored-by: Bob Zhu <41610754+czhu15@users.noreply.github.com>
Co-authored-by: hlin99 <73271530+hlin99@users.noreply.github.com>
Co-authored-by: Zehao Huang <zehao.huang@intel.com>
Co-authored-by: Andrzej Kotłowski <Andrzej.Kotlowski@intel.com>
Co-authored-by: Yan Tomsinsky <73292515+Yantom1@users.noreply.github.com>
Co-authored-by: Nir David <ndavid@habana.ai>
Co-authored-by: Yu-Zhou <yu.zhou@intel.com>
Co-authored-by: Ruheena Suhani Shaik <rsshaik@habana.ai>
Co-authored-by: Karol Damaszke <kdamaszke@habana.ai>
Co-authored-by: Marcin Swiniarski <mswiniarski@habana.ai>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Jacek Czaja <jacek.czaja@intel.com>
Co-authored-by: Jacek Czaja <jczaja@habana.ai>
Co-authored-by: Yuan <yuan.zhou@outlook.com>
|
2024-11-06 01:09:10 -08:00 |
|
Isotr0py
|
a5fda50a10
|
[CI/Build] Fix large_gpu_mark reason (#10070)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2024-11-06 08:50:37 +00:00 |
|
Aaron Pham
|
21063c11c7
|
[CI/Build] drop support for Python 3.8 EOL (#8464)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
|
2024-11-06 07:11:55 +00:00 |
|
youkaichao
|
4be3a45158
|
[distributed] add function to create ipc buffers directly (#10064)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-05 22:35:03 -08:00 |
|
Woosuk Kwon
|
4089985552
|
[V1] Integrate Piecewise CUDA graphs (#10058)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-11-05 22:16:04 -08:00 |
|
zifeitong
|
9d59b75593
|
[Bugfix] Remove CustomChatCompletionContentPartParam multimodal input type (#10054)
Signed-off-by: Zifei Tong <zifeitong@gmail.com>
|
2024-11-06 05:13:09 +00:00 |
|
arakowsk-amd
|
ea928f608c
|
[Bugfix] Gpt-j-6B patch kv_scale to k_scale path (#10063)
Signed-off-by: Alex Rakowski <alex.rakowski@amd.com>
Signed-off-by: Alex Rakowski <182798202+arakowsk-amd@users.noreply.github.com>
|
2024-11-06 05:10:40 +00:00 |
|
Travis Johnson
|
2bcbae704c
|
[Bugfix] Fix edge-case crash when using chat with the Mistral Tekken Tokenizer (#10051)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
|
2024-11-06 04:28:29 +00:00 |
|
Peter Salas
|
ffc0f2b47a
|
[Model][OpenVINO] Fix regressions from #8346 (#10045)
Signed-off-by: Peter Salas <peter@fixie.ai>
|
2024-11-06 04:19:15 +00:00 |
|
Cyrus Leung
|
82bfc38d07
|
[Misc] Sort the list of embedding models (#10037)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-11-06 04:05:05 +00:00 |
|
youkaichao
|
c4cacbaa7f
|
[v1] reduce graph capture time for piecewise cudagraph (#10059)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-05 18:19:50 -08:00 |
|
Sungjae Lee
|
0c63c34f72
|
[Bugfix][SpecDecode] kv corruption with bonus tokens in spec decode (#9730)
Co-authored-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>
|
2024-11-06 01:45:45 +00:00 |
|
Wallas Henrique
|
966e31697b
|
[Bugfix] Fix pickle of input when async output processing is on (#9931)
Signed-off-by: Wallas Santos <wallashss@ibm.com>
|
2024-11-06 00:39:26 +00:00 |
|
zifeitong
|
43300bd98a
|
[Bugfix] Properly propagate trust_remote_code settings (#10047)
Signed-off-by: Zifei Tong <zifeitong@gmail.com>
|
2024-11-05 16:34:40 -08:00 |
|
youkaichao
|
ca9844b340
|
[bugfix] fix weak ref in piecewise cudagraph and tractable test (#10048)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-05 14:49:20 -08:00 |
|
Michael Goin
|
235366fe2e
|
[CI] Prune back the number of tests in tests/kernels/* (#9932)
Signed-off-by: mgoin <michael@neuralmagic.com>
|
2024-11-05 16:02:32 -05:00 |
|
Michael Goin
|
02462465ea
|
[CI] Prune tests/models/decoder_only/language/* tests (#9940)
Signed-off-by: mgoin <michael@neuralmagic.com>
|
2024-11-05 16:02:23 -05:00 |
|
Jee Jee Li
|
b9c64c0ca7
|
[Misc] Modify BNB parameter name (#9997)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-11-05 14:40:08 -05:00 |
|
lkchen
|
d2e80332a7
|
[Feature] Update benchmark_throughput.py to support image input (#9851)
Signed-off-by: Linkun Chen <github+anyscale@lkchen.net>
Co-authored-by: Linkun Chen <github+anyscale@lkchen.net>
|
2024-11-05 19:30:02 +00:00 |
|
Michael Goin
|
a53046b16f
|
[Model] Support quantization of PixtralHFTransformer for PixtralHF (#9921)
Signed-off-by: mgoin <michael@neuralmagic.com>
|
2024-11-05 10:42:20 -08:00 |
|