Varun Sundar Rabindranath
|
79455cf421
|
[Misc] Enable V1 LoRA by default (#15320)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2025-04-01 16:53:56 +08:00 |
|
Wei Zeng
|
30d6a015e0
|
[Feature] specify model in config.yaml (#15798)
Signed-off-by: weizeng <weizeng@roblox.com>
|
2025-04-01 01:20:06 -07:00 |
|
Chen Zhang
|
3a5f0afcd2
|
[V1] Implement sliding window attention in kv_cache_manager (#14097)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-04-01 00:33:17 -07:00 |
|
Yan Ma
|
ff6473980d
|
[Bugfix][Model] fix mllama multi-image (#14883)
Signed-off-by: yan ma <yan.ma@intel.com>
|
2025-03-31 22:53:37 -07:00 |
|
Harry Mellor
|
a76f547e11
|
Rename fallback model and refactor supported models section (#15829)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-03-31 22:49:41 -07:00 |
|
Ilya Markov
|
b7b7676d67
|
[Distributed] Add custom allreduce support for ROCM (#14125)
Signed-off-by: ilmarkov <imarkov@redhat.com>
Co-authored-by: ilmarkov <imarkov@redhat.com>
|
2025-03-31 22:49:12 -07:00 |
|
Mark McLoughlin
|
f98a4920f9
|
[V1][Core] Remove unused speculative config from scheduler (#15818)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-03-31 19:15:21 +00:00 |
|
Alexander Matveev
|
9a2160fa55
|
[V1] TPU CI - Add basic perf regression test (#15414)
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
|
2025-03-31 13:25:20 -04:00 |
|
shangmingc
|
239b7befdd
|
[V1][Spec Decode] Remove deprecated spec decode config params (#15466)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
|
2025-03-31 09:19:35 -07:00 |
|
Cyrus Leung
|
09e974d483
|
[Bugfix] Check dimensions of multimodal embeddings in V1 (#15816)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-31 09:01:35 -07:00 |
|
Harry Mellor
|
e5ef4fa99a
|
Upgrade transformers to v4.50.3 (#13905)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-03-31 08:59:37 -07:00 |
|
Alex Brooks
|
c2e7507ad4
|
[Bugfix] Fix Crashing When Loading Modules With Batchnorm Stats (#15813)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
|
2025-03-31 13:23:53 +00:00 |
|
Naveassaf
|
3aa2b6a637
|
[Model] Update support for NemotronNAS models (#15008)
Signed-off-by: Nave Assaf <nassaf@nvidia.com>
|
2025-03-31 20:35:14 +08:00 |
|
youkaichao
|
555aa21905
|
[V1] Fully Transparent Implementation of CPU Offloading (#15354)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-03-31 20:22:34 +08:00 |
|
Charlie Fu
|
e85829450d
|
[Feature][ROCm]Enable fusion pass for torch.compile on ROCm (#15050)
Signed-off-by: charlifu <charlifu@amd.com>
|
2025-03-31 04:42:18 -07:00 |
|
yihong
|
248e76c4df
|
fix: lint fix a ruff checkout syntax error (#15767)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
|
2025-03-30 03:36:02 -07:00 |
|
Cyrus Leung
|
803d5c35f3
|
[V1] Override mm_counts for dummy data creation (#15703)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-30 03:20:42 -07:00 |
|
pansicheng
|
7fd8c0f85c
|
fix test_phi3v (#15321)
Signed-off-by: pansicheng <sicheng.pan.chn@gmail.com>
|
2025-03-30 02:01:34 -07:00 |
|
Julien Denize
|
6909a76201
|
[Bugfix] Fix Mistral guided generation using xgrammar (#15704)
Signed-off-by: Julien Denize <julien.denize@mistral.ai>
|
2025-03-29 20:20:19 -07:00 |
|
Chauncey
|
045533716b
|
[CI] xgrammar structured output supports Enum. (#15757)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-03-29 20:20:02 -07:00 |
|
Roger Wang
|
c67abd614f
|
[V1] Support interleaved modality items (#15605)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2025-03-29 06:30:09 -07:00 |
|
TJian
|
4965ec42d2
|
[FEAT] [ROCm] Add AITER int8 scaled gemm kernel (#15433)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2025-03-29 03:33:56 -07:00 |
|
Russell Bryant
|
7a7992085b
|
[CI] Speed up V1 structured output tests (#15718)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-03-28 21:10:45 -07:00 |
|
Varun Sundar Rabindranath
|
1286211f57
|
[Bugfix] LoRA V1: add and fix entrypoints tests (#15715)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2025-03-28 21:10:41 -07:00 |
|
pengyuange
|
de1cb38769
|
[Model] Support Skywork-R1V (#15397)
Signed-off-by: jiacai.liu <932997367@qq.com>
Co-authored-by: jiacai.liu <932997367@qq.com>
|
2025-03-28 20:39:21 -07:00 |
|
Alexander Matveev
|
c3f687ac22
|
[V1] TPU - Fix the chunked prompt bug (#15713)
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
|
2025-03-28 20:19:04 +00:00 |
|
Luka Govedič
|
04437e313d
|
[Bugfix] [torch.compile] Add Dynamo metrics context during compilation (#15639)
Signed-off-by: luka <luka@neuralmagic.com>
|
2025-03-28 14:01:09 -06:00 |
|
Cyrus Leung
|
c6bc0034d0
|
[Misc] Remove unused utils and clean up imports (#15708)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-28 09:41:16 -07:00 |
|
Michael Goin
|
47e9038d23
|
Fix cpu offload testing for gptq/awq/ct (#15648)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-03-29 00:29:32 +08:00 |
|
Russell Bryant
|
7329ff5468
|
[V1] Support disable_any_whtespace for guidance backend (#15584)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-03-28 23:46:45 +08:00 |
|
Chauncey
|
3b00ff9138
|
[Bugfix][v1] xgrammar structured output supports Enum. (#15594)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-03-28 06:14:53 -07:00 |
|
Ce Gao
|
3bbaacbe15
|
[Bugfix][Frontend] Eliminate regex based check in reasoning full generator (#14821)
Signed-off-by: Ce Gao <cegao@tensorchord.ai>
|
2025-03-28 11:20:35 +00:00 |
|
Lize Cai
|
a10314c6b3
|
[Misc] Fix test_sleep to use query parameters (#14373)
Signed-off-by: Lize Cai <lize.cai@sap.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2025-03-28 18:00:14 +08:00 |
|
Ce Gao
|
32b14baf8a
|
[Refactor][Frontend] Keep all logic about reasoning into one class (#14428)
Signed-off-by: Ce Gao <cegao@tensorchord.ai>
|
2025-03-28 00:23:30 -07:00 |
|
Robert Shaw
|
2d9045fce8
|
[TPU][CI] Fix TPUModelRunner Test (#15667)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2025-03-28 00:01:26 -07:00 |
|
Cyrus Leung
|
355f66348c
|
[V1] Remove legacy input registry (#15673)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-27 23:34:34 -07:00 |
|
Robert Shaw
|
8a49eea74b
|
[CI][TPU] Temporarily Disable Quant Test on TPU (#15649)
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
|
2025-03-27 19:45:05 -07:00 |
|
Jee Jee Li
|
726efc6a32
|
[Quantization][V1] BitsAndBytes support V1 (#15611)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-03-28 10:12:47 +08:00 |
|
Nick Hill
|
15dac210f0
|
[V1] AsyncLLM data parallel (#13923)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-03-27 16:14:41 -07:00 |
|
Nicolò Lucchesi
|
4098b72210
|
[Bugfix][TPU][V1] Fix recompilation (#15553)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-03-27 19:15:06 +00:00 |
|
Cyrus Leung
|
247181536f
|
[Misc] Replace is_encoder_decoder_inputs with split_enc_dec_inputs (#15620)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-27 17:36:32 +00:00 |
|
Cody Yu
|
54aa619459
|
[V1] Refactor num_computed_tokens logic (#15307)
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-27 04:54:36 +00:00 |
|
Varun Sundar Rabindranath
|
8095341a01
|
[misc] LoRA: Remove unused long context test data (#15558)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2025-03-27 10:04:51 +08:00 |
|
ElizaWszola
|
9239bf718e
|
[Kernel] CUTLASS grouped gemm fp8 MoE kernel (#13972)
Signed-off-by: ElizaWszola <eliza@neuralmagic.com>
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Co-authored-by: Lucas Wilkinson <wilkinson.lucas@gmail.com>
|
2025-03-27 00:54:44 +00:00 |
|
Matthew Vine
|
7a6d45bc8a
|
Support FIPS enabled machines with MD5 hashing (#15299)
Signed-off-by: Matthew Vine <32849887+MattTheCuber@users.noreply.github.com>
|
2025-03-26 20:19:46 -04:00 |
|
Alexander Matveev
|
9d119a86ae
|
[V1] TPU CI - Fix test_compilation.py (#15570)
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
|
2025-03-26 21:51:54 +00:00 |
|
marko
|
27df5199d9
|
Support SHA256 as hash function in prefix caching (#15297)
Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com>
|
2025-03-26 11:11:28 -07:00 |
|
Nick Hill
|
35fad35a48
|
[V1][Sampler] Faster top-k only implementation (#15478)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-03-26 10:56:47 -07:00 |
|
Alex Brooks
|
1711b929b6
|
[Model] Add Reasoning Parser for Granite Models (#14202)
Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>
Co-authored-by: Joe Runde <joe@joerun.de>
|
2025-03-26 14:28:07 +00:00 |
|
Harry Mellor
|
cf5c8f1686
|
Separate base model from TransformersModel (#15467)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-03-26 18:13:38 +08:00 |
|