Alexander Matveev
|
fdea8ec167
|
[V1] VLM - enable processor cache by default (#11305)
Signed-off-by: Alexander Matveev <alexm@neuralmagic.com>
|
2024-12-18 18:54:46 -05:00 |
|
Joe Runde
|
ca5f54a9b9
|
[Bugfix] fix minicpmv test (#11304)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2024-12-18 10:34:26 -08:00 |
|
Kunshang Ji
|
f954fe0e65
|
[FIX] update openai version (#11287)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2024-12-18 10:17:05 -08:00 |
|
Simon Mo
|
362cff1eb3
|
[CI][Misc] Remove Github Action Release Workflow (#11274)
|
2024-12-18 10:16:53 -08:00 |
|
Isotr0py
|
996aa70f00
|
[Bugfix] Fix broken phi3-v mm_processor_kwargs tests (#11263)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2024-12-18 10:16:40 -08:00 |
|
Dipika Sikka
|
60508ffda9
|
[Kernel]: Cutlass 2:4 Sparsity + FP8/Int8 Quant Support (#10995)
Co-authored-by: Faraz Shahsavan <faraz.shahsavan@gmail.com>
Co-authored-by: ilmarkov <markovilya197@gmail.com>
Co-authored-by: Rahul Tuli <rahul@neuralmagic.com>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
|
2024-12-18 09:57:16 -05:00 |
|
Yan Ma
|
f04e407e6b
|
[MISC][XPU]update ipex link for CI fix (#11278)
|
2024-12-17 22:34:23 -08:00 |
|
Wallas Henrique
|
8b79f9e107
|
[Bugfix] Fix guided decoding with tokenizer mode mistral (#11046)
|
2024-12-17 22:34:08 -08:00 |
|
Konrad Zawora
|
866fa4550d
|
[Bugfix] Restore support for larger block sizes (#11259)
Signed-off-by: Konrad Zawora <kzawora@habana.ai>
|
2024-12-17 16:39:07 -08:00 |
|
Cody Yu
|
bf8717ebae
|
[V1] Prefix caching for vision language models (#11187)
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
|
2024-12-17 16:37:59 -08:00 |
|
Michael Goin
|
c77eb8a33c
|
[Bugfix] Set temperature=0.7 in test_guided_choice_chat (#11264)
|
2024-12-17 16:34:06 -08:00 |
|
Joe Runde
|
2d1b9baa8f
|
[Bugfix] Fix request cancellation without polling (#11190)
|
2024-12-17 12:26:32 -08:00 |
|
Isotr0py
|
f9ecbb18bf
|
[Misc] Allow passing logits_soft_cap for xformers backend (#11252)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2024-12-17 00:37:04 -08:00 |
|
Roger Wang
|
02222a0256
|
[Misc] Kernel Benchmark for RMSNorm (#11241)
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Xiaoyu Zhang <BBuf@users.noreply.github.com>
|
2024-12-17 06:57:02 +00:00 |
|
Tyler Michael Smith
|
2bfdbf2a36
|
[V1][Core] Use weakref.finalize instead of atexit (#11242)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2024-12-16 22:11:33 -08:00 |
|
wangxiyuan
|
e88db68cf5
|
[Platform] platform agnostic for EngineArgs initialization (#11225)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2024-12-16 22:11:06 -08:00 |
|
Roger Wang
|
59c9b6ebeb
|
[V1][VLM] Proper memory profiling for image language models (#11210)
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: ywang96 <ywang@example.com>
|
2024-12-16 22:10:57 -08:00 |
|
kYLe
|
66d4b16724
|
[Frontend] Add OpenAI API support for input_audio (#11027)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-16 22:09:58 -08:00 |
|
Michael Goin
|
0064f697d3
|
[CI] Add test case with JSON schema using references + use xgrammar by default with OpenAI parse (#10935)
Signed-off-by: mgoin <michael@neuralmagic.com>
|
2024-12-17 11:39:58 +08:00 |
|
youkaichao
|
35bae114a8
|
fix gh200 tests on main (#11246)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-16 17:22:38 -08:00 |
|
youkaichao
|
88a412ed3d
|
[torch.compile] fast inductor (#11108)
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2024-12-16 16:15:22 -08:00 |
|
youkaichao
|
c301616ed2
|
[ci][tests] add gh200 tests (#11244)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-16 15:53:18 -08:00 |
|
bk-TurbaAI
|
35ffa682b1
|
[Docs] hint to enable use of GPU performance counters in profiling tools for multi-node distributed serving (#11235)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2024-12-16 22:20:39 +00:00 |
|
youkaichao
|
551603feff
|
[core] overhaul memory profiling and fix backward compatibility (#10511)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-16 13:32:25 -08:00 |
|
Varun Sundar Rabindranath
|
efbce85f4d
|
[misc] Layerwise profile updates (#10242)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2024-12-16 18:14:57 +00:00 |
|
Isotr0py
|
2ca830dbaa
|
[Doc] Reorder vision language examples in alphabet order (#11228)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2024-12-16 11:23:33 +00:00 |
|
Isotr0py
|
d927dbcd88
|
[Model] Refactor Ultravox to use merged input processor (#11198)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-12-16 10:09:53 +00:00 |
|
Jani Monoses
|
bddbbcb132
|
[Model] Support Cohere2ForCausalLM (Cohere R7B) (#11203)
|
2024-12-16 09:56:19 +00:00 |
|
cennn
|
b3b1526f03
|
WIP: [CI/Build] simplify Dockerfile build for ARM64 / GH200 (#11212)
Signed-off-by: drikster80 <ed.sealing@gmail.com>
Co-authored-by: drikster80 <ed.sealing@gmail.com>
|
2024-12-16 09:20:49 +00:00 |
|
yansh97
|
17138af7c4
|
[Bugfix] Fix the default value for temperature in ChatCompletionRequest (#11219)
|
2024-12-16 00:15:40 -08:00 |
|
chenqianfzh
|
69ba344de8
|
[Bugfix] Fix block size validation (#10938)
|
2024-12-15 16:38:40 -08:00 |
|
AlexHe99
|
da6f409246
|
Update deploying_with_k8s.rst (#10922)
|
2024-12-15 16:33:58 -08:00 |
|
Woosuk Kwon
|
25ebed2f8c
|
[V1][Minor] Cache np arange to reduce input preparation overhead (#11214)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-12-15 13:33:00 -08:00 |
|
shangmingc
|
d263bd9df7
|
[Core] Support disaggregated prefill with Mooncake Transfer Engine (#10884)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
|
2024-12-15 21:28:18 +00:00 |
|
Kuntai Du
|
38e599d6a8
|
[Doc] add documentation for disaggregated prefilling (#11197)
Signed-off-by: Kuntai Du <kuntai@uchicago.edu>
|
2024-12-15 13:31:16 -06:00 |
|
Cyrus Leung
|
96d673e0f8
|
[Bugfix] Fix error handling of unsupported sliding window (#11213)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-15 10:59:42 -07:00 |
|
Cyrus Leung
|
b10609e6a1
|
[Misc] Clean up multi-modal processor (#11207)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-15 06:30:28 +00:00 |
|
youkaichao
|
a1c02058ba
|
[torch.compile] allow tracking forward time (#11081)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-14 19:45:00 -08:00 |
|
Jee Jee Li
|
15859f2357
|
[[Misc]Upgrade bitsandbytes to the latest version 0.45.0 (#11201)
|
2024-12-15 03:03:06 +00:00 |
|
Sungjae Lee
|
886936837c
|
[Performance][Core] Optimize the performance of evictor v1 and v2 by applying a priority queue and lazy deletion (#7209)
|
2024-12-14 11:38:10 -08:00 |
|
Mark McLoughlin
|
6d917d0eeb
|
Enable mypy checking on V1 code (#11105)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2024-12-14 09:54:04 -08:00 |
|
Cyrus Leung
|
93abf23a64
|
[VLM] Fully dynamic prompt replacement in merged input processor (#11199)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-14 17:52:18 +00:00 |
|
Brad Hilton
|
9c3dadd1c9
|
[Frontend] Add logits_processors as an extra completion argument (#11150)
Signed-off-by: Brad Hilton <brad.hilton.nw@gmail.com>
|
2024-12-14 16:46:42 +00:00 |
|
Jee Jee Li
|
3cb5769883
|
[Misc] Minor improvements to the readability of PunicaWrapperBase (#11200)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-12-14 16:38:27 +00:00 |
|
Tyler Michael Smith
|
ea7bd68d10
|
[V1][Bugfix] Fix V1 TP trust-remote-code (#11182)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2024-12-14 08:21:23 +00:00 |
|
Russell Bryant
|
48259264a4
|
[Core] Update outlines and increase its threadpool size (#11140)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2024-12-14 07:46:18 +00:00 |
|
dhuangnm
|
24a3d12b82
|
update compressed-tensors to latest version (#11183)
Co-authored-by: dhuangnm <dhuang@MacBook-Pro-2.local>
|
2024-12-14 03:22:44 +00:00 |
|
Cody Yu
|
9855aea21b
|
[Bugfix][V1] Re-compute an entire block when fully cache hit (#11186)
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
|
2024-12-13 17:08:23 -08:00 |
|
Tyler Michael Smith
|
4b5b8a6a3b
|
[V1][Bugfix] Fix EngineCoreProc profile (#11185)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2024-12-13 17:02:35 -08:00 |
|
Russell Bryant
|
4863e5fba5
|
[Core] V1: Use multiprocessing by default (#11074)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2024-12-13 16:27:32 -08:00 |
|