Russell Bryant
|
813f249f02
|
[Docs] Fix broken link in SECURITY.md (#12175)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-01-18 04:35:21 +00:00 |
|
youkaichao
|
da02cb4b27
|
[core] further polish memory profiling (#12126)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-01-18 12:25:08 +08:00 |
|
Hongxia Yang
|
c09503ddd6
|
[AMD][CI/Build][Bugfix] use pytorch stale wheel (#12172)
Signed-off-by: hongxyan <hongxyan@amd.com>
|
2025-01-18 11:15:53 +08:00 |
|
youkaichao
|
2b83503227
|
[misc] fix cross-node TP (#12166)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-01-18 10:53:27 +08:00 |
|
youkaichao
|
7b98a65ae6
|
[torch.compile] disable logging when cache is disabled (#12043)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-01-17 20:29:31 +00:00 |
|
Gregory Shtrasberg
|
b5b57e301e
|
[AMD][FP8] Using MI300 FP8 format on ROCm for block_quant (#12134)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-01-17 17:12:26 +00:00 |
|
Kunshang Ji
|
54cacf008f
|
[Bugfix] Mistral tokenizer encode accept list of str (#12149)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2025-01-17 16:47:53 +00:00 |
|
Wallas Henrique
|
58fd57ff1d
|
[Bugfix] Fix score api for missing max_model_len validation (#12119)
Signed-off-by: Wallas Santos <wallashss@ibm.com>
|
2025-01-17 16:24:22 +00:00 |
|
youkaichao
|
87a0c076af
|
[core] allow callable in collective_rpc (#12151)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-01-17 20:47:01 +08:00 |
|
Li, Jiang
|
d4e6194570
|
[CI/Build][CPU][Bugfix] Fix CPU CI (#12150)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-01-17 19:39:52 +08:00 |
|
Jee Jee Li
|
07934cc237
|
[Misc][LoRA] Improve the readability of LoRA error messages (#12102)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-01-17 19:32:28 +08:00 |
|
Chen Zhang
|
69d765f5a5
|
[V1] Move more control of kv cache initialization from model_executor to EngineCore (#11960)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
|
2025-01-17 07:39:35 +00:00 |
|
Divakar Verma
|
8027a72461
|
[ROCm][MoE] moe tuning support for rocm (#12049)
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
|
2025-01-17 14:49:16 +08:00 |
|
Isotr0py
|
d75ab55f10
|
[Misc] Add deepseek_vl2 chat template (#12143)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-01-17 06:34:48 +00:00 |
|
Chen Zhang
|
d1adb9b403
|
[BugFix] add more is not None check in VllmConfig.__post_init__ (#12138)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-01-17 05:33:22 +00:00 |
|
Yuan Tang
|
b8bfa46a18
|
[Bugfix] Fix issues in CPU build Dockerfile (#12135)
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
|
2025-01-17 12:54:01 +08:00 |
|
Yuan Tang
|
1475847a14
|
[Doc] Add instructions on using Podman when SELinux is active (#12136)
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
|
2025-01-17 04:45:36 +00:00 |
|
Kunshang Ji
|
fead53ba78
|
[CI]add genai-perf benchmark in nightly benchmark (#10704)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2025-01-17 04:15:09 +00:00 |
|
Kuntai Du
|
ebc73f2828
|
[Bugfix] Fix a path bug in disaggregated prefill example script. (#12121)
Signed-off-by: Kuntai Du <kuntai@uchicago.edu>
|
2025-01-17 11:12:41 +08:00 |
|
Chen Zhang
|
d06e824006
|
[Bugfix] Set enforce_eager automatically for mllama (#12127)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-01-16 15:30:08 -05:00 |
|
Isotr0py
|
62b06ba23d
|
[Model] Add support for deepseek-vl2-tiny model (#12068)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-01-16 17:14:48 +00:00 |
|
Varun Sundar Rabindranath
|
5fd24ec02e
|
[misc] Add LoRA kernel micro benchmarks (#11579)
|
2025-01-16 15:51:40 +00:00 |
|
Roger Wang
|
874f7c292a
|
[Bugfix] Fix max image feature size for Llava-one-vision (#12104)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2025-01-16 14:54:06 +00:00 |
|
youkaichao
|
92e793d91a
|
[core] LLM.collective_rpc interface and RLHF example (#12084)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-01-16 20:19:52 +08:00 |
|
youkaichao
|
bf53e0c70b
|
Support torchrun and SPMD-style offline inference (#12071)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-01-16 19:58:53 +08:00 |
|
Isotr0py
|
dd7c9ad870
|
[Bugfix] Remove hardcoded head_size=256 for Deepseek v2 and v3 (#12067)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-01-16 10:11:54 +00:00 |
|
Michael Goin
|
9aa1519f08
|
Various cosmetic/comment fixes (#12089)
Signed-off-by: mgoin <michael@neuralmagic.com>
|
2025-01-16 09:59:06 +00:00 |
|
Cyrus Leung
|
f8ef146f03
|
[Doc] Add documentation for specifying model architecture (#12105)
|
2025-01-16 15:53:43 +08:00 |
|
Elfie Guo
|
fa0050db08
|
[Core] Default to using per_token quantization for fp8 when cutlass is supported. (#8651)
Signed-off-by: mgoin <michael@neuralmagic.com>
Co-authored-by: Michael Goin <mgoin@redhat.com>
Co-authored-by: mgoin <michael@neuralmagic.com>
|
2025-01-16 04:31:27 +00:00 |
|
tvirolai-amd
|
cd9d06fb8d
|
Allow hip sources to be directly included when compiling for rocm. (#12087)
|
2025-01-15 16:46:03 -05:00 |
|
Varun Sundar Rabindranath
|
ebd8c669ef
|
[Bugfix] Fix _get_lora_device for HQQ marlin (#12090)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2025-01-15 19:59:42 +00:00 |
|
Roger Wang
|
70755e819e
|
[V1][Core] Autotune encoder cache budget (#11895)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2025-01-15 11:29:00 -08:00 |
|
Joe Runde
|
edce722eaa
|
[Bugfix] use right truncation for non-generative tasks (#12050)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2025-01-16 00:31:01 +08:00 |
|
maang-h
|
57e729e874
|
[Doc]: Update OpenAI-Compatible Server documents (#12082)
|
2025-01-15 16:07:45 +00:00 |
|
kewang-xlnx
|
de0526f668
|
[Misc][Quark] Upstream Quark format to VLLM (#10765)
Signed-off-by: kewang-xlnx <kewang@xilinx.com>
Signed-off-by: kewang2 <kewang2@amd.com>
Co-authored-by: kewang2 <kewang2@amd.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2025-01-15 11:05:15 -05:00 |
|
Yuan
|
5ecf3e0aaf
|
Misc: allow to use proxy in HTTPConnection (#12042)
Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>
|
2025-01-15 13:16:40 +00:00 |
|
RunningLeon
|
97eb97b5a4
|
[Model]: Support internlm3 (#12037)
|
2025-01-15 11:35:17 +00:00 |
|
wangxiyuan
|
3adf0ffda8
|
[Platform] Do not raise error if _Backend is not found (#12023)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: Mengqing Cao <cmq0113@163.com>
Co-authored-by: Mengqing Cao <cmq0113@163.com>
|
2025-01-15 10:14:15 +00:00 |
|
Keyun Tong
|
ad388d25a8
|
Type-fix: make execute_model output type optional (#12020)
|
2025-01-15 09:44:56 +00:00 |
|
Rahul Tuli
|
cbe94391eb
|
Fix: cases with empty sparsity config (#12057)
Signed-off-by: Rahul Tuli <rahul@neuralmagic.com>
|
2025-01-15 17:41:24 +08:00 |
|
Chen Zhang
|
994fc655b7
|
[V1][Prefix Cache] Move the logic of num_computed_tokens into KVCacheManager (#12003)
|
2025-01-15 07:55:30 +00:00 |
|
Kyle Sayers
|
3f9b7ab9f5
|
[Doc] Update examples to remove SparseAutoModelForCausalLM (#12062)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
|
2025-01-15 06:36:01 +00:00 |
|
youkaichao
|
ad34c0df0f
|
[core] platform agnostic executor via collective_rpc (#11256)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-01-15 13:45:21 +08:00 |
|
Rui Qiao
|
f218f9c24d
|
[core] Turn off GPU communication overlap for Ray executor (#12051)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
|
2025-01-15 05:19:55 +00:00 |
|
Elfie Guo
|
0794e7446e
|
[Misc] Add multipstep chunked-prefill support for FlashInfer (#10467)
|
2025-01-15 12:47:49 +08:00 |
|
Woosuk Kwon
|
b7ee940a82
|
[V1][BugFix] Fix edge case in VLM scheduling (#12065)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-01-14 20:21:28 -08:00 |
|
Shanshan Shen
|
9ddac56311
|
[Platform] move current_memory_usage() into platform (#11369)
Signed-off-by: Shanshan Shen <467638484@qq.com>
|
2025-01-15 03:38:25 +00:00 |
|
Konrad Zawora
|
1a51b9f872
|
[HPU][Bugfix] Don't use /dev/accel/accel0 for HPU autodetection in setup.py (#12046)
Signed-off-by: Konrad Zawora <kzawora@habana.ai>
|
2025-01-15 02:59:18 +00:00 |
|
Jee Jee Li
|
42f5e7c52a
|
[Kernel] Support MulAndSilu (#11624)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-01-15 02:29:53 +00:00 |
|
Jee Jee Li
|
a3a3ee4e6f
|
[Misc] Merge bitsandbytes_stacked_params_mapping and packed_modules_mapping (#11924)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-01-15 07:49:49 +08:00 |
|