Yang Chen
58d1b2aa77
[Attention] MLA support for V1 ( #13789 )
...
Signed-off-by: Yang Chen <yangche@fb.com>
2025-02-27 13:14:17 -05:00
Lucas Wilkinson
f95903909f
[Kernel] FlashMLA integration ( #13747 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-02-27 10:35:08 +08:00
Mengqing Cao
238dfc8ac3
[MISC] tiny fixes ( #13378 )
2025-02-17 00:57:13 -08:00
youkaichao
a0231b7c25
[platform] add base class for communicators ( #13208 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-02-16 22:14:22 +08:00
Alexander Matveev
45f90bcbba
[WIP] TPU V1 Support Refactored ( #13049 )
2025-02-14 00:21:53 -08:00
wangxiyuan
2e3b969ec0
[Platform] add pre_register_and_update function ( #12432 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-02-11 22:06:46 +08:00
மனோஜ்குமார் பழனிச்சாமி
2ae889052c
Fix seed parameter behavior in vLLM ( #13007 )
...
Signed-off-by: மனோஜ்குமார் பழனிச்சாமி <smartmanoj42857@gmail.com>
2025-02-10 23:26:50 +08:00
youkaichao
bc1bdecebf
[core][distributed] exact ray placement control ( #12732 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-02-06 02:03:19 +08:00
Russell Bryant
e489ad7a21
[Misc] Add SPDX-License-Identifier headers to python source files ( #12628 )
...
- **Add SPDX license headers to python source files**
- **Check for SPDX headers using pre-commit**
commit 9d7ef44c3cfb72ca4c32e1c677d99259d10d4745
Author: Russell Bryant <rbryant@redhat.com>
Date: Fri Jan 31 14:18:24 2025 -0500
Add SPDX license headers to python source files
This commit adds SPDX license headers to python source files as
recommended to
the project by the Linux Foundation. These headers provide a concise way
that is
both human and machine readable for communicating license information
for each
source file. It helps avoid any ambiguity about the license of the code
and can
also be easily used by tools to help manage license compliance.
The Linux Foundation runs license scans against the codebase to help
ensure
we are in compliance with the licenses of the code we use, including
dependencies. Having these headers in place helps that tool do its job.
More information can be found on the SPDX site:
- https://spdx.dev/learn/handling-license-info/
Signed-off-by: Russell Bryant <rbryant@redhat.com>
commit 5a1cf1cb3b80759131c73f6a9dddebccac039dea
Author: Russell Bryant <rbryant@redhat.com>
Date: Fri Jan 31 14:36:32 2025 -0500
Check for SPDX headers using pre-commit
Signed-off-by: Russell Bryant <rbryant@redhat.com>
---------
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-02-02 11:58:18 -08:00
Lucas Wilkinson
cabaf4eff3
[Attention] MLA decode optimizations ( #12528 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: simon-mo <xmo@berkeley.edu>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: simon-mo <simon.mo@hey.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>
Co-authored-by: Alexander Matveev <59768536+alexm-neuralmagic@users.noreply.github.com>
Co-authored-by: simon-mo <xmo@berkeley.edu>
2025-01-30 23:49:37 -08:00
Shanshan Shen
9ddac56311
[Platform] move current_memory_usage() into platform ( #11369 )
...
Signed-off-by: Shanshan Shen <467638484@qq.com>
2025-01-15 03:38:25 +00:00
Chen Zhang
a2d2acb4c8
[Bugfix][Kernel] Give unique name to BlockSparseFlashAttention ( #12040 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-01-14 15:45:05 +00:00
Shanshan Shen
a7d59688fb
[Platform] Move get_punica_wrapper() function to Platform ( #11516 )
...
Signed-off-by: Shanshan Shen <467638484@qq.com>
2025-01-13 13:12:10 +00:00
youkaichao
458e63a2c6
[platform] add device_control env var ( #12009 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-01-13 20:59:09 +08:00
youkaichao
89ce62a316
[platform] add ray_device_key ( #11948 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-01-13 16:20:52 +08:00
wangxiyuan
20410b2fda
[platform] support custom torch.compile backend key ( #11318 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2025-01-10 23:46:51 +08:00
wangxiyuan
ef725feafc
[platform] support pytorch custom op pluggable ( #11328 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-01-10 10:02:38 +00:00
wangxiyuan
405eb8e396
[platform] Allow platform specify attention backend ( #11609 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: Mengqing Cao <cmq0113@163.com>
Co-authored-by: Mengqing Cao <cmq0113@163.com>
2025-01-09 21:46:50 +08:00
Mengqing Cao
6c6f7fe8a8
[Platform] Move model arch check to platform ( #11503 )
...
Signed-off-by: Mengqing Cao <cmq0113@163.com>
2024-12-27 08:45:25 +00:00
Chenguang Li
d1fa714cb1
[Refactor]A simple device-related refactor ( #11163 )
...
Signed-off-by: noemotiovon <noemotiovon@gmail.com>
Co-authored-by: noemotiovon <noemotiovon@gmail.com>
2024-12-13 13:39:00 +00:00
wangxiyuan
aea2fc38c3
[Platform] Move async output
check to platform ( #10768 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2024-12-09 17:24:46 +00:00
Michael Goin
7090c27bb2
[Bugfix] Only require XGrammar on x86 ( #10865 )
...
Signed-off-by: mgoin <michael@neuralmagic.com>
2024-12-03 10:32:21 -08:00
wangxiyuan
661175bc82
[platform] Add verify_quantization in platform. ( #10757 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2024-11-29 15:22:21 +00:00
youkaichao
eebad39f26
[torch.compile] support all attention backends ( #10558 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-22 14:04:42 -08:00
youkaichao
aaddce5d26
[platforms] improve error message for unspecified platforms ( #10520 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-20 23:07:56 -08:00
Mengqing Cao
9d827170a3
[Platforms] Add device_type
in Platform
( #10508 )
...
Signed-off-by: MengqingCao <cmq0113@163.com>
2024-11-21 04:44:20 +00:00
Mengqing Cao
8c1fb50705
[Platform][Refactor] Extract func get_default_attn_backend
to Platform
( #10358 )
...
Signed-off-by: Mengqing Cao <cmq0113@163.com>
2024-11-19 11:22:26 +08:00
youkaichao
4fd9375028
[2/N][torch.compile] make compilation cfg part of vllm cfg ( #10383 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-16 18:02:14 -08:00
Konrad Zawora
a02a50e6e5
[Hardware][Intel-Gaudi] Add Intel Gaudi (HPU) inference backend ( #6143 )
...
Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
Signed-off-by: Bob Zhu <bob.zhu@intel.com>
Signed-off-by: zehao-intel <zehao.huang@intel.com>
Signed-off-by: Konrad Zawora <kzawora@habana.ai>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
Co-authored-by: Sanju C Sudhakaran <scsudhakaran@habana.ai>
Co-authored-by: Michal Adamczyk <madamczyk@habana.ai>
Co-authored-by: Marceli Fylcek <mfylcek@habana.ai>
Co-authored-by: Himangshu Lahkar <49579433+hlahkar@users.noreply.github.com>
Co-authored-by: Vivek Goel <vgoel@habana.ai>
Co-authored-by: yuwenzho <yuwen.zhou@intel.com>
Co-authored-by: Dominika Olszewska <dolszewska@habana.ai>
Co-authored-by: barak goldberg <149692267+bgoldberg-habana@users.noreply.github.com>
Co-authored-by: Michal Szutenberg <37601244+szutenberg@users.noreply.github.com>
Co-authored-by: Jan Kaniecki <jkaniecki@habana.ai>
Co-authored-by: Agata Dobrzyniewicz <160237065+adobrzyniewicz-habana@users.noreply.github.com>
Co-authored-by: Krzysztof Wisniewski <kwisniewski@habana.ai>
Co-authored-by: Dudi Lester <160421192+dudilester@users.noreply.github.com>
Co-authored-by: Ilia Taraban <tarabanil@gmail.com>
Co-authored-by: Chendi.Xue <chendi.xue@intel.com>
Co-authored-by: Michał Kuligowski <mkuligowski@habana.ai>
Co-authored-by: Jakub Maksymczuk <jmaksymczuk@habana.ai>
Co-authored-by: Tomasz Zielinski <85164140+tzielinski-habana@users.noreply.github.com>
Co-authored-by: Sun Choi <schoi@habana.ai>
Co-authored-by: Iryna Boiko <iboiko@habana.ai>
Co-authored-by: Bob Zhu <41610754+czhu15@users.noreply.github.com>
Co-authored-by: hlin99 <73271530+hlin99@users.noreply.github.com>
Co-authored-by: Zehao Huang <zehao.huang@intel.com>
Co-authored-by: Andrzej Kotłowski <Andrzej.Kotlowski@intel.com>
Co-authored-by: Yan Tomsinsky <73292515+Yantom1@users.noreply.github.com>
Co-authored-by: Nir David <ndavid@habana.ai>
Co-authored-by: Yu-Zhou <yu.zhou@intel.com>
Co-authored-by: Ruheena Suhani Shaik <rsshaik@habana.ai>
Co-authored-by: Karol Damaszke <kdamaszke@habana.ai>
Co-authored-by: Marcin Swiniarski <mswiniarski@habana.ai>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Jacek Czaja <jacek.czaja@intel.com>
Co-authored-by: Jacek Czaja <jczaja@habana.ai>
Co-authored-by: Yuan <yuan.zhou@outlook.com>
2024-11-06 01:09:10 -08:00
wangshuai09
622b7ab955
[Hardware] using current_platform.seed_everything ( #9785 )
...
Signed-off-by: wangshuai09 <391746016@qq.com>
2024-10-29 14:47:44 +00:00
Mengqing Cao
5cbdccd151
[Hardware][openvino] is_openvino --> current_platform.is_openvino ( #9716 )
2024-10-26 10:59:06 +00:00
xendo
9dbcce84a7
[Neuron] [Bugfix] Fix neuron startup ( #9374 )
...
Co-authored-by: Jerzy Zagorski <jzagorsk@amazon.com>
2024-10-22 12:51:41 +00:00
Cyrus Leung
26a68d5d7e
[CI/Build] Add test decorator for minimum GPU memory ( #8925 )
2024-09-29 02:50:51 +00:00
Tyler Titsworth
260024a374
[Bugfix][Intel] Fix XPU Dockerfile Build ( #7824 )
...
Signed-off-by: tylertitsworth <tyler.titsworth@intel.com>
Co-authored-by: youkaichao <youkaichao@126.com>
2024-09-27 23:45:50 -07:00
Cyrus Leung
6ffa3f314c
[CI/Build] Avoid CUDA initialization ( #8534 )
2024-09-18 10:38:11 +00:00
Li, Jiang
0b952af458
[Hardware][Intel] Support compressed-tensor W8A8 for CPU backend ( #7257 )
2024-09-11 09:46:46 -07:00
youkaichao
eed020f673
[misc] use nvml to get consistent device name ( #7582 )
2024-08-16 21:15:13 -07:00
Woosuk Kwon
42de2cefcb
[Misc] Add a wrapper for torch.inference_mode ( #6618 )
2024-07-21 18:43:11 -07:00
youkaichao
482045ee77
[hardware][misc] introduce platform abstraction ( #6080 )
2024-07-02 20:12:22 -07:00