Woosuk Kwon
|
8bab4959be
|
[Misc] Remove VLLM_BUILD_WITH_NEURON env variable (#5389)
|
2024-06-11 00:37:56 -07:00 |
|
bnellnm
|
5467ac3196
|
[Kernel][Misc] Use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (#5047)
|
2024-06-09 16:23:30 -04:00 |
|
Divakar Verma
|
a66cf40b20
|
[Kernel][ROCm][AMD] enable fused topk_softmax kernel for moe layer (#4927)
This PR enables the fused topk_softmax kernel used in moe layer for HIP
|
2024-06-02 14:13:26 -07:00 |
|
Daniele
|
a360ff80bb
|
[CI/Build] CMakeLists: build all extensions' cmake targets at the same time (#5034)
|
2024-05-31 22:06:45 -06:00 |
|
youkaichao
|
5bd3c65072
|
[Core][Optimization] remove vllm-nccl (#5091)
|
2024-05-29 05:13:52 +00:00 |
|
Sanger Steel
|
8bc68e198c
|
[Frontend] [Core] perf: Automatically detect vLLM-tensorized model, update tensorizer to version 2.9.0 (#4208)
|
2024-05-13 14:57:07 -07:00 |
|
kliuae
|
ff5abcd746
|
[ROCm] Add support for Punica kernels on AMD GPUs (#3140)
Co-authored-by: miloice <jeffaw99@hotmail.com>
|
2024-05-09 09:19:50 -07:00 |
|
Woosuk Kwon
|
89579a201f
|
[Misc] Use vllm-flash-attn instead of flash-attn (#4686)
|
2024-05-08 13:15:34 -07:00 |
|
youkaichao
|
344bf7cd2d
|
[Misc] add installation time env vars (#4574)
|
2024-05-03 15:55:56 -07:00 |
|
Hu Dong
|
5ad60b0cbd
|
[Misc] Exclude the tests directory from being packaged (#4552)
|
2024-05-02 10:50:25 -07:00 |
|
Travis Johnson
|
8b798eec75
|
[CI/Build][Bugfix] VLLM_USE_PRECOMPILED should skip compilation (#4534)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
|
2024-05-01 18:01:50 +00:00 |
|
Alpay Ariyak
|
715c2d854d
|
[Frontend] [Core] Tensorizer: support dynamic num_readers , update version (#4467)
|
2024-04-30 16:32:13 -07:00 |
|
SangBin Cho
|
a88081bf76
|
[CI] Disable non-lazy string operation on logging (#4326)
Co-authored-by: Danny Guinther <dguinther@neuralmagic.com>
|
2024-04-26 00:16:58 -07:00 |
|
Liangfu Chen
|
cd2f63fb36
|
[CI/CD] add neuron docker and ci test scripts (#3571)
|
2024-04-18 15:26:01 -07:00 |
|
Nick Hill
|
563c54f760
|
[BugFix] Fix tensorizer extra in setup.py (#4072)
|
2024-04-14 14:12:42 -07:00 |
|
Sanger Steel
|
711a000255
|
[Frontend] [Core] feat: Add model loading using tensorizer (#3476)
|
2024-04-13 17:13:01 -07:00 |
|
Michael Feil
|
c2b4a1bce9
|
[Doc] Add typing hints / mypy types cleanup (#3816)
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
|
2024-04-11 17:17:21 -07:00 |
|
Woosuk Kwon
|
cfaf49a167
|
[Misc] Define common requirements (#3841)
|
2024-04-05 00:39:17 -07:00 |
|
youkaichao
|
ca81ff5196
|
[Core] manage nccl via a pypi package & upgrade to pt 2.2.1 (#3805)
|
2024-04-04 10:26:19 -07:00 |
|
bigPYJ1151
|
0e3f06fe9c
|
[Hardware][Intel] Add CPU inference backend (#3634)
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
Co-authored-by: Yuan Zhou <yuan.zhou@intel.com>
|
2024-04-01 22:07:30 -07:00 |
|
youkaichao
|
3492859b68
|
[CI/Build] update default number of jobs and nvcc threads to avoid overloading the system (#3675)
|
2024-03-28 00:18:54 -04:00 |
|
youkaichao
|
8f44facddd
|
[Core] remove cupy dependency (#3625)
|
2024-03-27 00:33:26 -07:00 |
|
SangBin Cho
|
01bfb22b41
|
[CI] Try introducing isort. (#3495)
|
2024-03-25 07:59:47 -07:00 |
|
youkaichao
|
42bc386129
|
[CI/Build] respect the common environment variable MAX_JOBS (#3600)
|
2024-03-24 17:04:00 -07:00 |
|
Zhuohan Li
|
523e30ea0c
|
[BugFix] Hot fix in setup.py for neuron build (#3537)
|
2024-03-20 17:59:52 -07:00 |
|
bnellnm
|
ba8ae1d84f
|
Check for _is_cuda() in compute_num_jobs (#3481)
|
2024-03-20 10:06:56 -07:00 |
|
bnellnm
|
9fdf3de346
|
Cmake based build system (#2830)
|
2024-03-18 15:38:33 -07:00 |
|
Woosuk Kwon
|
abfc4f3387
|
[Misc] Use dataclass for InputMetadata (#3452)
Co-authored-by: youkaichao <youkaichao@126.com>
|
2024-03-17 10:02:46 +00:00 |
|
Simon Mo
|
6b78837b29
|
Fix setup.py neuron-ls issue (#2671)
|
2024-03-16 16:00:25 -07:00 |
|
Simon Mo
|
8e67598aa6
|
[Misc] fix line length for entire codebase (#3444)
|
2024-03-16 00:36:29 -07:00 |
|
youkaichao
|
604f235937
|
[Misc] add error message in non linux platform (#3438)
|
2024-03-15 21:21:37 +00:00 |
|
陈序
|
739c350c19
|
[Minor Fix] Use cupy-cuda11x in CUDA 11.8 build (#3256)
|
2024-03-13 09:43:24 -07:00 |
|
Zhuohan Li
|
2f8844ba08
|
Re-enable the 80 char line width limit (#3305)
|
2024-03-10 19:49:14 -07:00 |
|
Woosuk Kwon
|
1cb0cc2975
|
[FIX] Make flash_attn optional (#3269)
|
2024-03-08 10:52:20 -08:00 |
|
Woosuk Kwon
|
2daf23ab0c
|
Separate attention backends (#3005)
|
2024-03-07 01:45:50 -08:00 |
|
Robert Shaw
|
c0c2335ce0
|
Integrate Marlin Kernels for Int4 GPTQ inference (#2497)
Co-authored-by: Robert Shaw <114415538+rib-2@users.noreply.github.com>
Co-authored-by: alexm <alexm@neuralmagic.com>
|
2024-03-01 12:47:51 -08:00 |
|
Billy Cao
|
2c08ff23c0
|
Fix building from source on WSL (#3112)
|
2024-02-29 11:13:58 -08:00 |
|
Philipp Moritz
|
cfc15a1031
|
Optimize Triton MoE Kernel (#2979)
Co-authored-by: Cade Daniel <edacih@gmail.com>
|
2024-02-26 13:48:56 -08:00 |
|
James Whedbee
|
264017a2bf
|
[ROCm] include gfx908 as supported (#2792)
|
2024-02-19 17:58:59 -08:00 |
|
Hongxia Yang
|
0580aab02f
|
[ROCm] support Radeon™ 7900 series (gfx1100) without using flash-attention (#2768)
|
2024-02-10 23:14:37 -08:00 |
|
Philipp Moritz
|
931746bc6d
|
Add documentation on how to do incremental builds (#2796)
|
2024-02-07 14:42:02 -08:00 |
|
Woosuk Kwon
|
f0d4e14557
|
Add fused top-K softmax kernel for MoE (#2769)
|
2024-02-05 17:38:02 -08:00 |
|
Douglas Lehr
|
2ccee3def6
|
[ROCm] Fixup arch checks for ROCM (#2627)
|
2024-02-05 14:59:09 -08:00 |
|
wangding zeng
|
5d60def02c
|
DeepseekMoE support with Fused MoE kernel (#2453)
Co-authored-by: roy <jasonailu87@gmail.com>
|
2024-01-29 21:19:48 -08:00 |
|
Rasmus Larsen
|
ea8489fce2
|
ROCm: Allow setting compilation target (#2581)
|
2024-01-29 10:52:31 -08:00 |
|
zhaoyang-star
|
9090bf02e7
|
Support FP8-E5M2 KV Cache (#2279)
Co-authored-by: zhaoyang <zhao.yang16@zte.com.cn>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
|
2024-01-28 16:43:54 -08:00 |
|
Hanzhi Zhou
|
380170038e
|
Implement custom all reduce kernels (#2192)
|
2024-01-27 12:46:35 -08:00 |
|
Philipp Moritz
|
390b495ff3
|
Don't build punica kernels by default (#2605)
|
2024-01-26 15:19:19 -08:00 |
|
Hongxia Yang
|
6b7de1a030
|
[ROCm] add support to ROCm 6.0 and MI300 (#2274)
|
2024-01-26 12:41:10 -08:00 |
|
Antoni Baum
|
9b945daaf1
|
[Experimental] Add multi-LoRA support (#1804)
Co-authored-by: Chen Shen <scv119@gmail.com>
Co-authored-by: Shreyas Krishnaswamy <shrekris@anyscale.com>
Co-authored-by: Avnish Narayan <avnish@anyscale.com>
|
2024-01-23 15:26:37 -08:00 |
|