Tyler Michael Smith
|
28b3a1c7e5
|
[V1] Multiprocessing Tensor Parallel Support for v1 (#9856)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2024-12-10 06:28:14 +00:00 |
|
youkaichao
|
1b62745b1d
|
[core][executor] simplify instance id (#10976)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-07 09:33:45 -08:00 |
|
youkaichao
|
a111d0151f
|
[platforms] absorb worker cls difference into platforms folder (#10555)
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2024-11-21 21:00:32 -08:00 |
|
Mengqing Cao
|
7371749d54
|
[Misc] Fix ImportError causing by triton (#9493)
|
2024-11-08 05:08:51 +00:00 |
|
Russell Bryant
|
d1537039ce
|
[Core] Improve choice of Python multiprocessing method (#8823)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: youkaichao <youkaichao@126.com>
|
2024-09-29 09:17:07 +08:00 |
|
Nick Hill
|
acd5511b6d
|
[BugFix] Fix clean shutdown issues (#8492)
|
2024-09-16 09:33:46 -07:00 |
|
afeldman-nm
|
428dd1445e
|
[Core] Logprobs support in Multi-step (#7652)
|
2024-08-29 19:19:08 -07:00 |
|
youkaichao
|
f52a43a8b9
|
[ci][test] fix pp test failure (#7945)
|
2024-08-28 01:27:07 -07:00 |
|
Kunshang Ji
|
076169f603
|
[Hardware][Intel GPU] Add intel GPU pipeline parallel support. (#7810)
|
2024-08-27 10:07:02 -07:00 |
|
youkaichao
|
660dea1235
|
[cuda][misc] remove error_on_invalid_device_count_status (#7069)
|
2024-08-02 00:14:21 -07:00 |
|
Travis Johnson
|
593e79e733
|
[Bugfix] torch.set_num_threads() in multiproc_gpu_executor (#6802)
[Bugfix] Use torch.set_num_threads() to configure parallelism in multiproc_gpu_executor (#6802)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
|
2024-07-26 22:15:20 -07:00 |
|
Anthony Platanios
|
084a01fd35
|
[Bugfix] [Easy] Fixed a bug in the multiprocessing GPU executor. (#6770)
|
2024-07-25 21:25:35 -07:00 |
|
Antoni Baum
|
7bd82002ae
|
[Core] Allow specifying custom Executor (#6557)
|
2024-07-20 01:25:06 +00:00 |
|
Nick Hill
|
b5672a112c
|
[Core] Multiprocessing Pipeline Parallel support (#6130)
Co-authored-by: Murali Andoorveedu <muralidhar.andoorveedu@centml.ai>
|
2024-07-18 19:15:52 -07:00 |
|
youkaichao
|
09c2eb85dd
|
[ci][distributed] add pipeline parallel correctness test (#6410)
|
2024-07-16 15:44:22 -07:00 |
|
Thomas Parnell
|
eaec4b9153
|
[Bugfix] Add custom Triton cache manager to resolve MoE MP issue (#6140)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Co-authored-by: Chih-Chieh-Yang <chih.chieh.yang@ibm.com>
|
2024-07-15 10:12:47 -07:00 |
|
Travis Johnson
|
1dab9bc8a9
|
[Bugfix] set OMP_NUM_THREADS to 1 by default for multiprocessing (#6109)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
|
2024-07-03 16:56:59 -07:00 |
|
youkaichao
|
f666207161
|
[misc][distributed] error on invalid state (#6092)
|
2024-07-02 23:37:29 -07:00 |
|
Murali Andoorveedu
|
c5832d2ae9
|
[Core] Pipeline Parallel Support (#4412)
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
|
2024-07-02 10:58:08 -07:00 |
|
Stephanie Wang
|
dda4811591
|
[Core] Refactor Worker and ModelRunner to consolidate control plane communication (#5408)
Signed-off-by: Stephanie Wang <swang@cs.berkeley.edu>
Signed-off-by: Stephanie <swang@anyscale.com>
Co-authored-by: Stephanie <swang@anyscale.com>
|
2024-06-25 20:30:03 -07:00 |
|
Matt Wong
|
dd793d1de5
|
[Hardware][AMD][CI/Build][Doc] Upgrade to ROCm 6.1, Dockerfile improvements, test fixes (#5422)
|
2024-06-25 15:56:15 -07:00 |
|
youkaichao
|
3eea74889f
|
[misc][distributed] use 127.0.0.1 for single-node (#5619)
|
2024-06-19 08:05:00 +00:00 |
|
Antoni Baum
|
50eed24d25
|
Add cuda_device_count_stateless (#5473)
|
2024-06-13 16:06:49 -07:00 |
|
Nick Hill
|
99dac099ab
|
[Core][Doc] Default to multiprocessing for single-node distributed case (#5230)
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
|
2024-06-11 11:10:41 -07:00 |
|
Junichi Sato
|
2e02311a1b
|
[Bugfix] Fix MultiprocessingGPUExecutor.check_health when world_size == 1 (#5254)
|
2024-06-11 10:38:07 -07:00 |
|
zifeitong
|
a58f24e590
|
[Bugfix] Fix torch.compile() error when using MultiprocessingGPUExecutor (#5229)
|
2024-06-03 20:55:50 -07:00 |
|
Nick Hill
|
eb6d3c264d
|
[Core] Eliminate parallel worker per-step task scheduling overhead (#4894)
|
2024-05-23 06:17:27 +09:00 |
|
Nick Hill
|
676a99982f
|
[Core] Add MultiprocessingGPUExecutor (#4539)
Co-authored-by: SAHIL SUNEJA <suneja@us.ibm.com>
|
2024-05-14 10:38:59 -07:00 |
|