Sage Moore
|
9a88f89799
|
custom allreduce + torch.compile (#10121)
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2024-11-25 22:00:16 -08:00 |
|
youkaichao
|
29f3ef26a3
|
[ci][distributed] disable hanging tests (#10317)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-14 00:23:39 -08:00 |
|
youkaichao
|
0d4ea3fb5c
|
[core][distributed] use tcp store directly (#10275)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-12 17:36:08 -08:00 |
|
youkaichao
|
8a7fe47d32
|
[misc][distributed] auto port selection and disable tests (#10226)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-11 11:54:59 -08:00 |
|
youkaichao
|
e6de9784d2
|
[core][distributed] add stateless process group (#10216)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-11 09:02:14 -08:00 |
|
youkaichao
|
719c1ca468
|
[core][distributed] add stateless_init_process_group (#10072)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-06 16:42:09 -08:00 |
|
Hongxia Yang
|
b6c16cf8ff
|
[ROCm][AMD] unify CUDA_VISIBLE_DEVICES usage in cuda/rocm (#6352)
|
2024-07-11 21:30:46 -07:00 |
|
Matt Wong
|
dd793d1de5
|
[Hardware][AMD][CI/Build][Doc] Upgrade to ROCm 6.1, Dockerfile improvements, test fixes (#5422)
|
2024-06-25 15:56:15 -07:00 |
|
Cyrus Leung
|
0e9164b40a
|
[mypy] Enable type checking for test directory (#5017)
|
2024-06-15 04:45:31 +00:00 |
|
youkaichao
|
48f589e18b
|
[mis] fix flaky test of test_cuda_device_count_stateless (#5546)
|
2024-06-14 10:02:23 -07:00 |
|
Antoni Baum
|
50eed24d25
|
Add cuda_device_count_stateless (#5473)
|
2024-06-13 16:06:49 -07:00 |
|