[WIP][Core][Refactor] move vllm/model_executor/parallel_utils into vllm/distributed and vllm/device_communicators (#3950)
[Core][Test] move local_rank to the last arg with default value to keep api compatible (#3711)