[core] occasional Ray port conflict issue
See original GitHub issueWhat is the problem?
encountered an issue with occasional Ray port conflict. Ray component is trying to use a port number xxx that is used by other components. Ray 1.3.0
Reproduction (REQUIRED)
start the head node as normal, start the worker node with command below in a could: ray start --address=agent10909-phx4.prod.uber.internal:31014 --object-manager-port=31009 --worker-port-list=31034,31035,31046,31047,31048,31049,31061,31062,31063,31064,31065,31066 --num-cpus=10 --num-gpus=1 --block
We estimate 1 out of 100 run, this issue will happen.
The worker node won’t be able to start. log looks like below. I see Ray itself pickup same port for dashboard_agent and metrics_export, which we didn’t specify in our ray start up command.
2021-08-22 07:50:43,072 INFO : worker_ports_str is 31034,31035,31046,31047,31048,31049,31061,31062,31063,31064,31065,31066
2021-08-22 07:50:43,073 INFO : Running ray worker with ray start --address=agent10909-phx4.prod.uber.internal:31014 --object-manager-port=31009 --worker-port-list=31034,31035,31046,31047,31048,31049,31061,31062,31063,31064,31065,31066 --num-cpus=10 --num-gpus=1 --block
/usr/lib/python3.6/site-packages/ray/autoscaler/_private/cli_logger.py:61: FutureWarning: Not all Ray CLI dependencies were found. In Ray 1.4+, the Ray CLI, autoscaler, and dashboard will only be usable via `pip install 'ray[default]'`. Please update your install command.
"update your install command.", FutureWarning)
Traceback (most recent call last):
File "/usr/bin/ray", line 8, in <module>
sys.exit(main())
File "/usr/lib/python3.6/site-packages/ray/scripts/scripts.py", line 1706, in main
return cli()
File "/usr/lib/python3.6/site-packages/click/core.py", line 1137, in _call_
return self.main(*args, **kwargs)
File "/usr/lib/python3.6/site-packages/click/core.py", line 1062, in main
rv = self.invoke(ctx)
File "/usr/lib/python3.6/site-packages/click/core.py", line 1668, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/lib/python3.6/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/lib/python3.6/site-packages/click/core.py", line 763, in invoke
return __callback(*args, **kwargs)
File "/usr/lib/python3.6/site-packages/ray/scripts/scripts.py", line 657, in start
ray_params, head=False, shutdown_at_exit=block, spawn_reaper=block)
File "/usr/lib/python3.6/site-packages/ray/node.py", line 223, in _init_
self._ray_params.update_pre_selected_port()
File "/usr/lib/python3.6/site-packages/ray/_private/parameter.py", line 297, in update_pre_selected_port
ValueError: Ray component metrics_export is trying to use a port number 61240 that is used by other components.
Port information: {'gcs': [], 'object_manager': [31009], 'node_manager': [], 'gcs_server': [], 'client_server': [10001], 'dashboard': [8265], 'dashboard_agent': [61240], 'metrics_export': [61240], 'redis_shards': [], 'worker_ports': [31034, 31035, 31046, 31047, 31048, 31049, 31061, 31062, 31063, 31064, 31065, 31066]}
If you allocate ports, please make sure the same port is not used by multiple components.
I0822 07:50:44.000429 9 executor.cpp:1015] Command exited with status 0 (pid: 73)
I0822 07:50:45.002389 72 process.cpp:927] Stopped the socket accept loop
- I have verified my script runs in a clean environment and reproduces the issue.
- I have verified the issue also occurs with the latest wheels.
Issue Analytics
- State:
- Created 2 years ago
- Comments:9 (5 by maintainers)
Top Results From Across the Web
The Ray agent couldn't be started due to the port conflict. To ...
The Ray agent couldn't be started due to the port conflict. To solve the problem, start Ray with a hard-coded agent port.
Read more >Port conflict issue for port 80 and 443 while starting Liberty ...
If you run Liberty Server out of the AppScan Enterprise product, you need to set Liberty port to a different port than 80...
Read more >Extensible Messaging and Presence Protocol (XMPP): Core
This document defines XMPP's core protocol methods: setup and teardown of XML streams, channel encryption, authentication, error handling, and communication ...
Read more >Package List — Spack 0.20.0.dev0 documentation
dotnet-core-sdk, py-azure-identity, r-genomeinfodbdata ... unstructured grid, finite element code for the solution and analysis of multiphysics problems.
Read more >National Commission on the BP Deepwater Horizon Oil Spill
crucial issues we believe must inform policy going forward: the specific ... exploding Deepwater Horizon, the Bankston berthed in slip 1 at the...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Let me check this soon
Hi @rkooo567, it seems the ticket is automatically closed, and I wonder if this has been fixed. Thanks!