[Ray Serve] Unable to connect to GCS with ray start --head, but works from inside python
See original GitHub issueWhat happened + What you expected to happen
When I try follow this tutorial for deploying on a single node, and I start up a ray head node using ray start --head, it fails to start up (see below error).
However, when I start a server up from inside a python script it works as expected (see below). I want to be able to do it the prior way to make use of Serve’s ability to dynamically update running deployments.
Versions / Dependencies
ray, version 1.12.1
Redis server v=6.0.15 sha=00000000:0 malloc=jemalloc-5.2.1 bits=64 build=d583da279d383435
Reproduction script
ray start --head
Observe the following
2022-05-18 10:13:12,091 WARNING utils.py:1254 -- Unable to connect to GCS at 10.0.0.105:6379. Check that (1) Ray GCS with matching version started successfully at the specified address, and (2) there is no firewall setting preventing access.
However, this works:
import ray
from ray import serve
serve.start()
while True:
pass
Observe:
2022-05-18 10:34:37,061 INFO services.py:1456 -- View the Ray dashboard at http://127.0.0.1:8265
(ServeController pid=10417) 2022-05-18 10:34:40,010 INFO checkpoint_path.py:15 -- Using RayInternalKVStore for controller checkpoint and recovery.
(ServeController pid=10417) 2022-05-18 10:34:40,118 INFO http_state.py:106 -- Starting HTTP proxy with name 'SERVE_CONTROLLER_ACTOR:yZdKhI:SERVE_PROXY_ACTOR-node:10.0.0.105-0' on node 'node:10.0.0.105-0' listening on '127.0.0.1:8000'
2022-05-18 10:34:40,986 INFO api.py:794 -- Started Serve instance in namespace 'serve'.
Issue Severity
High: It blocks me from completing my task.
Issue Analytics
- State:
- Created a year ago
- Comments:21 (11 by maintainers)
Top Results From Across the Web
Ray start --head Unable to connect to GCS
I'm working through the ray[serve] tutorials but I'm unable to start and run them detached because ray start --head fails. If I used...
Read more >Unable to Connect to Ray Cluster from machines other than ...
The machine where I'm calling ray up from is able to connect to the cluster via ray attach or to submit jobs using...
Read more >How to Use Ray, a Distributed Python Framework, on Databricks
Ray is an open-source project first developed at RISELab that makes it simple to scale any compute-intensive Python workload.
Read more >Troubleshooting Ray jobs from logs - AWS Glue
Check the failure message of the job run. If that does not provide enough information, check /aws-glue/ray/jobs/script-log/ . Problem area: PIP ...
Read more >Learning Ray
Use of the information and instructions contained in this work is at your own ... Distributed Python is not new, and Ray is...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Great that it works now!
@simon-mo perfect. Exactly what I was looking for. Thank you.