[Core] Ray may hang if workers fail to start due to limited ports
See original GitHub issueWhat happened + What you expected to happen
- If
--min-worker-portand--max-worker-portare used on a cluster, ray stops running any remote functions. - ray would be able to find a node that has an available port and port and run on it. (As in the sample given below, I am asking for 4 remotes to run on 6 nodes and have given each node 8 ports, and the run completely stops.) See #27499 for why I am using worker-ports. (From the Ray Architecture document: “Ray worker nodes are designed to be homogeneous, so that any single node may be lost without bringing down the entire cluster.” Basically, this is a single node failure that stops progress in the entire cluster.)
- below
Basically, ray status seems to think there are available cpus (1/6 used) but the ray_test python program is unable to find them.
ray status --address=10.159.0.79:58391
======== Autoscaler status: 2022-08-23 12:36:28.035691 ========
Node status
---------------------------------------------------------------
Healthy:
1 node_991fc91accbb9cc5e9294c23d505be10da58c11a6f679c5562fab5b8
1 node_c3238c760e1c81564e6d0dbff838b3678bba29bdd221c1dcaabfb19b
1 node_a4b36956a2fed8fa17ca63ee3155b0c68bb2e04c094d5366eff0b9e5
1 node_49970fd58f7f63a57aa2218156c24919dd6755b721a69260f2db7992
1 node_50b4b421ab67619fd61ac967de3f72ebce9a3d1826755bd0c0b9ce86
1 node_58f9c045ad2a1bda3ac013012ef758a00c05b93b88688051ca4c67d9
Pending:
(no pending nodes)
Recent failures:
(no failures)
Resources
---------------------------------------------------------------
Usage:
1.0/6.0 CPU
0.00/571.893 GiB memory
0.00/249.089 GiB object_store_memory
Demands:
{'CPU': 1.0}: 1+ pending tasks/actors
$ python ray_test.py
starting remote functions
remotes started
(raylet, ip=10.159.0.92) [2022-08-23 12:33:11,849 E 197968 197968] core_worker.cc:137: Failed to register worker 9389d63fbf6022579d3f4f76962f5e43aecd706984ed2535e68e4f88 to Raylet. Invalid: Invalid: No available ports. Please specify a wider port range using --min-worker-port and --max-worker-port.
(raylet, ip=10.159.0.94) [2022-08-23 12:33:48,045 E 46719 46719] (raylet) worker_pool.cc:502: Some workers of the worker process(55142) have not registered within the timeout. The process is dead, probably it crashed during start.
(raylet, ip=10.159.0.81) [2022-08-23 12:33:48,163 E 101832 101832] (raylet) worker_pool.cc:502: Some workers of the worker process(109680) have not registered within the timeout. The process is dead, probably it crashed during start.
(raylet, ip=10.159.0.92) [2022-08-23 12:34:10,844 E 190127 190127] (raylet) worker_pool.cc:502: Some workers of the worker process(197968) have not registered within the timeout. The process is dead, probably it crashed during start.
(raylet, ip=10.159.0.92) [2022-08-23 12:34:11,385 E 198185 198185] core_worker.cc:137: Failed to register worker c547d2a09340527eb29e316cbdbf34cdbb93fdda3dbdf8124902bc83 to Raylet. Invalid: Invalid: No available ports. Please specify a wider port range using --min-worker-port and --max-worker-port.
(raylet, ip=10.159.0.92) [2022-08-23 12:35:10,848 E 190127 190127] (raylet) worker_pool.cc:502: Some workers of the worker process(198185) have not registered within the timeout. The process is dead, probably it crashed during start.
(raylet, ip=10.159.0.92) [2022-08-23 12:35:11,634 E 198353 198353] core_worker.cc:137: Failed to register worker 8e47343dfaf4ddd666f2c9b7e1767fc822b09a6e908d575e620e8df8 to Raylet. Invalid: Invalid: No available ports. Please specify a wider port range using --min-worker-port and --max-worker-port.
...
#running ps -u fred -o pcpu,pid,ppid,thcount,user,size,stat,comm,args | grep 'ray::'
#start:
93.1 94252 86552 21 fred 7812360 Rl ray::f() ray::f()
90.6 159525 152200 21 fred 7840232 Rl ray::f() ray::f()
96.3 136620 122245 21 fred 1914708 Rl ray::f() ray::f()
#when making no progress
23.5 136620 122245 21 fred 1398696 Sl ray::IDLE ray::IDLE
23.4 94252 86552 21 fred 7357788 Sl ray::IDLE ray::IDLE
23.3 159525 152200 21 fred 7357852 Sl ray::IDLE ray::IDLE
Versions / Dependencies
(Note that this was tested with ray 2.0 and still exists: https://github.com/ray-project/ray/issues/28071#issuecomment-1227623743 )
$ conda list
# packages in environment at /home/fred/miniconda3/envs/raven_libraries_heron_newer_ray:
#
# Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 2_gnu conda-forge
_tflow_select 2.3.0 mkl
absl-py 0.15.0 pyhd8ed1ab_0 conda-forge
aiohttp 3.8.1 py37h540881e_1 conda-forge
aiohttp-cors 0.7.0 pypi_0 pypi
aiosignal 1.2.0 pyhd8ed1ab_0 conda-forge
ampl-mp 3.1.0 h2cc385e_1006 conda-forge
astor 0.8.1 pyh9f0ad1d_0 conda-forge
astroid 2.11.6 py37h89c1867_0 conda-forge
async-timeout 4.0.2 pyhd8ed1ab_0 conda-forge
asynctest 0.13.0 py_0 conda-forge
attrs 22.1.0 pyh71513ae_1 conda-forge
blessed 1.19.1 pypi_0 pypi
blinker 1.4 py_1 conda-forge
brotlipy 0.7.0 py37h540881e_1004 conda-forge
bzip2 1.0.8 h7f98852_4 conda-forge
c-ares 1.18.1 h7f98852_0 conda-forge
ca-certificates 2022.6.15 ha878542_0 conda-forge
cached-property 1.5.2 hd8ed1ab_1 conda-forge
cached_property 1.5.2 pyha770c72_1 conda-forge
cachetools 5.2.0 pyhd8ed1ab_0 conda-forge
certifi 2022.6.15 py37h89c1867_0 conda-forge
cffi 1.15.1 py37h43b0acd_0 conda-forge
cftime 1.6.0 py37h6c7ee08_0 conda-forge
charset-normalizer 2.1.0 pyhd8ed1ab_0 conda-forge
click 8.0.4 pypi_0 pypi
cloudpickle 1.6.0 py_0 conda-forge
coin-or-cbc 2.10.8 h3786ebc_0 conda-forge
coin-or-cgl 0.60.6 h6f57e76_1 conda-forge
coin-or-clp 1.17.7 hc56784d_1 conda-forge
coin-or-osi 0.108.7 h2720bb7_1 conda-forge
coin-or-utils 2.11.6 h202d8b1_1 conda-forge
coincbc 2.10.8 0_metapackage conda-forge
colorama 0.4.5 pyhd8ed1ab_0 conda-forge
colorful 0.5.4 pypi_0 pypi
coverage 6.4.4 py37h540881e_0 conda-forge
cryptography 37.0.4 py37h38fbfac_0 conda-forge
curl 7.83.1 h7bff187_0 conda-forge
cycler 0.11.0 pyhd8ed1ab_0 conda-forge
dbus 1.13.6 h5008d03_3 conda-forge
dill 0.3.5.1 pyhd8ed1ab_0 conda-forge
distlib 0.3.5 pypi_0 pypi
elementpath 3.0.2 pypi_0 pypi
expat 2.4.8 h27087fc_0 conda-forge
filelock 3.8.0 pypi_0 pypi
fontconfig 2.14.0 h8e229c2_0 conda-forge
freetype 2.12.1 hca18f0e_0 conda-forge
frozenlist 1.3.1 py37h540881e_0 conda-forge
gast 0.2.2 py_0 conda-forge
gettext 0.19.8.1 h73d1719_1008 conda-forge
glib 2.72.1 h6239696_0 conda-forge
glib-tools 2.72.1 h6239696_0 conda-forge
glpk 5.0 h445213a_0 conda-forge
gmp 6.2.1 h58526e2_0 conda-forge
google-api-core 2.8.2 pypi_0 pypi
google-auth 2.10.0 pyh6c4a22f_0 conda-forge
google-auth-oauthlib 0.4.6 pyhd8ed1ab_0 conda-forge
google-pasta 0.2.0 pyh8c360ce_0 conda-forge
googleapis-common-protos 1.56.4 pypi_0 pypi
gpustat 1.0.0rc1 pypi_0 pypi
grpc-cpp 1.48.0 hbad87ad_3 conda-forge
grpcio 1.43.0 pypi_0 pypi
gst-plugins-base 1.14.5 h0935bb2_2 conda-forge
gstreamer 1.18.5 h9f60fe5_3 conda-forge
h5py 3.6.0 nompi_py37hd308b1e_100 conda-forge
hdf4 4.2.15 h9772cbc_4 conda-forge
hdf5 1.12.1 nompi_h2386368_104 conda-forge
icu 67.1 he1b5a44_0 conda-forge
idna 3.3 pyhd8ed1ab_0 conda-forge
imageio 2.9.0 py_0 conda-forge
importlib-metadata 4.11.4 py37h89c1867_0 conda-forge
importlib-resources 5.9.0 pypi_0 pypi
importlib_metadata 4.11.4 hd8ed1ab_0 conda-forge
ipopt 3.14.9 hc8a599a_0 conda-forge
isort 5.10.1 pyhd8ed1ab_0 conda-forge
joblib 1.1.0 pyhd8ed1ab_0 conda-forge
jpeg 9e h166bdaf_2 conda-forge
jsonschema 4.10.0 pypi_0 pypi
keras-applications 1.0.8 py_1 conda-forge
keras-preprocessing 1.1.2 pyhd8ed1ab_0 conda-forge
keyutils 1.6.1 h166bdaf_0 conda-forge
kiwisolver 1.4.4 py37h7cecad7_0 conda-forge
krb5 1.19.3 h3790be6_0 conda-forge
lazy-object-proxy 1.7.1 py37h540881e_1 conda-forge
lcms2 2.12 hddcbb42_0 conda-forge
ld_impl_linux-64 2.36.1 hea4e1c9_2 conda-forge
lerc 4.0.0 h27087fc_0 conda-forge
libabseil 20220623.0 cxx17_h48a1fff_1 conda-forge
libblas 3.9.0 16_linux64_openblas conda-forge
libcblas 3.9.0 16_linux64_openblas conda-forge
libclang 11.1.0 default_ha53f305_1 conda-forge
libcurl 7.83.1 h7bff187_0 conda-forge
libdeflate 1.13 h166bdaf_0 conda-forge
libedit 3.1.20191231 he28a2e2_2 conda-forge
libev 4.33 h516909a_1 conda-forge
libevent 2.1.10 h9b69904_4 conda-forge
libffi 3.4.2 h7f98852_5 conda-forge
libgcc-ng 12.1.0 h8d9b700_16 conda-forge
libgfortran-ng 12.1.0 h69a702a_16 conda-forge
libgfortran5 12.1.0 hdcd56e2_16 conda-forge
libglib 2.72.1 h2d90d5f_0 conda-forge
libgomp 12.1.0 h8d9b700_16 conda-forge
libiconv 1.16 h516909a_0 conda-forge
liblapack 3.9.0 16_linux64_openblas conda-forge
liblapacke 3.9.0 16_linux64_openblas conda-forge
libllvm11 11.1.0 hf817b99_3 conda-forge
libnetcdf 4.8.1 nompi_h329d8a1_102 conda-forge
libnghttp2 1.47.0 hdcd2b5c_1 conda-forge
libnsl 2.0.0 h7f98852_0 conda-forge
libopenblas 0.3.21 pthreads_h78a6416_1 conda-forge
libpng 1.6.37 h753d276_4 conda-forge
libpq 12.9 h16c4e8d_3
libprotobuf 3.20.1 h6239696_1 conda-forge
libsqlite 3.39.2 h753d276_1 conda-forge
libssh2 1.10.0 haa6b8db_3 conda-forge
libstdcxx-ng 12.1.0 ha89aaad_16 conda-forge
libtiff 4.4.0 h0e0dad5_3 conda-forge
libuuid 2.32.1 h7f98852_1000 conda-forge
libwebp-base 1.2.4 h166bdaf_0 conda-forge
libxcb 1.13 h7f98852_1004 conda-forge
libxkbcommon 1.0.3 he3ba5ed_0 conda-forge
libxml2 2.9.10 h68273f3_2 conda-forge
libxslt 1.1.33 hf705e74_1 conda-forge
libzip 1.9.2 hc869a4a_1 conda-forge
libzlib 1.2.12 h166bdaf_2 conda-forge
lxml 4.8.0 py37h540881e_3 conda-forge
lz4-c 1.9.3 h9c3ff4c_1 conda-forge
markdown 3.4.1 pyhd8ed1ab_0 conda-forge
matplotlib 3.2.2 1 conda-forge
matplotlib-base 3.2.2 py37h1d35a4c_1 conda-forge
mccabe 0.7.0 pyhd8ed1ab_0 conda-forge
metis 5.1.0 h58526e2_1006 conda-forge
msgpack 1.0.4 pypi_0 pypi
multidict 6.0.2 py37h540881e_1 conda-forge
mumps-include 5.2.1 ha770c72_11 conda-forge
mumps-seq 5.2.1 h2104b81_11 conda-forge
mysql-common 8.0.30 haf5c9bc_0 conda-forge
mysql-libs 8.0.30 h28c427c_0 conda-forge
ncurses 6.3 h27087fc_1 conda-forge
netcdf4 1.5.8 nompi_py37hf784469_101 conda-forge
nomkl 1.0 h5ca1d4c_0 conda-forge
nose 1.3.7 py_1006 conda-forge
nspr 4.32 h9c3ff4c_1 conda-forge
nss 3.78 h2350873_0 conda-forge
numexpr 2.8.0 py37hfe5f03c_101 conda-forge
numpy 1.18.5 py37h8960a57_0 conda-forge
numpy-financial 1.0.0 pyhd8ed1ab_0 conda-forge
nvidia-ml-py 11.495.46 pypi_0 pypi
oauthlib 3.2.0 pyhd8ed1ab_0 conda-forge
opencensus 0.11.0 pypi_0 pypi
opencensus-context 0.1.3 pypi_0 pypi
openjpeg 2.5.0 h7d73246_1 conda-forge
openssl 1.1.1q h166bdaf_0 conda-forge
opt_einsum 3.3.0 pyhd8ed1ab_1 conda-forge
pandas 1.1.5 py37hdc94413_0 conda-forge
patsy 0.5.2 pyhd8ed1ab_0 conda-forge
pcre 8.45 h9c3ff4c_0 conda-forge
pillow 9.2.0 py37h850a105_2 conda-forge
pip 22.2.2 pyhd8ed1ab_0 conda-forge
pkgutil-resolve-name 1.3.10 pypi_0 pypi
platformdirs 2.5.2 pyhd8ed1ab_1 conda-forge
ply 3.11 py_1 conda-forge
prometheus-client 0.13.1 pypi_0 pypi
protobuf 3.20.1 py37hd23a5d3_0 conda-forge
psutil 5.9.1 py37h540881e_0 conda-forge
pthread-stubs 0.4 h36c2ea0_1001 conda-forge
py-spy 0.3.12 pypi_0 pypi
pyasn1 0.4.8 py_0 conda-forge
pyasn1-modules 0.2.7 py_0 conda-forge
pycparser 2.21 pyhd8ed1ab_0 conda-forge
pyjwt 2.4.0 pyhd8ed1ab_0 conda-forge
pylint 2.14.5 pyhd8ed1ab_0 conda-forge
pyomo 6.4.2 py37hd23a5d3_0 conda-forge
pyopenssl 22.0.0 pyhd8ed1ab_0 conda-forge
pyparsing 3.0.9 pyhd8ed1ab_0 conda-forge
pyrsistent 0.18.1 pypi_0 pypi
pyside2 5.13.2 py37hfa98aef_4 conda-forge
pysocks 1.7.1 py37h89c1867_5 conda-forge
python 3.7.12 hb7a2778_100_cpython conda-forge
python-dateutil 2.8.2 pyhd8ed1ab_0 conda-forge
python_abi 3.7 2_cp37m conda-forge
pytz 2022.2.1 pyhd8ed1ab_0 conda-forge
pyu2f 0.1.5 pyhd8ed1ab_0 conda-forge
pyutilib 6.0.0 pyh9f0ad1d_0 conda-forge
pyyaml 6.0 pypi_0 pypi
qt 5.12.9 h763d07f_1 conda-forge
ray 1.13.0 pypi_0 pypi
re2 2022.06.01 h27087fc_0 conda-forge
readline 8.1.2 h0f457ee_0 conda-forge
requests 2.28.1 pyhd8ed1ab_0 conda-forge
requests-oauthlib 1.3.1 pyhd8ed1ab_0 conda-forge
rsa 4.9 pyhd8ed1ab_0 conda-forge
scikit-learn 0.24.2 py37hf0f1638_1 conda-forge
scipy 1.5.3 py37h14a347d_0 conda-forge
scotch 6.0.9 hb2e6521_2 conda-forge
setuptools 59.8.0 py37h89c1867_1 conda-forge
six 1.16.0 pyh6c4a22f_0 conda-forge
smart-open 6.0.0 pypi_0 pypi
sqlite 3.39.2 h4ff8645_1 conda-forge
statsmodels 0.12.2 py37hb1e94ed_0 conda-forge
swig 4.0.2 hd3c618e_2 conda-forge
tensorboard 2.8.0 pyhd8ed1ab_1 conda-forge
tensorboard-data-server 0.6.0 py37h38fbfac_2 conda-forge
tensorboard-plugin-wit 1.8.1 pyhd8ed1ab_0 conda-forge
tensorflow 2.0.0 mkl_py37h66b46cc_0
tensorflow-base 2.0.0 mkl_py37h9204916_0
tensorflow-estimator 2.5.0 pyh8a188c0_0 conda-forge
termcolor 1.1.0 pyhd8ed1ab_3 conda-forge
threadpoolctl 3.1.0 pyh8a188c0_0 conda-forge
tk 8.6.12 h27826a3_0 conda-forge
tomli 2.0.1 pyhd8ed1ab_0 conda-forge
tomlkit 0.11.4 pyha770c72_0 conda-forge
tornado 6.2 py37h540881e_0 conda-forge
typed-ast 1.5.4 py37h540881e_0 conda-forge
typing 3.10.0.0 pyhd8ed1ab_0 conda-forge
typing-extensions 4.3.0 hd8ed1ab_0 conda-forge
typing_extensions 4.3.0 pyha770c72_0 conda-forge
unixodbc 2.3.10 h583eb01_0 conda-forge
urllib3 1.26.11 pyhd8ed1ab_0 conda-forge
virtualenv 20.16.3 pypi_0 pypi
wcwidth 0.2.5 pypi_0 pypi
werkzeug 0.16.1 py_0 conda-forge
wheel 0.37.1 pyhd8ed1ab_0 conda-forge
wrapt 1.14.1 py37h540881e_0 conda-forge
xarray 0.16.2 pyhd8ed1ab_0 conda-forge
xmlschema 2.0.2 pypi_0 pypi
xorg-libxau 1.0.9 h7f98852_0 conda-forge
xorg-libxdmcp 1.1.3 h7f98852_0 conda-forge
xz 5.2.6 h166bdaf_0 conda-forge
yarl 1.7.2 py37h540881e_2 conda-forge
zipp 3.8.1 pyhd8ed1ab_0 conda-forge
zlib 1.2.12 h166bdaf_2 conda-forge
zstd 1.5.2 h8a70e8d_4 conda-forge
$ pip list
Package Version
------------------------ ---------
absl-py 0.15.0
aiohttp 3.8.1
aiohttp-cors 0.7.0
aiosignal 1.2.0
astor 0.8.1
astroid 2.11.6
async-timeout 4.0.2
asynctest 0.13.0
attrs 22.1.0
blessed 1.19.1
blinker 1.4
brotlipy 0.7.0
cached-property 1.5.2
cachetools 5.2.0
certifi 2022.6.15
cffi 1.15.1
cftime 1.6.0
charset-normalizer 2.1.0
click 8.0.4
cloudpickle 1.6.0
colorama 0.4.5
colorful 0.5.4
coverage 6.4.4
cryptography 37.0.4
cycler 0.11.0
DaemonLite 0.0.2
dill 0.3.5.1
distlib 0.3.5
elementpath 3.0.2
filelock 3.8.0
frozenlist 1.3.1
gast 0.2.2
google-api-core 2.8.2
google-auth 2.10.0
google-auth-oauthlib 0.4.6
google-pasta 0.2.0
googleapis-common-protos 1.56.4
gpustat 1.0.0rc1
grpcio 1.43.0
h5py 3.6.0
h5py-wrapper 1.1.0
hdfdict 0.3.1
idna 3.3
imageio 2.9.0
importlib-metadata 4.11.4
importlib-resources 5.9.0
isort 5.10.1
joblib 1.1.0
jsonschema 4.10.0
Keras-Applications 1.0.8
Keras-Preprocessing 1.1.2
kiwisolver 1.4.4
lazy-object-proxy 1.7.1
lxml 4.8.0
Markdown 3.4.1
matplotlib 3.2.2
mccabe 0.7.0
msgpack 1.0.4
multidict 6.0.2
netCDF4 1.5.8
nose 1.3.7
numexpr 2.8.0
numpy 1.18.5
numpy-financial 1.0.0
nvidia-ml-py 11.495.46
oauthlib 3.2.0
opencensus 0.11.0
opencensus-context 0.1.3
opt-einsum 3.3.0
pandas 1.1.5
patsy 0.5.2
Pillow 9.2.0
pip 22.2.2
pkgutil_resolve_name 1.3.10
platformdirs 2.5.2
ply 3.11
prometheus-client 0.13.1
protobuf 3.20.1
psutil 5.9.1
py-spy 0.3.12
pyasn1 0.4.8
pyasn1-modules 0.2.7
pycparser 2.21
PyJWT 2.4.0
pylint 2.14.5
Pyomo 6.4.2
pyOpenSSL 22.0.0
pyparsing 3.0.9
pyrsistent 0.18.1
PySocks 1.7.1
pytest-runner 5.3.1
python-dateutil 2.8.2
pytz 2022.2.1
pyu2f 0.1.5
PyUtilib 6.0.0
PyYAML 6.0
ray 1.13.0
requests 2.28.1
requests-oauthlib 1.3.1
rsa 4.9
scikit-learn 0.24.2
scipy 1.5.3
setuptools 59.8.0
six 1.16.0
smart-open 6.0.0
statsmodels 0.12.2
tensorboard 2.8.0
tensorboard-data-server 0.6.0
tensorboard-plugin-wit 1.8.1
tensorflow 2.0.0
tensorflow-estimator 2.5.0
termcolor 1.1.0
threadpoolctl 3.1.0
tomli 2.0.1
tomlkit 0.11.4
tornado 6.2
typed-ast 1.5.4
typing_extensions 4.3.0
urllib3 1.26.11
virtualenv 20.16.3
wcwidth 0.2.5
Werkzeug 0.16.1
wheel 0.37.1
wrapt 1.14.1
xarray 0.16.2
xmlschema 2.0.2
xmltodict 0.12.0
yarl 1.7.2
zipp 3.8.1
$ python
Python 3.7.12 | packaged by conda-forge | (default, Oct 26 2021, 06:08:53)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
Reproduction script
Scripts used: creation of environment:
conda install --name raven_libraries_heron_newer_ray -y -c conda-forge h5py numpy=1.18 scipy=1.5 scikit-learn=0.24 pandas=1.1 xarray=0.16 netcdf4=1.5 matplotlib=3.2 statsmodels=0.12 cloudpickle=1.6 tensorflow=2.0 python=3 hdf5 swig pylint coverage lxml psutil pip importlib_metadata pyside2 nomkl numexpr imageio=2.9 setuptools dill pyomo pyutilib glpk ipopt coincbc numpy-financial
pip install ray[default]==1.13.* xmlschema
ray_start.sh
#!/bin/bash
source /home/fred/miniconda3/etc/profile.d/conda.sh
conda activate raven_libraries_heron_newer_ray
NUM_CPUS=$1
HEAD_ADDRESS=$2
ray start --verbose --address=$HEAD_ADDRESS --num-cpus $NUM_CPUS --min-worker-port 10002 --max-worker-port $((10002+8*$NUM_CPUS))
ray_test.py
import ray
import time
ray.init(address='auto')
start = time.time()
@ray.remote
def f(x):
a = [2**2**2**2**2 for x in range(100000)]
return x * x
print("starting remote functions")
futures = [f.remote(i) for i in range(4)]
print("remotes started")
print(ray.get(futures)) # [0, 1, 4, 9]
end = time.time()
print("time: ",end - start)
Head node:
ray start --head --num-cpus=1 --port=0 --min-worker-port 10002 --max-worker-port 10010`
...
#adjust RAY_ADDRESS based on output of above
export RAY_ADDRESS=10.159.0.79:60085
#Note that PBS_NODEFILE is a list of nodes hostnames that can be used
for NODE in `cat $PBS_NODEFILE | uniq`; do
if echo $NODE | grep -q `hostname`; then
echo skipping $NODE;
else
echo $NODE
ssh $NODE /home/fred/notes-for-various-projects/raven/ray_start.sh 1 $RAY_ADDRESS
fi
done
python ray_test.py
I checked that ports were available by running: netstat --all | grep ":1000"
Issue Severity
High: It blocks me from completing my task.
Issue Analytics
- State:
- Created a year ago
- Comments:25 (7 by maintainers)
Top Results From Across the Web
Configuring Ray — Ray 2.2.0 - the Ray documentation
Ports configurations#. Ray requires bi-directional communication among its nodes in a cluster. Each of node is supposed to open specific ports to receive ......
Read more >Our People - Blackstone
Our employees are integral to the firm's culture of integrity, professionalism and excellence. ... Raymond Chan. Managing Director.
Read more >Untitled
Fix to correct the sense of aggressiveness level and documentation update ... memory leak due to ptp channel (Taehee Yoo) - sfc: Use...
Read more >Googlelist – MIT was we will home can us about if page my
the of and to a in for is on s that by this with i you it not or be are from at...
Read more >AIX Version 7.2: Device management - IBM
If a disk failure occurs, the volume group remains active as long as ... When this limit is exceeded, you might see a...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Sure, it would be great if you want to take a look:
This is still a problem in ray 2.1. I checked with python 3.10.8 and: