[Core] Ray may hang if workers fail to start due to limited ports

See original GitHub issue

What happened + What you expected to happen

  1. If --min-worker-port and --max-worker-port are used on a cluster, ray stops running any remote functions.
  2. ray would be able to find a node that has an available port and port and run on it. (As in the sample given below, I am asking for 4 remotes to run on 6 nodes and have given each node 8 ports, and the run completely stops.) See #27499 for why I am using worker-ports. (From the Ray Architecture document: “Ray worker nodes are designed to be homogeneous, so that any single node may be lost without bringing down the entire cluster.” Basically, this is a single node failure that stops progress in the entire cluster.)
  3. below

Basically, ray status seems to think there are available cpus (1/6 used) but the ray_test python program is unable to find them.

ray status --address=10.159.0.79:58391
======== Autoscaler status: 2022-08-23 12:36:28.035691 ========
Node status
---------------------------------------------------------------
Healthy:
 1 node_991fc91accbb9cc5e9294c23d505be10da58c11a6f679c5562fab5b8
 1 node_c3238c760e1c81564e6d0dbff838b3678bba29bdd221c1dcaabfb19b
 1 node_a4b36956a2fed8fa17ca63ee3155b0c68bb2e04c094d5366eff0b9e5
 1 node_49970fd58f7f63a57aa2218156c24919dd6755b721a69260f2db7992
 1 node_50b4b421ab67619fd61ac967de3f72ebce9a3d1826755bd0c0b9ce86
 1 node_58f9c045ad2a1bda3ac013012ef758a00c05b93b88688051ca4c67d9
Pending:
 (no pending nodes)
Recent failures:
 (no failures)

Resources
---------------------------------------------------------------
Usage:
 1.0/6.0 CPU
 0.00/571.893 GiB memory
 0.00/249.089 GiB object_store_memory

Demands:
 {'CPU': 1.0}: 1+ pending tasks/actors
$ python ray_test.py
starting remote functions
remotes started
(raylet, ip=10.159.0.92) [2022-08-23 12:33:11,849 E 197968 197968] core_worker.cc:137: Failed to register worker 9389d63fbf6022579d3f4f76962f5e43aecd706984ed2535e68e4f88 to Raylet. Invalid: Invalid: No available ports. Please specify a wider port range using --min-worker-port and --max-worker-port.
(raylet, ip=10.159.0.94) [2022-08-23 12:33:48,045 E 46719 46719] (raylet) worker_pool.cc:502: Some workers of the worker process(55142) have not registered within the timeout. The process is dead, probably it crashed during start.
(raylet, ip=10.159.0.81) [2022-08-23 12:33:48,163 E 101832 101832] (raylet) worker_pool.cc:502: Some workers of the worker process(109680) have not registered within the timeout. The process is dead, probably it crashed during start.
(raylet, ip=10.159.0.92) [2022-08-23 12:34:10,844 E 190127 190127] (raylet) worker_pool.cc:502: Some workers of the worker process(197968) have not registered within the timeout. The process is dead, probably it crashed during start.
(raylet, ip=10.159.0.92) [2022-08-23 12:34:11,385 E 198185 198185] core_worker.cc:137: Failed to register worker c547d2a09340527eb29e316cbdbf34cdbb93fdda3dbdf8124902bc83 to Raylet. Invalid: Invalid: No available ports. Please specify a wider port range using --min-worker-port and --max-worker-port.
(raylet, ip=10.159.0.92) [2022-08-23 12:35:10,848 E 190127 190127] (raylet) worker_pool.cc:502: Some workers of the worker process(198185) have not registered within the timeout. The process is dead, probably it crashed during start.
(raylet, ip=10.159.0.92) [2022-08-23 12:35:11,634 E 198353 198353] core_worker.cc:137: Failed to register worker 8e47343dfaf4ddd666f2c9b7e1767fc822b09a6e908d575e620e8df8 to Raylet. Invalid: Invalid: No available ports. Please specify a wider port range using --min-worker-port and --max-worker-port.
...
#running ps -u fred -o pcpu,pid,ppid,thcount,user,size,stat,comm,args | grep 'ray::'
#start: 
93.1  94252  86552    21 fred   7812360 Rl ray::f()        ray::f()
90.6 159525 152200    21 fred   7840232 Rl ray::f()        ray::f()
96.3 136620 122245    21 fred   1914708 Rl ray::f()        ray::f()
#when making no progress
23.5 136620 122245    21 fred   1398696 Sl ray::IDLE       ray::IDLE
23.4  94252  86552    21 fred   7357788 Sl ray::IDLE       ray::IDLE
23.3 159525 152200    21 fred   7357852 Sl ray::IDLE       ray::IDLE

Versions / Dependencies

(Note that this was tested with ray 2.0 and still exists: https://github.com/ray-project/ray/issues/28071#issuecomment-1227623743 )

$ conda list
# packages in environment at /home/fred/miniconda3/envs/raven_libraries_heron_newer_ray:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
_tflow_select             2.3.0                       mkl  
absl-py                   0.15.0             pyhd8ed1ab_0    conda-forge
aiohttp                   3.8.1            py37h540881e_1    conda-forge
aiohttp-cors              0.7.0                    pypi_0    pypi
aiosignal                 1.2.0              pyhd8ed1ab_0    conda-forge
ampl-mp                   3.1.0             h2cc385e_1006    conda-forge
astor                     0.8.1              pyh9f0ad1d_0    conda-forge
astroid                   2.11.6           py37h89c1867_0    conda-forge
async-timeout             4.0.2              pyhd8ed1ab_0    conda-forge
asynctest                 0.13.0                     py_0    conda-forge
attrs                     22.1.0             pyh71513ae_1    conda-forge
blessed                   1.19.1                   pypi_0    pypi
blinker                   1.4                        py_1    conda-forge
brotlipy                  0.7.0           py37h540881e_1004    conda-forge
bzip2                     1.0.8                h7f98852_4    conda-forge
c-ares                    1.18.1               h7f98852_0    conda-forge
ca-certificates           2022.6.15            ha878542_0    conda-forge
cached-property           1.5.2                hd8ed1ab_1    conda-forge
cached_property           1.5.2              pyha770c72_1    conda-forge
cachetools                5.2.0              pyhd8ed1ab_0    conda-forge
certifi                   2022.6.15        py37h89c1867_0    conda-forge
cffi                      1.15.1           py37h43b0acd_0    conda-forge
cftime                    1.6.0            py37h6c7ee08_0    conda-forge
charset-normalizer        2.1.0              pyhd8ed1ab_0    conda-forge
click                     8.0.4                    pypi_0    pypi
cloudpickle               1.6.0                      py_0    conda-forge
coin-or-cbc               2.10.8               h3786ebc_0    conda-forge
coin-or-cgl               0.60.6               h6f57e76_1    conda-forge
coin-or-clp               1.17.7               hc56784d_1    conda-forge
coin-or-osi               0.108.7              h2720bb7_1    conda-forge
coin-or-utils             2.11.6               h202d8b1_1    conda-forge
coincbc                   2.10.8            0_metapackage    conda-forge
colorama                  0.4.5              pyhd8ed1ab_0    conda-forge
colorful                  0.5.4                    pypi_0    pypi
coverage                  6.4.4            py37h540881e_0    conda-forge
cryptography              37.0.4           py37h38fbfac_0    conda-forge
curl                      7.83.1               h7bff187_0    conda-forge
cycler                    0.11.0             pyhd8ed1ab_0    conda-forge
dbus                      1.13.6               h5008d03_3    conda-forge
dill                      0.3.5.1            pyhd8ed1ab_0    conda-forge
distlib                   0.3.5                    pypi_0    pypi
elementpath               3.0.2                    pypi_0    pypi
expat                     2.4.8                h27087fc_0    conda-forge
filelock                  3.8.0                    pypi_0    pypi
fontconfig                2.14.0               h8e229c2_0    conda-forge
freetype                  2.12.1               hca18f0e_0    conda-forge
frozenlist                1.3.1            py37h540881e_0    conda-forge
gast                      0.2.2                      py_0    conda-forge
gettext                   0.19.8.1          h73d1719_1008    conda-forge
glib                      2.72.1               h6239696_0    conda-forge
glib-tools                2.72.1               h6239696_0    conda-forge
glpk                      5.0                  h445213a_0    conda-forge
gmp                       6.2.1                h58526e2_0    conda-forge
google-api-core           2.8.2                    pypi_0    pypi
google-auth               2.10.0             pyh6c4a22f_0    conda-forge
google-auth-oauthlib      0.4.6              pyhd8ed1ab_0    conda-forge
google-pasta              0.2.0              pyh8c360ce_0    conda-forge
googleapis-common-protos  1.56.4                   pypi_0    pypi
gpustat                   1.0.0rc1                 pypi_0    pypi
grpc-cpp                  1.48.0               hbad87ad_3    conda-forge
grpcio                    1.43.0                   pypi_0    pypi
gst-plugins-base          1.14.5               h0935bb2_2    conda-forge
gstreamer                 1.18.5               h9f60fe5_3    conda-forge
h5py                      3.6.0           nompi_py37hd308b1e_100    conda-forge
hdf4                      4.2.15               h9772cbc_4    conda-forge
hdf5                      1.12.1          nompi_h2386368_104    conda-forge
icu                       67.1                 he1b5a44_0    conda-forge
idna                      3.3                pyhd8ed1ab_0    conda-forge
imageio                   2.9.0                      py_0    conda-forge
importlib-metadata        4.11.4           py37h89c1867_0    conda-forge
importlib-resources       5.9.0                    pypi_0    pypi
importlib_metadata        4.11.4               hd8ed1ab_0    conda-forge
ipopt                     3.14.9               hc8a599a_0    conda-forge
isort                     5.10.1             pyhd8ed1ab_0    conda-forge
joblib                    1.1.0              pyhd8ed1ab_0    conda-forge
jpeg                      9e                   h166bdaf_2    conda-forge
jsonschema                4.10.0                   pypi_0    pypi
keras-applications        1.0.8                      py_1    conda-forge
keras-preprocessing       1.1.2              pyhd8ed1ab_0    conda-forge
keyutils                  1.6.1                h166bdaf_0    conda-forge
kiwisolver                1.4.4            py37h7cecad7_0    conda-forge
krb5                      1.19.3               h3790be6_0    conda-forge
lazy-object-proxy         1.7.1            py37h540881e_1    conda-forge
lcms2                     2.12                 hddcbb42_0    conda-forge
ld_impl_linux-64          2.36.1               hea4e1c9_2    conda-forge
lerc                      4.0.0                h27087fc_0    conda-forge
libabseil                 20220623.0      cxx17_h48a1fff_1    conda-forge
libblas                   3.9.0           16_linux64_openblas    conda-forge
libcblas                  3.9.0           16_linux64_openblas    conda-forge
libclang                  11.1.0          default_ha53f305_1    conda-forge
libcurl                   7.83.1               h7bff187_0    conda-forge
libdeflate                1.13                 h166bdaf_0    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 h516909a_1    conda-forge
libevent                  2.1.10               h9b69904_4    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc-ng                 12.1.0              h8d9b700_16    conda-forge
libgfortran-ng            12.1.0              h69a702a_16    conda-forge
libgfortran5              12.1.0              hdcd56e2_16    conda-forge
libglib                   2.72.1               h2d90d5f_0    conda-forge
libgomp                   12.1.0              h8d9b700_16    conda-forge
libiconv                  1.16                 h516909a_0    conda-forge
liblapack                 3.9.0           16_linux64_openblas    conda-forge
liblapacke                3.9.0           16_linux64_openblas    conda-forge
libllvm11                 11.1.0               hf817b99_3    conda-forge
libnetcdf                 4.8.1           nompi_h329d8a1_102    conda-forge
libnghttp2                1.47.0               hdcd2b5c_1    conda-forge
libnsl                    2.0.0                h7f98852_0    conda-forge
libopenblas               0.3.21          pthreads_h78a6416_1    conda-forge
libpng                    1.6.37               h753d276_4    conda-forge
libpq                     12.9                 h16c4e8d_3  
libprotobuf               3.20.1               h6239696_1    conda-forge
libsqlite                 3.39.2               h753d276_1    conda-forge
libssh2                   1.10.0               haa6b8db_3    conda-forge
libstdcxx-ng              12.1.0              ha89aaad_16    conda-forge
libtiff                   4.4.0                h0e0dad5_3    conda-forge
libuuid                   2.32.1            h7f98852_1000    conda-forge
libwebp-base              1.2.4                h166bdaf_0    conda-forge
libxcb                    1.13              h7f98852_1004    conda-forge
libxkbcommon              1.0.3                he3ba5ed_0    conda-forge
libxml2                   2.9.10               h68273f3_2    conda-forge
libxslt                   1.1.33               hf705e74_1    conda-forge
libzip                    1.9.2                hc869a4a_1    conda-forge
libzlib                   1.2.12               h166bdaf_2    conda-forge
lxml                      4.8.0            py37h540881e_3    conda-forge
lz4-c                     1.9.3                h9c3ff4c_1    conda-forge
markdown                  3.4.1              pyhd8ed1ab_0    conda-forge
matplotlib                3.2.2                         1    conda-forge
matplotlib-base           3.2.2            py37h1d35a4c_1    conda-forge
mccabe                    0.7.0              pyhd8ed1ab_0    conda-forge
metis                     5.1.0             h58526e2_1006    conda-forge
msgpack                   1.0.4                    pypi_0    pypi
multidict                 6.0.2            py37h540881e_1    conda-forge
mumps-include             5.2.1               ha770c72_11    conda-forge
mumps-seq                 5.2.1               h2104b81_11    conda-forge
mysql-common              8.0.30               haf5c9bc_0    conda-forge
mysql-libs                8.0.30               h28c427c_0    conda-forge
ncurses                   6.3                  h27087fc_1    conda-forge
netcdf4                   1.5.8           nompi_py37hf784469_101    conda-forge
nomkl                     1.0                  h5ca1d4c_0    conda-forge
nose                      1.3.7                   py_1006    conda-forge
nspr                      4.32                 h9c3ff4c_1    conda-forge
nss                       3.78                 h2350873_0    conda-forge
numexpr                   2.8.0           py37hfe5f03c_101    conda-forge
numpy                     1.18.5           py37h8960a57_0    conda-forge
numpy-financial           1.0.0              pyhd8ed1ab_0    conda-forge
nvidia-ml-py              11.495.46                pypi_0    pypi
oauthlib                  3.2.0              pyhd8ed1ab_0    conda-forge
opencensus                0.11.0                   pypi_0    pypi
opencensus-context        0.1.3                    pypi_0    pypi
openjpeg                  2.5.0                h7d73246_1    conda-forge
openssl                   1.1.1q               h166bdaf_0    conda-forge
opt_einsum                3.3.0              pyhd8ed1ab_1    conda-forge
pandas                    1.1.5            py37hdc94413_0    conda-forge
patsy                     0.5.2              pyhd8ed1ab_0    conda-forge
pcre                      8.45                 h9c3ff4c_0    conda-forge
pillow                    9.2.0            py37h850a105_2    conda-forge
pip                       22.2.2             pyhd8ed1ab_0    conda-forge
pkgutil-resolve-name      1.3.10                   pypi_0    pypi
platformdirs              2.5.2              pyhd8ed1ab_1    conda-forge
ply                       3.11                       py_1    conda-forge
prometheus-client         0.13.1                   pypi_0    pypi
protobuf                  3.20.1           py37hd23a5d3_0    conda-forge
psutil                    5.9.1            py37h540881e_0    conda-forge
pthread-stubs             0.4               h36c2ea0_1001    conda-forge
py-spy                    0.3.12                   pypi_0    pypi
pyasn1                    0.4.8                      py_0    conda-forge
pyasn1-modules            0.2.7                      py_0    conda-forge
pycparser                 2.21               pyhd8ed1ab_0    conda-forge
pyjwt                     2.4.0              pyhd8ed1ab_0    conda-forge
pylint                    2.14.5             pyhd8ed1ab_0    conda-forge
pyomo                     6.4.2            py37hd23a5d3_0    conda-forge
pyopenssl                 22.0.0             pyhd8ed1ab_0    conda-forge
pyparsing                 3.0.9              pyhd8ed1ab_0    conda-forge
pyrsistent                0.18.1                   pypi_0    pypi
pyside2                   5.13.2           py37hfa98aef_4    conda-forge
pysocks                   1.7.1            py37h89c1867_5    conda-forge
python                    3.7.12          hb7a2778_100_cpython    conda-forge
python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
python_abi                3.7                     2_cp37m    conda-forge
pytz                      2022.2.1           pyhd8ed1ab_0    conda-forge
pyu2f                     0.1.5              pyhd8ed1ab_0    conda-forge
pyutilib                  6.0.0              pyh9f0ad1d_0    conda-forge
pyyaml                    6.0                      pypi_0    pypi
qt                        5.12.9               h763d07f_1    conda-forge
ray                       1.13.0                   pypi_0    pypi
re2                       2022.06.01           h27087fc_0    conda-forge
readline                  8.1.2                h0f457ee_0    conda-forge
requests                  2.28.1             pyhd8ed1ab_0    conda-forge
requests-oauthlib         1.3.1              pyhd8ed1ab_0    conda-forge
rsa                       4.9                pyhd8ed1ab_0    conda-forge
scikit-learn              0.24.2           py37hf0f1638_1    conda-forge
scipy                     1.5.3            py37h14a347d_0    conda-forge
scotch                    6.0.9                hb2e6521_2    conda-forge
setuptools                59.8.0           py37h89c1867_1    conda-forge
six                       1.16.0             pyh6c4a22f_0    conda-forge
smart-open                6.0.0                    pypi_0    pypi
sqlite                    3.39.2               h4ff8645_1    conda-forge
statsmodels               0.12.2           py37hb1e94ed_0    conda-forge
swig                      4.0.2                hd3c618e_2    conda-forge
tensorboard               2.8.0              pyhd8ed1ab_1    conda-forge
tensorboard-data-server   0.6.0            py37h38fbfac_2    conda-forge
tensorboard-plugin-wit    1.8.1              pyhd8ed1ab_0    conda-forge
tensorflow                2.0.0           mkl_py37h66b46cc_0  
tensorflow-base           2.0.0           mkl_py37h9204916_0  
tensorflow-estimator      2.5.0              pyh8a188c0_0    conda-forge
termcolor                 1.1.0              pyhd8ed1ab_3    conda-forge
threadpoolctl             3.1.0              pyh8a188c0_0    conda-forge
tk                        8.6.12               h27826a3_0    conda-forge
tomli                     2.0.1              pyhd8ed1ab_0    conda-forge
tomlkit                   0.11.4             pyha770c72_0    conda-forge
tornado                   6.2              py37h540881e_0    conda-forge
typed-ast                 1.5.4            py37h540881e_0    conda-forge
typing                    3.10.0.0           pyhd8ed1ab_0    conda-forge
typing-extensions         4.3.0                hd8ed1ab_0    conda-forge
typing_extensions         4.3.0              pyha770c72_0    conda-forge
unixodbc                  2.3.10               h583eb01_0    conda-forge
urllib3                   1.26.11            pyhd8ed1ab_0    conda-forge
virtualenv                20.16.3                  pypi_0    pypi
wcwidth                   0.2.5                    pypi_0    pypi
werkzeug                  0.16.1                     py_0    conda-forge
wheel                     0.37.1             pyhd8ed1ab_0    conda-forge
wrapt                     1.14.1           py37h540881e_0    conda-forge
xarray                    0.16.2             pyhd8ed1ab_0    conda-forge
xmlschema                 2.0.2                    pypi_0    pypi
xorg-libxau               1.0.9                h7f98852_0    conda-forge
xorg-libxdmcp             1.1.3                h7f98852_0    conda-forge
xz                        5.2.6                h166bdaf_0    conda-forge
yarl                      1.7.2            py37h540881e_2    conda-forge
zipp                      3.8.1              pyhd8ed1ab_0    conda-forge
zlib                      1.2.12               h166bdaf_2    conda-forge
zstd                      1.5.2                h8a70e8d_4    conda-forge

$ pip list
Package                  Version
------------------------ ---------
absl-py                  0.15.0
aiohttp                  3.8.1
aiohttp-cors             0.7.0
aiosignal                1.2.0
astor                    0.8.1
astroid                  2.11.6
async-timeout            4.0.2
asynctest                0.13.0
attrs                    22.1.0
blessed                  1.19.1
blinker                  1.4
brotlipy                 0.7.0
cached-property          1.5.2
cachetools               5.2.0
certifi                  2022.6.15
cffi                     1.15.1
cftime                   1.6.0
charset-normalizer       2.1.0
click                    8.0.4
cloudpickle              1.6.0
colorama                 0.4.5
colorful                 0.5.4
coverage                 6.4.4
cryptography             37.0.4
cycler                   0.11.0
DaemonLite               0.0.2
dill                     0.3.5.1
distlib                  0.3.5
elementpath              3.0.2
filelock                 3.8.0
frozenlist               1.3.1
gast                     0.2.2
google-api-core          2.8.2
google-auth              2.10.0
google-auth-oauthlib     0.4.6
google-pasta             0.2.0
googleapis-common-protos 1.56.4
gpustat                  1.0.0rc1
grpcio                   1.43.0
h5py                     3.6.0
h5py-wrapper             1.1.0
hdfdict                  0.3.1
idna                     3.3
imageio                  2.9.0
importlib-metadata       4.11.4
importlib-resources      5.9.0
isort                    5.10.1
joblib                   1.1.0
jsonschema               4.10.0
Keras-Applications       1.0.8
Keras-Preprocessing      1.1.2
kiwisolver               1.4.4
lazy-object-proxy        1.7.1
lxml                     4.8.0
Markdown                 3.4.1
matplotlib               3.2.2
mccabe                   0.7.0
msgpack                  1.0.4
multidict                6.0.2
netCDF4                  1.5.8
nose                     1.3.7
numexpr                  2.8.0
numpy                    1.18.5
numpy-financial          1.0.0
nvidia-ml-py             11.495.46
oauthlib                 3.2.0
opencensus               0.11.0
opencensus-context       0.1.3
opt-einsum               3.3.0
pandas                   1.1.5
patsy                    0.5.2
Pillow                   9.2.0
pip                      22.2.2
pkgutil_resolve_name     1.3.10
platformdirs             2.5.2
ply                      3.11
prometheus-client        0.13.1
protobuf                 3.20.1
psutil                   5.9.1
py-spy                   0.3.12
pyasn1                   0.4.8
pyasn1-modules           0.2.7
pycparser                2.21
PyJWT                    2.4.0
pylint                   2.14.5
Pyomo                    6.4.2
pyOpenSSL                22.0.0
pyparsing                3.0.9
pyrsistent               0.18.1
PySocks                  1.7.1
pytest-runner            5.3.1
python-dateutil          2.8.2
pytz                     2022.2.1
pyu2f                    0.1.5
PyUtilib                 6.0.0
PyYAML                   6.0
ray                      1.13.0
requests                 2.28.1
requests-oauthlib        1.3.1
rsa                      4.9
scikit-learn             0.24.2
scipy                    1.5.3
setuptools               59.8.0
six                      1.16.0
smart-open               6.0.0
statsmodels              0.12.2
tensorboard              2.8.0
tensorboard-data-server  0.6.0
tensorboard-plugin-wit   1.8.1
tensorflow               2.0.0
tensorflow-estimator     2.5.0
termcolor                1.1.0
threadpoolctl            3.1.0
tomli                    2.0.1
tomlkit                  0.11.4
tornado                  6.2
typed-ast                1.5.4
typing_extensions        4.3.0
urllib3                  1.26.11
virtualenv               20.16.3
wcwidth                  0.2.5
Werkzeug                 0.16.1
wheel                    0.37.1
wrapt                    1.14.1
xarray                   0.16.2
xmlschema                2.0.2
xmltodict                0.12.0
yarl                     1.7.2
zipp                     3.8.1
$ python
Python 3.7.12 | packaged by conda-forge | (default, Oct 26 2021, 06:08:53) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 

Reproduction script

Scripts used: creation of environment:

conda install --name raven_libraries_heron_newer_ray -y -c conda-forge h5py numpy=1.18 scipy=1.5 scikit-learn=0.24 pandas=1.1 xarray=0.16 netcdf4=1.5 matplotlib=3.2 statsmodels=0.12 cloudpickle=1.6 tensorflow=2.0 python=3 hdf5 swig pylint coverage lxml psutil pip importlib_metadata pyside2 nomkl numexpr imageio=2.9 setuptools dill pyomo pyutilib glpk ipopt coincbc numpy-financial
pip install  ray[default]==1.13.* xmlschema

ray_start.sh

#!/bin/bash                                                                     

source /home/fred/miniconda3/etc/profile.d/conda.sh
conda activate raven_libraries_heron_newer_ray

NUM_CPUS=$1
HEAD_ADDRESS=$2

ray start --verbose --address=$HEAD_ADDRESS --num-cpus $NUM_CPUS  --min-worker-port 10002 --max-worker-port $((10002+8*$NUM_CPUS))

ray_test.py

import ray
import time
ray.init(address='auto')

start = time.time()

@ray.remote
def f(x):
    a = [2**2**2**2**2 for x in range(100000)]
    return x * x

print("starting remote functions")
futures = [f.remote(i) for i in range(4)]
print("remotes started")
print(ray.get(futures)) # [0, 1, 4, 9]                                          
end = time.time()
print("time: ",end - start)

Head node:

ray start --head --num-cpus=1 --port=0 --min-worker-port 10002 --max-worker-port 10010`
...
#adjust RAY_ADDRESS based on output of above
export RAY_ADDRESS=10.159.0.79:60085

#Note that PBS_NODEFILE is a list of nodes hostnames that can be used
for NODE in `cat $PBS_NODEFILE | uniq`; do
    if echo $NODE | grep -q `hostname`; then
        echo skipping $NODE;
    else
       echo $NODE
       ssh $NODE /home/fred/notes-for-various-projects/raven/ray_start.sh 1 $RAY_ADDRESS
    fi
done

python ray_test.py

I checked that ports were available by running: netstat --all | grep ":1000"

Issue Severity

High: It blocks me from completing my task.

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:25 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
stephanie-wangcommented, Sep 14, 2022

Sure, it would be great if you want to take a look:

0reactions
joshua-cogliati-inlcommented, Dec 13, 2022

This is still a problem in ray 2.1. I checked with python 3.10.8 and:

$ pip list
Package            Version
------------------ ---------
aiosignal          1.3.1
attrs              22.1.0
certifi            2022.12.7
charset-normalizer 2.1.1
click              8.0.4
distlib            0.3.6
filelock           3.8.2
frozenlist         1.3.3
grpcio             1.51.1
idna               3.4
jsonschema         4.17.3
msgpack            1.0.4
numpy              1.23.5
packaging          22.0
pip                22.3.1
platformdirs       2.6.0
protobuf           4.21.11
pyrsistent         0.19.2
PyYAML             6.0
ray                2.1.0
requests           2.28.1
setuptools         65.5.1
six                1.16.0
urllib3            1.26.13
virtualenv         20.17.1
wheel              0.38.4
Read more comments on GitHub >

github_iconTop Results From Across the Web

Configuring Ray — Ray 2.2.0 - the Ray documentation
Ports configurations#. Ray requires bi-directional communication among its nodes in a cluster. Each of node is supposed to open specific ports to receive ......
Read more >
Our People - Blackstone
Our employees are integral to the firm's culture of integrity, professionalism and excellence. ... Raymond Chan. Managing Director.
Read more >
Untitled
Fix to correct the sense of aggressiveness level and documentation update ... memory leak due to ptp channel (Taehee Yoo) - sfc: Use...
Read more >
Googlelist – MIT was we will home can us about if page my
the of and to a in for is on s that by this with i you it not or be are from at...
Read more >
AIX Version 7.2: Device management - IBM
If a disk failure occurs, the volume group remains active as long as ... When this limit is exceeded, you might see a...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found