[core] ray.kill pending actor doesn't cancel the actor creation task

See original GitHub issue

What is the problem?

Currently, ray.kill will silently fail if the actor has not already been started. This appears to be because we try to kill actors directly (via direct actor transport), but now GCS is responsible for scheduling/creating actors, so the actor’s owner can’t easily cancel the pending lease request.

Here’s a simple reproduction which shows the lease request is still infeasible in a raylet.

import ray
from ray._raylet import GlobalStateAccessor
import time

cluster = ray.init()

global_state_accessor = GlobalStateAccessor(
    cluster["redis_address"], ray.ray_constants.REDIS_DEFAULT_PASSWORD)
global_state_accessor.connect()


@ray.remote(resources={"WORKER": 1.0})
class ActorA:
    pass

a = ActorA.remote()
ray.kill(a) # do not wait until it starts

while True:
    message = global_state_accessor.get_all_resource_usage()
    if message is not None:
        resource_usage = ray.gcs_utils.ResourceUsageBatchData.FromString(
            message)
        print(resource_usage)
    else:
        print(message)
    time.sleep(1)

cc @ericl

Ray version and other system information (Python version, TensorFlow version, OS):

Reproduction (REQUIRED)

Please provide a short code snippet (less than 50 lines if possible) that can be copy-pasted to reproduce the issue. The snippet should have no external library dependencies (i.e., use fake or mock data / environments):

If the code snippet cannot be run by itself, the issue will be closed with “needs-repro-script”.

  • I have verified my script runs in a clean environment and reproduces the issue.
  • I have verified the issue also occurs with the latest wheels.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:1
  • Comments:18 (18 by maintainers)

github_iconTop GitHub Comments

1reaction
wuisawesomecommented, Jan 12, 2021

Yes, I think the version with lease_client->CancelWorkerLease looks right.

0reactions
ffbincommented, Jan 12, 2021

I call lease_client->CancelWorkerLease in gcs actor scheduler and it will cancel actor creation request. running Alex's script with no_restart True: batch { node_id: “\206\363\344\377<3^\350y's\250\264x\214+\035\010<Us\177\255\323\362"\035H” resources_available { key: “CPU” value: 8.0 } resources_available { key: “memory” value: 99.0 } resources_available { key: “node:10.15.246.254” value: 1.0 } resources_available { key: “object_store_memory” value: 34.0 } resources_available_changed: true resources_total { key: “CPU” value: 8.0 } resources_total { key: “memory” value: 99.0 } resources_total { key: “node:10.15.246.254” value: 1.0 } resources_total { key: “object_store_memory” value: 34.0 } resource_load_changed: true resource_load_by_shape { } } placement_group_load { } running Alex's script with no_restart False: batch { node_id: “\273\315\313K\234\272X\264\014xE0\020\006Y\333\367\313^\013\264\324\303#\343\030z\342” resources_available { key: “CPU” value: 8.0 } resources_available { key: “memory” value: 95.0 } resources_available { key: “node:10.15.246.254” value: 1.0 } resources_available { key: “object_store_memory” value: 33.0 } resources_available_changed: true resources_total { key: “CPU” value: 8.0 } resources_total { key: “memory” value: 95.0 } resources_total { key: “node:10.15.246.254” value: 1.0 } resources_total { key: “object_store_memory” value: 33.0 } resource_load { key: “CPU” value: 1.0 } resource_load { key: “WORKER” value: 1.0 } resource_load_changed: true resource_load_by_shape { resource_demands { shape { key: “CPU” value: 1.0 } shape { key: “WORKER” value: 1.0 } num_infeasible_requests_queued: 1 } } } resource_load_by_shape { resource_demands { shape { key: “CPU” value: 1.0 } shape { key: “WORKER” value: 1.0 } num_infeasible_requests_queued: 1 } } placement_group_load { }

Read more comments on GitHub >

github_iconTop Results From Across the Web

The pending tasks/actors remain on Ray Cluster when the ...
Hi there, I summited a ray job via a python script on the terminal. I know if I press ctrl+c to kill the...
Read more >
How to kill ray tasks when the driver is dead - Stack Overflow
Remote functions can be canceled by calling ray.cancel on the returned Object ref. Remote actor functions can be stopped by killing the ...
Read more >
U fa CAUSES, ORIGINS, AND LESSONS OF THE - GovInfo
So there is a distinction here. He will stop the acts of force, which,. I assume, will include the bombing in Laos and...
Read more >
Louisiana business community doesn't rely on tax votes when ...
Sen. Neil Riser, who oversaw all tax votes as chair of the Senate Revenue & Fiscal Affairs Committee, also received LABI's early backing....
Read more >
Unreal Engine 4.9 Released! - Unreal Engine
An Actor's Custom Time Dilation does not apply to the Tick Interval. ... the Pawn Actions Component when the Controlled Pawn is pending...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found