Tasks marked as "UP_FOR_RESCHEDULE" get stuck in Executor.running and never reschedule

See original GitHub issue

Apache Airflow version

2.3.3

What happened

Upon upgrading from Airflow 2.1.3 to Airflow 2.3.3 we have an issue with our sensors that have mode=‘reschedule’. Using TimeSensor as example:

  1. It executes as normal on the first run
  2. It detects it is not the correct time yet and marks itself “UP_FOR_RESCEDULE” (usually to rescheduled for 5 minutes in the future)
  3. When the time comes to be rescheduled it just gets marked as “QUEUED” and is never actually run again, the error in the log: [2022-08-15 00:01:11,027] {base_executor.py:215} ERROR - could not queue task TaskInstanceKey(dag_id='TestDAG', task_id='testTASK', run_id='scheduled__2022-08-12T04:00:00+00:00', try_number=1, map_index=-1) (still running after 4 attempts)

Looking at the relevant code (https://github.com/apache/airflow/blob/2.3.3/airflow/executors/base_executor.py#L215) it seems that the Task Key was never removed from self.running after it initially rescheduled itself.

What you think should happen instead

Rescheduled tasks should reschedule

How to reproduce

  1. Airflow 2.3.3 from Docker
  2. Celery 5.2.7 with Redis backend
  3. MySQL 8
  4. Airflow Timezone set to America/New_York
  5. Have a normal (non-async) sensor that has mode reschedule and needs to reschedule itself

Operating System

Fedora 29

Versions of Apache Airflow Providers

No response

Deployment

Docker-Compose

Deployment details

No response

Anything else

The symptoms of this discussion sounds the same, but no one has replied on it yet: https://github.com/apache/airflow/discussions/25651

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Issue Analytics

  • State:closed
  • Created a year ago
  • Reactions:1
  • Comments:6 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
notatallshaw-gtscommented, Aug 15, 2022

It appears this was never an issue before 2.3.0 because the CeleryExecutor implemented it’s own trigger_tasks logic, until this PR landed: https://github.com/apache/airflow/pull/23016

0reactions
notatallshaw-gtscommented, Aug 18, 2022

Looks like it was our fault!

It seems the issue was that our scheduler celery results backend was pointing to a different database than our worker celery results backend 🤦‍♂️.

Thanks for responding earlier, sorry it was on our side.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Airflow Sensors failing after getting UP_FOR_RESCHEDULE
Ideally, the task status should be UP_FOR_RESCHEDULE, but it becomes failed and even after configuring retries, it doesn't retry again. The ...
Read more >
100% of our tasks are stuck in "scheduled" state this morning ...
Marking all the running DagRuns as failed does not change the state of the tasks from scheduled to failed. Current next step is...
Read more >
[GitHub] [airflow] podhornyi opened a new issue #15077
... to executor with priority 3 and queue default"} ``` Scheduler logs Task which stuck in queue: `test_reschedule.up-for-reschedule.task_1` ...
Read more >
Executors hang with supposedly running task that are really ...
Running on a six-node cluster, each of the executors end up with 5-7 tasks that are never marked as completed. Here's an excerpt...
Read more >
Part 5: How to Resolve Common Errors When Switching to ...
When you have failed tasks, you need to find the Stage that the tasks belong to. To do this, click on Stages in...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found