could not queue task - airflow 2.3.4 - kubernetes executor

See original GitHub issue

Apache Airflow version

2.3.4

What happened

Tasks are no longer scheduled since upgrading to airflow.2.3.4, with logs:

{base_executor.py:211} INFO - task TaskInstanceKey(dag_id='sys_liveness', task_id='liveness', run_id='scheduled__2022-09-14T12:15:00+00:00', try_number=1, map_index=-1) is still running
{base_executor.py:215} ERROR - could not queue task TaskInstanceKey(dag_id='sys_liveness', task_id='liveness', run_id='scheduled__2022-09-14T12:15:00+00:00', try_number=1, map_index=-1) (still running after 4 attempts)

Even with only one dag running in a simple deployment, scheduled every 5 minutes, and enough slot in the default pool.

What you think should happen instead

Tasks should be executed.

How to reproduce

No response

Operating System

Debian GNU/Linux

Versions of Apache Airflow Providers

apache-airflow-providers-amazon = "4.1.0"
apache-airflow-providers-cncf-kubernetes = "4.0.2"
apache-airflow-providers-http = "2.1.2"
apache-airflow-providers-mysql = "2.2.3"
apache-airflow-providers-postgres = "4.1.0"
apache-airflow-providers-ssh = "2.4.4"
apache-airflow-providers-sqlite = "2.1.3"

Deployment

Official Apache Airflow Helm Chart

Deployment details

kubernetes executor

[core]
hostname_callable = socket.getfqdn
default_timezone = utc
executor = KubernetesExecutor
parallelism = 512
max_active_tasks_per_dag = 128
dags_are_paused_at_creation = False
max_active_runs_per_dag = 1
load_examples = False
plugins_folder = /home/airflow/plugins
execute_tasks_new_python_interpreter = False
donot_pickle = False
dagbag_import_timeout = 30.0
dagbag_import_error_tracebacks = True
dagbag_import_error_traceback_depth = 2
dag_file_processor_timeout = 300
task_runner = StandardTaskRunner
default_impersonation =
security =
unit_test_mode = False
enable_xcom_pickling = False
killed_task_cleanup_time = 60
dag_run_conf_overrides_params = True
dag_discovery_safe_mode = True
dag_ignore_file_syntax = regexp
default_task_retries = 0
default_task_weight_rule = downstream
default_task_execution_timeout =
min_serialized_dag_update_interval = 30
compress_serialized_dags = False
min_serialized_dag_fetch_interval = 30
max_num_rendered_ti_fields_per_task = 30
check_slas = True
xcom_backend = airflow.models.xcom.BaseXCom
lazy_load_plugins = True
lazy_discover_providers = True
hide_sensitive_var_conn_fields = True
sensitive_var_conn_names =
default_pool_task_slot_count = 128
max_map_length = 1024
daemon_umask = 0o077
secure_mode = False
dag_concurrency = 256
non_pooled_task_slot_count = 512
sql_alchemy_schema = airflow

[scheduler]
job_heartbeat_sec = 5
scheduler_heartbeat_sec = 5
num_runs = -1
scheduler_idle_sleep_time = 1
min_file_process_interval = 400
deactivate_stale_dags_interval = 60
dag_dir_list_interval = 300
print_stats_interval = 30
pool_metrics_interval = 5.0
scheduler_health_check_threshold = 30
orphaned_tasks_check_interval = 300.0
child_process_log_directory = /home/airflow/logs/scheduler
scheduler_zombie_task_threshold = 300
zombie_detection_interval = 10.0
catchup_by_default = True
ignore_first_depends_on_past_by_default = True
max_tis_per_query = 512
use_row_level_locking = True
max_dagruns_to_create_per_loop = 10
max_dagruns_per_loop_to_schedule = 20
schedule_after_task_execution = True
parsing_processes = 4
file_parsing_sort_mode = modified_time
standalone_dag_processor = False
max_callbacks_per_loop = 20
use_job_schedule = True
allow_trigger_in_future = False
dependency_detector = airflow.serialization.serialized_objects.DependencyDetector
trigger_timeout_check_interval = 15
max_threads = 5

Anything else

When we restart the scheduler, any queued tasks are executed. But the next task after isn’t and is stucked in queued again.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Issue Analytics

  • State:closed
  • Created a year ago
  • Reactions:1
  • Comments:14 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
Miskhocommented, Sep 16, 2022

Hi eveyone,

I am facing the same issue, basically after updating to 2.3.4 some of mine scheduled tasks would not run i.e. the DAG run itself would be labeled as running, but the first task in the DAG would just get set to queued and would never run, the scheduler would report the same

INFO - task TaskInstanceKey(dag_id='...', task_id='...', run_id='scheduled__2022-09-16T12:49:00+00:00', try_number=1, map_index=-1) is still running
scheduler [2022-09-16T12:58:17.664+0000] {base_executor.py:215} ERROR - could not queue task TaskInstanceKey(dag_id='....', task_id='...', run_id='scheduled__2022-09-16T12:47:00+00:00', try_number=1, map_index=-1) (still running after 4 attempts)

Runing the task manually would succeed, but scheduled tasks would not run, if I drop the dag_runs from db, then the first scheduled dag run would succeed but all after it would halt as described above.

0reactions
eladkalcommented, Oct 20, 2022

Cool closing it as non reproducible.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Kubernetes Executor — Airflow Documentation
The Kubernetes executor runs each task instance in its own pod on a Kubernetes cluster. KubernetesExecutor runs as a process in the Airflow...
Read more >
Apache Airflow 2.3 — Everything You Need to Know
Dynamic task mapping, a new local executor, an improved grid view… ... Believe it or not, Airflow 2.3 might be the most important...
Read more >
Troubleshooting Airflow scheduler issues | Cloud Composer
Resolution: To solve this issue, you need to make sure there is always capacity in Airflow workers to run queued tasks. For example,...
Read more >
Source code for airflow.executors.kubernetes_executor
You are not reading the most recent version of this documentation. ... The Kubernetes Namespace in which pods will be created by the...
Read more >
airflow - Bitnami - Artifact Hub
Apache Airflow is a tool to express and execute workflows as directed ... The Kubernetes executor will create a new pod for every...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found