Scheduler livenessProbe errors on new helm chart

See original GitHub issue

Official Helm Chart version

1.6.0 (latest released)

Apache Airflow version

v2.1.2

Kubernetes Version

v1.22.10 (GKE version v1.22.10-gke.600)

Helm Chart configuration

Only livenessProbe config before and during the issue:

# Airflow scheduler settings
scheduler:
  livenessProbe:
    initialDelaySeconds: 10
    timeoutSeconds: 15
    failureThreshold: 10
    periodSeconds: 60

Docker Image customisations

Here’s the image we use based on the apache airflow image:

### Main official airflow image
FROM apache/airflow:2.1.2-python3.8

USER root

RUN apt update

RUN echo 'debconf debconf/frontend select Noninteractive' | debconf-set-selections

### Add OS packages here
## GCC compiler in case it's needed for installing python packages
RUN apt install -y -q build-essential

USER airflow

### Changing the default SSL / TLS mode for mysql client to work properly
## https://askubuntu.com/questions/1233186/ubuntu-20-04-how-to-set-lower-ssl-security-level
## https://bugs.launchpad.net/ubuntu/+source/mysql-8.0/+bug/1872541
## https://stackoverflow.com/questions/61649764/mysql-error-2026-ssl-connection-error-ubuntu-20-04
RUN echo $'openssl_conf = default_conf\n\
[default_conf]\n\
ssl_conf = ssl_sect\n\
[ssl_sect]\n\
system_default = ssl_default_sect\n\
[ssl_default_sect]\n\
MinProtocol = TLSv1\n\
CipherString = DEFAULT:@SECLEVEL=1' >> /home/airflow/.openssl.cnf
## OS env var to point to the new openssl.cnf file
ENV OPENSSL_CONF=/home/airflow/.openssl.cnf

### Add airflow providers
RUN pip install apache-airflow-providers-apache-beam
### End airflow providers

### Add extra python packages
RUN pip install python-slugify==3.0.3
### End extra python packages

What happened

After the upgrade to the helm chart 1.6.0, the scheduler POD was restarting as the livenessProbe was failing.

Command for the new livenessProbe from helm chart 1.6.0 tested directly on our scheduler POD:

airflow@yc-data-airflow-scheduler-0:/opt/airflow$ CONNECTION_CHECK_MAX_COUNT=0 AIRFLOW__LOGGING__LOGGING_LEVEL=ERROR exec /entrypoint \
> airflow jobs check --job-type SchedulerJob --hostname $(hostname)
No alive jobs found.

Removing the --hostname argument works and a live job is found:

airflow@yc-data-airflow-scheduler-0:/opt/airflow$ CONNECTION_CHECK_MAX_COUNT=0 AIRFLOW__LOGGING__LOGGING_LEVEL=ERROR exec /entrypoint \
> airflow jobs check --job-type SchedulerJob
Found one alive job.

What you think should happen instead

livenessProbe should not error.

How to reproduce

No response

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Issue Analytics

  • State:closed
  • Created a year ago
  • Reactions:2
  • Comments:14 (4 by maintainers)

github_iconTop GitHub Comments

3reactions
ilyadinaburgcommented, Aug 17, 2022

hey @fernhtls, I think that the issue might not be directly related to AIRFLOW__CORE__HOSTNAME_CALLABLE, however, might be this is obvious but still as per your findings it is related to the fact that pod hostname not being resolved. So as a workaround $(hostname -i) instead of $(hostname) may be used as well in liveliness probe command.

2reactions
VladimirYushkevichcommented, Oct 10, 2022

I have the same issue with airflow 2.4.1 to check liveness on dag-processor’s pod. Default is not working for me:

dag-processor-ccb9f9949-7zdtv:/opt/airflow$ sh -c "CONNECTION_CHECK_MAX_COUNT=0 AIRFLOW__LOGGING__LOGGING_LEVEL=ERROR exec /entrypoint \
airflow jobs check --hostname $(hostname)"
No alive jobs found.

as well as with $(hostname -i):

dag-processor-ccb9f9949-7zdtv:/opt/airflow$ sh -c "CONNECTION_CHECK_MAX_COUNT=0 AIRFLOW__LOGGING__LOGGING_LEVEL=ERROR exec /entrypoint \
airflow jobs check --hostname $(hostname -i)"
No alive jobs found.

Only after removing --hostname flag it works:

dag-processor-ccb9f9949-7zdtv:/opt/airflow$ sh -c "CONNECTION_CHECK_MAX_COUNT=0 AIRFLOW__LOGGING__LOGGING_LEVEL=ERROR exec /entrypoint \
airflow jobs check"
Found one alive job.
Read more comments on GitHub >

github_iconTop Results From Across the Web

Parameters reference — helm-chart Documentation
The following tables lists the configurable parameters of the Airflow chart and their default values. Sections: Common. Airflow. Images. Ports. Database.
Read more >
Mysql Kubernetes Deployment helm chart fails with readiness ...
It looks like the Readiness probe failed & the Liveness probe is failing in your case. For MySQL, please try with the below...
Read more >
mxnet 3.1.7 · bitnami/bitnami - Artifact Hub
This chart bootstraps an Apache MXNet (Incubating) deployment on a Kubernetes cluster using the Helm package manager. Bitnami charts can be used with...
Read more >
Pod Lifecycle | Kubernetes
Once the scheduler assigns a Pod to a Node, the kubelet starts creating containers ... Once a container has executed for 10 minutes...
Read more >
View the list of available chart parameters - VMware Docs
livenessProbe.enabled, Enable livenessProbe on Airflow web ... scheduler.image.digest, Airflow Schefuler image digest in the way sha256:aa.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found