airflow db clean is unable to delete the table rendered_task_instance_fields

See original GitHub issue

Apache Airflow version

Other Airflow 2 version

What happened

Hi All,

When I run the below command in Airflow 2.3.4:

airflow db clean --clean-before-timestamp '2022-09-18T00:00:00+05:30' --yes

I receive an error within a warning which says

[2022-09-20 10:33:30,971] {db_cleanup.py:302} WARNING - Encountered error when attempting to clean table 'rendered_task_instance_fields'.

All other tables like log, dag, xcom get deleted properly. On my analysis, rendered_task_instance_fields was the 5th largest table by rows in the DB, so the impact of it’s data size is significant.

On analyzing the table itself on PostGres 13 DB, I found that the table rendered_task_instance_fields has no timestamp column that records when the entry was inserted.

https://imgur.com/a/Qys2uwD

Thus, there would be no way the code can filter out older records and delete them.

What you think should happen instead

A timestamp field needs to be added to the table rendered_task_instance_fields basis of which older records can be deleted.

How to reproduce

Run the below command in airflow v2.3.4 and check the output.

airflow db clean --clean-before-timestamp '2022-09-18T00:00:00+05:30' --yes

Operating System

Ubuntu 20.04 LTS

Versions of Apache Airflow Providers

apache-airflow-providers-amazon==5.1.0 apache-airflow-providers-celery==3.0.0 apache-airflow-providers-common-sql==1.2.0 apache-airflow-providers-ftp==3.1.0 apache-airflow-providers-google==8.3.0 apache-airflow-providers-http==4.0.0 apache-airflow-providers-imap==3.0.0 apache-airflow-providers-mongo==3.0.0 apache-airflow-providers-mysql==3.2.0 apache-airflow-providers-slack==5.1.0 apache-airflow-providers-sqlite==3.2.1

Deployment

Virtualenv installation

Deployment details

No response

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Issue Analytics

  • State:closed
  • Created a year ago
  • Reactions:1
  • Comments:12 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
sai3563commented, Sep 23, 2022

@jedcunningham Alright, will do the same tomorrow.

1reaction
potiukcommented, Sep 22, 2022

The column needs to be added an migration needs to be added to add the column via alembic (and likely set it to current time for all records). You can see how migrations are done in the `airflow/migrations’ folder and there is a chapter about adding migrations in https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#metadata-database-updates (and you can look in the past commits to that folder to get a feeling how such changes shoudl look like).

And we will make a review of the PR, so if there will be anything missing, we will help with guidance 😃

Read more comments on GitHub >

github_iconTop Results From Across the Web

Clean up the Airflow database | Cloud Composer
This DAG removes old entries from DagRun, TaskInstance, Log, XCom, Job DB and SlaMiss tables. Database maintenance DAG - Airflow 2.
Read more >
Release Notes — Airflow Documentation
Don't rely on current ORM structure for db clean command (#23574). Clear ... Avoid sharing session with RenderedTaskInstanceFields write and delete (#9993).
Read more >
Aurora PostgreSQL database cleanup on an Amazon MWAA ...
The following sample code periodically clears out entries from the dedicated Aurora PostgreSQL database for your Amazon Managed Workflows for Apache Airflow ......
Read more >
Airflow Standalone Cannot use relative path: - Stack Overflow
I just installed Airflow 2.3.0 using the command ... Add ``RenderedTaskInstanceFields`` table INFO [alembic.runtime.migration] Running ...
Read more >
airflow db clean - Fig.io
airflow db clean. Purge old records in metastore tables. Options. Name, Description. -h, --help ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found