airflow db clean is unable to delete the table rendered_task_instance_fields
See original GitHub issueApache Airflow version
Other Airflow 2 version
What happened
Hi All,
When I run the below command in Airflow 2.3.4:
airflow db clean --clean-before-timestamp '2022-09-18T00:00:00+05:30' --yes
I receive an error within a warning which says
[2022-09-20 10:33:30,971] {db_cleanup.py:302} WARNING - Encountered error when attempting to clean table 'rendered_task_instance_fields'.
All other tables like log, dag, xcom get deleted properly. On my analysis, rendered_task_instance_fields was the 5th largest table by rows in the DB, so the impact of it’s data size is significant.
On analyzing the table itself on PostGres 13 DB, I found that the table rendered_task_instance_fields has no timestamp column that records when the entry was inserted.
Thus, there would be no way the code can filter out older records and delete them.
What you think should happen instead
A timestamp field needs to be added to the table rendered_task_instance_fields basis of which older records can be deleted.
How to reproduce
Run the below command in airflow v2.3.4 and check the output.
airflow db clean --clean-before-timestamp '2022-09-18T00:00:00+05:30' --yes
Operating System
Ubuntu 20.04 LTS
Versions of Apache Airflow Providers
apache-airflow-providers-amazon==5.1.0 apache-airflow-providers-celery==3.0.0 apache-airflow-providers-common-sql==1.2.0 apache-airflow-providers-ftp==3.1.0 apache-airflow-providers-google==8.3.0 apache-airflow-providers-http==4.0.0 apache-airflow-providers-imap==3.0.0 apache-airflow-providers-mongo==3.0.0 apache-airflow-providers-mysql==3.2.0 apache-airflow-providers-slack==5.1.0 apache-airflow-providers-sqlite==3.2.1
Deployment
Virtualenv installation
Deployment details
No response
Anything else
No response
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project’s Code of Conduct
Issue Analytics
- State:
- Created a year ago
- Reactions:1
- Comments:12 (7 by maintainers)
Top Related StackOverflow Question
@jedcunningham Alright, will do the same tomorrow.
The column needs to be added an migration needs to be added to add the column via alembic (and likely set it to current time for all records). You can see how migrations are done in the `airflow/migrations’ folder and there is a chapter about adding migrations in https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#metadata-database-updates (and you can look in the past commits to that folder to get a feeling how such changes shoudl look like).
And we will make a review of the PR, so if there will be anything missing, we will help with guidance 😃