Skip FK-referenced dag_version rows during db clean#68339
Open
ephraimbuddy wants to merge 1 commit into
Open
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR improves airflow db clean behavior for dag_version by preventing deletion attempts of rows that are still referenced by task_instance.dag_version_id (an ON DELETE RESTRICT FK), allowing cleanup to make progress by pruning only orphaned, older DAG versions.
Changes:
- Add a generic
skip_if_referenced(andreferenced_pk_column) option to db cleanup table configuration, implemented as a correlatedNOT EXISTSfilter in_build_query(). - Apply this option to
dag_versionto skip versions still referenced bytask_instance.dag_version_id. - Add a unit regression test to ensure pinned
dag_versionrows are skipped while orphaned old versions are deleted.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
airflow-core/src/airflow/utils/db_cleanup.py |
Adds skip_if_referenced filtering to the cleanup query builder and configures dag_version to use it. |
airflow-core/tests/unit/utils/test_db_cleanup.py |
Adds a regression test verifying dag_version cleanup skips FK-pinned versions but prunes orphaned old ones. |
a9023cd to
5a53021
Compare
airflow db clean on the dag_version table selected old, non-latest versions for deletion regardless of whether they were still referenced. Because task_instance.dag_version_id is ON DELETE RESTRICT, deleting a version still referenced by a task instance fails the foreign key, so the command could not prune dag_version at all for any DAG with history. Add a generic skip_if_referenced option to the cleanup table config that excludes rows still referenced by a given (table, fk_column) via a correlated NOT EXISTS, and apply it to dag_version for task_instance.dag_version_id. Cleanup now prunes only orphaned older versions and makes progress as task instances age out and are cleaned. related: apache#66177
5a53021 to
bec56dc
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
airflow db clean on the dag_version table selected old, non-latest versions for deletion regardless of whether they were still referenced. Because task_instance.dag_version_id is ON DELETE RESTRICT, deleting a version still referenced by a task instance fails the foreign key, so the command could not prune dag_version at all for any DAG with history.
Add a generic skip_if_referenced option to the cleanup table config that excludes rows still referenced by a given (table, fk_column) via a correlated NOT EXISTS, and apply it to dag_version for task_instance.dag_version_id. Cleanup now prunes only orphaned older versions and makes progress as task instances age out and are cleaned.
related: #66177
Was generative AI tooling used to co-author this PR?
Generated-by: claude opus 4.8 following the guidelines