feat(#1424): self.upstream property for pre-restricted ancestor access#1473
feat(#1424): self.upstream property for pre-restricted ancestor access#1473dimitri-yatsenko wants to merge 2 commits into
Conversation
Implements T2.2.b of the provenance trinity. Inside make(), self.upstream exposes a pre-constructed Diagram.trace(self & key) so users can read declared ancestors with provenance-safe, ergonomic syntax. Branch stacked on feat/1423-diagram-trace (#1471) for Diagram.trace(). What's added: - src/datajoint/autopopulate.py: - AutoPopulate._upstream class attribute (default None) — instance storage for the per-make() trace. - AutoPopulate.upstream property — returns the trace if set, raises DataJointError with a clear "only available inside make()" message otherwise. The error includes the fallback pattern (dj.Diagram.trace(self & key)) so the user knows the escape hatch. - In _populate_one, set self._upstream = Diagram.trace(self & dict(key)) immediately before the make() invocation block. Construction is lazy at this layer (graph copy only); the SQL fetch fires when the user accesses self.upstream[T].fetch(...). - The existing `finally` block (line 716) that resets _allow_insert now also resets _upstream to None, so subsequent attribute access raises a clear error rather than silently returning a stale trace from the previous make() call. What's not changed: - make() signature: unchanged — key remains a dict, make_kwargs work as before. self.upstream is a new attribute on self, not a new parameter. - Tripartite make pattern: self._upstream is set once before all three make() invocations, so all three phases see the same upstream view. Tests in tests/integration/test_autopopulate.py (5 new): - test_upstream_provides_pre_restricted_ancestor — basic case: make() reads self.upstream[Subject].fetch1("name") and the value is correctly pre-restricted to the current key. - test_upstream_rejects_non_ancestor — self.upstream[Unrelated] raises DataJointError ("not in this trace"). Inherited from Diagram.__getitem__. - test_upstream_unset_outside_make — accessing the property outside of make() raises with the helpful "only available inside make()" message. - test_upstream_cleared_after_make — after populate() completes, accessing the property on a fresh instance still raises (verifies the finally cleanup; no stale state). - test_upstream_seen_across_tripartite_make — both make_fetch / make_compute / make_insert see the same self.upstream value. Full regression: 17/17 autopopulate tests pass on MySQL. Slated for DataJoint 2.3. Blocked on #1471 (Diagram.trace) merging; T2.2.c (strict_provenance) is stacked on this PR.
Implements T2.2.c of the provenance trinity, completing the trio (Diagram.trace → self.upstream → strict_provenance). When dj.config["strict_provenance"] = True, runtime gates enforce the upstream-only convention inside make(): - Reads must target a table in the active trace's allowed set (declared ancestors + self + self's Parts). - Writes must target self or self's Parts. - Inserted rows' PK columns that overlap with the current key must equal the key's values (key-consistency rule). Default is False. Existing make() bodies are unaffected. Branch stacked on feat/1424-self-upstream (#1473) → feat/1423-diagram-trace (#1471) → fix/1429-cascade-part-part-renamed-fk (#1468). Will rebase onto master after the chain merges. What's added: - src/datajoint/provenance.py (new): the runtime context module. - `_active_strict_make` ContextVar holding (target, allowed_tables, key) for the currently-executing make() invocation. ContextVar chosen over threading.local to propagate correctly across contextvars-aware concurrency boundaries. - `push_strict_make_context` / `pop_strict_make_context` — context lifecycle managed by `_populate_one`'s try/finally. - `assert_read_allowed(query_expression)` — read gate. Recursively discovers base tables via the QueryExpression's `_support` chain and checks each against the allowed set. - `assert_write_allowed(target_table, rows)` — write gate. Verifies the target is self or one of self's Part tables, and checks the key-consistency rule on each dict row. - src/datajoint/settings.py: new `strict_provenance: bool` field on Config (default False), env-var `DJ_STRICT_PROVENANCE`, ENV_VAR_MAPPING entry. - src/datajoint/autopopulate.py: in `_populate_one`, push the strict context (when the flag is on) just before the make() invocation block. The allowed table set = trace's ancestor nodes ∪ {self.full_table_name} ∪ {self's Parts}. Pop in the existing `finally` block. - src/datajoint/expression.py: `QueryExpression.cursor` now calls `assert_read_allowed(self)` before issuing SQL. No-op outside make(). - src/datajoint/table.py: `Table.insert` calls `assert_write_allowed(self, rows)` after the existing `_allow_insert` check. No-op outside make(). Part-table detection uses class `__dict__` traversal (filtered to Part subclasses) instead of `dir/getattr` to avoid triggering the `_JobsDescriptor` (which would lazy-declare ~~table inside the populate transaction — caught by the first test iteration). Documented limitation (deferred): the read gate does not distinguish reads that came through `self.upstream` from reads of the same ancestor via a direct expression. Both are allowed if the table is in the allowed set. The intent is to catch reads from *undeclared* dependencies; tightening the "must come through self.upstream" path requires propagating an attribution marker through QueryExpression composition and is left for a follow-up release. Tests in tests/integration/test_strict_provenance.py (6 new): - test_strict_compliant_make_passes — make() reading via self.upstream and writing self.insert1 with matching key runs cleanly under strict. - test_strict_blocks_read_from_undeclared_table — read from an unrelated table raises with "strict_provenance ... undeclared" message. - test_strict_blocks_write_to_other_table — insert into a non-self, non-Part target raises "not permitted". - test_strict_blocks_write_with_mismatched_key — row PK that disagrees with the current key raises "does not match the current make() key". - test_strict_writes_to_part_table_pass — self.PartName.insert(...) works. - test_strict_off_by_default_no_change — default-off regression check; the canonical "direct (Ancestor & key).fetch1()" pattern still works when strict_provenance is unset. Regression: 17/17 autopopulate tests pass with strict_provenance unset (default). 6/6 new strict tests pass with strict_provenance=True. 8/8 trace tests + 9/9 cascade tests unaffected. Slated for DataJoint 2.3.
86b1ae7 to
7e4130f
Compare
|
The logic here is sound. One gap worth closing: the two guarantees this design most needs to prove are effectively untested.
Suggest a test that captures the populate instance ( |
MilagrosMarin
left a comment
There was a problem hiding this comment.
Verified 7e4130f against the stacked parent (origin/feat/1423-diagram-trace). Isolated diff: +51 in src/datajoint/autopopulate.py, +169 in tests/integration/test_autopopulate.py. Concur with @ttngu207 — logic sound; test coverage under-proves the design.
Verified ✅
- Instance-level
_upstreamwith class-levelNonedefault. Set inside_populate_one(autopopulate.py:658) right beforemake(), cleared in thefinallyblock (autopopulate.py:719). Instance scope (notself.__class__) — no leakage across workers or keys. - Non-mutating seed.
Diagram.trace(self & dict(key))—dict(key)copies the mapping, so the caller'skeyis not touched. Small but correct. - Property gives an actionable error message outside
make()including the escape-hatch pattern (dj.Diagram.trace(self & key)). - Lazy at this layer. Only the graph copy happens in
_populate_one; SQL fetch fires when the user accessesself.upstream[T].fetch(...).make()bodies that never touchself.upstreampay nothing.
Concerns (concurring with @ttngu207 + additions)
-
test_upstream_cleared_after_makeis toothless. It probesDerived().upstreamon a fresh instance whose_upstreamwas never set — falls through to the class default (None) and would pass even if thefinally: self._upstream = Noneline were deleted. To actually exercise the cleanup, capture the populate instance:inst = Derived(); inst.populate(); with pytest.raises(...): inst.upstream. -
Exception-path cleanup has zero coverage. The whole reason
_upstream = Nonelives infinally(rather than aftermake()) is to survive an exception. No test forcesmake()to raise. Add one that raises mid-body and assertsself.upstreamis unset on the populate instance. -
Tripartite test doesn't verify tripartite.
test_upstream_seen_across_tripartite_makereadsself.upstream[Source]only inmake_fetch;make_computeandmake_insertnever touch it. The docstring claims "all three phases see the same self.upstream value" — that's not what the test verifies. Anid(self._upstream)assertion (orischeck) across the three phases would prove the "constructed once, shared across phases" property. -
Non-ancestor rejection tests only the class-based lookup.
test_upstream_rejects_non_ancestorusesself.upstream[Unrelated](class). The string-basedFreeTablepath (self.upstream["schema.Unrelated"]) is a different branch inDiagram.__getitem__— the message overlap betweendiagram.py:609anddiagram.py:619on the parent branch means the current substring assertion doesn't distinguish which path raised. A second string-based assertion would round out the coverage. -
#1471 coverage gaps carry through here.
self.upstreamis a thin wrapper overDiagram.trace, and the four gaps flagged during #1471 review (multi-hop diamond OR-convergence, cross-schema ancestors, upward Part-of-Part chains, isolated non-primary FK) remain uncovered at this layer too. Reasonable to defer to a follow-up.
Bottom line
Implementation is correct as far as the code shows. Items #1 and #2 would be trivial additions and would meaningfully strengthen the two guarantees the design most needs to prove. #3–#5 are nice-to-have. Approving with the same non-blocking stance as @ttngu207.
Summary
T2.2.b of the provenance trinity. Inside `make()`, `self.upstream` exposes a pre-constructed `Diagram.trace(self & key)` so user code can read declared ancestors with provenance-safe, ergonomic syntax.
Closes #1424. Slated for DataJoint 2.3.
Branch dependency
Stacked on `feat/1423-diagram-trace` (#1471). This PR's base is `feat/1423-diagram-trace`, not `master`. Will rebase onto master once #1471 (and its parent #1468) merge.
What's added
Design notes
Tests
Sequencing
Once this PR merges, T2.2.c (`strict_provenance` config flag, the runtime gate) goes next. It branches off this PR.
Test plan