line-log: range-scope stat, check, and -G under -L#2152
Open
mmontalbo wants to merge 7 commits into
Open
Conversation
7977925 to
2ea00b6
Compare
b587ea9 to
a1b3ff0
Compare
The line-range filter that mm/line-log-cleanup added uses names that
obscure its model. The cursors lno_post/lno_pre and the index lno_0
share an lno_ prefix but conflate the pre/post-image axis with the
0-based/1-based axis, the hunk state is a flat set of rhunk_* fields,
and the filter-state pointer is just s.
The filter bridges two layers of diff.c, and its fields already used
each layer's vocabulary, but in cryptic abbreviations. Spell them out
to the form the rest of the file uses, so that the patches that follow
can simplify and fix it with those clearer names in place:
- lno_post/lno_pre -> lno_in_postimage/lno_in_preimage, the
line-number cursors, matching the counters in struct emit_callback
- lno_0 -> idx_in_postimage, the 0-based range index
- the hunk-header geometry stays old/new (old_begin, new_begin, and
counts) to match the xdiff_emit_hunk_fn callback and the
"@@ -<old> +<new> @@" header it feeds, but moves from flat rhunk_*
fields into a "hunk" sub-struct, so accesses read
filter->hunk.old_begin
- flush_rhunk -> flush_range_hunk
- the filter-state pointer in each callback: s -> filter
Also rename the struct line_range_callback to line_range_filter: it is
a filter over xdiff output, not merely a callback.
No behavior change.
Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
d7b39a5 to
2bc09ca
Compare
The filter buffered '-' lines in a pending_rm strbuf, deferring their classification until a '+' or ' ' line revealed the post-image position. That buffering is unnecessary: a removal occupies no post-image line, so it does not advance lno_in_postimage, and xdiff emits removals before additions within a change. A '-' therefore arrives while lno_in_postimage already holds the index the following '+'/' ' will occupy, and can be classified against the ranges as it arrives. The buffering also hid a bug: flush_range_hunk() drained pending_rm into the range hunk whenever the hunk was active, even after lno_in_postimage had advanced past the tracked range, so a deletion just after the tracked function leaked into the patch. Classifying each line as it arrives removes the pending_rm buffer, the discard_pending_rm() helper, three struct fields, and makes that bug impossible by construction. With every line classified on arrival, the buffered lines are the hunk's single source of truth, so the old/new counts need not be kept alongside them: flush_range_hunk() derives the counts (and whether the hunk holds any change) from the buffer when it builds the header. Drop the per-line counting and the old_count, new_count, and has_changes fields; there is no longer a second tally that could fall out of sync with the buffer. Add begin_range_hunk() to open the accumulator at the first in-range line, seeding both begins from the live image cursors, as the counterpart to flush_range_hunk(). With the counting gone too, line_range_line_fn() now only appends an in-range line. Document the coordinate model: a block comment on struct line_range_filter states it (the pre/post-image cursors, the 0-based idx_in_postimage, removals classified by the following line) with a worked example. Add tests for the leaked trailing deletion this fixes, the symmetric leading-deletion case, and the filter's range boundaries (a change at the first and last line of a range, and a pure in-range deletion). Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
The line-range filter builds its own "@@ -<old> +<new> @@" header for
each range hunk. For a side with no lines (count 0, such as the old
side of a pure insertion), the begin should be the number of the line
before the change, per the convention git diff and xdl_emit_hunk_hdr()
follow. The hand-rolled code's begin was one too high; in t4211 this
produced
@@ -25,0 +18,9 @@
an old begin of 25 in a 24-line file, where git diff would give 24.
Stop hand-rolling the header. flush_range_hunk() now formats it through
xdiff's own emitter: a new xdiff_emit_hunk_header() helper wraps
xdl_emit_hunk_hdr(), the function that produces every other diff's hunk
headers. The count-0 begin is then correct by construction, and as a
side effect -L headers match git diff exactly, including its omission of
a count of 1 ("@@ -22 +22 @@" rather than "@@ -22,1 +22,1 @@").
xdiff's hunk callback already hands line_range_hunk_fn() a count-0 begin
decremented, so undo that when seeding the cursors and let the formatter
re-apply the convention once, at emit time.
The off-by-one predates this series, and the two regenerated fixtures
reach it from different origins: no-assertion-error has carried it since
its test was added in ab60c69 (line-log: fix assertion error,
2025-08-18), while vanishes-early acquired it when 86e986f (line-log:
route -L output through the standard diff pipeline) reshaped its tracked
line into a pure insertion. vanishes-early also drops its count-1
counts.
Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
builtin_diff() open-codes the line-range filter setup and teardown around its xdi_diff_outf() call: zero the struct, point it at the output callback, inflate ctxlen to the largest range span so each range yields a single xdiff hunk, run the diff, flush the trailing range hunk, and release the buffer. The upcoming -L stat and check formats need the same sequence. Extract line_range_filter_init() for the setup and a line_range_filter_diff() helper that prepares the xdiff config the filter needs, runs an initialized filter through xdi_diff_outf(), flushes the final range hunk, and releases it, returning the latched error. The helper inflates ctxlen to the largest range span so each range yields a single xdiff hunk, and clears XDL_EMIT_NO_HUNK_HDR so the hunk headers the filter seeds its position from are always emitted. Folding both into the helper keeps these invariants, which the filter's position tracking relies on, in a single place for every consumer. builtin_diff() now does init + line_range_filter_diff(); the next two patches reuse them in builtin_diffstat() and builtin_checkdiff() instead of repeating the boilerplate. No behavior change: builtin_diff() leaves XDL_EMIT_NO_HUNK_HDR unset, so clearing it is a no-op until the suppressing consumers arrive. Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
Reuse the line_range_filter in builtin_diffstat() to produce
range-scoped statistics. When a filepair carries line_ranges, the
filter wraps diffstat_consume() as its output callback, forwarding only
in-range lines for counting. flush_range_hunk() replays buffered
content through diffstat_consume(), which ignores synthetic @@ headers
since it only counts '+' and '-' lines.
Expand the output format allowlist in setup_revisions() to accept
--stat, --numstat, and --shortstat with -L.
Leave --dirstat out of the allowlist so it is rejected like any other
unsupported format. Its default mode counts each file's whole-file
byte damage via diffcore_count_changes(), outside the line-based
pipeline that the -L filter scopes, so bare --dirstat cannot honor the
tracked range. The --dirstat=lines mode could: it aggregates the same
per-file line counts as --numstat, which -L already scopes. But
accepting only that sub-mode while bare --dirstat keeps erroring is a
confusing split, so the whole format is deferred to a follow-up;
--numstat already reports the exact range-scoped per-file counts.
Also drop "yet" from the generic -L rejection message ("does not
yet support the requested diff format"). Some rejected formats do
not fit a line range at all, so "yet" wrongly implied they are all
just awaiting support.
Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
builtin_checkdiff() runs its own xdiff pass to detect whitespace errors in newly added lines. When -L is active, the check should be scoped to the tracked line ranges rather than the whole file. Reuse the line_range_filter to wrap checkdiff_consume(), the same pattern already used for patch output and diffstat. The filter forwards only in-range lines for whitespace checking. checkdiff reports the file line number of each error, which it normally learns from the hunk header via checkdiff_consume_hunk(). The filter synthesizes its own hunk headers, so give it an optional hunk callback and route checkdiff_consume_hunk() through it; this sets the post-image position before the in-range lines are replayed. Without it the reported line numbers would count from the start of the range hunk rather than the start of the file. The trailing blank-at-eof check is a second pass that scans the whole file via check_blank_at_eof(), so gate its report on the tracked ranges as well; otherwise a blank line added at end of file is reported even when it lies outside the range. Add DIFF_FORMAT_CHECKDIFF to the -L output format allowlist in setup_revisions() so that -L --check is accepted, and list --check among the supported formats in the documentation. Add tests covering that whitespace errors are reported, scoped to the tracked range, and labeled with the correct file line number, including when two errors in one range are separated by a gap that would otherwise split into multiple xdiff hunks. Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
git log -L scopes its diff output to the tracked range, but pickaxe (-S, -G) still runs in diffcore over the whole-file change, so -L -G selects a commit whenever the pattern appears in any added or removed line of the file, even outside the tracked range. Teach -G to honor the range. diff_grep() already runs an xdiff pass and greps the +/- lines; route that pass through the line-range filter so only the tracked range's lines are grepped. Expose the filter as diff_emit_line_ranges(), a line-range-scoped xdi_diff_outf(), thread the filepair's line_ranges through the pickaxe callback, and pass it from pickaxe_match(). Skip scoping under textconv, whose output is not in the original file's line coordinates. -G needs only a hit/no-hit answer, so the line-number concerns the filter handles for patch and check output do not apply here. -S is left matching the whole file: it counts needle occurrences per blob rather than grepping the diff, so scoping it needs a different approach, left to a follow-up. has_changes() takes the range parameter but ignores it for now. Document the resulting -L pickaxe scoping: -G is range-scoped, while -S still matches the whole file. Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
2bc09ca to
f69ccfb
Compare
Author
|
/preview |
|
Preview email sent as pull.2152.git.1781800932.gitgitgadget@gmail.com |
Author
|
/submit |
|
Submitted as pull.2152.git.1781806593.gitgitgadget@gmail.com To fetch this version into To fetch this version to local tag |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This series extends
git log -Lso that more of its diff output andcommit selection honor the tracked line ranges: the diff stat formats
and --check are supported with range-scoped results, and the -G pickaxe
is scoped to the tracked range.
It builds on top of the mm/line-log-cleanup topic [1], which integrated
-L with the standard log output pipeline and taught it the non-patch
formats --raw, --name-only, --name-status, and --summary.
With these patches the following also honor the tracked range:
--stat, --numstat, --shortstat: counts cover only the lines inside
the tracked range, not the whole file.
--check: whitespace errors are reported only for added lines inside
the tracked range, with the correct file line numbers.
-G: a commit is selected only when the pattern appears on an
added/removed line inside the tracked range, rather than anywhere in
the file.
The --dirstat format is deliberately rejected. Its default mode
reports each directory's share of the total churn as a percentage,
computed from whole-file byte damage (via diffcore_count_changes(),
outside the line-based pipeline that -L scopes), so bare --dirstat
cannot honor the tracked range. The --dirstat=lines mode could: it
aggregates the same per-file line counts as --numstat, which -L
already scopes. But supporting only that sub-mode while bare
--dirstat still errors is a confusing split, so the whole format is
left to a follow-up; --numstat already gives the exact range-scoped
per-file counts.
-S is left matching the whole file. Unlike -G, it counts needle
occurrences per blob rather than grepping the diff, so scoping it to a
range needs a different approach; that is left to a follow-up. Patch 7,
which scopes -G, also updates the -L documentation to note the -S/-G
distinction, so the whole-file behavior of -S is not mistaken for the
range-scoped behavior around it.
Patches 1-3 are independent of the new formats: they fix two bugs in
the existing -L patch output (a leaked deletion and an off-by-one hunk
header), bring its hunk headers in line with git diff's format, and
clarify the line-range filter mm/line-log-cleanup added, whose names
obscured its model (cryptic lno_ cursors conflating the pre/post-image
and 0/1-based axes, a flat hunk-state struct, and a one-letter state
pointer (s)). The two bugs may be a hint that the model could use
clarification, so patch 1 renames and groups the filter state and
patch 2 documents the model, before the fixes that read against it.
Patches 4-7 then build the new formats on top:
Patch 1: rename and group the filter for clarity. Spell the
cryptic names out to the file's own forms: the line-number cursors
to lno_in_preimage/lno_in_postimage (as in struct emit_callback)
and the range index to idx_in_postimage, while the hunk geometry
stays old/new (the xdiff_emit_hunk_fn convention) and moves into a
sub-struct. Name the filter pointer (filter) and rename the struct
to line_range_filter and the flush helper to flush_range_hunk. No
behavior change.
Patch 2: simplify the filter by classifying removals as they arrive,
dropping the pending_rm buffer and a latent flush_range_hunk() bug
that leaked deletions just past the range. Make the buffered lines
the hunk's single source of truth: flush_range_hunk() derives the
counts from them rather than tracking them per line, dropping three
more fields. Document the model with a block comment and worked
example, and add begin_range_hunk() as the counterpart to
flush_range_hunk(). (This simplification was submitted by itself
previously [2] but did not advance, so it is re-included here.)
Patch 3: stop hand-rolling the synthetic hunk header and emit it
through xdiff's own formatter via a new xdiff_emit_hunk_header()
helper. The hand-rolled code put a count-0 side's begin one too
high (the convention is the line before the change); routing
through xdl_emit_hunk_hdr() fixes that by construction and, as a
side effect, makes -L headers match git diff exactly, including its
omission of a count of 1. Regenerate the two affected fixtures.
Patch 4: extract a line_range_filter_diff() helper that folds the
filter's two preconditions into one place: inflate ctxlen to the
largest range span so every change within a range lands in a single
xdiff hunk, and clear XDL_EMIT_NO_HUNK_HDR so the hunk headers the
filter reads are always emitted (its position tracking relies on
both). It then runs an initialized filter through xdiff, flushes
the final range hunk, and releases it; use it in builtin_diff().
The stat, check, and -G patches that reuse it inherit both.
Patch 5: reuse the filter in builtin_diffstat() for the stat
formats, extend the -L output-format allowlist, and reject --dirstat.
Patch 6: reuse the filter in builtin_checkdiff() and extend the
allowlist for --check. The separate blank-at-eof pass scans the
whole file, so scope its report to the tracked ranges too.
Patch 7: scope -G to the tracked range. Expose the filter as
diff_emit_line_ranges() and grep only the tracked range's lines,
threading the filepair's line_ranges through the pickaxe callback.
-S is left whole-file, and the -L documentation is updated to note
that -G is range-scoped while -S still matches the whole file.
[1] https://lore.kernel.org/git/pull.2094.v3.git.1780001267.gitgitgadget@gmail.com/
[2] https://lore.kernel.org/git/pull.2099.git.1777230630020.gitgitgadget@gmail.com/
Cc: "D. Ben Knoble" ben.knoble@gmail.com