On 21/03/18 05:59, Jeff King wrote: > This patch fixes a corner case where diff-highlight may > scramble some diffs when combined with --graph. > > Commit 7e4ffb4c17 (diff-highlight: add support for --graph > output, 2016-08-29) taught diff-highlight to skip past the > graph characters at the start of each line with this regex: > > ($COLOR?\|$COLOR?\s+)* > > I.e., any series of pipes separated by and followed by > arbitrary whitespace. We need to match more than just a > single space because the commit in question may be indented > to accommodate other parts of the graph drawing. E.g.: > > * commit 1234abcd > | ... > | diff --git ... > > has only a single space, but for the last commit before a > fork: > > | | | > | * | commit 1234abcd > | |/ ... > | | diff --git > > the diff lines have more spaces between the pipes and the > start of the diff. > > However, when we soak up all of those spaces with the > $GRAPH regex, we may accidentally include the leading space > for a context line. That means we may consider the actual > contents of a context line as part of the diff syntax. In > other words, something like this: > > normal context line > -old line > +new line > -this is a context line with a leading dash > > would cause us to see that final context line as a removal > line, and we'd end up showing the hunk in the wrong order: > > normal context line > -old line > -this is a context line with a leading dash > +new line > > Instead, let's a be a little more clever about parsing the > graph. We'll look for the actual "*" line that marks the > start of a commit, and record the indentation we see there. > Then we can skip past that indentation when checking whether > the line is a hunk header, removal, addition, etc. > > There is one tricky thing: the indentation in bytes may be > different for various lines of the graph due to coloring. > E.g., the "*" on a commit line is generally shown without > color, but on the actual diff lines, it will be replaced > with a colorized "|" character, adding several bytes. We > work around this here by counting "visible" bytes. This is > unfortunately a bit more expensive, making us about twice as > slow to handle --graph output. But since this is meant to be > used interactively anyway, it's tolerably fast (and the > non-graph case is unaffected). > > One alternative would be to search for hunk header lines and > use their indentation (since they'd have the same colors as > the diff lines which follow). But that just opens up > different corner cases. If we see: > > | | @@ 1,2 1,3 @@ > > we cannot know if this is a real diff that has been > indented due to the graph, or if it's a context line that > happens to look like a diff header. We can only be sure of > the indent on the "*" lines, since we know those don't > contain arbitrary data (technically the user could include a > bunch of extra indentation via --format, but that's rare > enough to disregard). > > Reported-by: Phillip Wood <phillip.wood@xxxxxxxxxxxxx> > Signed-off-by: Jeff King <peff@xxxxxxxx> > --- > contrib/diff-highlight/DiffHighlight.pm | 78 +++++++++++++++---- > .../diff-highlight/t/t9400-diff-highlight.sh | 28 +++++++ > 2 files changed, 91 insertions(+), 15 deletions(-) > > diff --git a/contrib/diff-highlight/DiffHighlight.pm b/contrib/diff-highlight/DiffHighlight.pm > index e07cd5931d..536754583b 100644 > --- a/contrib/diff-highlight/DiffHighlight.pm > +++ b/contrib/diff-highlight/DiffHighlight.pm > @@ -21,34 +21,82 @@ my $RESET = "\x1b[m"; > my $COLOR = qr/\x1b\[[0-9;]*m/; > my $BORING = qr/$COLOR|\s/; > > -# The patch portion of git log -p --graph should only ever have preceding | and > -# not / or \ as merge history only shows up on the commit line. > -my $GRAPH = qr/$COLOR?\|$COLOR?\s+/; > - > my @removed; > my @added; > my $in_hunk; > +my $graph_indent = 0; > > our $line_cb = sub { print @_ }; > our $flush_cb = sub { local $| = 1 }; > > -sub handle_line { > +# Count the visible width of a string, excluding any terminal color sequences. > +sub visible_width { > local $_ = shift; > + my $ret = 0; > + while (length) { > + if (s/^$COLOR//) { > + # skip colors > + } elsif (s/^.//) { > + $ret++; > + } > + } > + return $ret; > +} > + > +# Return a substring of $str, omitting $len visible characters from the > +# beginning, where terminal color sequences do not count as visible. > +sub visible_substr { > + my ($str, $len) = @_; > + while ($len > 0) { > + if ($str =~ s/^$COLOR//) { > + next > + } > + $str =~ s/^.//; > + $len--; > + } > + return $str; > +} > + > +sub handle_line { > + my $orig = shift; > + local $_ = $orig; > + > + # match a graph line that begins a commit > + if (/^(?:$COLOR?\|$COLOR?[ ])* # zero or more leading "|" with space > + $COLOR?\*$COLOR?[ ] # a "*" with its trailing space > + (?:$COLOR?\|$COLOR?[ ])* # zero or more trailing "|" > + [ ]* # trailing whitespace for merges Hi Peff, thanks for looking at this. I've only had a quick look through but I wonder if this will be confused by commit messages that contain * bullet points * like this Best Wishes Phillip > + /x) { > + my $graph_prefix = $&; > + > + # We must flush before setting graph indent, since the > + # new commit may be indented differently from what we > + # queued. > + flush(); > + $graph_indent = visible_width($graph_prefix); > + > + } elsif ($graph_indent) { > + if (length($_) < $graph_indent) { > + $graph_indent = 0; > + } else { > + $_ = visible_substr($_, $graph_indent); > + } > + } > > if (!$in_hunk) { > - $line_cb->($_); > - $in_hunk = /^$GRAPH*$COLOR*\@\@ /; > + $line_cb->($orig); > + $in_hunk = /^$COLOR*\@\@ /; > } > - elsif (/^$GRAPH*$COLOR*-/) { > - push @removed, $_; > + elsif (/^$COLOR*-/) { > + push @removed, $orig; > } > - elsif (/^$GRAPH*$COLOR*\+/) { > - push @added, $_; > + elsif (/^$COLOR*\+/) { > + push @added, $orig; > } > else { > flush(); > - $line_cb->($_); > - $in_hunk = /^$GRAPH*$COLOR*[\@ ]/; > + $line_cb->($orig); > + $in_hunk = /^$COLOR*[\@ ]/; > } > > # Most of the time there is enough output to keep things streaming, > @@ -225,8 +273,8 @@ sub is_pair_interesting { > my $suffix_a = join('', @$a[($sa+1)..$#$a]); > my $suffix_b = join('', @$b[($sb+1)..$#$b]); > > - return $prefix_a !~ /^$GRAPH*$COLOR*-$BORING*$/ || > - $prefix_b !~ /^$GRAPH*$COLOR*\+$BORING*$/ || > + return visible_substr($prefix_a, $graph_indent) !~ /^$COLOR*-$BORING*$/ || > + visible_substr($prefix_b, $graph_indent) !~ /^$COLOR*\+$BORING*$/ || > $suffix_a !~ /^$BORING*$/ || > $suffix_b !~ /^$BORING*$/; > } > diff --git a/contrib/diff-highlight/t/t9400-diff-highlight.sh b/contrib/diff-highlight/t/t9400-diff-highlight.sh > index bf0c270d60..f6f5195d00 100755 > --- a/contrib/diff-highlight/t/t9400-diff-highlight.sh > +++ b/contrib/diff-highlight/t/t9400-diff-highlight.sh > @@ -310,4 +310,32 @@ test_expect_success 'diff-highlight ignores combined diffs' ' > test_cmp expect actual > ' > > +test_expect_success 'diff-highlight handles --graph with leading dash' ' > + cat >file <<-\EOF && > + before > + the old line > + -leading dash > + EOF > + git add file && > + git commit -m before && > + > + sed s/old/new/ <file >file.tmp && > + mv file.tmp file && > + git add file && > + git commit -m after && > + > + cat >expect <<-EOF && > + --- a/file > + +++ b/file > + @@ -1,3 +1,3 @@ > + before > + -the ${CW}old${CR} line > + +the ${CW}new${CR} line > + -leading dash > + EOF > + git log --graph -p -1 | "$DIFF_HIGHLIGHT" >actual.raw && > + trim_graph <actual.raw | sed -n "/^---/,\$p" >actual && > + test_cmp expect actual > +' > + > test_done >