Re: [PATCH 7/7] diff-highlight: detect --graph by indent

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 21/03/18 05:59, Jeff King wrote:
> This patch fixes a corner case where diff-highlight may
> scramble some diffs when combined with --graph.
> 
> Commit 7e4ffb4c17 (diff-highlight: add support for --graph
> output, 2016-08-29) taught diff-highlight to skip past the
> graph characters at the start of each line with this regex:
> 
>   ($COLOR?\|$COLOR?\s+)*
> 
> I.e., any series of pipes separated by and followed by
> arbitrary whitespace.  We need to match more than just a
> single space because the commit in question may be indented
> to accommodate other parts of the graph drawing. E.g.:
> 
>  * commit 1234abcd
>  | ...
>  | diff --git ...
> 
> has only a single space, but for the last commit before a
> fork:
> 
>  | | |
>  | * | commit 1234abcd
>  | |/  ...
>  | |   diff --git
> 
> the diff lines have more spaces between the pipes and the
> start of the diff.
> 
> However, when we soak up all of those spaces with the
> $GRAPH regex, we may accidentally include the leading space
> for a context line. That means we may consider the actual
> contents of a context line as part of the diff syntax. In
> other words, something like this:
> 
>    normal context line
>   -old line
>   +new line
>    -this is a context line with a leading dash
> 
> would cause us to see that final context line as a removal
> line, and we'd end up showing the hunk in the wrong order:
> 
>   normal context line
>   -old line
>    -this is a context line with a leading dash
>   +new line
> 
> Instead, let's a be a little more clever about parsing the
> graph. We'll look for the actual "*" line that marks the
> start of a commit, and record the indentation we see there.
> Then we can skip past that indentation when checking whether
> the line is a hunk header, removal, addition, etc.
> 
> There is one tricky thing: the indentation in bytes may be
> different for various lines of the graph due to coloring.
> E.g., the "*" on a commit line is generally shown without
> color, but on the actual diff lines, it will be replaced
> with a colorized "|" character, adding several bytes. We
> work around this here by counting "visible" bytes. This is
> unfortunately a bit more expensive, making us about twice as
> slow to handle --graph output. But since this is meant to be
> used interactively anyway, it's tolerably fast (and the
> non-graph case is unaffected).
> 
> One alternative would be to search for hunk header lines and
> use their indentation (since they'd have the same colors as
> the diff lines which follow). But that just opens up
> different corner cases. If we see:
> 
>   | |    @@ 1,2 1,3 @@
> 
> we cannot know if this is a real diff that has been
> indented due to the graph, or if it's a context line that
> happens to look like a diff header. We can only be sure of
> the indent on the "*" lines, since we know those don't
> contain arbitrary data (technically the user could include a
> bunch of extra indentation via --format, but that's rare
> enough to disregard).
> 
> Reported-by: Phillip Wood <phillip.wood@xxxxxxxxxxxxx>
> Signed-off-by: Jeff King <peff@xxxxxxxx>
> ---
>  contrib/diff-highlight/DiffHighlight.pm       | 78 +++++++++++++++----
>  .../diff-highlight/t/t9400-diff-highlight.sh  | 28 +++++++
>  2 files changed, 91 insertions(+), 15 deletions(-)
> 
> diff --git a/contrib/diff-highlight/DiffHighlight.pm b/contrib/diff-highlight/DiffHighlight.pm
> index e07cd5931d..536754583b 100644
> --- a/contrib/diff-highlight/DiffHighlight.pm
> +++ b/contrib/diff-highlight/DiffHighlight.pm
> @@ -21,34 +21,82 @@ my $RESET = "\x1b[m";
>  my $COLOR = qr/\x1b\[[0-9;]*m/;
>  my $BORING = qr/$COLOR|\s/;
>  
> -# The patch portion of git log -p --graph should only ever have preceding | and
> -# not / or \ as merge history only shows up on the commit line.
> -my $GRAPH = qr/$COLOR?\|$COLOR?\s+/;
> -
>  my @removed;
>  my @added;
>  my $in_hunk;
> +my $graph_indent = 0;
>  
>  our $line_cb = sub { print @_ };
>  our $flush_cb = sub { local $| = 1 };
>  
> -sub handle_line {
> +# Count the visible width of a string, excluding any terminal color sequences.
> +sub visible_width {
>  	local $_ = shift;
> +	my $ret = 0;
> +	while (length) {
> +		if (s/^$COLOR//) {
> +			# skip colors
> +		} elsif (s/^.//) {
> +			$ret++;
> +		}
> +	}
> +	return $ret;
> +}
> +
> +# Return a substring of $str, omitting $len visible characters from the
> +# beginning, where terminal color sequences do not count as visible.
> +sub visible_substr {
> +	my ($str, $len) = @_;
> +	while ($len > 0) {
> +		if ($str =~ s/^$COLOR//) {
> +			next
> +		}
> +		$str =~ s/^.//;
> +		$len--;
> +	}
> +	return $str;
> +}
> +
> +sub handle_line {
> +	my $orig = shift;
> +	local $_ = $orig;
> +
> +	# match a graph line that begins a commit
> +	if (/^(?:$COLOR?\|$COLOR?[ ])* # zero or more leading "|" with space
> +	         $COLOR?\*$COLOR?[ ]   # a "*" with its trailing space
> +	      (?:$COLOR?\|$COLOR?[ ])* # zero or more trailing "|"
> +	                         [ ]*  # trailing whitespace for merges

Hi Peff, thanks for looking at this. I've only had a quick look through
but I wonder if this will be confused by commit messages that contain
  * bullet points
  * like this

Best Wishes

Phillip

> +	    /x) {
> +	        my $graph_prefix = $&;
> +
> +		# We must flush before setting graph indent, since the
> +		# new commit may be indented differently from what we
> +		# queued.
> +		flush();
> +		$graph_indent = visible_width($graph_prefix);
> +
> +	} elsif ($graph_indent) {
> +		if (length($_) < $graph_indent) {
> +			$graph_indent = 0;
> +		} else {
> +			$_ = visible_substr($_, $graph_indent);
> +		}
> +	}
>  
>  	if (!$in_hunk) {
> -		$line_cb->($_);
> -		$in_hunk = /^$GRAPH*$COLOR*\@\@ /;
> +		$line_cb->($orig);
> +		$in_hunk = /^$COLOR*\@\@ /;
>  	}
> -	elsif (/^$GRAPH*$COLOR*-/) {
> -		push @removed, $_;
> +	elsif (/^$COLOR*-/) {
> +		push @removed, $orig;
>  	}
> -	elsif (/^$GRAPH*$COLOR*\+/) {
> -		push @added, $_;
> +	elsif (/^$COLOR*\+/) {
> +		push @added, $orig;
>  	}
>  	else {
>  		flush();
> -		$line_cb->($_);
> -		$in_hunk = /^$GRAPH*$COLOR*[\@ ]/;
> +		$line_cb->($orig);
> +		$in_hunk = /^$COLOR*[\@ ]/;
>  	}
>  
>  	# Most of the time there is enough output to keep things streaming,
> @@ -225,8 +273,8 @@ sub is_pair_interesting {
>  	my $suffix_a = join('', @$a[($sa+1)..$#$a]);
>  	my $suffix_b = join('', @$b[($sb+1)..$#$b]);
>  
> -	return $prefix_a !~ /^$GRAPH*$COLOR*-$BORING*$/ ||
> -	       $prefix_b !~ /^$GRAPH*$COLOR*\+$BORING*$/ ||
> +	return visible_substr($prefix_a, $graph_indent) !~ /^$COLOR*-$BORING*$/ ||
> +	       visible_substr($prefix_b, $graph_indent) !~ /^$COLOR*\+$BORING*$/ ||
>  	       $suffix_a !~ /^$BORING*$/ ||
>  	       $suffix_b !~ /^$BORING*$/;
>  }
> diff --git a/contrib/diff-highlight/t/t9400-diff-highlight.sh b/contrib/diff-highlight/t/t9400-diff-highlight.sh
> index bf0c270d60..f6f5195d00 100755
> --- a/contrib/diff-highlight/t/t9400-diff-highlight.sh
> +++ b/contrib/diff-highlight/t/t9400-diff-highlight.sh
> @@ -310,4 +310,32 @@ test_expect_success 'diff-highlight ignores combined diffs' '
>  	test_cmp expect actual
>  '
>  
> +test_expect_success 'diff-highlight handles --graph with leading dash' '
> +	cat >file <<-\EOF &&
> +	before
> +	the old line
> +	-leading dash
> +	EOF
> +	git add file &&
> +	git commit -m before &&
> +
> +	sed s/old/new/ <file >file.tmp &&
> +	mv file.tmp file &&
> +	git add file &&
> +	git commit -m after &&
> +
> +	cat >expect <<-EOF &&
> +	--- a/file
> +	+++ b/file
> +	@@ -1,3 +1,3 @@
> +	 before
> +	-the ${CW}old${CR} line
> +	+the ${CW}new${CR} line
> +	 -leading dash
> +	EOF
> +	git log --graph -p -1 | "$DIFF_HIGHLIGHT" >actual.raw &&
> +	trim_graph <actual.raw | sed -n "/^---/,\$p" >actual &&
> +	test_cmp expect actual
> +'
> +
>  test_done
> 




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux