Re: [PATCH v4 06/15] diff --color-moved: avoid false short line matches and bad zerba coloring

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Phillip,

The commit's oneline has a typo: zerba instead of zebra.

On Tue, 16 Nov 2021, Phillip Wood via GitGitGadget wrote:

> From: Phillip Wood <phillip.wood@xxxxxxxxxxxxx>
>
> When marking moved lines it is possible for a block of potential
> matched lines to extend past a change in sign when there is a sequence
> of added lines whose text matches the text of a sequence of deleted
> and added lines. Most of the time either `match` will be NULL or
> `pmb_advance_or_null()` will fail when the loop encounters a change of
> sign but there are corner cases where `match` is non-NULL and
> `pmb_advance_or_null()` successfully advances the moved block despite
> the change in sign.
>
> One consequence of this is highlighting a short line as moved when it
> should not be. For example
>
> -moved line  # Correctly highlighted as moved
> +short line  # Wrongly highlighted as moved
>  context
> +moved line  # Correctly highlighted as moved
> +short line
>  context
> -short line
>
> The other consequence is coloring a moved addition following a moved
> deletion in the wrong color. In the example below the first "+moved
> line 3" should be highlighted as newMoved not newMovedAlternate.
>
> -moved line 1 # Correctly highlighted as oldMoved
> -moved line 2 # Correctly highlighted as oldMovedAlternate
> +moved line 3 # Wrongly highlighted as newMovedAlternate
>  context      # Everything else is highlighted correctly
> +moved line 2
> +moved line 3
>  context
> +moved line 1
> -moved line 3
>
> These false matches are more likely when using --color-moved-ws with
> the exception of --color-moved-ws=allow-indentation-change which ties
> the sign of the current whitespace delta to the sign of the line to
> avoid this problem. The fix is to check that the sign of the new line
> being matched is the same as the sign of the line that started the
> block of potential matches.
>
> Signed-off-by: Phillip Wood <phillip.wood@xxxxxxxxxxxxx>
> ---
>  diff.c                     | 17 ++++++----
>  t/t4015-diff-whitespace.sh | 65 ++++++++++++++++++++++++++++++++++++++
>  2 files changed, 76 insertions(+), 6 deletions(-)
>
> diff --git a/diff.c b/diff.c
> index 53f0df75329..efba2789354 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -1176,7 +1176,7 @@ static void mark_color_as_moved(struct diff_options *o,
>  	struct moved_block *pmb = NULL; /* potentially moved blocks */
>  	int pmb_nr = 0, pmb_alloc = 0;
>  	int n, flipped_block = 0, block_length = 0;
> -	enum diff_symbol last_symbol = 0;
> +	enum diff_symbol moved_symbol = DIFF_SYMBOL_BINARY_DIFF_HEADER;

The exact value does not matter, as long as it is different from whatever
the next line will have, of course.

>
>
>  	for (n = 0; n < o->emitted_symbols->nr; n++) {
> @@ -1202,7 +1202,7 @@ static void mark_color_as_moved(struct diff_options *o,
>  			flipped_block = 0;
>  		}
>
> -		if (!match) {
> +		if (pmb_nr && (!match || l->s != moved_symbol)) {
>  			int i;
>
>  			if (!adjust_last_block(o, n, block_length) &&
> @@ -1219,12 +1219,13 @@ static void mark_color_as_moved(struct diff_options *o,
>  			pmb_nr = 0;
>  			block_length = 0;
>  			flipped_block = 0;
> -			last_symbol = l->s;
> +		}

This is one of those instances where I dislike having the patch in a
static mail. I so want to have a _convenient_ way to expand the diff
context, to have a look around.

So I went over to
https://github.com/gitgitgadget/git/pull/981/commits/10b11526206d3b515ba08ac80ccf09ecb7a03420
to get the convenience I need for a pleasant reviewing experience.

In this instance, the `continue` that dropped out of that conditional
block gave me pause.

My understanding is that the diff makes it essentially a lot harder to
understand what is done here: this conditional block did two things, it
re-set the possibly-moved-block, and it skipped to the next loop
iteration. With this patch, we now re-set the possibly-moved-block in more
cases, but still skip to the next loop iteration under the same condition
as before:

> +		if (!match) {
> +			moved_symbol = DIFF_SYMBOL_BINARY_DIFF_HEADER;
>  			continue;
>  		}

However, after reading the commit message, I would have expected the
condition above to read `if (!match || l->s != moved_symbol)` instead of
`if (!match)`. Could you help me understand what I am missing?

>
>  		if (o->color_moved == COLOR_MOVED_PLAIN) {
> -			last_symbol = l->s;
>  			l->flags |= DIFF_SYMBOL_MOVED_LINE;
>  			continue;
>  		}

I want to make sure that I understand why the `last_symbol` assignment
could be removed without any `moved_symbol` assignment in its place. But I
don't, I still do not see why we do not need a `moved_symbol = l->s;`
assignment here.

Unless, that is, we extended the `!match` condition above to also cover
the case where `l->s != moved_symbol`.

> @@ -1251,11 +1252,16 @@ static void mark_color_as_moved(struct diff_options *o,
>  							    &pmb, &pmb_alloc,
>  							    &pmb_nr);
>
> -			if (contiguous && pmb_nr && last_symbol == l->s)
> +			if (contiguous && pmb_nr && moved_symbol == l->s)
>  				flipped_block = (flipped_block + 1) % 2;

This is totally not your fault, but I really wish we could have the much
simpler and much easier to understand `flipped_block = !flipped_block`
here.

>  			else
>  				flipped_block = 0;
>
> +			if (pmb_nr)
> +				moved_symbol = l->s;
> +			else
> +				moved_symbol = DIFF_SYMBOL_BINARY_DIFF_HEADER;
> +
>  			block_length = 0;
>  		}
>
> @@ -1265,7 +1271,6 @@ static void mark_color_as_moved(struct diff_options *o,
>  			if (flipped_block && o->color_moved != COLOR_MOVED_BLOCKS)
>  				l->flags |= DIFF_SYMBOL_MOVED_LINE_ALT;
>  		}
> -		last_symbol = l->s;

That makes sense: we only set `moved_symbol` when `pmb_nr` had been 0 now,
and don't want it to be overridden.

As I said, I do not quite understand this patch yet, and am looking for
your guidance to wrap my head around it.

Thank you for working on this!
Dscho

>  	}
>  	adjust_last_block(o, n, block_length);
>
> diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
> index 4e0fd76c6c5..15782c879d2 100755
> --- a/t/t4015-diff-whitespace.sh
> +++ b/t/t4015-diff-whitespace.sh
> @@ -1514,6 +1514,71 @@ test_expect_success 'zebra alternate color is only used when necessary' '
>  	test_cmp expected actual
>  '
>
> +test_expect_success 'short lines of opposite sign do not get marked as moved' '
> +	cat >old.txt <<-\EOF &&
> +	this line should be marked as moved
> +	unchanged
> +	unchanged
> +	unchanged
> +	unchanged
> +	too short
> +	this line should be marked as oldMoved newMoved
> +	this line should be marked as oldMovedAlternate newMoved
> +	unchanged 1
> +	unchanged 2
> +	unchanged 3
> +	unchanged 4
> +	this line should be marked as oldMoved newMoved/newMovedAlternate
> +	EOF
> +	cat >new.txt <<-\EOF &&
> +	too short
> +	unchanged
> +	unchanged
> +	this line should be marked as moved
> +	too short
> +	unchanged
> +	unchanged
> +	this line should be marked as oldMoved newMoved/newMovedAlternate
> +	unchanged 1
> +	unchanged 2
> +	this line should be marked as oldMovedAlternate newMoved
> +	this line should be marked as oldMoved newMoved/newMovedAlternate
> +	unchanged 3
> +	this line should be marked as oldMoved newMoved
> +	unchanged 4
> +	EOF
> +	test_expect_code 1 git diff --no-index --color --color-moved=zebra \
> +		old.txt new.txt >output && cat output &&
> +	grep -v index output | test_decode_color >actual &&
> +	cat >expect <<-\EOF &&
> +	<BOLD>diff --git a/old.txt b/new.txt<RESET>
> +	<BOLD>--- a/old.txt<RESET>
> +	<BOLD>+++ b/new.txt<RESET>
> +	<CYAN>@@ -1,13 +1,15 @@<RESET>
> +	<BOLD;MAGENTA>-this line should be marked as moved<RESET>
> +	<GREEN>+<RESET><GREEN>too short<RESET>
> +	 unchanged<RESET>
> +	 unchanged<RESET>
> +	<BOLD;CYAN>+<RESET><BOLD;CYAN>this line should be marked as moved<RESET>
> +	<GREEN>+<RESET><GREEN>too short<RESET>
> +	 unchanged<RESET>
> +	 unchanged<RESET>
> +	<RED>-too short<RESET>
> +	<BOLD;MAGENTA>-this line should be marked as oldMoved newMoved<RESET>
> +	<BOLD;BLUE>-this line should be marked as oldMovedAlternate newMoved<RESET>
> +	<BOLD;CYAN>+<RESET><BOLD;CYAN>this line should be marked as oldMoved newMoved/newMovedAlternate<RESET>
> +	 unchanged 1<RESET>
> +	 unchanged 2<RESET>
> +	<BOLD;CYAN>+<RESET><BOLD;CYAN>this line should be marked as oldMovedAlternate newMoved<RESET>
> +	<BOLD;YELLOW>+<RESET><BOLD;YELLOW>this line should be marked as oldMoved newMoved/newMovedAlternate<RESET>
> +	 unchanged 3<RESET>
> +	<BOLD;CYAN>+<RESET><BOLD;CYAN>this line should be marked as oldMoved newMoved<RESET>
> +	 unchanged 4<RESET>
> +	<BOLD;MAGENTA>-this line should be marked as oldMoved newMoved/newMovedAlternate<RESET>
> +	EOF
> +	test_cmp expect actual
> +'
> +
>  test_expect_success 'cmd option assumes configured colored-moved' '
>  	test_config color.diff.oldMoved "magenta" &&
>  	test_config color.diff.newMoved "cyan" &&
> --
> gitgitgadget
>
>




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux