Re: [PATCH v2 3/3] diff: check MIN_BLOCK_LENGTH at start of new block

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Aug 14, 2017 at 2:31 PM, Jonathan Tan <jonathantanmy@xxxxxxxxxx> wrote:
> The existing documentation states "If there are fewer than 3 adjacent
> moved lines, they are not marked up as moved", which is ambiguous as to
> whether "adjacent moved lines" must be adjacent both at the source and
> at the destination, or be adjacent merely at the source or merely at the
> destination. The behavior of the current code takes the latter
> interpretation, but the behavior of blocks being conceptually painted as
> blocks and then "unpainted" as lines is confusing to me.
>
> Therefore, clarify the ambiguity in the documentation in the stricter
> direction - a block is completely painted or not at all - and update the
> code accordingly.
>
> This requires a change in the test "detect malicious moved code, inside
> file" in that the malicious change is now marked without the move
> colors (because the blocks involved are too small), contrasting with
> the subsequent test where the non-malicious change is marked with move
> colors.
>
> Signed-off-by: Jonathan Tan <jonathantanmy@xxxxxxxxxx>

I wonder if these changes ought to be a new mode
(C.f. "mountain zebra" and "imperial zebra" for slight
changes in coloring ;) or if we can settle on one true way.

The 3 lines heuristic is a bad heuristic IMHO (it works reasonable well
for little effort but the fact that we discuss this patch makes it a bad
heuristic as we discuss corner cases that are not relevant. The heuristic
originally wanted to filter out stray single braces that were "moved",
it did not want to suppress small original moved pieces of code),
which this covers up a bit.

Maybe we'll cook this in next for a while to see how people
react to it?

> ---
>  Documentation/diff-options.txt |  6 ++--
>  diff.c                         |  6 +++-
>  t/t4015-diff-whitespace.sh     | 71 +++++++++++++++++++++++++++++++++---------
>  3 files changed, 65 insertions(+), 18 deletions(-)
>
> diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
> index bc52bd0b9..1ee3ca3f6 100644
> --- a/Documentation/diff-options.txt
> +++ b/Documentation/diff-options.txt
> @@ -257,10 +257,10 @@ zebra::
>         Blocks of moved code are detected greedily. The detected blocks are
>         painted using either the 'color.diff.{old,new}Moved' color or
>         'color.diff.{old,new}MovedAlternative'. The change between
> -       the two colors indicates that a new block was detected. If there
> -       are fewer than 3 adjacent moved lines, they are not marked up
> +       the two colors indicates that a new block was detected. If a block
> +       has fewer than 3 adjacent moved lines, it is not marked up
>         as moved, but the regular colors 'color.diff.{old,new}' will be
> -       used.
> +       used instead.

Thanks for clarifying the docs.

> diff --git a/diff.c b/diff.c
> index f598d8a3a..20b784736 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -923,7 +923,6 @@ static void mark_color_as_moved(struct diff_options *o,
>                 }
>
>                 l->flags |= DIFF_SYMBOL_MOVED_LINE;
> -               block_length++;
>
>                 if (o->color_moved == COLOR_MOVED_PLAIN)
>                         continue;
> @@ -953,8 +952,13 @@ static void mark_color_as_moved(struct diff_options *o,
>                         }
>
>                         flipped_block = (flipped_block + 1) % 2;
> +
> +                       adjust_last_block(o, n, block_length);
> +                       block_length = 0;
>                 }
>
> +               block_length++;
> +
>                 if (flipped_block)
>                         l->flags |= DIFF_SYMBOL_MOVED_LINE_ALT;
>         }

This changes the algorithm in a non-obvious way.
When the min-length heuristic is strictly bound to each block,
the function can be simplified more than adding on these tweaks,

1) remove variable block_length, needing to count in the adjust function

2) assign DIFF_SYMBOL_MOVED_LINE either in
    COLOR_MOVED_PLAIN case (and continue) or later (where
    block_length is increased in this patch)

No need to do these, just as thoughts on how to reduce complexity.

> diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
> index 6f7758e5c..d0613a189 100755
> --- a/t/t4015-diff-whitespace.sh
> +++ b/t/t4015-diff-whitespace.sh
> @@ -1097,13 +1097,13 @@ test_expect_success 'detect malicious moved code, inside file' '

This test would not 'detect malicious moved code, inside file'  any more,
I think instead we'd rather want to have a more realistic test case,
which has more lines in it? (This test is about the block detection
not about the omission of short blocks, which was an after thought)

> +test_expect_success '--color-moved treats adjacent blocks as separate for MIN_BLOCK_LENGTH' '

Thanks for providing a test here! For testing MIN_BLOCK_LENGTH
for each block I would have imagined the tests would have a block of
length (1,)2,3(,4) lines and then we'd see that the blocks are
highlighted or not.
This only has length=1 blocks?



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux