Re: [PATCH 4/5] diff: fix whitespace-skipping with --color-moved

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Oct 19, 2017 at 1:29 PM, Jeff King <peff@xxxxxxxx> wrote:
> The code for handling whitespace with --color-moved
> represents partial strings as a pair of pointers. There are
> two possible conventions for the end pointer:
>
>   1. It points to the byte right after the end of the
>      string.
>
>   2. It points to the final byte of the string.
>
> But we seem to use both conventions in the code:
>
>   a. we assign the initial pointers from the NUL-terminated
>      string using (1)
>
>   b. we eat trailing whitespace by checking the second
>      pointer for isspace(), which needs (2)
>
>   c. the next_byte() function checks for end-of-string with
>      "if (cp > endp)", which is (2)
>
>   d. in next_byte() we skip past internal whitespace with
>      "while (cp < end)", which is (1)
>
> This creates fewer bugs than you might think, because there
> are some subtle interactions. Because of (a) and (c), we
> always return the NUL-terminator from next_byte(). But all
> of the callers of next_byte() happen to handle that
> gracefully.
>
> Because of the mismatch between (d) and (c), next_byte()
> could accidentally return a whitespace character right at
> endp. But because of the interaction of (a) and (b), we fail
> to actually chomp trailing whitespace, meaning our endp
> _always_ points to a NUL, canceling out the problem.
>
> But that does leave (b) as a real bug: when ignoring
> whitespace only at the end-of-line, we don't correctly trim
> it, and fail to match up lines.
>
> We can fix the whole thing by moving consistently to one
> convention. Since convention (1) is idiomatic in our code
> base, we'll pick that one.
>
> The existing "-w" and "-b" tests continue to pass, and a new
> "--ignore-space-at-eol" shows off the breakage we're fixing.
>
> Signed-off-by: Jeff King <peff@xxxxxxxx>
> ---
>  diff.c                     | 15 +++++++----
>  t/t4015-diff-whitespace.sh | 67 ++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 77 insertions(+), 5 deletions(-)
>
> diff --git a/diff.c b/diff.c
> index 6fd288420b..09081a207c 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -712,7 +712,7 @@ static int next_byte(const char **cp, const char **endp,
>  {
>         int retval;
>
> -       if (*cp > *endp)
> +       if (*cp >= *endp)
>                 return -1;

This converts (c) from (2) to (1).

>                         while (*cp < *endp && isspace(**cp))
>                                 (*cp)++;
> -                       /* return the first non-ws character via the usual below */
> +                       /*
> +                        * return the first non-ws character via the usual
> +                        * below, unless we ate all of the bytes
> +                        */
> +                       if (*cp >= *endp)
> +                               return -1;

This fixes the mismatch between (d) and (c).

When I wrote the code, I did not follow proper commenting style
by capitalizing the sentence start and putting periods. :(
(Well it was a single line comment, which I have seen to not
follow style occasionally unlike any multi line comment. Anyway
no need to fix it here and now, just pointing out my bad code
in the beginning.)

> -               while (ae > ap && isspace(*ae))
> +               while (ae > ap && isspace(ae[-1]))
>                         ae--;
> -               while (be > bp && isspace(*be))
> +               while (be > bp && isspace(be[-1]))
>                         be--;
...
> -               while (ae > ap && isspace(*ae))
> +               while (ae > ap && isspace(ae[-1]))

These fixes convert (b) to (1), so we're all on (1).
As we check for strict endpointer > firstpointer
(and not >=), the check of -1 is fine, too.


> @@ -1463,6 +1463,73 @@ test_expect_success 'move detection ignoring whitespace changes' '
>         test_cmp expected actual
>  '
>
> +test_expect_failure 'move detection ignoring whitespace at eol' '
> +       git reset --hard &&
> +       # Lines 6-9 have new eol whitespace, but 9 also has it in the middle
> +       q_to_tab <<-\EOF >lines.txt &&
> +       long line 6Q
> +       long line 7Q
> +       long line 8Q
> +       longQline 9Q
> +       line 1
> +       line 2
> +       line 3
> +       line 4
> +       line 5
> +       EOF
> +
> +       # avoid cluttering the output with complaints about our eol whitespace
> +       test_config core.whitespace -blank-at-eol &&

We avoid the eol space change as we want to test the move detection
without interference. Do we want to test it with that as well?

> +       git diff HEAD --no-renames --color-moved --color |
> +               grep -v "index" |
> +               test_decode_color >actual &&
..
> +       git diff HEAD --no-renames --ignore-space-at-eol --color-moved --color |
> +               grep -v "index" |
> +               test_decode_color >actual &&
..
> +       <GREEN>+<RESET><GREEN>long      line 9  <RESET>

ok, we also have no interference with space changes,
which we assume is orthogonal.

The commit message really enlightened me,
Thanks!

Stefan



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux