Re: Out of memory with diff.colormoved enabled

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Oct 12, 2017 at 1:05 PM, Jeff King <peff@xxxxxxxx> wrote:
> On Thu, Oct 12, 2017 at 10:53:23PM +0300, Orgad Shaneh wrote:
>
>> There is an infinite loop when colormoved is used with --ignore-space-change:
>>
>> git init
>> seq 20 > test
>> git add test
>> sed -i 's/9/42/' test
>> git -c diff.colormoved diff --ignore-space-change -- test
>
> Thanks for an easy reproduction recipe.

Thanks here as well!

> It looks like the problem is that next_byte() doesn't make any forward
> progress in the buffer with --ignore-space-change. We try to convert
> whitespace into a single space

> (I'm not sure why, but I'm not very
> familiar with this part of the code).

(on why you don't feel familiar)
Because it is quite new, and you weren't one of the main reviewers.
2e2d5ac184 (diff.c: color moved lines differently, 2017-06-30) was also very
large, such that it is easy to overlook. Though I remember Junio and me
discussing the next_byte part quite vividly. :/

(On why it is the way it is)
Consider the three strings
    one SP word
    one TAB word
    oneword

The first two are equal, but the third is different with
`--ignore-space-change` given. To achieve that goal,
the easiest thing to do (in my mind) was to replace
any sequence of blank characters by "a standard
space" sequence. That will make all strings with different
white space sequences compare equal, but the one
without blanks will be different.

> But if there's no space, then the
> "cp" pointer never gets advanced.

Right, because of the early return, skipping the increment of *cp

> This fixes it, but I have no idea if it's doing the right thing:
>
> diff --git a/diff.c b/diff.c
> index 69f03570ad..e8dedc7357 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -713,13 +713,17 @@ static int next_byte(const char **cp, const char **endp,
>                 return -1;
>
>         if (DIFF_XDL_TST(diffopt, IGNORE_WHITESPACE_CHANGE)) {
> -               while (*cp < *endp && isspace(**cp))
> +               int saw_whitespace = 0;
> +               while (*cp < *endp && isspace(**cp)) {
>                         (*cp)++;
> +                       saw_whitespace = 1;
> +               }
>                 /*
>                  * After skipping a couple of whitespaces, we still have to
>                  * account for one space.
>                  */
> -               return (int)' ';
> +               if (saw_whitespace)
> +                       return (int)' ';

The "else" is implicit and it falls through to
the standard case at the end of the function,
incrementing *cp, returning the character *cp
pointed at prior to being incremented.

That sounds correct.

>         }
>
>         if (DIFF_XDL_TST(diffopt, IGNORE_WHITESPACE)) {
>
> I guess it would be equally correct to not enter that if-block unless
> isspace(**cp).

This also sounds correct.

>
> -Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux