Re: [PATCH] diff.c: increment buffer pointer in all code path

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Oct 12, 2017 at 08:20:57PM -0400, Jeff King wrote:

> On Thu, Oct 12, 2017 at 08:18:37PM -0400, Jeff King wrote:
> 
> > Fix this by entering the conditional only when we actually
> > see whitespace. We can apply this also to the
> > IGNORE_WHITESPACE change. That code path isn't buggy
> > (because it falls through to returning the next
> > non-whitespace byte), but it makes the logic more clear if
> > we only bother to look at whitespace flags after seeing that
> > the next byte is whitespace.
> 
> I think there actually _is_ a bug in that code path, but it's unrelated
> to this one. If you have whitespace at the end of the buffer, then we'd
> advance *cp until it matches *endp, and then return whatever is at *endp
> (which is nonsense, or probably a NUL) rather than returning "-1".
> 
> I'm out of time for tonight and not familiar enough with the color-moved
> code to come up with a reasonable test case quickly, but maybe you can
> see if that can trigger bad behavior?

I found a few moments to follow up on this. I'm not sure if it's a bug
or intentional behavior. It's definitely easy to trigger next_byte()
reading *endp (even without ignoring whitespace, we always dereference
it).

This function is always operating on the buffer from an
emitted_diff_symbol, which is NUL-terminated. So *endp is always NUL. So
we'll return "0" at the end of the string.

There are two callers. The first is get_string_hash(), which does:

  while ((c = next_byte(&ap, &ae, o)) > 0)

so we'll break out on the NUL byte. But it also sometimes shrinks the
end-pointer "ae" to skip trailing whitespace.  Or at least it tries to.
It does:

   const char *ap = es->line, *ae = es->line + es->len;
   ...
   while (ae > ap && isspace(*ae))
           ae--;

but AFAICT that loop will never trigger, since *ae will always point to
NUL to begin with. But it points to the expectation that our end-pointer
does not follow the usual one-past-the-end convention, but rather is
meant to point to the actual last character in the string.

The same problem is repeated in the other caller, moved_entry_cmp(),
which tries to eat trailing whitespace for IGNORE_WHITESPACE_AT_EOL. But
it uses the same loop that starts on NUL and will never trigger.

It then goes on to call next_byte(). The extra NUL there should not
cause any problem, because we are just checking for equality between the
two strings (so if they both return NUL at the same time, good; if not,
then we know they are different).

So. That leaves me with:

  - I'm unclear on whether next_byte() is meant to return that trailing
    NUL or not. I don't think it causes any bugs, but it certainly
    confused me for a function to take a cp/endp pair of pointers, and
    then dereference endp. It might be worth either fixing or clarifying
    with a comment.

  - Those loops to eat trailing whitespace are doing nothing. I'm not
    sure if that all works out because next_byte() eats whitespaces or
    not (I think not, because it doesn't eat whitespace for the
    IGNORE_WHITESPACE_AT_EOL case). But I'm not quite sure what a test
    would look like.

-Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux