On Thu, Oct 19, 2017 at 1:29 PM, Jeff King <peff@xxxxxxxx> wrote: > The code for handling whitespace with --color-moved > represents partial strings as a pair of pointers. There are > two possible conventions for the end pointer: > > 1. It points to the byte right after the end of the > string. > > 2. It points to the final byte of the string. > > But we seem to use both conventions in the code: > > a. we assign the initial pointers from the NUL-terminated > string using (1) > > b. we eat trailing whitespace by checking the second > pointer for isspace(), which needs (2) > > c. the next_byte() function checks for end-of-string with > "if (cp > endp)", which is (2) > > d. in next_byte() we skip past internal whitespace with > "while (cp < end)", which is (1) > > This creates fewer bugs than you might think, because there > are some subtle interactions. Because of (a) and (c), we > always return the NUL-terminator from next_byte(). But all > of the callers of next_byte() happen to handle that > gracefully. > > Because of the mismatch between (d) and (c), next_byte() > could accidentally return a whitespace character right at > endp. But because of the interaction of (a) and (b), we fail > to actually chomp trailing whitespace, meaning our endp > _always_ points to a NUL, canceling out the problem. > > But that does leave (b) as a real bug: when ignoring > whitespace only at the end-of-line, we don't correctly trim > it, and fail to match up lines. > > We can fix the whole thing by moving consistently to one > convention. Since convention (1) is idiomatic in our code > base, we'll pick that one. > > The existing "-w" and "-b" tests continue to pass, and a new > "--ignore-space-at-eol" shows off the breakage we're fixing. > > Signed-off-by: Jeff King <peff@xxxxxxxx> > --- > diff.c | 15 +++++++---- > t/t4015-diff-whitespace.sh | 67 ++++++++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 77 insertions(+), 5 deletions(-) > > diff --git a/diff.c b/diff.c > index 6fd288420b..09081a207c 100644 > --- a/diff.c > +++ b/diff.c > @@ -712,7 +712,7 @@ static int next_byte(const char **cp, const char **endp, > { > int retval; > > - if (*cp > *endp) > + if (*cp >= *endp) > return -1; This converts (c) from (2) to (1). > while (*cp < *endp && isspace(**cp)) > (*cp)++; > - /* return the first non-ws character via the usual below */ > + /* > + * return the first non-ws character via the usual > + * below, unless we ate all of the bytes > + */ > + if (*cp >= *endp) > + return -1; This fixes the mismatch between (d) and (c). When I wrote the code, I did not follow proper commenting style by capitalizing the sentence start and putting periods. :( (Well it was a single line comment, which I have seen to not follow style occasionally unlike any multi line comment. Anyway no need to fix it here and now, just pointing out my bad code in the beginning.) > - while (ae > ap && isspace(*ae)) > + while (ae > ap && isspace(ae[-1])) > ae--; > - while (be > bp && isspace(*be)) > + while (be > bp && isspace(be[-1])) > be--; ... > - while (ae > ap && isspace(*ae)) > + while (ae > ap && isspace(ae[-1])) These fixes convert (b) to (1), so we're all on (1). As we check for strict endpointer > firstpointer (and not >=), the check of -1 is fine, too. > @@ -1463,6 +1463,73 @@ test_expect_success 'move detection ignoring whitespace changes' ' > test_cmp expected actual > ' > > +test_expect_failure 'move detection ignoring whitespace at eol' ' > + git reset --hard && > + # Lines 6-9 have new eol whitespace, but 9 also has it in the middle > + q_to_tab <<-\EOF >lines.txt && > + long line 6Q > + long line 7Q > + long line 8Q > + longQline 9Q > + line 1 > + line 2 > + line 3 > + line 4 > + line 5 > + EOF > + > + # avoid cluttering the output with complaints about our eol whitespace > + test_config core.whitespace -blank-at-eol && We avoid the eol space change as we want to test the move detection without interference. Do we want to test it with that as well? > + git diff HEAD --no-renames --color-moved --color | > + grep -v "index" | > + test_decode_color >actual && .. > + git diff HEAD --no-renames --ignore-space-at-eol --color-moved --color | > + grep -v "index" | > + test_decode_color >actual && .. > + <GREEN>+<RESET><GREEN>long line 9 <RESET> ok, we also have no interference with space changes, which we assume is orthogonal. The commit message really enlightened me, Thanks! Stefan