On Thu, Oct 19, 2017 at 02:30:08PM +0900, Junio C Hamano wrote: > Jeff King <peff@xxxxxxxx> writes: > > > it does. It just adjusts our "end pointer" to point to the last valid > > character in the string (rather than one past), which seems to be the > > convention that those loops (and next_byte) expect. > > Yeah I am not sure if I like this comparison at the beginning of the > function: > > static int next_byte(const char **cp, const char **endp, > const struct diff_options *diffopt) > { > int retval; > > if (*cp > *endp) > return -1; > > but it says endp _is_ part of valid input, contrary to my intuition. Actually, I think even this function is confused about its convention. In the line you quote, we clearly treat *endp as part of the input. But later we do: while (*cp < *endp && isspace(**cp)) (*cp)++; meaning that we'd fail to soak up whitespace at *endp. That wouldn't be so bad if not for the other bug which fails to eat whitespace at endp in the first place. :) So I think the right fix is this: diff --git a/diff.c b/diff.c index 6fd288420b..09081a207c 100644 --- a/diff.c +++ b/diff.c @@ -712,7 +712,7 @@ static int next_byte(const char **cp, const char **endp, { int retval; - if (*cp > *endp) + if (*cp >= *endp) return -1; if (isspace(**cp)) { @@ -729,7 +729,12 @@ static int next_byte(const char **cp, const char **endp, if (DIFF_XDL_TST(diffopt, IGNORE_WHITESPACE)) { while (*cp < *endp && isspace(**cp)) (*cp)++; - /* return the first non-ws character via the usual below */ + /* + * return the first non-ws character via the usual + * below, unless we ate all of the bytes + */ + if (*cp >= *endp) + return -1; } } @@ -750,9 +755,9 @@ static int moved_entry_cmp(const struct diff_options *diffopt, return a->es->len != b->es->len || memcmp(ap, bp, a->es->len); if (DIFF_XDL_TST(diffopt, IGNORE_WHITESPACE_AT_EOL)) { - while (ae > ap && isspace(*ae)) + while (ae > ap && isspace(ae[-1])) ae--; - while (be > bp && isspace(*be)) + while (be > bp && isspace(be[-1])) be--; } @@ -775,7 +780,7 @@ static unsigned get_string_hash(struct emitted_diff_symbol *es, struct diff_opti int c; strbuf_reset(&sb); - while (ae > ap && isspace(*ae)) + while (ae > ap && isspace(ae[-1])) ae--; while ((c = next_byte(&ap, &ae, o)) > 0) strbuf_addch(&sb, c); It's late here, so I'll wait for comments from Stefan and then try to wrap it up with a commit message and test tomorrow. -Peff