On Thu, Oct 12, 2017 at 1:05 PM, Jeff King <peff@xxxxxxxx> wrote: > On Thu, Oct 12, 2017 at 10:53:23PM +0300, Orgad Shaneh wrote: > >> There is an infinite loop when colormoved is used with --ignore-space-change: >> >> git init >> seq 20 > test >> git add test >> sed -i 's/9/42/' test >> git -c diff.colormoved diff --ignore-space-change -- test > > Thanks for an easy reproduction recipe. Thanks here as well! > It looks like the problem is that next_byte() doesn't make any forward > progress in the buffer with --ignore-space-change. We try to convert > whitespace into a single space > (I'm not sure why, but I'm not very > familiar with this part of the code). (on why you don't feel familiar) Because it is quite new, and you weren't one of the main reviewers. 2e2d5ac184 (diff.c: color moved lines differently, 2017-06-30) was also very large, such that it is easy to overlook. Though I remember Junio and me discussing the next_byte part quite vividly. :/ (On why it is the way it is) Consider the three strings one SP word one TAB word oneword The first two are equal, but the third is different with `--ignore-space-change` given. To achieve that goal, the easiest thing to do (in my mind) was to replace any sequence of blank characters by "a standard space" sequence. That will make all strings with different white space sequences compare equal, but the one without blanks will be different. > But if there's no space, then the > "cp" pointer never gets advanced. Right, because of the early return, skipping the increment of *cp > This fixes it, but I have no idea if it's doing the right thing: > > diff --git a/diff.c b/diff.c > index 69f03570ad..e8dedc7357 100644 > --- a/diff.c > +++ b/diff.c > @@ -713,13 +713,17 @@ static int next_byte(const char **cp, const char **endp, > return -1; > > if (DIFF_XDL_TST(diffopt, IGNORE_WHITESPACE_CHANGE)) { > - while (*cp < *endp && isspace(**cp)) > + int saw_whitespace = 0; > + while (*cp < *endp && isspace(**cp)) { > (*cp)++; > + saw_whitespace = 1; > + } > /* > * After skipping a couple of whitespaces, we still have to > * account for one space. > */ > - return (int)' '; > + if (saw_whitespace) > + return (int)' '; The "else" is implicit and it falls through to the standard case at the end of the function, incrementing *cp, returning the character *cp pointed at prior to being incremented. That sounds correct. > } > > if (DIFF_XDL_TST(diffopt, IGNORE_WHITESPACE)) { > > I guess it would be equally correct to not enter that if-block unless > isspace(**cp). This also sounds correct. > > -Peff