On Tue, Jun 27, 2017 at 10:06 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote: > Junio C Hamano <gitster@xxxxxxxxx> writes: > >> Looking at the implementation of get_ws_cleaned_string() that is the >> workhorse of emitted_symbol_cmp_no_ws(), it seems to be doing wrong >> things for various "ignore whitespace" options (i.e. there is only >> one implementation, while "git diff" family takes things like >> "ignore space change", "ignore all whitespace", etc.), though. > > This probably deserves a bit more illustration of how I envision the > code should evolve. > > In the longer term, I would prefer to see emitted_symbol_cmp_no_ws() > to go and instead emitted_symbol_cmp() to take the diff options so > that it can change the behaviour of the comparison function based on > the -w/-b/--ignore-space-at-eol/etc. settings. And compare two strings > in place. > > For that, you may need a helper function that takes a pointer to a > character pointer, picks the next byte that matters while advancing > the pointer, and returns that byte. The emitted_symbol_cmp(a, b) > which is not used for real comparison (i.e. ordering to see if a > sorts earlier than b) but for equivalence (i.e. considering various > whitespace-ignoring settings, does a and b matfch?) may become > something like: > > int > emitted_symbol_eqv(struct emitted_diff_symbol *a, > struct emitted_diff_symbol *b, > const void *keydata) { > struct diff_options *diffopt = keydata; > const char *ap = a->line; > const char *bp = b->line; > > while (1) { > int ca, cb; > ca = next_byte(&ap, diffopt); > cb = next_byte(&bp, diffopt); > if (ca != cb) > return 1; /* differs */ > if (!ca) > return 0; > }; > } > > where next_byte() may look like: > > static int > next_byte(const char **cp, struct diff_options *diffopt) > { > int retval; > > again: > retval = **cp; > if (!retval) > return retval; /* end of string */ > if (!isspace(retval)) { > (*cp)++; /* advance */ > return retval; > } > > switch (ignore_space_type(diffopt)) { > case NOT_IGNORING: > (*cp)++; /* advance */ > return retval; > case IGNORE_SPACE_CHANGE: > while (**cp && isspace(**cp)) > (*cp)++; /* squash consecutive spaces */ > return ' '; /* normalize spaces with a SP */ > case IGNORE_ALL_SPACE: > (*cp)++; /* advance */ > goto again; > ... other cases here ... > } > } > > I just rebased the diff series on top of the hashmap series and now I want to implement the compare function based on the new data feature. So I thought I might start off this proposal, as you seem to have good taste for how to approach problems. It turns out that the code here is rather a very loose proposal, not to be copied literally, because of these constraints: * When dealing with user content, we do not have C-strings, but memory + length, such that next_byte also needs to be aware of the end pointer. * The ignore_space_type() as well as these constants do not exist as-is, I think the cleanest for now is to parse diffopt->xdl_opts via DIFF_XDL_TST Thanks, Stefan