As a side remark, this patch makes a good use-case for --patience, and is not isomorphic to the other edit-and-move examples; rather it's a delete-and-edit. Johannes Schindelin wrote: > Subject: [PATCH 3/4] color-words: refactor to allow for 0-character word boundaries I do not think the term "refactor" is accurate. Wikipedia roughly defines it as a code change that preserves all external semantics by some standard method, and lists methods such as variable renaming, common code extraction, etc. You are actually completely replacing the algorithm "under the hood" with a new one, so no such standard method applies. And there is also a tiny semantic change: compare A: a b c B: x y z ^^ The old version implicitly generated an empty line at the double spaces (marked ^^), which subsequently became context and caused the words to be printed as follows, where <..> is old and [..] is new: <a b >[x y ] <c>[z] Your patched version does not generate empty lines for any space whatsoever, not even for newlines. Thus the result is <a b c>[x y z] I think this is actually a good change, since it results in longer chunks for "entirely rewritten" parts of the diff. It also answers Junio's question in the other thread: Junio C Hamano wrote: >> >> What happens if the input "language" does not have any inter-word spacing >> but its words can still be expressed by regexp patterns? >> >> ImagineALanguageThatAllowsYouToWriteSomethingLikeThis. Does the mechanism >> help users who want to do word-diff files written in such a language by >> outputting: >> >> ImagineALanguage<red>That</red><green>Which</green>AllowsYou... >> >> when '[A-Z][a-z]*' is given by the word pattern? Your patch handles this as a side-effect *even if the lines are indented*, since no sequence of spaces whatsoever is special. (Mine would have given hard-to-predict results based on the number of newlines between them, and xdiff's decision whether the newlines or the words are more valuable as context.) So I think this is actually an improvement, but the commit message should point out the change in semantics. > +static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len) > { > + if (line[0] != '@' || parse_hunk_header(line, len, > + &minus_first, &minus_len, &plus_first, &plus_len)) It would be nice to have a comment here that points out that this method crucially relies on having context length 0 (just as the old one crucially relied on having the full text in a single hunk). > + for (i = 0; i < buffer->text.size; i++) { > + if (isspace(buffer->text.ptr[i])) > + continue; I think it is this coupling of the loops to find a word, and to find a word _beginning_, that comes back to haunt you in 4/4. If the outer loop was strictly about the words, you could use the regex match info to find the beginning in the regex case. This is probably cleaner than attempting to force an anchored match, since at least the 'grep' on my system takes '^^foo' to mean 'a "^foo" at the beginning of a line', so you cannot just unconditionally insert a ^. (Conditionally inserting one seems even harder.) These remarks aside (and the last one is the only one of relevance to the code), this patch would be a vast improvement of the code even if we weren't discussing it in the context of the regex feature. So FWIW Acked-by: Thomas Rast <trast@xxxxxxxxxxxxxxx> up to here. I hope we can agree on some sane regex semantics for 4/4... -- Thomas Rast trast@{inf,student}.ethz.ch
Attachment:
signature.asc
Description: This is a digitally signed message part.