Johannes Schindelin <Johannes.Schindelin@xxxxxx> writes: > Now, you can specify which characters are to be interpreted as word > characters with "--color-words=A-Za-z", or by setting the config variable > diff.wordCharacters. > > Signed-off-by: Johannes Schindelin <johannes.schindelin@xxxxxx> > --- > > I would have preferred an approach like this. Hmmm... > diff --git a/README b/README > index 548142c..0e325e2 100644 > --- a/README > +++ b/README > @@ -4,7 +4,7 @@ > > //////////////////////////////////////////////////////////////// > > -"git" can mean anything, depending on your mood. > +"git" cann mean anything, depending on your mood. Heh. > @@ -456,7 +514,7 @@ static void diff_words_show(struct diff_words_data *diff_words) > plus.ptr = xmalloc(plus.size); > memcpy(plus.ptr, diff_words->plus.text.ptr, plus.size); > for (i = 0; i < plus.size; i++) > - if (isspace(plus.ptr[i])) > + if (!word_character[(unsigned char)plus.ptr[i]]) > plus.ptr[i] = '\n'; > diff_words->plus.current = 0; I do not think there is much difference between specifying the set of word characters and the set of non-word characters, especially as long as your definition of "character" is limited to 8-bit bytes. By enumerating word characters, your patch is letting the user specify non word characters that are remainder from the 256-element set. By the way, I think you meant to do the same for the "minus" side a few lines above this hunk. I commented on the patch from Ping earier about a quite different issue. I was wondering if we can avoid losing the non-word character information. The original code replaces any isspace byte with LF, but a whitespace is a whitespace is a whitespace so there won't be much loss of information, but making the above isspace() configurable means that now you are going to drop non-space non-word characters from the output set. Instead of dropping the original character and replacing it with LF, I thought a more sensible approach would be to _insert_ a line break between runs of word characters and non-word characters (while probably dropping a LF in the original). That is, instead of what the current implementation of the above loop does to "ab c d" (i.e. rewrite it to "ab\n\nc\nd"), rewrite it to "ab\n \nc\n \nd". Which feels more consistent with the way how \b should work. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html