On Thu, Jan 12, 2012 at 5:22 PM, Thomas Rast <trast@xxxxxxxxxxxxxxx> wrote: > [snip] > Case in point, consider my patch sent out yesterday > > http://article.gmane.org/gmane.comp.version-control.git/188391 > > It consists of a one-hunk doc update. word-diff is not brilliant: > > -k:: > Usually the program [-'cleans up'-]{+removes email cruft from+} the Subject: > header line to extract the title line for the commit log > [-message,-] > [- among which (1) remove 'Re:' or 're:', (2) leading-] > [- whitespaces, (3) '[' up to ']', typically '[PATCH]', and-] > [- then prepends "[PATCH] ".-]{+message.+} This [-flag forbids-]{+option prevents+} this munging, and is most > useful when used to read back 'git format-patch -k' output. > [snip the rest as it's only {+}] > > But character-diff tries too hard to find common subsequences: > > $ g show HEAD^^ --word-diff-regex='[^[:space:]]' | xsel >[snip] > w-]{+. T+}hi[-te-]s[-paces, (3) '[' up t-] o[-']', ty-]p[ > > is just line noise? The colors don't even help as most of it is removed > (red). You missed the '+' quantifier, as in [^[:space:]]+ Using that regex, that abomination of a word-diff that you mentioned disappears, like this: -k:: Usually the program [-'cleans up'-]{+removes email cruft from+} the Subject: header line to extract the title line for the commit log [-message,-] [- among which (1) remove 'Re:' or 're:', (2) leading-] [- whitespaces, (3) '[' up to ']', typically '[PATCH]', and-] [- then prepends "[PATCH] ".-]{+message.+} This [-flag forbids-]{+option prevents+} this munging, and is most useful when used to read back 'git format-patch -k' output. > [snip] > That being said, I can see some arguments for changing the default to > split punctuation into a separate word. That is, whereas the current > default is semantically equivalent to a wordRegex of > > [^[:space:]]* > > (but has a faster code path) Oh right, there *is* a sensible default implemented in. Somehow I was under the impression that there wasn't. I wonder which is faster, using the non-whitespace regex, or the isspace() calls... > and your proposal is equivalent to > > [^[:space:]]|UTF_8_GUARD > > I think there is a case to be made for a default of > > [^[:space:]]|([[:alnum:]]|UTF_8_GUARD)+ > > or some such. There's a lot of bikeshedding lurking in the (non)extent > of the [[:alnum:]] here, however. Care to explain further? Not to sure what you mean here. -- Cheers, Ray Chuan -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html