Tay Ray Chuan <rctay89@xxxxxxxxx> writes: > On Thu, Jan 12, 2012 at 4:05 AM, Thomas Rast <trast@xxxxxxxxxxxxxxx> wrote: >> Tay Ray Chuan <rctay89@xxxxxxxxx> writes: >> >>> Factor out the comprehensive non-whitespace regex in use by PATTERNS and >>> IPATTERN and use it as the word-diff regex for the default diff driver. >> >> Why? Sorry for distracting you with the performance argument; it was mostly the first thing that came to my mind that I could use to ask for the motivation, and evaluation of tradeoffs, that both were missing from the proposed commit message. > But I think it's worthwhile to trade-off performance for a sensible > default. Something like > > matrix[a,b,c] > matrix[d,b,c] > > gives > > matrix[[-a-]{+d+},b,c] > > and when we have > > ImagineALanguageLikeFoo > ImagineALanguageLikeBar > > we get > > ImagineALanguageLike[-Foo-]{+Bar+} In that case (and I should have read the original patch), I am definitely against this change. It turns the default word-diff into character-diff, which is something entirely different, and frequently useless precisely for the reason you state: > (But I cheated. Foo and Bar have no common characters in common; if > they did, the word diff would be messy.) Case in point, consider my patch sent out yesterday http://article.gmane.org/gmane.comp.version-control.git/188391 It consists of a one-hunk doc update. word-diff is not brilliant: -k:: Usually the program [-'cleans up'-]{+removes email cruft from+} the Subject: header line to extract the title line for the commit log [-message,-] [- among which (1) remove 'Re:' or 're:', (2) leading-] [- whitespaces, (3) '[' up to ']', typically '[PATCH]', and-] [- then prepends "[PATCH] ".-]{+message.+} This [-flag forbids-]{+option prevents+} this munging, and is most useful when used to read back 'git format-patch -k' output. [snip the rest as it's only {+}] But character-diff tries too hard to find common subsequences: $ g show HEAD^^ --word-diff-regex='[^[:space:]]' | xsel -k:: Usually the program [-'cl-]{+remov+}e[-an-]s {+email cr+}u[-p'-]{+ft from+} the Subject: header line to extract the title line for the commit log message[-,-] [- among which (1) remove 'Re:' or 're:', (2) leading-] [- w-]{+. T+}hi[-te-]s[-paces, (3) '[' up t-] o[-']', ty-]p[-ically '[PATCH]', and-]t[-he-]{+io+}n pre[-p-]{+v+}en[-ds "[PATCH] ". This flag forbid-]{+t+}s this munging, and is most useful when used to read back 'git format-patch -k' output. [snip] Wouldn't you agree that w-]{+. T+}hi[-te-]s[-paces, (3) '[' up t-] o[-']', ty-]p[ is just line noise? The colors don't even help as most of it is removed (red). Regarding your examples > [1] http://article.gmane.org/gmane.comp.version-control.git/105896 > [2] http://article.gmane.org/gmane.comp.version-control.git/105237 first please notice that both of them were written before (and actually discussing) the introduction of the wordRegex feature. At this point, we were trying to make up our minds w.r.t. how powerful the feature needs to be. Nowadays (or in fact, starting a few days after those emails) the user can easily achieve everything discussed here by setting the wordRegex to taste. That being said, I can see some arguments for changing the default to split punctuation into a separate word. That is, whereas the current default is semantically equivalent to a wordRegex of [^[:space:]]* (but has a faster code path) and your proposal is equivalent to [^[:space:]]|UTF_8_GUARD I think there is a case to be made for a default of [^[:space:]]|([[:alnum:]]|UTF_8_GUARD)+ or some such. There's a lot of bikeshedding lurking in the (non)extent of the [[:alnum:]] here, however. -- Thomas Rast trast@{inf,student}.ethz.ch -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html