Johannes Schindelin wrote: > > In some applications, words are not delimited by white space. To > allow for that, you can specify a regular expression describing > what makes a word with > > git diff --color-words='^[A-Za-z0-9]*' [...] > > Intuitively, all you would have to do is to replace this part in > > diff_words_show() > > > > for (i = 0; i < minus.size; i++) > > if (isspace(minus.ptr[i])) > > minus.ptr[i] = '\n'; > > > > by a loop finding the next word boundary. [...] > > However, as I said, I think it would be much more intuitive to > > characterize the _words_ instead of the _word boundaries_. That doesn't work. You cannot overwrite actual content in the strings to be diffed with newlines. The current --color-words exploits the fact that we don't care about spaces anyway, so we might as well replace them with newlines, but we _do_ care about the words and in the regexed version, you have no guarantees about where they might start. To wit: thomas@thomas:~/tmp/foo(master)$ cat >foo foo_bar_baz quux thomas@thomas:~/tmp/foo(master)$ git add foo thomas@thomas:~/tmp/foo(master)$ git ci -m initial [master (root-commit)]: created f110c6c: "initial" 1 files changed, 2 insertions(+), 0 deletions(-) create mode 100644 foo thomas@thomas:~/tmp/foo(master)$ cat >foo foo_ ar_ az quux thomas@thomas:~/tmp/foo(master)$ git diff diff --git i/foo w/foo index 5b34f11..a2762c6 100644 --- i/foo +++ w/foo @@ -1,2 +1,4 @@ -foo_bar_baz +foo_ +ar_ +az quux thomas@thomas:~/tmp/foo(master)$ git diff --color-words diff --git i/foo w/foo index 5b34f11..a2762c6 100644 --- i/foo +++ w/foo @@ -1,2 +1,4 @@ foo_bar_bafoo_ ar_ az quux thomas@thomas:~/tmp/foo(master)$ git diff --color-words='[a-zA-Z]+_?' diff --git i/foo w/foo index 5b34f11..a2762c6 100644 --- i/foo +++ w/foo @@ -1,2 +1,4 @@ quux Even without the colours, you can see that it has a blind spot for changes around a newline. Perhaps there is an easier way to remember them, but we definitely cannot *forget* about the word boundaries. That being said, even though my patch correctly sees the changes, the above test case also exposes some sort of string overrun :-( > > And I would like to keep the default as-is (together _with_ the > > performance. IOW if the user did not specify a regexp, it should fall > > back to what it does now, which is slow enough). That's definitely a valid request. I'll come up with a fixed patch, and probably make it both funcname-like (Jeff's idea) and command line configurable. -- Thomas Rast trast@{inf,student}.ethz.ch
Attachment:
signature.asc
Description: This is a digitally signed message part.