Hi, On Fri, 9 Jan 2009, Thomas Rast wrote: > Johannes Schindelin wrote: > > On Fri, 9 Jan 2009, Thomas Rast wrote: > > > > > Allows for user-configurable word splits when using --color-words. This > > > can make the diff more readable if the regex is configured according to > > > the language of the file. > > > > > > For now the (POSIX extended) regex must be set via the environment > > > GIT_DIFF_WORDS_REGEX. Each (non-overlapping) match of the regex is > > > considered a word. Anything characters not matched are considered > > > whitespace. For example, for C try > > > > > > GIT_DIFF_WORDS_REGEX='[0-9]+|[a-zA-Z_][a-zA-Z0-9_]*|(\+|-|&|\|){1,2}|\S' > [...] > > Interesting idea. However, I think it would be better to do the opposite, > > have _word_ patterns. And even better to have _one_ pattern. > > I'm not sure I understand. It _is_ a single pattern. The examples > just have several cases to distinguish various semantic groups that > can occur, as a sort of "half tokenizer". (The C example isn't very > complete however.) Oh, I was fooled by your use of an array of enums whose purpose I did not understand at all. > > BTW I think you could do what you intended to do with a _way_ smaller > > and more intuitive patch. > > How? Intuitively, all you would have to do is to replace this part in diff_words_show() for (i = 0; i < minus.size; i++) if (isspace(minus.ptr[i])) minus.ptr[i] = '\n'; by a loop finding the next word boundary. I would suggest making that a function, say, int find_word_boundary(struct diff_words_data *data, char *minus); This function would also be responsible to initialize the regexp. However, as I said, I think it would be much more intuitive to characterize the _words_ instead of the _word boundaries_. And I would like to keep the default as-is (together _with_ the performance. IOW if the user did not specify a regexp, it should fall back to what it does now, which is slow enough). Ciao, Dscho -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html