Re: [RFC PATCH] make diff --color-words customizable

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On Fri, 9 Jan 2009, Thomas Rast wrote:

> Johannes Schindelin wrote:
> > On Fri, 9 Jan 2009, Thomas Rast wrote:
> > 
> > > Allows for user-configurable word splits when using --color-words. This 
> > > can make the diff more readable if the regex is configured according to 
> > > the language of the file.
> > > 
> > > For now the (POSIX extended) regex must be set via the environment
> > > GIT_DIFF_WORDS_REGEX.  Each (non-overlapping) match of the regex is
> > > considered a word.  Anything characters not matched are considered
> > > whitespace.  For example, for C try
> > > 
> > >   GIT_DIFF_WORDS_REGEX='[0-9]+|[a-zA-Z_][a-zA-Z0-9_]*|(\+|-|&|\|){1,2}|\S'
> [...]
> > Interesting idea.  However, I think it would be better to do the opposite, 
> > have _word_ patterns.  And even better to have _one_ pattern.
> 
> I'm not sure I understand.  It _is_ a single pattern.  The examples
> just have several cases to distinguish various semantic groups that
> can occur, as a sort of "half tokenizer".  (The C example isn't very
> complete however.)

Oh, I was fooled by your use of an array of enums whose purpose I did not 
understand at all.

> > BTW I think you could do what you intended to do with a _way_ smaller 
> > and more intuitive patch.
> 
> How?

Intuitively, all you would have to do is to replace this part in 
diff_words_show()

        for (i = 0; i < minus.size; i++)
                if (isspace(minus.ptr[i]))
                        minus.ptr[i] = '\n';

by a loop finding the next word boundary.  I would suggest making that a 
function, say,

	int find_word_boundary(struct diff_words_data *data, char *minus);

This function would also be responsible to initialize the regexp.

However, as I said, I think it would be much more intuitive to 
characterize the _words_ instead of the _word boundaries_.

And I would like to keep the default as-is (together _with_ the 
performance.  IOW if the user did not specify a regexp, it should fall 
back to what it does now, which is slow enough).

Ciao,
Dscho

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux