Re: [PATCH 0/4] refactor the --color-words to make it more hackable

Thomas Rast <trast@xxxxxxxxxxxxxxx> · Mon, 12 Jan 2009 07:25:28 +0100

Johannes Schindelin wrote:
> On Sun, 11 Jan 2009, Thomas Rast wrote:
> >  However, the final result segfaults and/or prints garbage (on 
> > apparently every commit except very small changes) when using the regex 
> > '\S+', which IMHO should give exactly the same result as not using a 
> > regex at all.
> 
> No, it should not.  The correct regex is '^\S+'.
> 
> As it happens, your regex matches _anything_ + non-whitespace.

It definitely doesn't(*).

Given ' word rest', '^\S+' would not match at all, and '\S+' would
match 'word'.  No space there.  However, at a cursory glance your
patch seems to ignore the rm_so member of match[0], so it'll never
know the difference.

While it might arguably make sense to enforce that only isspace()
characters are whitespace and !isspace() is at least part of _some_
(possibly one-character) word, I do not think it is a good idea to
require the anchoring of the user.  If we need it, we must anchor the
match ourselves.

> Unfortunately, this includes a newline which utterly confuses the
> diff, 

I do agree that matching a newline as part of a word is bad because we
need it for its diff separator semantics.  Consider passing
REG_NEWLINE to regcomp() to reduce the risk of matching newlines via
things like [^\[:space:]].

(*) Well, modulo Junio's objection in the other thread that \S is
actually a PCRE extension.  Substitute [^[:space:]] if your local
flavour doesn't understand it.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

Attachment:
signature.asc

Description: This is a digitally signed message part.