Re: [PATCH 0/4] refactor the --color-words to make it more hackable

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Johannes Schindelin wrote:
> On Sun, 11 Jan 2009, Thomas Rast wrote:
> >  However, the final result segfaults and/or prints garbage (on 
> > apparently every commit except very small changes) when using the regex 
> > '\S+', which IMHO should give exactly the same result as not using a 
> > regex at all.
> 
> No, it should not.  The correct regex is '^\S+'.
> 
> As it happens, your regex matches _anything_ + non-whitespace.

It definitely doesn't(*).

Given ' word rest', '^\S+' would not match at all, and '\S+' would
match 'word'.  No space there.  However, at a cursory glance your
patch seems to ignore the rm_so member of match[0], so it'll never
know the difference.

While it might arguably make sense to enforce that only isspace()
characters are whitespace and !isspace() is at least part of _some_
(possibly one-character) word, I do not think it is a good idea to
require the anchoring of the user.  If we need it, we must anchor the
match ourselves.

> Unfortunately, this includes a newline which utterly confuses the
> diff, 

I do agree that matching a newline as part of a word is bad because we
need it for its diff separator semantics.  Consider passing
REG_NEWLINE to regcomp() to reduce the risk of matching newlines via
things like [^\[:space:]].



(*) Well, modulo Junio's objection in the other thread that \S is
actually a PCRE extension.  Substitute [^[:space:]] if your local
flavour doesn't understand it.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch



Attachment: signature.asc
Description: This is a digitally signed message part.


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux