Johannes Schindelin wrote: > On Sun, 11 Jan 2009, Thomas Rast wrote: > > However, the final result segfaults and/or prints garbage (on > > apparently every commit except very small changes) when using the regex > > '\S+', which IMHO should give exactly the same result as not using a > > regex at all. > > No, it should not. The correct regex is '^\S+'. > > As it happens, your regex matches _anything_ + non-whitespace. It definitely doesn't(*). Given ' word rest', '^\S+' would not match at all, and '\S+' would match 'word'. No space there. However, at a cursory glance your patch seems to ignore the rm_so member of match[0], so it'll never know the difference. While it might arguably make sense to enforce that only isspace() characters are whitespace and !isspace() is at least part of _some_ (possibly one-character) word, I do not think it is a good idea to require the anchoring of the user. If we need it, we must anchor the match ourselves. > Unfortunately, this includes a newline which utterly confuses the > diff, I do agree that matching a newline as part of a word is bad because we need it for its diff separator semantics. Consider passing REG_NEWLINE to regcomp() to reduce the risk of matching newlines via things like [^\[:space:]]. (*) Well, modulo Junio's objection in the other thread that \S is actually a PCRE extension. Substitute [^[:space:]] if your local flavour doesn't understand it. -- Thomas Rast trast@{inf,student}.ethz.ch
Attachment:
signature.asc
Description: This is a digitally signed message part.