Re: [PATCH take 3 0/4] color-words improvements

"Boyd Stephen Smith Jr." <bss@xxxxxxxxxxxxxxxxx> · Thu, 15 Jan 2009 19:42:55 -0600

On Thursday 15 January 2009, Santi Béjar <santi@xxxxxxxxxxx> wrote 
about 'Re: [PATCH take 3 0/4] color-words improvements':
>It may be ok and logical, but for me it is not what I want. Mmaybe I
>don't really undestand what I want or is a crazy idea but here it is
>anyway:

The discussion above is mildly theoretical.  I don't imagine someone is 
going to intentionally mark 98% of a file as non-words, which is basically 
what you are doing with a regex of "a+".

>a) primary words are those with alphanumerics (or a regex)

regex: [[:alnum:]]+

example words: matrix ball I a
example non-words: don't haven't

>b) secondary "words" are the other non-whitespaces characters (in this
>case "[]{} and ,"

regex: []{}[,]

example words: [ , }
example non-words: [] ball 147

>c) whitespaces are cruft.
>
>(having two regexp to specify what is a words but they cannot mix).

Combine regex with '|' to get:
[[:alnum:]]+|[]{}[,]

>If everything works as I think (it's late night :-) with the above two
> lines:
>
>matrix[a,b,c]
>matrix{d,b,c}
>
>the word diff would be
>
>matrix<RED>[<GREEN>{<RED>a<GREEN>d<RESET>,b,c<RED>]<GREEN>}<RED>

For this specific case, the regex "[^[:space:]]" by itself should work, 
although it would end up being a character-by-character diff.

The regex you built from your description "[[:alnum:]]+|[]}{[,]" would also 
give the same diff.  However:
-dont
+don't
gives a word diff of:
don't
not:
don<RED>'<RESET>t
because "'" is not recognized as part of any word it is considered 
ignorable.

There was a patch that included documentation that most users should add 
"|[^[:space:]]" to the end of their regex, to capture all non-whitespace 
characters that are not otherwise part of a word as individual, 
single-character "words".
-- 
Boyd Stephen Smith Jr.                     ,= ,-_-. =. 
bss@xxxxxxxxxxxxxxxxx                     ((_/)o o(\_))
ICQ: 514984 YM/AIM: DaTwinkDaddy           `-'(. .)`-' 
http://iguanasuicide.net/                      \_/     
Attachment:
signature.asc

Description: This is a digitally signed message part.