On Thursday 15 January 2009, Santi Béjar <santi@xxxxxxxxxxx> wrote about 'Re: [PATCH take 3 0/4] color-words improvements': >It may be ok and logical, but for me it is not what I want. Mmaybe I >don't really undestand what I want or is a crazy idea but here it is >anyway: The discussion above is mildly theoretical. I don't imagine someone is going to intentionally mark 98% of a file as non-words, which is basically what you are doing with a regex of "a+". >a) primary words are those with alphanumerics (or a regex) regex: [[:alnum:]]+ example words: matrix ball I a example non-words: don't haven't >b) secondary "words" are the other non-whitespaces characters (in this >case "[]{} and ," regex: []{}[,] example words: [ , } example non-words: [] ball 147 >c) whitespaces are cruft. > >(having two regexp to specify what is a words but they cannot mix). Combine regex with '|' to get: [[:alnum:]]+|[]{}[,] >If everything works as I think (it's late night :-) with the above two > lines: > >matrix[a,b,c] >matrix{d,b,c} > >the word diff would be > >matrix<RED>[<GREEN>{<RED>a<GREEN>d<RESET>,b,c<RED>]<GREEN>}<RED> For this specific case, the regex "[^[:space:]]" by itself should work, although it would end up being a character-by-character diff. The regex you built from your description "[[:alnum:]]+|[]}{[,]" would also give the same diff. However: -dont +don't gives a word diff of: don't not: don<RED>'<RESET>t because "'" is not recognized as part of any word it is considered ignorable. There was a patch that included documentation that most users should add "|[^[:space:]]" to the end of their regex, to capture all non-whitespace characters that are not otherwise part of a word as individual, single-character "words". -- Boyd Stephen Smith Jr. ,= ,-_-. =. bss@xxxxxxxxxxxxxxxxx ((_/)o o(\_)) ICQ: 514984 YM/AIM: DaTwinkDaddy `-'(. .)`-' http://iguanasuicide.net/ \_/
Attachment:
signature.asc
Description: This is a digitally signed message part.