On Wed, 4 May 2005, Chris Robertson wrote:
For the record, something like:
^[^?]*sex[^?]*
should fit the bill(1), but would be an absolute CPU hog. Not to mention that you would have to do something to match other "bad" words (e.g. one regex for each word, or a single really long regex utilizing the "or" operator).
Chris
(1) If my logic is correct, this statement should translate to "From the beginning of the line, match anything but a question mark zero or more times, the letters s, e and x all together and anything but a question mark zero or more times". Not pretty, but it would do what you are asking.
Correct. But you should also realise that the letters s e x is not equal to the word sex. The letters s e x is part of many other words not at all related to sex (the most common example is the city Sussex).
GNU regex has magic patterns for matching word boundaries which can do wonders at improving the accuracy of word based patters.
There are two special cases(!) of bracket expressions: the bracket expressions `[[:<:]]' and `[[:>:]]' match the null string at the beginning and end of a word respectively. A word is defined as a sequence of word characters which is neither preceded nor followed by word characters. A word character is an alnum character (as defined by wctype(3)) or an underscore. This is an extension, compatible with but not specified by POSIX 1003.2, and should be used with caution in software intended to be portable to other systems.
man 7 regex
or info regex
for details on the regex language on your system.
Regards Henrik