Search squid archive

RE: [squid-users] How can I ignore 'form inputs' on a urlpath_reg ex ?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 4 May 2005, Chris Robertson wrote:

For the record, something like:

^[^?]*sex[^?]*

should fit the bill(1), but would be an absolute CPU hog.  Not to mention
that you would have to do something to match other "bad" words (e.g. one
regex for each word, or a single really long regex utilizing the "or"
operator).

Chris

(1) If my logic is correct, this statement should translate to "From the
beginning of the line, match anything but a question mark zero or more
times, the letters s, e and x all together and anything but a question mark
zero or more times".  Not pretty, but it would do what you are asking.

Correct. But you should also realise that the letters s e x is not equal to the word sex. The letters s e x is part of many other words not at all related to sex (the most common example is the city Sussex).


GNU regex has magic patterns for matching word boundaries which can do wonders at improving the accuracy of word based patters.

       There  are two special cases(!) of bracket expressions: the
       bracket expressions `[[:<:]]' and `[[:>:]]' match the null
       string at the beginning and end of a word respectively.  A
       word is defined as a sequence of word characters which is
       neither preceded nor followed by word characters. A word
       character is an alnum character (as defined by wctype(3)) or
       an underscore.  This is an extension, compatible with but not
       specified by POSIX 1003.2, and should be used with caution in
       software intended to be portable to other systems.

   man 7 regex

or
   info regex

for details on the regex language on your system.

Regards
Henrik

[Index of Archives]     [Linux Audio Users]     [Samba]     [Big List of Linux Books]     [Linux USB]     [Yosemite News]

  Powered by Linux