Re: link counting

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



At 4/6/2007 06:01 AM, Sebe wrote:
i thought of an idea of counting the number of links to reduce comment spam.


I do this by counting the number of 'http://' instances in the text. You can use a variety of PHP functions:

- substr_count()
- preg_match_all() then count() the result array
- str_split() then count()
- preg_split() then count()

preg_split() is useful if you want to split the text by more than one string; simply separate alternative strings in the pattern with the pipe: '(http://|<a )'

However, in my personal experience, contact form spam links always contain 'http://' but they're not always couched in anchor tags, so I've never found the need to search for more than the one pattern.

substr_count() is case-sensitive so you'll want to make a copy of the message text lowercase using strtolower() to catch all variants of http|HTTP|Http|... substr_count() is probably also faster than the regular expression functions -- not that a difference of microseconds or milliseconds need necessarily concern you if you're not executing many iterations.

I usually set the limit of permissible links to three. Since it's entirely possible that a genuine correspondent might send more than three links someday, I don't throw away suspect messages but instead send them to my own mailbox coded so they're easy to catch and file on receipt; that way I can monitor the health of the system and watch for false positives while still shielding my clients from spam.

Typically I'll display an error message when someone fills out a contact form incorrectly, for example asking them to enter a valid email address. Recently, however, I've stopped warning the sender if they try to send a message that looks like spam because I don't want to tech spammers how to circumvent my criteria. I send the suspect message to my monitoring mailbox instead of to the intended recipient and let the spammers think they've succeeded. I feared at first that this would encourage spammers to use my contact forms more, but it hasn't appeared to have had that effect.

Documentation links:
http://php.net/count
http://php.net/pcre.pattern.syntax
http://php.net/preg_match_all
http://php.net/preg_match_all
http://php.net/preg_split
http://php.net/strtolower
http://php.net/substr-count

Regards,

Paul
__________________________

Paul Novitski
Juniper Webcraft Ltd.
http://juniperwebcraft.com
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux