At 4/6/2007 06:01 AM, Sebe wrote:
i thought of an idea of counting the number of links to reduce comment spam.
I do this by counting the number of 'http://' instances in the
text. You can use a variety of PHP functions:
- substr_count()
- preg_match_all() then count() the result array
- str_split() then count()
- preg_split() then count()
preg_split() is useful if you want to split the text by more than one
string; simply separate alternative strings in the pattern with the
pipe: '(http://|<a )'
However, in my personal experience, contact form spam links always
contain 'http://' but they're not always couched in anchor tags, so
I've never found the need to search for more than the one pattern.
substr_count() is case-sensitive so you'll want to make a copy of the
message text lowercase using strtolower() to catch all variants of
http|HTTP|Http|... substr_count() is probably also faster than the
regular expression functions -- not that a difference of microseconds
or milliseconds need necessarily concern you if you're not executing
many iterations.
I usually set the limit of permissible links to three. Since it's
entirely possible that a genuine correspondent might send more than
three links someday, I don't throw away suspect messages but instead
send them to my own mailbox coded so they're easy to catch and file
on receipt; that way I can monitor the health of the system and watch
for false positives while still shielding my clients from spam.
Typically I'll display an error message when someone fills out a
contact form incorrectly, for example asking them to enter a valid
email address. Recently, however, I've stopped warning the sender if
they try to send a message that looks like spam because I don't want
to tech spammers how to circumvent my criteria. I send the suspect
message to my monitoring mailbox instead of to the intended recipient
and let the spammers think they've succeeded. I feared at first that
this would encourage spammers to use my contact forms more, but it
hasn't appeared to have had that effect.
Documentation links:
http://php.net/count
http://php.net/pcre.pattern.syntax
http://php.net/preg_match_all
http://php.net/preg_match_all
http://php.net/preg_split
http://php.net/strtolower
http://php.net/substr-count
Regards,
Paul
__________________________
Paul Novitski
Juniper Webcraft Ltd.
http://juniperwebcraft.com
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php