Re: Swear filter ideas

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



At 12:35 PM +0200 6/28/07, Tijnema wrote:
On 6/28/07, Richard Davey <rich@xxxxxxxxxxxxx> wrote:
Hi all,

Just wanting to pick your collective brains on this one:

How do you go about implementing a swear / bad-word filter in PHP?

Reasons for needing one aside, I'm just wondering if you favour a
regexp, a substr_count, or what? Do you like to *** out the bad words,
or just error back to the user? (I guess that is application
specific).

There are always ways to circumvent them, but I'm still curious to
know how you prefer to handle it.

Cheers,

Rich

This is a really though thing to implement, let's say you don't want
to the word "ass" in your message coming from the user, and the
message contains this:

Hi, in the archive I attached is a picture of my ass, the password is abcdef.

The word ass is not wanted, so you *** it, but if you do that with a
regexp or such, then password would become p***word, or you want to
check only for real words (so spaces on both sides), but then somebody
would write !ass! or something like that.


Something like that could be taken care of with a regexp like

   $CleansedLine = preg_replace('/\bass\b/i', '***', $Line);

The \b matches a word boundary, i means case insensitive. You'd have to loop through your BadWord list for each line:

   foreach ($BadWords as $BW) {
      $CleansedLine = preg_replace("/\b$BW\b/i", '***', $Line);
   }

Your badword list should include variants like A55 for ASS, etc. Recognize that you aren't going to aren't going to catch all the alternatives, but this should get most of them. And of course, if you're talking about donkeys, ass is legitimate :). If you really need to filter all profanity the only solution I know of is moderation. If you want to count the number of badwords and reject based on reaching a certain threshold, you could do

   $SwearCount = 0;
   foreach ($BadWords as $BW) {
      $SwearCount  += preg_match_all("/\b$BW\b/i", $Line);
      $CleansedLine = preg_replace("/\b$BW\b/i", '***', $Line);
   }

#  In PHP 5.1, apparently preg_replace can return a match count,
#  eliminating the need for a separate preg_match_all

   if ($SwearCount > SWEAR_THRESHOLD) {
      echo 'You %$#&%@, who do you $#@^ing think you ***@~+-!!ing are ??';
   }

	- st&$#%!@#eve

--
+--------------- my people are the people of the dessert, ---------------+
| Steve Edberg                                http://pgfsun.ucdavis.edu/ |
| UC Davis Genome Center                            sbedberg@xxxxxxxxxxx |
| Bioinformatics programming/database/sysadmin             (530)754-9127 |
+---------------- said t e lawrence, picking up his fork ----------------+

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux