Re: Detecting naughty sites

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



At 11/28/2006 11:57 AM, Rory Browne wrote:
I didn't mean something quite that simple, or as an absolute solution.

I meant something slightly more advanced, but based on that idea.

From a robot point of view, what do you think is the difference
between the php archives and a porn site?


What eaxctly do you mean by "porn"? Certainly there are websites that 99 people in a room of 100 would label as pornography, but the grey area is the killer -- an enormous volume of material that various people will label "pornographic" and others won't. Whose opinion will you use in crafting your software?

Only when you define "porn site" with sufficient specificity can you attempt to write an algorithm to recognize one. When you've accomplished all that, you can apply similar logic to recognizing truly good movies, really delightful reads, absolutely delicious recipes, and undeniably correct political opinions. You'll be a genius -- providing you can find enough people to agree with you.

If we humans can't reliably define the boundaries of a set, how can we write software to recognize its (pardon the expression) members? How would we know when the software was functioning correctly? Who would judge the accuracy of its findings?

You can work for weeks developing an algorithm to detect "porn" and it will take others an hour or a day to find the gaps in the definition. I think any program you could write would have so many false positives & false negatives that you'd end up having to manually moderate the process anyway.

Please understand that I love (if I may use that verb ever so delicately) writing software that parses human expression in search of patterns and specific content. I love bestowing on a program the flexibility and grace it requires to enter that messy jungle and return with a map or a fact. I could write a spider that flagged websites containing certain words (in English, at least, without assistance), but I'm not as comfortable with the prospect of writing a sexual content filter so dependable that I'd be happy to leave it to guard a gate on its own. I'm sure it would slam the door on many undeserving people and would happily let in others my client wouldn't want. For a commercial site hoping to make money from advertisers, it wouldn't pay to have a near-sighted or illiterate gatekeeper.

Perhaps the only way to do what you're suggesting is to write an image pattern recognition algorithm so sophisticated that it can differentiate a photograph of a hand caressing a breast from a photograph of a breast self-exam.

Or are photos of breast self-exams pornographic?

Yikes,
Paul
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux