At 11/28/2006 11:57 AM, Rory Browne wrote:
I didn't mean something quite that simple, or as an absolute solution.
I meant something slightly more advanced, but based on that idea.
From a robot point of view, what do you think is the difference
between the php archives and a porn site?
What eaxctly do you mean by "porn"? Certainly there are websites
that 99 people in a room of 100 would label as pornography, but the
grey area is the killer -- an enormous volume of material that
various people will label "pornographic" and others won't. Whose
opinion will you use in crafting your software?
Only when you define "porn site" with sufficient specificity can you
attempt to write an algorithm to recognize one. When you've
accomplished all that, you can apply similar logic to recognizing
truly good movies, really delightful reads, absolutely delicious
recipes, and undeniably correct political opinions. You'll be a
genius -- providing you can find enough people to agree with you.
If we humans can't reliably define the boundaries of a set, how can
we write software to recognize its (pardon the expression)
members? How would we know when the software was functioning
correctly? Who would judge the accuracy of its findings?
You can work for weeks developing an algorithm to detect "porn" and
it will take others an hour or a day to find the gaps in the
definition. I think any program you could write would have so many
false positives & false negatives that you'd end up having to
manually moderate the process anyway.
Please understand that I love (if I may use that verb ever so
delicately) writing software that parses human expression in search
of patterns and specific content. I love bestowing on a program the
flexibility and grace it requires to enter that messy jungle and
return with a map or a fact. I could write a spider that flagged
websites containing certain words (in English, at least, without
assistance), but I'm not as comfortable with the prospect of writing
a sexual content filter so dependable that I'd be happy to leave it
to guard a gate on its own. I'm sure it would slam the door on many
undeserving people and would happily let in others my client wouldn't
want. For a commercial site hoping to make money from advertisers,
it wouldn't pay to have a near-sighted or illiterate gatekeeper.
Perhaps the only way to do what you're suggesting is to write an
image pattern recognition algorithm so sophisticated that it can
differentiate a photograph of a hand caressing a breast from a
photograph of a breast self-exam.
Or are photos of breast self-exams pornographic?
Yikes,
Paul
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php