Mihamina Rakotomandimby <mihamina@xxxxxxxxx> hat am 11. Juni 2012 um 11:12 geschrieben: > Hi all, > > I have a small job ad website, where some poster tend to flood with the > same ad, just in order to be on top of the recent sort. > > To perturb the strict duplication detection (yes it's weak), they add > one or two words that makes difference. > > The result is a duplication of many ads. > > I would like to search for duplicates by looking for ads with 80%-90% > same words and decide they're the same, so that I can group them. > > Of course, putting a limiting mecanism or even a moderation is > scheduled, but I want to process existing first. > > I dont want to use MySQL for indexing, I believe text indexers are best > tools for this: Am I wrong? > > What would you suggest me to process and lookup for duplicates in that > situation? Maybe take a look at http://de.php.net/manual/de/function.similar-text.php http://de.php.net/manual/de/function.levenshtein.php > > -- > RMA. > > -- > PHP General Mailing List (http://www.php.net/) > To unsubscribe, visit: http://www.php.net/unsub.php > Marco Behnke Dipl. Informatiker (FH), SAE Audio Engineer Diploma Zend Certified Engineer PHP 5.3 Tel.: 0174 / 9722336 e-Mail: marco@xxxxxxxxxx Softwaretechnik Behnke Heinrich-Heine-Str. 7D 21218 Seevetal http://www.behnke.biz -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php