On Mon, 2009-09-28 at 12:27 +0200, Merlin Morgenstern wrote: > Hi there, > > I am trying to find out similarity between 2 strings. Somehow the > similar_text function returns 33% similarity on strings that are not > even close and on the other hand it returns 21% on strings that have a > matching word. > > E.G: > > 'gemütliche sofas' > > Wohngemeinschaften - similarity: 33.333333333333 > Sofas & Sessel - similarity: 31.25 > > I am using this code: > similar_text($data[txt], $categories[$i], $similarity); > > Does anybody have an idea why it gives back 33% similarity on the first > string? > > Thank you for any help, > > Merlin > If you think about it, it makes sense. Taking your three sentences above, 'Wohngemeinschaften' has more characters similar towards the start of the string (you only have to go 4 characters in to start a match) whereas 'sofas' won't match the source string until the 12th string in. Also, both test strings have the same number of characters that match in order, although the ones that match in 'Wohngemeinschaften' are separated by characters that do not match, so I'm not sure what bearing this will have. As noted on the manual page for this function, the similar_text() function compares without regard to string length, and tends to only really be accurate enough for larger excerpts of text. Thanks, Ash http://www.ashleysheridan.co.uk -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php