Re: Text similarity

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





Ashley Sheridan wrote:
On Mon, 2009-09-28 at 12:27 +0200, Merlin Morgenstern wrote:
Hi there,

I am trying to find out similarity between 2 strings. Somehow the similar_text function returns 33% similarity on strings that are not even close and on the other hand it returns 21% on strings that have a matching word.

E.G:

'gemütliche sofas'

Wohngemeinschaften - similarity: 33.333333333333
Sofas & Sessel - similarity: 31.25

I am using this code:
similar_text($data[txt], $categories[$i], $similarity);

Does anybody have an idea why it gives back 33% similarity on the first string?

Thank you for any help,

Merlin


If you think about it, it makes sense.

Taking your three sentences above, 'Wohngemeinschaften' has more
characters similar towards the start of the string (you only have to go
4 characters in to start a match) whereas 'sofas' won't match the source
string until the 12th string in. Also, both test strings have the same
number of characters that match in order, although the ones that match
in 'Wohngemeinschaften' are separated by characters that do not match,
so I'm not sure what bearing this will have.

As noted on the manual page for this function, the similar_text()
function compares without regard to string length, and tends to only
really be accurate enough for larger excerpts of text.

Thanks,
Ash
http://www.ashleysheridan.co.uk




Sounds logical. Is there another function you suggest? I guess this is a standard problem I am having here. I tried it with levenstein, but similar results.

e.g levenstein (smaller = better):
Search for : Stellplatz für Wohnwagen gesucht
Stereoanlagen : 23
Wohnwagen, -mobile : 24
Sonstiges für Baby & Kind - : 25
Steuer & Finanzen - :25

How come stereoanlagen and the others shows up here?

Any idea how I could make this more accurate?

Thank you for any help, Merlin

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux