Is there a way to consider white space in tri-grams? That would
allow for better matches of phrases. For example, currently "one two three" and "three two one" would generate the same tri-grams ({ o, t, on, th, tw,ee ,hre,ne ,one,ree,thr,two,wo }), and the distance of "one two four" will be the same for both of them. The query: SELECT phrase Returns: phrase |input |similarity |word_similarity | But surely "one two four" is more similar to "one two three" than
to "three two one". Any thoughts? Igal Sapir
|