On Thu, Feb 6, 2014 at 7:41 AM, Alban Hertroys <haramrae@xxxxxxxxx> wrote: > On 6 February 2014 16:18, alexandros_e <alexandros.ef@xxxxxxxxx> wrote: >> Is there a way in SQL or PostgreSQL in general to group by values than are >> not exactly the same but are quite similar (like 'ABC' and 'ABCD') based on >> some distance function (levenshtein for example) if the distance is within >> some threshold (i.e., 1) > > Perhaps there is: You can calculate the levenshtein distance between > those values using a self-join and then GROUP BY the result of that > expression and limit the results with HAVING. > > For example: > SELECT foo1.ID, foo1.G1, foo1.T1, levenshtein(foo1.T1, foo2.T1) > FROM foo foo1 > INNER JOIN foo foo2 ON (foo2.ID = foo1.ID AND foo2.G1 = foo1.G1) > GROUP BY foo1.ID, foo1.G1, foo1.T1, levenshtein(foo1.T1, foo2.T1) > HAVING levenshtein(foo1.T1, foo2.T1) > 1 >From my understanding of the question, probably, adding another levenshtein distance to some base value will make more sense: GROUP BY levenshtein(foo1.T1, foo2.T1), levenshtein(foo1.T1, 'A') Though, it looks like a clusterization task for me, and therefore I would recommend OP to look at the PL/R http://www.joeconway.com/plr/doc/index.html. -- Kind regards, Sergey Konoplev PostgreSQL Consultant and DBA http://www.linkedin.com/in/grayhemp +1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979 gray.ru@xxxxxxxxx -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general