On Wed, Apr 25, 2012 at 8:34 PM, <hari.fuchs@xxxxxxxxx> wrote: > Andreas <maps.on@xxxxxxx> writes: > >> How would I group the table so that it shows groups that have >> similarity () > x ? >> >> Lets say the table looks like this: >> >> id, txt >> 1, aa1 >> 2, bb1 >> 3, cc1 >> 4, bb2 >> 5, bb3 >> 6, aa2 >> ... >> >> How would a select look like that shows: >> >> id, txt, group_id >> 1, aa1, 1, >> 6, aa2, 1, >> 2, bb1, 2, >> 4, bb2, 2, >> 5, bb3, 2, >> 3, cc1, 3 > Hey guys. I have a similar problem and I tried a couple of ways to solve this including the window function described in the answer to the original poster in this thread. The problem I am having is that even with a trigam index and a table with only 80,000 records the query takes forever to run. In both cases I ended the query manually and have no idea how long it would actually take to run. I have included the two queries below and am hoping somebody can give me a pointer on how to accomplish with a query that runs. Query 1 WITH grp (t1, id, t2) AS ( SELECT t1.raw_data, t1.id, t2.raw_data FROM schema.a t1 LEFT JOIN schema.a t2 ON t2.raw_data > t1.raw_data WHERE t2.raw_data IS NULL OR similarity(t1.raw_data, t2.raw_data) > .75 ) SELECT t1, min(id) FROM ( SELECT t1, id FROM grp UNION ALL SELECT t2, id FROM grp WHERE t2 IS NOT NULL ) dummy GROUP BY t1 ORDER BY t1 query 2 select similarity(a.raw_data,b.raw_data),* from schema.a a, schema.a b where similarity(a.raw_data,b.raw_data) > .75 and a.id != b.id -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general