Hi, I am looking for an efficient and effective solution to eliminate duplicates in a continuously updated "cumulative" transaction table (no deletions are envisioned as all non-redundant records are important). Below is my situation. I do have a “central” transaction table (A) that is expected to be populated many records at a time by several users simultaneously. I would like to ensure uniqueness in the tuples of this “central” transaction table. On average it is expected that approximately 40% to 60% of the batches of records being presented for persistence to the “central” transaction table to be already existing in the “central” table and hence will be duplicates. This batches will arrive via an insert query from another “central” transaction table (B) that will also be populated by other data generating processes. Table B is expected to contain duplicates. Each client’s records in this table B will share a common value per batch. I intend to prune out duplicates within each of these batches of records but this will not guarantee uniqueness of the records across all the batches. I have thought of the following ideas, but I would like more advice and suggestions and inclusions, including possibilities of using triggers and so on. 1)For each population of a batch of records from a client, perform a join of the the entire table A right join table B on … WHERE A.id IS NULL. Cons: I think it is likely for more than one data insert transactions running at the same time which may result in the transactions not seeing the updates the other(s) are performing and hence not be aware of potential duplicates. Maybe too expensive to perform a join for each batch insert query? 2) Simply write the records of each batch into this table then detect duplicates later (may be at some set time intervals or may be a the end of the clients run) by using count() like this: UPDATE table_a b SET is_redundant=TRUE FROM ( SELECT min(b.id)AS id___b ,potentially_redundant_field FROM table_a b GROUP BY potentially_redundant_field HAVING count(*)>1 )a WHERE b.potentially_redundant_field=a.potentially_redundant_field AND b.id>a.id___b ; 3)A combination of option 1 (if it is not too expensive to perform a join for each batch insert query) and option 2. Assuming many of the records of table_a will already be in memory and so will the incoming batch of records meant for inclusion into table_a, performing a right join may probably not be expensive and will result in less records for table_a (as the potential duplicates to the previously fully persisted records will have been hived-off). This will mean few records for option 2 to process. Allan. -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general