Seems promising but could you provide me a reference to PostgreSQL documentation regarding this "a%8=*" feature? Best Daniel -----Original Message----- From: pgsql-general-owner@xxxxxxxxxxxxxx [mailto:pgsql-general-owner@xxxxxxxxxxxxxx] On Behalf Of Marc Mamin Sent: December-12-14 06:41 To: Daniel Begin; 'Tom Lane'; 'Scott Marlowe' Cc: 'Andy Colson'; pgsql-general@xxxxxxxxxxxxxx Subject: Re: Removing duplicate records from a bulk upload (rationale behind selecting a method) >Thank Tom, >I understand that the rationale behind choosing to create a new table >from distinct records is that, since both approaches need full table >scans, selecting distinct records is faster (and seems more straight >forward) than finding/deleting duplicates; Hi, on a large table you may get it faster while using more than one thread. e.g.: select a,b,c into newtable from oldtable where a%8 =0 group by a,b,c; select a,b,c into newtable from oldtable where a%8 =1 group by a,b,c; ... select a,b,c into newtable from oldtable where a%8 =7 group by a,b,c; This will/should use a shared full table scan on oldtable. HTH Marc Mamin > >Best regards, >Daniel > >-----Original Message----- >From: pgsql-general-owner@xxxxxxxxxxxxxx >[mailto:pgsql-general-owner@xxxxxxxxxxxxxx] On Behalf Of Tom Lane >Sent: December-08-14 21:52 >To: Scott Marlowe >Cc: Andy Colson; Daniel Begin; pgsql-general@xxxxxxxxxxxxxx >Subject: Re: Removing duplicate records from a bulk upload >(rationale behind selecting a method) > >Scott Marlowe <scott.marlowe@xxxxxxxxx> writes: >> If you're de-duping a whole table, no need to create indexes, as it's >> gonna have to hit every row anyway. Fastest way I've found has been: > >> select a,b,c into newtable from oldtable group by a,b,c; > >> On pass, done. > >> If you want to use less than the whole row, you can use select >> distinct on (col1, col2) * into newtable from oldtable; > >Also, the DISTINCT ON method can be refined to control which of a set >of duplicate keys is retained, if you can identify additional columns >that constitute a preference order for retaining/discarding dupes. See >the "latest weather reports" example in the SELECT reference page. > >In any case, it's advisable to crank up work_mem while performing this >operation. > > regards, tom lane > > >-- >Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To >make changes to your subscription: >http://www.postgresql.org/mailpref/pgsql-general > > > >-- >Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To >make changes to your subscription: >http://www.postgresql.org/mailpref/pgsql-general > -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general