Re: How to idenity duplicate rows

"Jan Cruz" <malebug@xxxxxxxxx> · Mon, 20 Mar 2006 10:08:09 +0800

As for deleting all but one row in a duplicated group, you're going to
have to get at them by the oid or ctid columns perhaps.

The other idea is to run CREATE TABLE newtable AS SELECT DISTINCT * FROM
oldtable;.

I believe getting oid and/or ctid is not possible since it would not display/get duplicate records

in a "HAVING CLAUSE" since their oid/ctid wouldn't be the same.

And creating a newtable and use SELECT DISTINCT isn't an acceptable
idea to the audit team when you have millions of records in a table in
production server when you only need to remove let say 70 records from
that table.

I tried another approach where I queried and insert the duplicate record (35 records) into

a new/temporary table. Then I created a stored procedure something like this:

--START

DECLARE

foo table;

BEGIN

       FOR foo IN 

         SELECT * FROM newtable

       LOOP

          DELETE FROM oldtable

          where oldtable.field1 = foo.field1

           and   oldtable.field2 = foo.field2

           and   oldtable.field3 = foo.field3

           and   oldtable.field4 = foo.field4

           ...';

       END LOOP;

END;

-- END

Problem with this approach I got 35 duplicate records (count = 2) 

from the new table and delete only 20 records from the oldtable. 

If I am not mistaken it should delete 70 records.

I wonder