Re: How to remove duplicate lines but save one of the lines?

"Scott Marlowe" <scott.marlowe@xxxxxxxxx> · Mon, 21 Jul 2008 10:29:10 -0600

On Mon, Jul 21, 2008 at 9:51 AM, A B <gentosaker@xxxxxxxxx> wrote:
>> There is probably a more elegant way of doing it, but  a simple way of doing
>> it ( depending on the size of the table ) could be:
>>
>> begin;
>>
>> insert into foo select distinct * from orig_table;
>> delete from orig_table;
>> insert into orig_table select * from foo;
>>
>> commit;
>
> Just to make it clear to me
> Here foo is a table that I have to create  with the command
> CREATE TABLE foo (....same columns as orig_table);
> ?

If this is a live table with that you can't use that method on, you
can use this generic methodology to get rid of dups.

-- Create test table
smarlowe=# create table main (i int, t text);
CREATE TABLE
smarlowe=# insert into main values (1,'A');
INSERT 0 1
smarlowe=# insert into main values (1,'A');
INSERT 0 1
smarlowe=# insert into main values (3,'B');
INSERT 0 1
smarlowe=# insert into main values (3,'B');
INSERT 0 1
smarlowe=# insert into main values (44,'C');
INSERT 0 1
smarlowe=# insert into main values (44,'C');
INSERT 0 1
smarlowe=# select * from main;
 i  | t
----+---
  1 | A
  1 | A
  3 | B
  3 | B
 44 | C
 44 | C
(6 rows)

Add a new field for an int, set it to a sequence of numbers:

smarlowe=# alter table main add uniq int;
ALTER TABLE
smarlowe=# create sequence t
smarlowe-# ;
CREATE SEQUENCE
smarlowe=# update main set uniq=nextval('t');
UPDATE 6
smarlowe=# select * from main;
 i  | t | uniq
----+---+------
  1 | A |    1
  1 | A |    2
  3 | B |    3
  3 | B |    4
 44 | C |    5
 44 | C |    6
(6 rows)

This query will give us a list of "extra" ids:

smarlowe=# select distinct m1.uniq from main m1 join main m2 on
(m1.t=m2.t and m1.i=m2.i) where m1.uniq > m2.uniq;
 uniq
------
    2
    4
    6
(3 rows)

We use that query as a subselect to a delete:

smarlowe=# begin;
BEGIN
smarlowe=# delete from main where uniq in (select m1.uniq from main m1
join main m2 on (m1.t=m2.t and m1.i=m2.i) where m1.uniq > m2.uniq);
DELETE 3
smarlowe=# select * from main;
 i  | t | uniq
----+---+------
  1 | A |    1
  3 | B |    3
 44 | C |    5
(3 rows)
smarlowe=# commit;
COMMIT

> Is it possible to add a unique constraint to the table, with a
> "delete" option so it will delete duplicates?

It is possible to add a unique constraint.  Having it "delete" rows
automagically is not normal operation, but I'm sure some kind of user
defined trigger could be written to do that.  But if you've got a
unique constraint on a unique set of data, new non-unique entries will
fail to enter.

smarlowe=# create unique index main_t_i on main (t,i);
CREATE INDEX
smarlowe=# insert into main (i,t) values (1,'A');
ERROR:  duplicate key violates unique constraint "main_t_i"