Insert or Replace or \copy (bulkload)

Ow Mun Heng <Ow.Mun.Heng@xxxxxxx> · Tue, 14 Aug 2007 17:33:39 +0800

I'm seeing an obstacle in my aim to migrate from mysql to PG mainly from
the manner in which PG handles duplicate entries either from primary
keys or unique entries.

Data is taken from perl DBI into (right now) CSV based files to be used
via psql's \copy command to insert into the table.

In MySql, I was using mysqlimport --replace which essentially provided
the means to load data into the DB, while at the same time, would
provide the necessary logic to replace the entire row if there was a
duplicate instead of dying.

Under PG, I've yet to found something similar to this functionality and
searches via google has uncovered that this is one thing which _still_
has not found its way into PG. (anyone knows why? Standards?)

Anyway, I found a workaround, but, to me, even though it provides a
means to an end, it still looks like it'll end up as a maintenance
nightmare each time a table has any additional columns added.

Solution is taken from this site:

http://www.pointwriter.com/blog/index.php?/archives/6-REPLACE-in-PostgreSQL.html

[..snip..]
For the sake of this example we'll use a very simple table called 'map'
with a two fields: 'key' and 'value'. Not surprisingly, key is the
primary key.

Let's further assume that every insert into this table should actually
be a replace. Then all we need is this simple statement and we're done:

CREATE RULE "replace_map" AS
  ON INSERT TO "map_example"
  WHERE
    EXISTS(SELECT 1 FROM map_example WHERE key=NEW.key)
  DO INSTEAD
     (UPDATE map_example SET value=NEW.value WHERE key=NEW.key)

[...snip...]

Populate with some test data
==========================
XMMS=> insert into map_example(key,value,value2) values (1,1,1);
INSERT 0 1
XMMS=> insert into map_example(key,value,value2) values (2,2,2);
INSERT 0 1
XMMS=> select * from map_example
;
 key | value | value2
-----+-------+--------
   1 |     1 |      1
   2 |     2 |      2
(2 rows)

Try to insert some duplicate data
+=================================
XMMS=>insert into map_example(key,value,value2) values (2,20,20);
INSERT 0 0
XMMS=> select * from map_example
;
 key | value | value2
-----+-------+--------
   1 |     1 |      1
   2 |    20 | ====>2 <===== NOT Updated
(2 rows)

Add the new rule once we added the new columns
==============================================
XMMS=>CREATE OR REPLACE RULE "replace_map" AS
  ON INSERT TO "map_example"
  WHERE
    EXISTS(SELECT 1 FROM map_example WHERE key=NEW.key)
  DO INSTEAD
     (UPDATE map_example SET value=NEW.value,value2=NEW.value2 WHERE
key=NEW.key)
;

XMMS=> insert into map_example(key,value,value2) values (2,20,20);
INSERT 0 0
XMMS=> select * from map_example
;
 key | value | value2
-----+-------+--------
   1 |     1 |      1
   2 |    20 |     20
(2 rows)

Can anyone tell me if this won't turn out to be a maintenance nightmare?
So, the pertinent question is, is there a better mousetrap available?

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend