INSERT OR UPDATE

"Julian Scarfe" <julian@xxxxxxxxxxx> · Mon, 2 Jan 2006 19:07:40 -0000

I'm keeping config information for an application in a series of related 
tables.  I'd like a command that INSERTs data if it's new, or UPDATEs it if 
the key is duplicated.

Copying the config info from one database to another virgin installation is 
easy, of course. I can just use pg_dump on the original db1 to create the 
DDL and run it into the new database db2.

The problem comes when I've made changes in db1 that I want to copy to db2, 
but db2 may itself have additional entries not present in db1.  I'd like to 
update the entries in db2 that have equivalent entries in db1 with their db1 
values, without destroying the rest of the data in db2.

Here's an example:

db1 is created as follows:

create table a (
a_id text PRIMARY KEY,
param_b text,
param_c text
);

create table d (
d_id text PRIMARY KEY,
a_id text references a(a_id) ON UPDATE CASCADE,
param_e text,
param_f text
);

insert into a values ('a1', 'b1', 'c1');
insert into a values ('a2', 'b2', 'c2');

insert into d values ('d1', 'a1', 'e1', 'f1');
insert into d values ('d2', 'a2', 'e2', 'f2');
insert into d values ('d3', 'a1', 'e3', 'f3');

The databases have identical schema:
$ pg_dump -s db1 | psql db2

and provided db2 is empty, I can just copy the contents across:
$ pg_dump -ad db1 > db1.config
$ psql db2 < db1.config

So far so good.

But now db2 gets some additional entries:
db2=# insert into a values ('a3', 'b3', 'c3');
db2=# insert into d values ('d4', 'a1', 'e4', 'f4');
db2=# insert into d values ('d5', 'a2', 'e5', 'f5');

Meanwhile, I make some config changes in db1:
db1=# update a set param_b = 'b1 new' where a_id = 'a1';
db1=# update d set param_e = 'e1 new' where d_id = 'd1';
db1=# update d set param_f = 'f2 new' where d_id = 'd2';

So:
$ pg_dump -ad db1 > db1.config
and db1.config now contains
INSERT INTO a VALUES ('a2', 'b2', 'c2');
INSERT INTO a VALUES ('a1', 'b1 new', 'c1');
INSERT INTO d VALUES ('d3', 'a1', 'e3', 'f3');
INSERT INTO d VALUES ('d1', 'a1', 'e1 new', 'f1');
INSERT INTO d VALUES ('d2', 'a2', 'e2', 'f2 new');

I want to update the data in db2 to reflect the values in db1. I can't 
truncate the tables in db2 because I'd lose the additional entries (a3, d4, 
d5).

But of course when I do...
$ psql db2 < db1.config
... I get ...
ERROR:  duplicate key violates unique constraint "a_pkey"
ERROR:  duplicate key violates unique constraint "d_pkey"
...and the a1, d1, d2 rows are not updated to match db1.

What I'd really like is to be able to do:

INSERT OR UPDATE INTO a VALUES ('a2', 'b2', 'c2');
INSERT OR UPDATE INTO a VALUES ('a1', 'b1 new', 'c1');
INSERT OR UPDATE INTO d VALUES ('d3', 'a1', 'e3', 'f3');
INSERT OR UPDATE INTO d VALUES ('d1', 'a1', 'e1 new', 'f1');
INSERT OR UPDATE INTO d VALUES ('d2', 'a2', 'e2', 'f2 new');

so that the rest of the row is treated as an UPDATE if the primary key is a 
duplicate.

Of course I can write something at the application level to examine each row 
and take appropriate action. But it feels like this may be a commonly 
encountered problem for which there may be a database-level solution.  Am I 
missing something obvious?

Thanks

Julian Scarfe