configuring queries for concurrent updates

Robert Poor <rdpoor@xxxxxxxxx> · Fri, 22 Jun 2012 23:33:02 -0700

[std_disclaimer]I'm not a DBA and I'm running PostgreSQL on a
quad-core Intel laptop.  You may read on after you stop
laughing...[/std_disclaimer]

I've written a version of UPSERT designed to import a large number of
records.  It works in two steps: it UPDATEs incumbent records that
share common keys with the new records, then INSERTs new records that
do not share common keys with the incumbent records.  The gist of it
is:

SAVEPOINT ...;
-- update records from subquery that share common keys with tbl
UPDATE tbl
  SET f1 = X.f1, f2 = X.f2, ...
  FROM (subquery) AS X
 WHERE tbl.k1 = X.k1 AND tbl.k2 = X.k2 AND ...;
-- insert records from subquery that do NOT share common keys with tbl
    INSERT INTO tbl (f1, f2, ...)
         SELECT Y.f1, Y.f2, ...
           FROM (subquery) AS Y
LEFT OUTER JOIN tbl
             ON tbl.k1 = Y.k1 AND tbl.k2 = Y.k2 AND ...
          WHERE tbl.id IS NULL;
RELEASE SAVEPOINT ...;

The (subquery) frequently generates 5000 records for update/insert.
Three Unix processes are running the import process concurrently, all
updating/inserting records into a common table.

Most of the time this works, but I've observed examples where an
import process will
* get a "PG::Error: ERROR:  current transaction is aborted, commands
ignored until end of transaction block"
* get a "PG::Error: ERROR:  deadlock detected"
* go catatonic for several minutes

This leads me to believe that my SAVEPOINT / RELEASE SAVEPOINT is not
the correct way to protect the system from multiprocessing mischief.
I've read Tom Lane's presentation on concurrency
(http://www.postgresql.org/files/developer/concurrency.pdf), but I
haven't been able to figure out the right solution.

[For what it's worth, with the data I'm importing, INSERTs will far
outnumber the UPDATES.]

Since I someday plan to spin this code up on a big system with >> 3
import processes, I'd like to get this right.

Hints and solutions warmly accepted.  Thanks.

- rdp

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general