Le 2013-09-25 à 09:04, Merlin Moncure a écrit : > On Tue, Sep 24, 2013 at 10:19 PM, François Beausoleil > <francois@xxxxxxxxxxx> wrote: >> Hi all! >> >> I import many, many rows of data into a table, from three or more computers, 4 times per hour. I have a primary key, and the query I use to import the data is supposed to dedup before inserting, but I still get primary key violations. >> >> The import process is: >> >> * Load CSV data into temp table >> * INSERT INTO dest SELECT DISTINCT (pkey) FROM temp WHERE NOT EXISTS(temp.pkey = dest.pkey) >> >> I assumed (erroneously) that this would guarantee no duplicate data could make it into the database. The primary key violations are proving me wrong. > > Right. Transaction A and B are interleaved: they both run the same > check against the same id at the same time. Both checks pass because > neither transaction is committed. This problem is not solvable by > adjusting the isolation level. > > Typical solutions might be to: > A. Lock the table while inserting > B. Retry the transaction following an error. > C. Import the records to a staging table, then copy the do the > deduplication check when moving from the staging table You mean: COPY TO temp FROM stdin; INSERT INTO staging SELECT DISTINCT FROM temp; INSERT INTO production SELECT DISTINCT FROM staging; DELETE FROM staging WHERE inserted into production; Right? I assume I would not have the primary key constraint on the staging table. And steps 3 and 4 can be done in a single statement using UPDATE/DELETE CTE. Thanks! François
<<attachment: smime.p7s>>