Search Postgresql Archives

Re: Deduplication and transaction isolation level

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Le 2013-09-25 à 09:04, Merlin Moncure a écrit :

> On Tue, Sep 24, 2013 at 10:19 PM, François Beausoleil
> <francois@xxxxxxxxxxx> wrote:
>> Hi all!
>> 
>> I import many, many rows of data into a table, from three or more computers, 4 times per hour. I have a primary key, and the query I use to import the data is supposed to dedup before inserting, but I still get primary key violations.
>> 
>> The import process is:
>> 
>> * Load CSV data into temp table
>> * INSERT INTO dest SELECT DISTINCT (pkey) FROM temp WHERE NOT EXISTS(temp.pkey = dest.pkey)
>> 
>> I assumed (erroneously) that this would guarantee no duplicate data could make it into the database. The primary key violations are proving me wrong.
> 
> Right.  Transaction A and B are interleaved: they both run the same
> check against the same id at the same time.  Both checks pass because
> neither transaction is committed.  This problem is not solvable by
> adjusting the isolation level.
> 
> Typical solutions might be to:
> A. Lock the table while inserting
> B. Retry the transaction following an error.
> C. Import the records to a staging table, then copy the do the
> deduplication check when moving from the staging table

You mean:

COPY TO temp FROM stdin;
INSERT INTO staging SELECT DISTINCT FROM temp;
INSERT INTO production SELECT DISTINCT FROM staging;
DELETE FROM staging WHERE inserted into production;

Right? I assume I would not have the primary key constraint on the staging table.

And steps 3 and 4 can be done in a single statement using UPDATE/DELETE CTE.

Thanks!
François

<<attachment: smime.p7s>>


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux