Csaba Nagy wrote: > On Tue, 2007-07-24 at 19:06, Stephan Szabo wrote: >>> Unfortunately I don't think this will work. Multiple backends will happily >>> pick up the same ctid in their selects and then try to delete the same >>> records. >> I'm pretty sure he said that the batch processing (and the delete) would >> only be happening from one backend at a time, no concurrency on that >> portion, merely concurrency with the large volume of inserts. > > Yes it's exactly like that... only it also happened accidentally that 2 > batch processes started at the same time, and they should not double > process the data, nor loose some of it. The above scheme is OK with that > too... but the array version from Tom is even better :-) > > Regarding the proposed mark/process/delete version, we've done it that > way, and we always managed to get some corner case which lost us data... > so even if it's possible to do it well, it's definitely not easy. The > delete/copy/process private data version is much safer, and it actually > can be done in one transaction to assure crash safety. After reading through this thread, I have an idea that should accomplish what I believe are your 3 main goals (avoid any negative performance impact on the user's inserts, do not lose any data from those inserts, and do not double process any data) and possibly improve performance (I'm not sure what the overhead is for triggers, so there may not be any benefit). One-time changes that must be done before running the batch process: A. Create a persistent table (I'll refer to this table as "stage") to hold records that are pulled from the table that the user data is inserted into (I'll refer to this table as "user"). This table will have one extra column (say "orig_ctid") of type tid. B. Remove the delete trigger from the user table. The actual batch process: 1. Start an explicit transaction with serializable isolation level 2. Get an exclusive lock on the stage table. This will prevent any other batch processes from running concurrently. The serializable isolation level ensures that if a given batch process has to wait for the lock, it will see all changes made by the prior batch process run. 3. Select records (including ctid) from user with a limit clause and insert directly into stage: insert into stage select *, ctid as orig_ctid from user limit 10; 4. Remove records from user that were just inserted into stage (not sure of performance here, as it's not a self-join): delete from user where ctid = any (array(select orig_ctid from stage)); 5. Continue normal processing of records in the stage table. 6. Truncate the stage table. 7. Commit the transaction. This ensures all data is processed (and only processed once). With all that said, this isn't something I've actually done in PG. I've done similar things in other other databases, so the concept should be sound. Hope this helps. Andrew ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster