On 13 October 2011 14:20, Ivan Voras <ivoras@xxxxxxxxxxx> wrote: > Hello, > > I have a table with a large number of records (millions), on which the > following should be performed: > > 1. Retrieve a set of records by a SELECT query with a WHERE condition > 2. Process these in the application > 3. Delete them from the table > > Now, in the default read-committed transaction isolation, I can't just > use the same WHERE condition with a DELETE in step 3 as it might delete > more records than are processed in step 1 (i.e. phantom read). I've > thought of several ways around it and would like some feedback on which > would be the most efficient: > > #1: Create a giant DELETE WHERE ... IN (...) SQL command for step #3 > with primary keys of records from step 1 - but will it hit a SQL string > length limitation in the database? Is there such a limit (and what is it?) > > #2: Same as #1 but with batching the records to e.g. 1000 at a time, all > in one transaction > > #2: Use a higher isolation level, probably Repeatable Read (PG 9.0) - > but then the question is will this block other clients from inserting > new data into the table? Also, is Repeatable Read enough? > > Any other ideas? CREATE TABLE to_delete ( job_created timestamp NOT NULL DEFAULT now(), fk_id int NOT NULL ); -- Mark for deletion INSERT INTO to_delete (fk_id) SELECT id FROM table WHERE condition = true; -- Process in app SELECT table.* FROM table INNER JOIN to_delete ON (table.id = to_delete.fk_id); -- Delete them DELETE FROM table WHERE id IN (SELECT fk_id FROM to_delete); -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general