Delete/update with limit

Csaba Nagy <nagy@xxxxxxxxxxxxxx> · Mon, 23 Jul 2007 17:56:58 +0200

Hi all,

This subject was touched a few times in the past, I looked into the
archives... the result is invariably key developers saying such a
feature is unsafe because the result is unpredictable, while the people
requesting is saying it is OK that way, it is expected... but no
compelling use case for it.

OK, I have one...

We have here quite a few processes which collect user input, and put
them in "batch" tables, which in turn are processed regularly and update
other tables with summary data, reject invalid records, etc.

The insertions are unpredictable, they can happen any time and any of
them in parallel, they are user input... and they must be very fast,
it's our user experience at stake. 

The batch processing is done by a single periodical process. Now we had
a few attempts of making this parallelism safe enough so we don't loose
some of the concurrent input while we do the processing step, while
still keeping minimal overhead in the table. The end result was a scheme
where the batch processor deletes from the table and a delete trigger
puts the deleted rows into a temporary table, and then the processor can
do with that private data anything it pleases without interfering with
the inserts (the processing is actually quite complex on occasions).

This works fine in terms of correctness, however it turns out to be a
problem with high bursts of incoming data, or when the processor is not
running for a while and a lot of data is accumulating... then we have
lots of data to process at once, which leads to long running
transactions (the whole thing runs in one transaction) and worse,
connection timeouts.

On other databases, it is possible to limit the delete to a maximum
number of rows to be deleted. This way we can limit the size of one
batch with minimal overhead...

In postgres we're currently not chunking, due to the fact that the code
to do it is simply overly contorted and inefficient compared to the
other DBs we use. At least all the solutions we could think of to do the
chunking in a safe way while the inserts are running in parallel,
without disturbing them, have invariably resulted in overly complicated
code compared to the simple delete with limit + delete trigger solution
we have for the other DBs.

Now I don't put too much hope I can convince anybody that the limit on
the delete/update commands has valid usage scenarios, but then can
anybody help me find a good solution to chunk-wise process such a buffer
table where insert speed is the highest priority (thus no indexes, the
minimum of fields), and batch processing should still work fine with big
table size, while not impacting at all the inserts, and finish in short
time to avoid long running transactions ? Cause I can't really think of
one... other than our scheme with the delete with limit + trigger +
private temp table thing.

Cheers,
Csaba.