Re: how to avoid deadlock on masive update with multiples delete

Merlin Moncure <mmoncure@xxxxxxxxx> · Fri, 5 Oct 2012 10:55:39 -0500

On Fri, Oct 5, 2012 at 10:46 AM, Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
> Andres Freund <andres@xxxxxxxxxxxxxxx> writes:
>> On Friday, October 05, 2012 05:31:43 PM Tom Lane wrote:
>>> There's no guarantee that the planner won't re-sort the rows coming from
>>> the sub-select, unfortunately.
>
>> More often than not you can prevent the planner from doing that by putting a
>> OFFSET 0 in the query. Not 100% but better than nothing.
>
> No, that will accomplish exactly nothing.  The ORDER BY is already an
> optimization fence.  The problem is that of the several ways the planner
> might choose to join the subquery output to the original table, not all
> will produce the join rows in the same order as the subquery's result
> is.  For instance, when I tried his example I initially got
>
>  Delete on test  (cost=400.88..692.85 rows=18818 width=34)
>    ->  Merge Join  (cost=400.88..692.85 rows=18818 width=34)
>          Merge Cond: (test.g = x.g)
>          ->  Sort  (cost=135.34..140.19 rows=1940 width=10)
>                Sort Key: test.g
>                ->  Seq Scan on test  (cost=0.00..29.40 rows=1940 width=10)
>          ->  Sort  (cost=265.53..270.38 rows=1940 width=32)
>                Sort Key: x.g
>                ->  Subquery Scan on x  (cost=135.34..159.59 rows=1940 width=32)
>                      ->  Sort  (cost=135.34..140.19 rows=1940 width=10)
>                            Sort Key: test_1.ctid
>                            ->  Seq Scan on test test_1  (cost=0.00..29.40 rows=1940 width=10)
>
> which is going to do the deletes in "g" order, not ctid order;
> and then after an ANALYZE I got
>
>  Delete on test  (cost=90.83..120.58 rows=1000 width=34)
>    ->  Hash Join  (cost=90.83..120.58 rows=1000 width=34)
>          Hash Cond: (test.g = x.g)
>          ->  Seq Scan on test  (cost=0.00..16.00 rows=1000 width=10)
>          ->  Hash  (cost=78.33..78.33 rows=1000 width=32)
>                ->  Subquery Scan on x  (cost=65.83..78.33 rows=1000 width=32)
>                      ->  Sort  (cost=65.83..68.33 rows=1000 width=10)
>                            Sort Key: test_1.ctid
>                            ->  Seq Scan on test test_1  (cost=0.00..16.00 rows=1000 width=10)
>
> which is going to do the deletes in ctid order, but that's an artifact
> of using a seqscan on the test table; the order of the subquery's output
> is irrelevant, since it got hashed.
>
>> We really need ORDER BY for DML.
>
> Meh.  That's outside the SQL standard (not only outside the letter of
> the standard, but foreign to its very conceptual model) and I don't
> think the problem really comes up that often.  Personally, if I had to
> deal with this I'd use a plpgsql function (or DO command) that does

Can't it be forced like this (assuming it is in fact a vanilla order
by problem)?

EXPLAIN DELETE FROM test USING (SELECT g FROM test ORDER BY
ctid FOR UPDATE) x where x.g = test.g;

(emphasis on 'for update')

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance