Re: how to avoid deadlock on masive update with multiples delete

Tom Lane <tgl@xxxxxxxxxxxxx> · Fri, 05 Oct 2012 11:46:05 -0400

Andres Freund <andres@xxxxxxxxxxxxxxx> writes:
> On Friday, October 05, 2012 05:31:43 PM Tom Lane wrote:
>> There's no guarantee that the planner won't re-sort the rows coming from
>> the sub-select, unfortunately.

> More often than not you can prevent the planner from doing that by putting a 
> OFFSET 0 in the query. Not 100% but better than nothing.

No, that will accomplish exactly nothing.  The ORDER BY is already an
optimization fence.  The problem is that of the several ways the planner
might choose to join the subquery output to the original table, not all
will produce the join rows in the same order as the subquery's result
is.  For instance, when I tried his example I initially got

 Delete on test  (cost=400.88..692.85 rows=18818 width=34)
   ->  Merge Join  (cost=400.88..692.85 rows=18818 width=34)
         Merge Cond: (test.g = x.g)
         ->  Sort  (cost=135.34..140.19 rows=1940 width=10)
               Sort Key: test.g
               ->  Seq Scan on test  (cost=0.00..29.40 rows=1940 width=10)
         ->  Sort  (cost=265.53..270.38 rows=1940 width=32)
               Sort Key: x.g
               ->  Subquery Scan on x  (cost=135.34..159.59 rows=1940 width=32)
                     ->  Sort  (cost=135.34..140.19 rows=1940 width=10)
                           Sort Key: test_1.ctid
                           ->  Seq Scan on test test_1  (cost=0.00..29.40 rows=1940 width=10)

which is going to do the deletes in "g" order, not ctid order;
and then after an ANALYZE I got

 Delete on test  (cost=90.83..120.58 rows=1000 width=34)
   ->  Hash Join  (cost=90.83..120.58 rows=1000 width=34)
         Hash Cond: (test.g = x.g)
         ->  Seq Scan on test  (cost=0.00..16.00 rows=1000 width=10)
         ->  Hash  (cost=78.33..78.33 rows=1000 width=32)
               ->  Subquery Scan on x  (cost=65.83..78.33 rows=1000 width=32)
                     ->  Sort  (cost=65.83..68.33 rows=1000 width=10)
                           Sort Key: test_1.ctid
                           ->  Seq Scan on test test_1  (cost=0.00..16.00 rows=1000 width=10)

which is going to do the deletes in ctid order, but that's an artifact
of using a seqscan on the test table; the order of the subquery's output
is irrelevant, since it got hashed.

> We really need ORDER BY for DML.

Meh.  That's outside the SQL standard (not only outside the letter of
the standard, but foreign to its very conceptual model) and I don't
think the problem really comes up that often.  Personally, if I had to
deal with this I'd use a plpgsql function (or DO command) that does

	FOR c IN SELECT ctid FROM table WHERE ... ORDER BY ... LOOP
		DELETE FROM table WHERE ctid = c;
	END LOOP;

which is not great but at least it avoids client-to-server traffic.

Having said all that, are we sure this is even a deletion-order
problem?  I was wondering about deadlocks from foreign key references,
for instance.

			regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance