Andres Freund <andres@xxxxxxxxxxxxxxx> writes: > On Friday, October 05, 2012 05:31:43 PM Tom Lane wrote: >> There's no guarantee that the planner won't re-sort the rows coming from >> the sub-select, unfortunately. > More often than not you can prevent the planner from doing that by putting a > OFFSET 0 in the query. Not 100% but better than nothing. No, that will accomplish exactly nothing. The ORDER BY is already an optimization fence. The problem is that of the several ways the planner might choose to join the subquery output to the original table, not all will produce the join rows in the same order as the subquery's result is. For instance, when I tried his example I initially got Delete on test (cost=400.88..692.85 rows=18818 width=34) -> Merge Join (cost=400.88..692.85 rows=18818 width=34) Merge Cond: (test.g = x.g) -> Sort (cost=135.34..140.19 rows=1940 width=10) Sort Key: test.g -> Seq Scan on test (cost=0.00..29.40 rows=1940 width=10) -> Sort (cost=265.53..270.38 rows=1940 width=32) Sort Key: x.g -> Subquery Scan on x (cost=135.34..159.59 rows=1940 width=32) -> Sort (cost=135.34..140.19 rows=1940 width=10) Sort Key: test_1.ctid -> Seq Scan on test test_1 (cost=0.00..29.40 rows=1940 width=10) which is going to do the deletes in "g" order, not ctid order; and then after an ANALYZE I got Delete on test (cost=90.83..120.58 rows=1000 width=34) -> Hash Join (cost=90.83..120.58 rows=1000 width=34) Hash Cond: (test.g = x.g) -> Seq Scan on test (cost=0.00..16.00 rows=1000 width=10) -> Hash (cost=78.33..78.33 rows=1000 width=32) -> Subquery Scan on x (cost=65.83..78.33 rows=1000 width=32) -> Sort (cost=65.83..68.33 rows=1000 width=10) Sort Key: test_1.ctid -> Seq Scan on test test_1 (cost=0.00..16.00 rows=1000 width=10) which is going to do the deletes in ctid order, but that's an artifact of using a seqscan on the test table; the order of the subquery's output is irrelevant, since it got hashed. > We really need ORDER BY for DML. Meh. That's outside the SQL standard (not only outside the letter of the standard, but foreign to its very conceptual model) and I don't think the problem really comes up that often. Personally, if I had to deal with this I'd use a plpgsql function (or DO command) that does FOR c IN SELECT ctid FROM table WHERE ... ORDER BY ... LOOP DELETE FROM table WHERE ctid = c; END LOOP; which is not great but at least it avoids client-to-server traffic. Having said all that, are we sure this is even a deletion-order problem? I was wondering about deadlocks from foreign key references, for instance. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance