On Fri, Oct 5, 2012 at 10:46 AM, Tom Lane <tgl@xxxxxxxxxxxxx> wrote: > Andres Freund <andres@xxxxxxxxxxxxxxx> writes: >> On Friday, October 05, 2012 05:31:43 PM Tom Lane wrote: >>> There's no guarantee that the planner won't re-sort the rows coming from >>> the sub-select, unfortunately. > >> More often than not you can prevent the planner from doing that by putting a >> OFFSET 0 in the query. Not 100% but better than nothing. > > No, that will accomplish exactly nothing. The ORDER BY is already an > optimization fence. The problem is that of the several ways the planner > might choose to join the subquery output to the original table, not all > will produce the join rows in the same order as the subquery's result > is. For instance, when I tried his example I initially got > > Delete on test (cost=400.88..692.85 rows=18818 width=34) > -> Merge Join (cost=400.88..692.85 rows=18818 width=34) > Merge Cond: (test.g = x.g) > -> Sort (cost=135.34..140.19 rows=1940 width=10) > Sort Key: test.g > -> Seq Scan on test (cost=0.00..29.40 rows=1940 width=10) > -> Sort (cost=265.53..270.38 rows=1940 width=32) > Sort Key: x.g > -> Subquery Scan on x (cost=135.34..159.59 rows=1940 width=32) > -> Sort (cost=135.34..140.19 rows=1940 width=10) > Sort Key: test_1.ctid > -> Seq Scan on test test_1 (cost=0.00..29.40 rows=1940 width=10) > > which is going to do the deletes in "g" order, not ctid order; > and then after an ANALYZE I got > > Delete on test (cost=90.83..120.58 rows=1000 width=34) > -> Hash Join (cost=90.83..120.58 rows=1000 width=34) > Hash Cond: (test.g = x.g) > -> Seq Scan on test (cost=0.00..16.00 rows=1000 width=10) > -> Hash (cost=78.33..78.33 rows=1000 width=32) > -> Subquery Scan on x (cost=65.83..78.33 rows=1000 width=32) > -> Sort (cost=65.83..68.33 rows=1000 width=10) > Sort Key: test_1.ctid > -> Seq Scan on test test_1 (cost=0.00..16.00 rows=1000 width=10) > > which is going to do the deletes in ctid order, but that's an artifact > of using a seqscan on the test table; the order of the subquery's output > is irrelevant, since it got hashed. > >> We really need ORDER BY for DML. > > Meh. That's outside the SQL standard (not only outside the letter of > the standard, but foreign to its very conceptual model) and I don't > think the problem really comes up that often. Personally, if I had to > deal with this I'd use a plpgsql function (or DO command) that does Can't it be forced like this (assuming it is in fact a vanilla order by problem)? EXPLAIN DELETE FROM test USING (SELECT g FROM test ORDER BY ctid FOR UPDATE) x where x.g = test.g; (emphasis on 'for update') merlin -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance