On Mon, 2016-03-28 at 20:23 +0100, Geoff Winkless wrote: > So I accept that when using MIN(sc_id) against scdate it makes > statistical sense to use the sc_id index for a reasonable percentage > of the full range of scdate, unless we know in advance that scdate is > closely correlated to sc_id (because using MIN means we can stop > immediately we hit a value). > > However I'm now finding a similar problem when using a multi-table > DELETE, where the same obviously can't apply. > > This query: > > DELETE FROM pa USING legs WHERE legs.scdate BETWEEN 20160220 AND > 20160222 AND legs.sc_id=pa.sc_id; > > does what one would hope: > > Delete on pa (cost=0.99..705406.24 rows=36570 width=12) > -> Nested Loop (cost=0.99..705406.24 rows=36570 width=12) > -> Index Scan using legs_scdate_idx on legs > (cost=0.43..1171.88 rows=14458 width=10) > Index Cond: ((scdate >= 20160220) AND (scdate <= > 20160222)) > -> Index Scan using pa_pkey on pa (cost=0.56..48.33 > rows=38 width=10) > Index Cond: (sc_id = legs.sc_id) > > > However as soon as I add an extra test for field1 IS NULL, it > apparently goes insane: > > Delete on pa (cost=577260.90..626296.46 rows=23732 width=12) > (actual time=41870.770..41870.770 rows=0 loops=1) > -> Hash Join (cost=577260.90..626296.46 rows=23732 width=12) > (actual time=37886.396..41315.668 rows=44960 loops=1) > Hash Cond: (legs.sc_id = pa.sc_id) > -> Index Scan using legs_scdate_idx on legs > (cost=0.43..1171.88 rows=14458 width=10) (actual time=0.030..13.667 > rows=21281 loops=1) > Index Cond: ((scdate >= 20160220) AND (scdate <= > 20160222)) > -> Hash (cost=481691.12..481691.12 rows=5497868 width=10) > (actual time=37805.756..37805.756 rows=4875870 loops=1) > Buckets: 131072 Batches: 64 Memory Usage: 4311kB > -> Seq Scan on pa (cost=0.00..481691.12 rows=5497868 > width=10) (actual time=0.008..35869.304 rows=4875870 loops=1) > Filter: (field1 IS NULL) > Rows Removed by Filter: 2688634 > Planning time: 0.447 ms > Execution time: 41870.832 ms > > Running ANALYZE makes no difference. > > Table pa has 7522676 rows, 4834042 of which have field1 NULL, so it's > absolutely not reasonable to expect this to be an optimal strategy. > > Any suggestions as to how I can improve this query? > > Thanks :) > > Geoff What does:- DELETE FROM pa WHERE pa.field1 IS NULL AND pa.sc_id IN (SELECT legs.sc_id FROM legs WHERE legs.scdate BETWEEN 20160220 AND > 20160222) give as a cost when you run ANALYZE over it? HTH Rob -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general