What does:-
DELETE FROM pa
WHERE pa.field1 IS NULL
AND pa.sc_id IN (SELECT legs.sc_id FROM legs
WHERE legs.scdate BETWEEN 20160220 AND > 20160222)
give as a cost when you run ANALYZE over it?
Thanks for the suggestion.
It's a
pproximately the same.
Delete on pa (cost=1463.31..493321.89 rows=187833 width=12) (actual time=41539.174..41539.174 rows=0 loops=1)
-> Hash Join (cost=1463.31..493321.89 rows=187833 width=12) (actual time=41539.172..41539.172 rows=0 loops=1)
Hash Cond: (pa.sc_id = legs.sc_id)
-> Seq Scan on pa (cost=0.00..480888.83 rows=2899078 width=10) (actual time=0.010..40866.049 rows=2591264 loops=1)
Filter: (field1 IS NULL)
Rows Removed by Filter: 4931412
-> Hash (cost=1321.48..1321.48 rows=11346 width=10) (actual time=29.481..29.481 rows=20940 loops=1)
Buckets: 32768 (originally 16384) Batches: 1 (originally 1) Memory Usage: 1156kB
-> HashAggregate (cost=1208.02..1321.48 rows=11346 width=10) (actual time=20.446..25.028 rows=20940 loops=1)
Group Key: legs.sc_id
-> Index Scan using legs_scdate_idx on legs (cost=0.43..1171.88 rows=14458 width=10) (actual time=0.025..13.133 rows=21281 loops=1)
Index Cond: ((scdate >= 20160220) AND (scdate <= 20160222))
I've tried creating a subquery out of the legs dataset in the hope that that would help but that made no difference either.
Geoff