Hi, I'm running PostgreSQL 8.3 and I have a query with a couple of NOT IN subqueries: DELETE FROM foo WHERE type = 'o' AND b NOT IN (SELECT cqc.b FROM bar cqc) AND b NOT IN (SELECT car.b FROM foo car WHERE car.type != 'o'); The plan produced for this is: QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------- Index Scan using foo_type_index on foo (cost=17851.93..1271633830.75 rows=66410 width=6) Index Cond: (type = 'o'::bpchar) Filter: ((NOT (subplan)) AND (NOT (subplan))) SubPlan -> Materialize (cost=6077.87..10238.57 rows=299170 width=8) -> Seq Scan on bar cqc (cost=0.00..4609.70 rows=299170 width=8) -> Materialize (cost=11774.06..15728.45 rows=284339 width=8) -> Seq Scan on foo car (cost=0.00..10378.73 rows=284339 width=8) Filter: (type <> 'o'::bpchar) (9 rows) Unfortunately, when these tables get large-ish, the materilzations get really expensive to re-scan for every tuple (see cost above). At the moment, I have ~500k rows in foo and ~300k rows in bar. The selectivity of type = 'o' is ~50%. I've tried to re-write the query as follows: DELETE FROM foo WHERE b IN (SELECT candidate_run.type_o_run as b FROM (SELECT cqar1.b AS type_o_run, cqar2.b AS non_type_o_run FROM foo cqar1 LEFT OUTER JOIN foo cqar2 ON (cqar1.b = cqar2.b AND cqar2.type != 'o') WHERE cqar1.type = 'o') candidate_run LEFT OUTER JOIN bar ON (candidate_run.type_o_run = bar.b) WHERE non_type_o_run IS NULL AND bar.b IS NULL); This gives the more sensible plan: QUERY PLAN ----------------------------------------------------------------------------------------------------------------- Hash IN Join (cost=48999.81..71174.41 rows=66410 width=6) Hash Cond: (foo.b = cqar1.b) -> Seq Scan on foo (cost=0.00..9003.78 rows=549978 width=14) -> Hash (cost=47909.68..47909.68 rows=66410 width=8) -> Hash Left Join (cost=24562.29..47909.68 rows=66410 width=8) Hash Cond: (cqar1.b = bar.b) Filter: (bar.b IS NULL) -> Hash Left Join (cost=15043.96..33635.58 rows=132820 width=8) Hash Cond: (cqar1.b = cqar2.b) Filter: (cqar2.b IS NULL) -> Seq Scan on foo cqar1 (cost=0.00..10378.73 rows=265639 width=8) Filter: (type = 'o'::bpchar) -> Hash (cost=10378.73..10378.73 rows=284339 width=8) -> Seq Scan on foo cqar2 (cost=0.00..10378.73 rows=284339 width=8) Filter: (type <> 'o'::bpchar) -> Hash (cost=4609.70..4609.70 rows=299170 width=8) -> Seq Scan on bar (cost=0.00..4609.70 rows=299170 width=8) (17 rows) As far as I can tell, the results are identical. My questions 1. Is my rewrite valid? 2. Any way to reliably achieve the second plan (or really, any plan that doesn't rescan ~~500k tuples per each of ~250k tuples) by tweaking (per-session) planner constants? I've played with this a little, but without much success. As with any rewrite situation, I'd prefer to stick with the simpler, more explicit original query. Here is a SQL script to reproduce the problem: \set ON_ERROR_STOP drop schema if exists not_in_test cascade; create schema not_in_test; set search_path to not_in_test; create table foo ( a oid not null, b bigint not null, type char not null, ts timestamp without time zone not null ); create index "foo_b_a_type_index" on foo (b, a, type); create index "foo_a_index" on foo (a); create index "foo_type_index" on foo(type); create table bar ( b bigint unique not null, c timestamp with time zone not null ); create index "bar_b_index" on bar(b); insert into foo select (random()*10)::integer, generate_series(1,550000), case when random() > 0.5 then 'o' else 'x' end, now(); insert into bar select val, now() from generate_series(1,1200000) as vals(val) where random() > 0.75; analyze foo; analyze bar; EXPLAIN DELETE FROM foo WHERE type = 'o' AND b NOT IN (SELECT cqc.b FROM bar cqc) AND b NOT IN (SELECT car.b FROM foo car WHERE car.type != 'o'); EXPLAIN DELETE FROM foo WHERE b IN (SELECT candidate_run.type_o_run as b FROM (SELECT cqar1.b AS type_o_run, cqar2.b AS non_type_o_run FROM foo cqar1 LEFT OUTER JOIN foo cqar2 ON (cqar1.b = cqar2.b AND cqar2.type != 'o') WHERE cqar1.type = 'o') candidate_run LEFT OUTER JOIN bar ON (candidate_run.type_o_run = bar.b) WHERE non_type_o_run IS NULL AND bar.b IS NULL); Thanks, --- Maciek Sakrejda | System Architect | Truviso 1065 E. Hillsdale Blvd., Suite 215 Foster City, CA 94404 (650) 242-3500 Main www.truviso.com -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance