On Thu, 31 Jul 2008 21:37:39 -0400 Tom Lane <tgl@xxxxxxxxxxxxx> wrote: > Ivan Sergio Borgonovo <mail@xxxxxxxxxxxxxxx> writes: > > I'm doing something like: > > delete from table1 where id not in (select id from table2). > > table1 contains ~1M record table2 contains ~ 600K record and id > > is unique. > That's going to pretty much suck unless you've got work_mem set > high enough to allow a "hashed subplan" plan --- which is likely > to require tens of MB for this case, I don't recall exactly what Thanks. > the per-row overhead is. Experiment until EXPLAIN tells you it'll > use a hashed subplan. explain delete from catalog_categoryitem where ItemID not in (select ItemID from catalog_items); Well I reached 3Gb of work_mem and still I got: "Seq Scan on catalog_categoryitem (cost=31747.84..4019284477.13 rows=475532 width=6)" " Filter: (NOT (subplan))" " SubPlan" " -> Materialize (cost=31747.84..38509.51 rows=676167 width=8)" " -> Seq Scan on catalog_items (cost=0.00..31071.67 rows=676167 width=8)" I've this too: alter table catalog_items cluster on catalog_items_pkey; should I drop it? This is just a dev box. I loaded the 2 tables with 2 not coherent set of data just to play with, before adding all the pk/fk I need. I could just truncate the tables and reload them from coherent sources. But what if I *really* had to execute that query? Any other magic I could play to speed it up? thanks -- Ivan Sergio Borgonovo http://www.webthatworks.it