Re: Query plan for NOT IN

Grzegorz Jaśkiewicz <gryzman@xxxxxxxxx> · Mon, 5 Oct 2009 14:56:05 +0100

On Mon, Oct 5, 2009 at 2:52 PM, Matthew Wakeling <matthew@xxxxxxxxxxx> wrote:

mnw21-modmine-r13features-copy=# select count(*) from project;

 count

-------

    10

(1 row)

mnw21-modmine-r13features-copy=# select count(*) from intermineobject;

  count

----------

 26344616

(1 row)

mnw21-modmine-r13features-copy=# \d intermineobject;

Table "public.intermineobject"

 Column |  Type   | Modifiers

--------+---------+-----------

 object | text    |

 id     | integer | not null

 class  | text    |

Indexes:

    "intermineobject_pkey" UNIQUE, btree (id)

mnw21-modmine-r13features-copy=# explain select * from project where id NOT IN (SELECT id FROM intermineobject);

                                     QUERY PLAN

------------------------------------------------------------------------------------

 Seq Scan on project  (cost=1476573.93..1476575.05 rows=5 width=183)

   Filter: (NOT (hashed SubPlan 1))

   SubPlan 1

     ->  Seq Scan on intermineobject  (cost=0.00..1410720.74 rows=26341274 width=4)

(4 rows)

This query plan seems to me to be a little slow. Surely it could iterate through the ten project rows and perform ten index lookups in the big table?

try using join instead of 'not in'..

select p.* from project p left join intermineobject i on i.id=p.id where i.id is null;

-- 
GJ