join question

"Grzegorz Jaśkiewicz" <gryzman@xxxxxxxxx> · Wed, 22 Oct 2008 22:52:04 +0100

Hey folks,
I am trying to rewrite a query here, that takes 1.5m atm to finish. I got it down to 20s, and still trying to pin it down.

basically, a query looks something like that atm:

select a.*, b.* 
 from a
   join b on a.id = b.a_id and a.banned <> true
 where
   a.start <= now()
  and

   b.end > now();

that's 20s query, and now I got it down to 10s , by using something - which in my eyes would be always wrong - and against all logic. So if someone could please explain to me why is it faster:

select a.*, b.* 
 from foo a
   join bar b on a.id = b.a_id
 where
  not exists (
      select id from foo where foo.id = b.a_id and foo.banned <> true
   )
 and
   a.start <= now()
  and
   b.end > now();

plans differ, obviously - second one uses index to lookup .banned in foo, whilst first one goes for seq scan. 
result is the same, but I was actually expecting quite opposite. So is join on 1-2M rows a bad idea ?
The effect can be seen on both 8.1 and cvs head.

I would be grateful for someone clarifying that to me.

-- 
GJ