Give me a HINT or I'll got crazy ;)

wstrzalka <wstrzalka@xxxxxxxxx> · Thu, 8 Oct 2009 09:11:18 -0700 (PDT)

So the query is:

SELECT m.messageid, mad.destination
FROM messages AS m
	  LEFT JOIN message_address_link AS mal ON (mal.message_id =
m.messageid)
	  JOIN message_address_data AS mad ON (mad.id = mal.address_id)
WHERE  delete_status <> 1
AND  folderid=E'200702210742181172061738846603000'

the structure doesn't really matter - it's intuitive m:n relation with
primary key's indexes on join columns and on 'folderid' column
and the plan is like:

QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=317.55..17771.38 rows=95249 width=36) (actual
time=1116.358..1356.393 rows=152 loops=1)
   Hash Cond: (mal.address_id = mad.id)
   ->  Hash Join  (cost=101.53..15650.39 rows=95249 width=8) (actual
time=1102.977..1342.675 rows=152 loops=1)
         Hash Cond: (mal.message_id = m.messageid)
         ->  Seq Scan on message_address_link mal
(cost=0.00..11738.90 rows=761990 width=8) (actual time=0.059..666.597
rows=761990 loops=1)
         ->  Hash  (cost=101.22..101.22 rows=25 width=4) (actual
time=0.207..0.207 rows=39 loops=1)
               ->  Bitmap Heap Scan on messages m  (cost=4.46..101.22
rows=25 width=4) (actual time=0.064..0.163 rows=39 loops=1)
                     Recheck Cond: (folderid =
'200702210742181172061738846603000'::text)
                     Filter: (delete_status <> 1)
                     ->  Bitmap Index Scan on messages_folderid_idx
(cost=0.00..4.46 rows=25 width=0) (actual time=0.047..0.047 rows=39
loops=1)
                           Index Cond: (folderid =
'200702210742181172061738846603000'::text)
   ->  Hash  (cost=133.23..133.23 rows=6623 width=36) (actual
time=13.353..13.353 rows=6623 loops=1)
         ->  Seq Scan on message_address_data mad  (cost=0.00..133.23
rows=6623 width=36) (actual time=0.008..6.443 rows=6623 loops=1)
 Total runtime: 1356.600 ms

Prior to the playing with statistics target (it was 100 by default) I
was able to go with the time to 30ms by adding to the query such a
condition:

--------- AND m.messageid BETWEEN 1 AND 1000000000 -----------

which was more like a hint then a real condition in this case.
After some stats playing i'm not able to reproduce this, however I
remember the messages table was scanned with index first ( it returns
only several tuples) and the joined with message_address_link &
message_address_data - while the current version performs full scan on
the message_address_link table where there is over 700k tuples and
then plays with it.

messages 75k tuples
message_address_link 750k tuples
message_address_data - 6k tuples

I know PG community don't want hints - so how would you bite this
case. I saw it's possible that this query will be fast. The question
is - how ?

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general