Re: Query plan choice issue

Yaroslav Tykhiy <yar@xxxxxxxxxxxxx> · Mon, 13 Sep 2010 19:06:59 +1000

Hi Martin,

Thank you for your response!

On 13/09/2010, at 10:49 AM, Martin Gainty wrote:

a cursory look of the plan details a FTS on dbmail_headername  
invoked by the JOIN clause
JOIN dbmail_headername n ON v.headername_id=n.id
you would accelerate the seek appreciably by placing indexes on both  
participating columns
v.headername_id
n.id

Granted, there was no index on v.headername_id but creating one just  
slowed the query down, with a different plan:

CREATE INDEX dbmail_headervalue_testing ON dbmail_headervalue  
(headername_id);
EXPLAIN ANALYSE ...
                                                                                   QUERY 
 PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=222020.81..222020.81 rows=1 width=8) (actual  
time=28636.426..28636.426 rows=0 loops=1)
   Sort Key: m.message_idnr
   Sort Method:  quicksort  Memory: 25kB
   ->  Nested Loop  (cost=1409.82..222020.80 rows=1 width=8) (actual  
time=28636.409..28636.409 rows=0 loops=1)
         ->  Nested Loop  (cost=1409.82..222012.27 rows=1 width=24)  
(actual time=28636.405..28636.405 rows=0 loops=1)
               ->  Nested Loop  (cost=1409.82..221959.94 rows=6  
width=8) (actual time=28543.441..28624.750 rows=1 loops=1)
                     ->  Seq Scan on dbmail_headername n   
(cost=0.00..111.17 rows=1 width=8) (actual time=0.022..1.114 rows=1  
loops=1)
                           Filter: ((headername)::text ~~* 'MESSAGE- 
ID'::text)
                     ->  Bitmap Heap Scan on dbmail_headervalue v   
(cost=1409.82..221813.70 rows=2805 width=16) (actual  
time=28543.411..28623.623 rows=1 loops=1)
                           Recheck Cond: (v.headername_id = n.id)
                           Filter: ("substring"(v.headervalue, 0,  
255) ~~* '%<...@xxxxxxxxxxxxxx>%'::text)
                           ->  Bitmap Index Scan on  
dbmail_headervalue_testing  (cost=0.00..1409.82 rows=75940 width=0)  
(actual time=17555.572..17555.572 rows=1877009 loops=1)
                                 Index Cond: (v.headername_id = n.id)
               ->  Index Scan using dbmail_messages_physmessage_idx  
on dbmail_messages m  (cost=0.00..8.71 rows=1 width=16) (actual  
time=11.646..11.646 rows=0 loops=1)
                     Index Cond: (m.physmessage_id = v.physmessage_id)
                     Filter: ((m.status = ANY ('{0,1}'::integer[]))  
AND (m.mailbox_idnr = 12345))
         ->  Index Scan using dbmail_physmessage_pkey on  
dbmail_physmessage p  (cost=0.00..8.52 rows=1 width=8) (never executed)
               Index Cond: (p.id = m.physmessage_id)
 Total runtime: 28636.517 ms
(19 rows)

I also see a FTS on domain_headervalue invoked by the JOIN cluase
JOIN dbmail_headervalue v ON v.physmessage_id=p.id
place indexes on both columns
v.physmessage_id
p.id

Both columns already indexed here:

On public.dbmail_headervalue (alias v):
    "dbmail_headervalue_2" btree (physmessage_id)

On public.dbmail_physmessage (alias p):
    "dbmail_physmessage_pkey" PRIMARY KEY, btree (id)

Perhaps I should provide some data on the table sizes.

dbmail_headervalue is the largest table with respect to its record  
count: 36 million records.
dbmail_headername is small: 5640 records.
dbmail_physmessage and dbmail_messages are of an average size: ~2  
million records each.

Sorry for my cluelessness on this issue.  But Postgresql's ability to  
build a fast query plan for this query type at least occasionally is  
encouraging. :-)

Yar

the join clause JOIN dbmail_physmessage p ON m.physmessage_id=p.id  
uses indexed for both participants

Martin
______________________________________________
Verzicht und Vertraulichkeitanmerkung

Diese Nachricht ist vertraulich. Sollten Sie nicht der vorgesehene  
Empfaenger sein, so bitten wir hoeflich um eine Mitteilung. Jede  
unbefugte Weiterleitung oder Fertigung einer Kopie ist unzulaessig.  
Diese Nachricht dient lediglich dem Austausch von Informationen und  
entfaltet keine rechtliche Bindungswirkung. Aufgrund der leichten  
Manipulierbarkeit von E-Mails koennen wir keine Haftung fuer den  
Inhalt uebernehmen.

From: yar@xxxxxxxxxxxxx
To: pgsql-general@xxxxxxxxxxxxxx
Subject:  Query plan choice issue
Date: Mon, 13 Sep 2010 09:36:35 +1000

Hi all,

I'm seeing a funny behaviour in Postgresql 8.4.4. Namely, a query can
be executed using either of two different query plans, one taking a
few milliseconds and the other, tens of seconds. The work_mem setting
doesn't seem to affect it -- tried to increase or decrease it by 2 or
4 times, but it didn't seem to favour the fast plan choice. Honestly,
I have no idea what affects the plan choice, but I saw Postgresql
change it at random.

The query in question looks like this -- sorry, it's rather complex:

SELECT message_idnr
FROM dbmail_messages m
JOIN dbmail_physmessage p ON m.physmessage_id=p.id
JOIN dbmail_headervalue v ON v.physmessage_id=p.id
JOIN dbmail_headername n ON v.headername_id=n.id
WHERE mailbox_idnr = 12345 AND status IN (0,1) AND
headername ILIKE 'MESSAGE-ID' AND SUBSTRING(headervalue,0,255)
ILIKE '%<...@xxxxxxxxxxxxxx>%'
ORDER BY message_idnr;

It comes from DBMail. That said, I don't think DBMail can be blamed
here because at least sometimes Postgresql is able to do the right
thing with respect to that query.

Here is the slow plan:
QUERY
PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=1234610.11..1234610.11 rows=1 width=8) (actual
time=20933.166..20933.168 rows=1 loops=1)
Sort Key: m.message_idnr
Sort Method: quicksort Memory: 25kB
-> Nested Loop (cost=0.00..1234610.10 rows=1 width=8) (actual
time=3327.658..20933.122 rows=1 loops=1)
-> Nested Loop (cost=0.00..1234601.59 rows=1 width=24)
(actual time=3327.599..20933.056 rows=1 loops=1)
-> Nested Loop (cost=0.00..1234549.74 rows=6 width=8)
(actual time=3327.503..20932.944 rows=1 loops=1)
Join Filter: (v.headername_id = n.id)
-> Seq Scan on dbmail_headername n
(cost=0.00..108.46 rows=1 width=8) (actual time=0.027..0.985 rows=1
loops=1)
Filter: ((headername)::text ~~* 'MESSAGE-
ID'::text)
-> Seq Scan on dbmail_headervalue v
(cost=0.00..1234407.96 rows=2666 width=16) (actual
time=3327.465..20931.942 rows=1 loops=1)
Filter: ("substring"(v.headervalue, 0,
255) ~~* '%<...@xxxxxxxxxxxxxx>%'::text)
-> Index Scan using dbmail_messages_physmessage_idx on
dbmail_messages m (cost=0.00..8.63 rows=1 width=16) (actual
time=0.088..0.100 rows=1 loops=1)
Index Cond: (m.physmessage_id = v.physmessage_id)
Filter: ((m.status = ANY ('{0,1}'::integer[]))
AND (m.mailbox_idnr = 12345))
-> Index Scan using dbmail_physmessage_pkey on
dbmail_physmessage p (cost=0.00..8.50 rows=1 width=8) (actual
time=0.048..0.050 rows=1 loops=1)
Index Cond: (p.id = m.physmessage_id)
Total runtime: 20933.241 ms
(17 rows)

And here is the fast plan:
QUERY
PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=1175284.93..1175284.94 rows=1 width=8) (actual
time=14.163..14.165 rows=1 loops=1)
Sort Key: m.message_idnr
Sort Method: quicksort Memory: 25kB
-> Nested Loop (cost=0.00..1175284.92 rows=1 width=8) (actual
time=4.272..14.152 rows=1 loops=1)
-> Nested Loop (cost=0.00..1175276.41 rows=1 width=24)
(actual time=4.261..14.135 rows=1 loops=1)
-> Nested Loop (cost=0.00..1175268.46 rows=1 width=32)
(actual time=4.249..14.117 rows=1 loops=1)
-> Index Scan using dbmail_messages_mailbox_idx
on dbmail_messages m (cost=0.00..4153.35 rows=786 width=16) (actual
time=0.043..2.810 rows=358 loops=1)
Index Cond: (mailbox_idnr = 12345)
Filter: (status = ANY ('{0,1}'::integer[]))
-> Index Scan using dbmail_headervalue_2 on
dbmail_headervalue v (cost=0.00..1489.96 rows=1 width=16) (actual
time=0.028..0.029 rows=0 loops=358)
Index Cond: (v.physmessage_id =
m.physmessage_id)
Filter: ("substring"(v.headervalue, 0,
255) ~~* '%<...@xxxxxxxxxxxxxx>%'::text)
-> Index Scan using dbmail_headername_pkey on
dbmail_headername n (cost=0.00..7.94 rows=1 width=8) (actual
time=0.007..0.008 rows=1 loops=1)
Index Cond: (n.id = v.headername_id)
Filter: ((n.headername)::text ~~* 'MESSAGE-
ID'::text)
-> Index Scan using dbmail_physmessage_pkey on
dbmail_physmessage p (cost=0.00..8.50 rows=1 width=8) (actual
time=0.006..0.007 rows=1 loops=1)
Index Cond: (p.id = m.physmessage_id)
Total runtime: 14.231 ms
(18 rows)

Do you think this query plan choice can be affected by  
postgresql.conf
settings? Or shall I file a bug report?

I'll be happy to provide the table and index structure details if
needed.

Thanks!

Yar

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general