Search Postgresql Archives

Re: Query plan choice issue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Tom,

On 14/09/2010, at 12:41 AM, Tom Lane wrote:

Yaroslav Tykhiy <yar@xxxxxxxxxxxxx> writes:
                     ->  Bitmap Heap Scan on dbmail_headervalue v
(cost=1409.82..221813.70 rows=2805 width=16) (actual
time=28543.411..28623.623 rows=1 loops=1)
                           Recheck Cond: (v.headername_id = n.id)
                           Filter: ("substring"(v.headervalue, 0,
255) ~~* '%<...@xxxxxxxxxxxxxx>%'::text)
                           ->  Bitmap Index Scan on
dbmail_headervalue_testing  (cost=0.00..1409.82 rows=75940 width=0)
(actual time=17555.572..17555.572 rows=1877009 loops=1)
                                 Index Cond: (v.headername_id = n.id)

I think the major problem you're having is that the planner is
completely clueless about the selectivity of the condition
	"substring"(v.headervalue, 0,  255) ~~* '%<...@xxxxxxxxxxxxxx>%'
If it knew that that would match only one row, instead of several
thousand, it would likely pick a different plan.

In recent versions of PG you could probably make a noticeable
improvement in this if you just dropped the substring() restriction
... do you actually need that?  Alternatively, if you don't want to
change the query logic at all, I'd try making an index on
substring(v.headervalue, 0, 255).  I'm not expecting the query
to actually *use* the index, mind you.  But its existence will prompt
ANALYZE to collect stats on the expression's value, and that will
help the planner with estimating the ~~* condition.

Well, that substring() and ILIKE combo looked suspicious to me, too. However, there already was an index on substring(v.headervalue, 0, 255) but the fast query plan didn't seem to use it, it used a different index instead:

mail=# \d dbmail_headervalue
                              Table "public.dbmail_headervalue"
     Column     |  Type  |                             Modifiers
----------------+-------- +-------------------------------------------------------------------
 headername_id  | bigint | not null
 physmessage_id | bigint | not null
id | bigint | not null default nextval('dbmail_headervalue_idnr_seq'::regclass)
 headervalue    | text   | not null default ''::text
Indexes:
    "dbmail_headervalue_pkey" PRIMARY KEY, btree (id)
    "dbmail_headervalue_1" UNIQUE, btree (physmessage_id, id)
    "dbmail_headervalue_2" btree (physmessage_id)
    "dbmail_headervalue_3" btree ("substring"(headervalue, 0, 255))
...
EXPLAIN ANALYZE...
-> Index Scan using dbmail_headervalue_2 on dbmail_headervalue v (cost=0.00..1489.96 rows=1 width=16) (actual time=0.028..0.029 rows=0 loops=358) Index Cond: (v.physmessage_id = m.physmessage_id) Filter: ("substring"(v.headervalue, 0, 255) ~~* '%<...@xxxxxxxxxxxxxx>%'::text)
...

Meanwhile, a mate of mine lurking on this list pointed out that reducing random_page_cost might help here and it did: random_page_cost of 2 made the fast query favourable.

Can it mean that the default planner configuration slightly overfavours seq scans?

Thank you all guys!

Yar

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux