Re: Query performance with disabled hashjoin and mergejoin

Robert Haas <robertmhaas@xxxxxxxxx> · Tue, 8 Mar 2011 15:57:46 -0500

On Mon, Mar 7, 2011 at 3:40 PM, Merlin Moncure <mmoncure@xxxxxxxxx> wrote:
> On Tue, Feb 22, 2011 at 9:07 PM, Robert Haas <robertmhaas@xxxxxxxxx> wrote:
>> On Fri, Feb 4, 2011 at 7:08 AM, Ivan Voras <ivoras@xxxxxxxxxxx> wrote:
>>>                                 ->  BitmapAnd  (cost=1282.94..1282.94
>>> rows=1430 width=0) (actual time=5.508..5.508 rows=0 loops=1)
>>>                                       ->  Bitmap Index Scan on
>>> news_index_layout_id_state  (cost=0.00..150.14 rows=2587 width=0) (actual
>>> time=0.909..0.909 rows=3464 loops=1)
>>>                                             Index Cond: ((layout_id = 8980)
>>> AND (state = 2))
>>>                                       ->  BitmapOr (cost=1132.20..1132.20
>>> rows=20127 width=0) (actual time=4.136..4.136 rows=0 loops=1)
>>>                                             ->  Bitmap Index Scan on
>>> news_visible_from  (cost=0.00..1122.09 rows=19976 width=0) (actual
>>> time=3.367..3.367 rows=19932 loops=1)
>>>                                                   Index Cond: (visible_from
>>> IS NULL)
>>>                                             ->  Bitmap Index Scan on
>>> news_visible_to  (cost=0.00..9.40 rows=151 width=0) (actual
>>> time=0.766..0.766 rows=43 loops=1)
>>>                                                   Index Cond: (1296806570 <=
>>> visible_to)
>>
>> I think this part of the query is the problem.  Since the planner
>> doesn't support cross-column statistics, it can't spot the correlation
>> between these different search conditions, resulting in a badly broken
>> selectivity estimate.
>>
>> Sometimes you can work around this by adding a single column, computed
>> with a trigger, that contains enough information to test the whole
>> WHERE-clause condition using a single indexable test against the
>> column value.  Or sometimes you can get around it by partitioning the
>> data into multiple tables, say with the visible_from IS NULL rows in a
>> different table from the rest.
>
> Why should you need cross column statistics for this case?  You should
> be able to multiple selectivity from left to right as long as you are
> doing equality comparisons, yes?
>
> Right now the planner is treating
> select * from foo where (a,b,c) between (1,1,1) and (9,9,9) the same
> (using selectivity on a) as
> select * from foo where (a,b,c) between (1,1,5) and (1,1,7)
>
> but they are not the same. since in the second query terms a,b are
> equal, shouldn't you able to multiply the selectivity through?

I'm not quite following that...

The reason I thought cross-column correlations might be relevant is
that the bitmap index scan on news_visible_from is quite accurate
(19976 estimated vs. 19932 actual) and the bitmap index scan on
news_visible_to is tolerably accurate (151 estimated vs. 41 actual)
but the estimate on the BitmapOr is somehow totally wrong (20127
estimated vs. 0 actual).  But on further reflection that doesn't make
much sense.  How can the BitmapOr produce fewer rows than the sum of
its constituent inputs?

/me scratches head.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance