Search Postgresql Archives

Re: Slight change in query leads to unexpected change in query plan

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Sam Mason wrote:
On Mon, Jun 22, 2009 at 05:55:28PM -0400, Jack Orenstein wrote:
ris-# select *
ris-# from T
ris-# where pk > 1000000000
ris-# and value = 'asdf'::bytea
ris-# order by pk
ris-# limit 100;

PG thinks that you're going to get 16 rows back matching those
conditions, bitmap heap scans are faster in some cases and this is
likely to be one of those cases so PG is optimizing things correctly.

But that's 16 rows at the end. I'm concerned about the earlier processing. Here's the plan:

  Limit  (cost=78352.20..78352.24 rows=16 width=451)
    ->  Sort  (cost=78352.20..78352.24 rows=16 width=451)
          Sort Key: pk
          ->  Bitmap Heap Scan on t  (cost=2091.60..78351.88 rows=16 width=451)
                Recheck Cond: (pk > 1000000000)
                Filter: (value = 'asdf'::bytea)
                ->  Bitmap Index Scan on t_pkey  (cost=0.00..2091.60 rows=91088
width=0)
                      Index Cond: (pk > 1000000000)

If I'm reading this right, the optimizer estimates that the bitmap index scan will identify 91088 rows, which all have to be retrieved from the heap for evaluation of the restriction on value. The row width is 451, so even if the rows are packed into 8k pages perfectly, that's ~5000 pages that have to be read.

If there is an index scan (non-bitmap) with the value restriction checked at the end, we know the optimizer thinks that is cheaper. Here is the plan without the value restriction:

  Limit  (cost=0.00..324.99 rows=100 width=451)
    ->  Index Scan using t_pkey on t  (cost=0.00..296027.98 rows=91088 width=451)
          Index Cond: (pk > 1000000000)

Adding the value restriction at the top of this query plan wouldn't increase the cost very much.

So given that estimate of 91088 rows coming out of the bitmap index scan, I still don't understand the optimizer's reasoning. Am I reading the plan incorrectly?

Jack


 Limit  (cost=78352.20..78352.24 rows=16 width=451)

ris-# select *
ris-# from T
ris-# where pk > 1000000000
ris-# order by pk
ris-# limit 100;

With this query, PG thinks that you may get 91088 rows back but because
you've got a LIMIT in there you only needs the first 100 of them.  It
will therefore prefer a plan that will stop short and thus is preferring
an index scan.

 Limit  (cost=0.00..324.99 rows=100 width=451)
   ->  Index Scan using t_pkey on t  (cost=0.00..296027.98 rows=91088 width=451)


Why does adding the value restriction so radically change the execution plan?

PG doesn't have any cross column statistics and hence it assumes that pk
and value are uncorrelated.  You may get better results with increasing
the statistics target[1] for those columns as that will give PG more
information, but if the columns are indeed correlated then that's not
going to help.



--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux