Re: Slight change in query leads to unexpected change in query plan

Sam Mason <sam@xxxxxxxxxxxxx> · Tue, 23 Jun 2009 00:16:16 +0100

On Mon, Jun 22, 2009 at 05:55:28PM -0400, Jack Orenstein wrote:
> ris-# select *
> ris-# from T
> ris-# where pk > 1000000000
> ris-# and value = 'asdf'::bytea
> ris-# order by pk
> ris-# limit 100;

PG thinks that you're going to get 16 rows back matching those
conditions, bitmap heap scans are faster in some cases and this is
likely to be one of those cases so PG is optimizing things correctly.

>  Limit  (cost=78352.20..78352.24 rows=16 width=451)

> ris-# select *
> ris-# from T
> ris-# where pk > 1000000000
> ris-# order by pk
> ris-# limit 100;

With this query, PG thinks that you may get 91088 rows back but because
you've got a LIMIT in there you only needs the first 100 of them.  It
will therefore prefer a plan that will stop short and thus is preferring
an index scan.

>  Limit  (cost=0.00..324.99 rows=100 width=451)
>    ->  Index Scan using t_pkey on t  (cost=0.00..296027.98 rows=91088 width=451)

> Why does adding the value restriction so radically change the execution 
> plan?

PG doesn't have any cross column statistics and hence it assumes that pk
and value are uncorrelated.  You may get better results with increasing
the statistics target[1] for those columns as that will give PG more
information, but if the columns are indeed correlated then that's not
going to help.

-- 
  Sam  http://samason.me.uk/

 [1] http://www.postgresql.org/docs/current/static/sql-altertable.html

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general