Re: planner with index scan cost way off actual cost, advices to tweak cost constants?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Mar 21, 2006 at 11:13:06AM +0100, Guillaume Cottenceau wrote:
> "Jim C. Nasby" <jnasby 'at' pervasive.com> writes:
> 
> [...]
> 
> > > My point is that the planner's cost estimate is way above the
> > > actual cost of the query, so the planner doesn't use the best
> > > plan. Even if the index returns so much rows, actual cost of the
> > > query is so that index scan (worst case, all disk cache flushed)
> > > is still better than seq scan but the planner uses seq scan.
> > 
> > Yes. The cost estimator for an index scan supposedly does a linear
> > interpolation between a minimum cost and a maximum cost depending on the
> > correlation of the first field in the index. The problem is that while
> > the comment states it's a linear interpolation, the actual formula
> > squares the correlation before interpolating. This means that unless the
> > correlation is very high, you're going to get an unrealistically high
> > cost for an index scan. I have data that supports this at
> > http://stats.distributed.net/~decibel/, but I've never been able to get
> > around to testing a patch to see if it improves things.
> 
> Interesting.
> 
> It would be nice to investigate the arguments behind the choice
> you describe for the formula used to perform the interpolation. I
> have absolutely no knowledge on pg internals so this is rather
> new/fresh for me, I have no idea how smart that choice is (but
> based on my general feeling about pg, I'm suspecting this is
> actually smart but I am not smart enough to see why ;p).

If you feel like running some tests, you need to change

    run_cost += max_IO_cost + csquared * (min_IO_cost - max_IO_cost);

in src/backend/optimizer/path/costsize.c to something like

    run_cost += max_IO_cost + abs(indexCorrelation) * (min_IO_cost - max_IO_cost);

That might not produce a perfect cost estimate, but I'll wager that it
will be substantially better than what's in there now. FYI, see also
http://archives.postgresql.org/pgsql-performance/2005-04/msg00669.php
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@xxxxxxxxxxxxx
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux