Re: planner with index scan cost way off actual cost, advices to tweak cost constants?

"Jim C. Nasby" <jnasby@xxxxxxxxxxxxx> · Tue, 21 Mar 2006 03:57:00 -0600

On Mon, Mar 20, 2006 at 09:14:32AM +0100, Guillaume Cottenceau wrote:
> Guillaume,
> 
> Thanks for your answer.
> 
> > On 17 Mar 2006 11:09:50 +0100, Guillaume Cottenceau
> > wrote:
> > > Reading the documentation and postgresql list archives, I have
> > > run ANALYZE right before my tests, I have increased the
> > > statistics target to 50 for the considered table; my problem is
> > > that the index scan cost reported by EXPLAIN seems to be around
> > > 12.7 times higher that it should, a figure I suppose incompatible
> > > (too large) for just random_page_cost and effective_cache_size
> > > tweaks.
> > 
> > It's not surprising you have a high cost for an index scan which is
> > planned to return and returns so much rows. I really don't think the
> > planner does something wrong on this one.
> 
> My point is that the planner's cost estimate is way above the
> actual cost of the query, so the planner doesn't use the best
> plan. Even if the index returns so much rows, actual cost of the
> query is so that index scan (worst case, all disk cache flushed)
> is still better than seq scan but the planner uses seq scan.

Yes. The cost estimator for an index scan supposedly does a linear
interpolation between a minimum cost and a maximum cost depending on the
correlation of the first field in the index. The problem is that while
the comment states it's a linear interpolation, the actual formula
squares the correlation before interpolating. This means that unless the
correlation is very high, you're going to get an unrealistically high
cost for an index scan. I have data that supports this at
http://stats.distributed.net/~decibel/, but I've never been able to get
around to testing a patch to see if it improves things.

<snip>
> > thinking the problem is that your test case is not accurate.
> 
> Ok.

Actually, I suspect your test case was probably fine, but take a look at
the data I've got and see what you think. If you want to spend some time
on this it should be possible to come up with a test case that uses
either pgbench or dbt2/3 to generate data, so that others can easily
reproduce (I can't really make the data I used for my testing
available).
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@xxxxxxxxxxxxx
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461