Re: planner with index scan cost way off actual cost, advices to tweak cost constants?

Guillaume Cottenceau <gc@xxxxxx> · 21 Mar 2006 11:13:06 +0100

"Jim C. Nasby" <jnasby 'at' pervasive.com> writes:

[...]

> > My point is that the planner's cost estimate is way above the
> > actual cost of the query, so the planner doesn't use the best
> > plan. Even if the index returns so much rows, actual cost of the
> > query is so that index scan (worst case, all disk cache flushed)
> > is still better than seq scan but the planner uses seq scan.
> 
> Yes. The cost estimator for an index scan supposedly does a linear
> interpolation between a minimum cost and a maximum cost depending on the
> correlation of the first field in the index. The problem is that while
> the comment states it's a linear interpolation, the actual formula
> squares the correlation before interpolating. This means that unless the
> correlation is very high, you're going to get an unrealistically high
> cost for an index scan. I have data that supports this at
> http://stats.distributed.net/~decibel/, but I've never been able to get
> around to testing a patch to see if it improves things.

Interesting.

It would be nice to investigate the arguments behind the choice
you describe for the formula used to perform the interpolation. I
have absolutely no knowledge on pg internals so this is rather
new/fresh for me, I have no idea how smart that choice is (but
based on my general feeling about pg, I'm suspecting this is
actually smart but I am not smart enough to see why ;p).

-- 
Guillaume Cottenceau