"Jim C. Nasby" <jnasby 'at' pervasive.com> writes: [...] > > My point is that the planner's cost estimate is way above the > > actual cost of the query, so the planner doesn't use the best > > plan. Even if the index returns so much rows, actual cost of the > > query is so that index scan (worst case, all disk cache flushed) > > is still better than seq scan but the planner uses seq scan. > > Yes. The cost estimator for an index scan supposedly does a linear > interpolation between a minimum cost and a maximum cost depending on the > correlation of the first field in the index. The problem is that while > the comment states it's a linear interpolation, the actual formula > squares the correlation before interpolating. This means that unless the > correlation is very high, you're going to get an unrealistically high > cost for an index scan. I have data that supports this at > http://stats.distributed.net/~decibel/, but I've never been able to get > around to testing a patch to see if it improves things. Interesting. It would be nice to investigate the arguments behind the choice you describe for the formula used to perform the interpolation. I have absolutely no knowledge on pg internals so this is rather new/fresh for me, I have no idea how smart that choice is (but based on my general feeling about pg, I'm suspecting this is actually smart but I am not smart enough to see why ;p). -- Guillaume Cottenceau