On Mon, Mar 20, 2006 at 09:14:32AM +0100, Guillaume Cottenceau wrote: > Guillaume, > > Thanks for your answer. > > > On 17 Mar 2006 11:09:50 +0100, Guillaume Cottenceau > > wrote: > > > Reading the documentation and postgresql list archives, I have > > > run ANALYZE right before my tests, I have increased the > > > statistics target to 50 for the considered table; my problem is > > > that the index scan cost reported by EXPLAIN seems to be around > > > 12.7 times higher that it should, a figure I suppose incompatible > > > (too large) for just random_page_cost and effective_cache_size > > > tweaks. > > > > It's not surprising you have a high cost for an index scan which is > > planned to return and returns so much rows. I really don't think the > > planner does something wrong on this one. > > My point is that the planner's cost estimate is way above the > actual cost of the query, so the planner doesn't use the best > plan. Even if the index returns so much rows, actual cost of the > query is so that index scan (worst case, all disk cache flushed) > is still better than seq scan but the planner uses seq scan. Yes. The cost estimator for an index scan supposedly does a linear interpolation between a minimum cost and a maximum cost depending on the correlation of the first field in the index. The problem is that while the comment states it's a linear interpolation, the actual formula squares the correlation before interpolating. This means that unless the correlation is very high, you're going to get an unrealistically high cost for an index scan. I have data that supports this at http://stats.distributed.net/~decibel/, but I've never been able to get around to testing a patch to see if it improves things. <snip> > > thinking the problem is that your test case is not accurate. > > Ok. Actually, I suspect your test case was probably fine, but take a look at the data I've got and see what you think. If you want to spend some time on this it should be possible to come up with a test case that uses either pgbench or dbt2/3 to generate data, so that others can easily reproduce (I can't really make the data I used for my testing available). -- Jim C. Nasby, Sr. Engineering Consultant jnasby@xxxxxxxxxxxxx Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461