Re: Bad Planner Statistics for Uneven distribution.

"Guillaume Smet" <guillaume.smet@xxxxxxxxx> · Sat, 22 Jul 2006 00:00:01 +0200

Tom,

On 7/21/06, Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
It's really not possible for a full-table indexscan to be faster than a
seqscan, and not very credible for it even to be approximately as fast.
I suspect your second query here is the beneficiary of the first query
having fetched all the pages into cache.  In general, if you want to
optimize for a mostly-cached database, you need to reduce
random_page_cost below its default value ...

We discussed this case on IRC and the problem was not the first set of
queries but the second one:
select brand_id from brands where exists (select 1 from models_brands
where brand = brands.brand_id);).

Isn't there any way to make PostgreSQL have a better estimation here:
->  Index Scan using models_brands_brand on models_brands
(cost=0.00..216410.97 rows=92372 width=0) (actual time=0.008..0.008
rows=0 loops=303)
          Index Cond: (brand = $0)

I suppose it's because the planner estimates that there will be 92372
result rows that it chooses the seqscan instead of the index scan.
ALTER STATISTICS didn't change anything.
IIRC, there were already a few threads about the same sort of
estimation problem and there wasn't any solution to solve this
problem. Do you have any hint/ideas?

--
Guillaume