Re: reducing random_page_cost from 4 to 2 to force index scan

Josh Berkus <josh@xxxxxxxxxxxx> · Sun, 15 May 2011 11:20:10 -0700

Stuart,

> I think random_page_cost causes problems because I need to combine
> disk random access time, which I can measure, with a guesstimate of
> the disk cache hit rate.

See, that's wrong. Disk cache hit rate is what effective_cache_size
(ECS) is for.

Really, there's several factors which should be going into the planner's
estimates to determine a probability of a table being cached:

* ratio between total database size and ECS
* ratio between table size and ECS
* ratio between index size and ECS
* whether the table is "hot" or not
* whether the index is "hot" or not

The last two statistics are critically important for good estimation,
and they are not things we currently collect.  By "hot" I mean: is this
a relation which is accessed several times per minute/hour and is thus
likely to be in the cache when we need it?  Currently, we have no way of
knowing that.

Without "hot" statistics, we're left with guessing based on size, which
results in bad plans for small tables in large databases which are
accessed infrequently.

Mind you, for large tables it would be even better to go beyond that and
actually have some knowledge of which disk pages might be in cache.
However, I think that's beyond feasibility for current software/OSes.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance