Re: merge>hash>loop

"Jim C. Nasby" <jnasby@xxxxxxxxxxxxx> · Tue, 18 Apr 2006 16:52:38 -0500

On Tue, Apr 18, 2006 at 12:51:59PM +0200, Markus Schaber wrote:
> > In my mind this is tied into another issue, which is that the planner
> > always costs on the basis of each query starting from zero.  In a real
> > environment it's much cheaper to use heavily-used indexes than this cost
> > model suggests, because they'll already be swapped in due to use by
> > previous queries.  But we haven't got any infrastructure to keep track
> > of what's been heavily used, let alone a cost model that could make use
> > of the info.
> 
> An easy first approach would be to add a user tunable cache probability
> value to each index (and possibly table) between 0 and 1. Then simply
> multiply random_page_cost with (1-that value) for each scan.
> 
> Later, this value could be automatically tuned by stats analysis or
> other means.

Actually, if you run with stats_block_level turned on you have a
first-order approximation of what is and isn't cached. Perhaps the
planner could make use of this information if it's available.

> > I think part of the reason that people commonly reduce random_page_cost
> > to values much lower than physical reality would suggest is that it
> > provides a crude way of partially compensating for this basic problem.
> 
> I totall agree with this, it's just what we did here from time to time. :-)
> 
> Hmm, how does effective_cach_size correspond with it? Shouldn't a high
> effective_cache_size have a similar effect?

Generally, effective_cache_size is used to determine the likelyhood that
something will be in-cache. random_page_cost tells us how expensive it
will be to get that information if it isn't in cache.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@xxxxxxxxxxxxx
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461