2010/11/11 Robert Haas <robertmhaas@xxxxxxxxx>
-- But thinking over what you've written here, I'm reminded of something
Peter said years ago, also about the optimizer. He was discussed the
ratio of the estimated cost to the actual cost and made an off-hand
remark that efforts had been made over the years to make that ratio
more consistent (i.e. improve the quality of the cost estimates) but
that they'd been abandoned because they didn't necessarily produce
better plans. Applying that line of thinking to this problem, maybe
we should give up on trying to make the estimates truly model reality,
and focus more on assigning them values which work well in practice.
For example, in your case, it would be sufficient to estimate the
amount of data that a given query is going to grovel through and then
applying some heuristic to choose values for random_page_cost and
seq_page_cost based on the ratio of that value to, I don't know,
effective_cache_size.
As for me, the simplest solution would be to allow to set costs on per-relation basis. E.g. I know that this relation is most time in memory and other one (archive) is on the disk. This could work like charm along with buffer pools (portions of shared cache) - tables (or indexes) that are required to be cached can be assigned to bufferpool that has enough size to hold all the data, archive ones - to small bufferpool. This can guarantie that after query on the archive data, cached tables are still cached.
This solutions however, does not help on tables where only some portion of table is activelly used. The solution can be to allow set costs via partial indexes - e.g. "for any table access using this index, use this cost values". This, BTW, will make table access via given index more preferable.
Best regards,
Vitalii Tymchyshyn