Robert Haas <robertmhaas@xxxxxxxxx> wrote: > Just handling better the case where we pick a straight nested loop > rather than a hash join would help a lot of people. Some basic > conservatism about the number of outer rows would be useful here (in > particular, we should probably assume that there will be at least 2 > when costing a nest loop, unless the outer side is known unique), > and it's also worth thinking about the fact that a hash join won't > build the table unless there is at least 1 outer row, which I don't > think the current costing algorithm takes into account. Or maybe we > should estimate the amount by which the nested loop figures to beat > out the hash join and only accepted the nested loop plan if the win > exceeds some threshold percentage. But in our environment the most common cause of a sub-optimal planning choice is over-estimating the cost of a nested loop. We've never been able to get good plans overall without dropping random_page_cost to twice the seq_page_cost or less -- in highly cached situations we routinely drop both to 0.1. Creating further bias against nested loop plans just means we'll have to push the numbers further from what the purportedly represent. It seems to me a significant unsolved problem is properly accounting for caching effects. That said, I have not really clear idea on how best to solve that problem. The only ideas which recur when facing these issues are: (1) The the planner refused to deal with fractional estimates of how many rows will be returned in a loop -- it treats 0.01 as 1 on the basis that you can't read a fractional row, rather than as a 1% chance that you will read a row and need to do the related work. I have always thought that changing this might allow more accurate estimates; perhaps I should hack a version which behaves that way and test it as a "proof of concept." Note that this is diametrically opposed to your suggestion that we always assume at least two rows in the absence of a unique index. (2) Somehow use effective_cache_size in combination with some sort of current activity metrics to dynamically adjust random access costs. (I know, that one's total hand-waving, but it seems to have some possibility of better modeling reality than what we currently do.) -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance