On Fri, Aug 7, 2009 at 5:09 PM, Scott Carey<scott@xxxxxxxxxxxxxxxxx> wrote: > On 8/7/09 5:53 AM, "Robert Haas" <robertmhaas@xxxxxxxxx> wrote: > >> On Fri, Aug 7, 2009 at 4:00 AM, Kees van Dieren<keesvandieren@xxxxxxxxx> >> wrote: >>> Would it get attention if I submit this to >>> http://www.postgresql.org/support/submitbug ? (in fact it is not really a >>> bug, but an improvement request). >> >> I think that many of the people who read that mailing list also read >> this one, including, critically, Tom Lane, and you're not the first >> person to run into a problem caused by lack of cross-column >> statistics. > > Critically, it should be understood that this general problem is not just > born from lack of cross-column statistics. > > It is also one of using the statistical expected value to calculate cost > without consideration of the variance. Often, the cost of a plan varies > widely and nonlinearly with a small change in the expected value the stats > used to estimate cost. > > The problem is acute with LIMIT and various other boundary conditions where > there is a steep 'cost cliff' for certain plan types. When LIMIT is > applied, the planner changes its estimates, but does not take into account > the _greatly_ increased uncertainty of those estimates. > > Imagine a case where the planner's estimate is 100% correct, and on average > one side of a join will have 2 tuples. The planner chooses nested loops to > do that join. But the distribution of the number of tuples at this node is > skewed, so although the expected value is 2, a values of 10 and 0 are both > common. When values of 10 occur, the execution time goes up significantly. > An alternate plan might be slower for the case where the actual values in > execution equal the expected values, but faster in the average case! > The average cost of a plan is NOT that cost of the query with average > statistics, due to variance, nonlinearity, and skew. And even if they were > equal, it might not be preferable to choose the plan that is best on average > over the one that is best at the 90th percentile. > > Getting back to the case above with the nestloop -- if the planner estimated > the cost of the nestloop join versus other joins with some idea of the > variance in the estimations it could favor more 'stable' execution plans. > > So in short, cross-column statistics don't have to be gathered and used to > make this problem less acute. The planner just has to be more aware of > variance and the things that lead to it, such as column correlation. > Thus with a LIMIT applied, the expected value for the number of tuples > scanned before a match will shrink, but the uncertainty of this estimate > grows significantly as well, so the right plan to choose is one that hedges > against the uncertainty, not the one that assumes the expected value is > correct. This is a good analysis. I think I proposed some kind of idea about tracking uncertainty in the planner a while back, but I never got very far with it. The problem is, first, figuring out how to estimate the uncertainty, and second, figuring out what to do with the result once you've estimated it. The concept of hedging against uncertainty is the right one, I think, but it's not obvious how to fit that into the cost-comparison algorithm that the planner uses. A sticking point here too is that the comparison of path costs is already a hot spot; making it more complex will likely have a noticeable negative impact on query planning time. For queries against huge databases that may not matter much, but for OLTP queries it certainly does. There are a couple of methods that have been proposed to deal with this in the past. The most obvious one that's been talked about a couple of times is switching a nested loop to a hash join if the number of iterations exceeds some bound, which would require some executor support. Empirically, it seems to me that the planner generally follows a pretty consistent pattern. If the inner relation is tiny or the number of loops is estimated to be very small, it uses a nested loop. When the inner rel is a bit bigger, or the number of iterations is nontrivial, it switches to a hash join. When the inner relation gets too big to fit in work_mem, or just big enough that hashing it looks too slow, it switches to a nested loop with inner index-scan or, especially if a useful sort is available, a merge join. Just handling better the case where we pick a straight nested loop rather than a hash join would help a lot of people. Some basic conservatism about the number of outer rows would be useful here (in particular, we should probably assume that there will be at least 2 when costing a nest loop, unless the outer side is known unique), and it's also worth thinking about the fact that a hash join won't build the table unless there is at least 1 outer row, which I don't think the current costing algorithm takes into account. Or maybe we should estimate the amount by which the nested loop figures to beat out the hash join and only accepted the nested loop plan if the win exceeds some threshold percentage. ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance