On Thu, Mar 24, 2011 at 5:30 PM, Nathan Boley <npboley@xxxxxxxxx> wrote: > Another approach, that hasn't been suggested yet, is some Bayesian > update method. There, rather than calculating a specific parameter > value ( like ndistinct ), you try to store the entire distribution and > choose the plan that minimizes cost averaged over all of the possible > parameter values. I've done similar stuff for work, you don't have to go all the way to storing complete probability distributions, usually a simple likelihood range is enough. In essence, instead of having a scalar MLE for plan cost, you implement a "ranged" estimator, that estimates the most-likely range of plan costs, with mean and standard deviation from mean. This essentially gives a risk value, since risky plans will have very large standard deviations from the mean. > Also, ( not that I have even close to the experience / expertise to > make this claim - so take this with a grain of salt ) it seems that > the code changes would be substantial but pretty straightforward and > easy to localize. Rather than passing a selectivity, pass a pair of > arrays with selectivities and probabilities. If you approximage the probability distributions as I outlined above, it's even simpler. Approximate, but simpler - and since you retain the original cost estimations in the form of mean cost values, you can easily tune the GEQO to perform as it currently does (ignore variance) or with a safety margin (account for variance). About issues like these being uncommon - I disagree. I routinely have to work around query inefficiencies because GEQO does something odd - and since postgres gives me too few tools to tweak plans (increase statistics, use subqueries, rephrase joins, no direct tool before CTEs which are rather new), it becomes an art form, and it becomes very unpredictable and an administrative burden. Out of the blue, statistics change, queries that worked fine start to perform poorly, and sites go down. If GEQO could detect unsafe plans and work around them automatically, it would be a major improvement. Granted, though, this should be approached with care. -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance