Search Postgresql Archives

Re: A query planner that learns

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Oct 13, 2006, at 11:47 , John D. Burger wrote:

Erik Jones wrote:

Forgive me if I'm way off here as I'm not all that familiar with the internals of postgres, but isn't this what the genetic query optimizer discussed the one of the manual's appendixes is supposed to do.

No - it's not an "optimizer" in that sense. When there are a small enough set of tables involved, the planner uses a dynamic programming algorithm to explore the entire space of all possible plans. But the space grows exponentially (I think) with the number of tables - when this would take too long, the planner switches to a genetic algorithm approach, which explores a small fraction of the plan space, in a guided manner.

But with both approaches, the planner is just using the static statistics gathered by ANALYZE to estimate the cost of each candidate plan, and these statistics are based on sampling your data - they may be wrong, or at least misleading. (In particular, the statistic for total number of unique values is frequently =way= off, per a recent thread here. I have been reading about this, idly thinking about how to improve the estimate.)

The idea of a learning planner, I suppose, would be one that examines cases where these statistics lead to very misguided expectations. The simplest version of a "learning" planner could simply bump up the statistics targets on certain columns. A slightly more sophisticated idea would be for some of the statistics to optionally use parametric modeling (this column is a Gaussian, let's estimate the mean and variance, this one is a Beta distribution ...). Then the smarter planner could spend some cycles applying more sophisticated statistical modeling to problematic tables/columns.

One simple first step would be to run an ANALYZE whenever a sequential scan is executed. Is there a reason not to do this? It could be controlled by a GUC variable in case someone wants repeatable plans.

Further down the line, statistics could be collected during the execution of any query- updating histograms on delete and update, as well.

-M


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux