Re: Yet another abort-early plan disaster on 9.3

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 01/10/14 05:54, Jeff Janes wrote:
On Mon, Sep 29, 2014 at 7:12 PM, Gavin Flower <GavinFlower@xxxxxxxxxxxxxxxxx> wrote:

Would it be feasible to get a competent statistician to advise what data to collect, and to analyze it?  Maybe it is possible to get a better estimate on how much of a table needs to be scanned, based on some fairly simple statistics.  But unless research is done, it is probably impossible to determine what statistics might be useful, and how effective a better estimate could be.
I have a nasty feeling that assuming a uniform distribution, may still end up being the best we can do - but I maybe being unduly pessimistic!.

As a semi-competent statistician, my gut feeling is that our best bet would be not to rely on the competence of statisticians for too much, and instead try to give the executor the ability to abandon a fruitless path and pick a different plan instead. Of course this option is foreclosed once a tuple is returned to the client (unless the ctid is also cached, so we can make sure not to send it again on the new plan).

I think that the exponential explosion of possibilities is going to be too great to analyze in any rigorous way.

Cheers,

Jeff
Many moons ago, I passed several 300 level statistics papers.

I looked at this problem and found it was too hard to even properly characterise the problem (looks 'simple' - if you don't look too closely), and ended up feeling it was definitely 'way above my pay grade'!  :-)

It might be possible to tackle it more pragmatically, instead of trying to be all analytic and rigorously list all the possible influences, have a look at queries of this nature that are taking far too long.  Then get a feel for combinations of issues involved and how they contribute.  If you have enough data, you might be able to use something like Principle Component Analysis (I was fortunate to meet a scientist who had got heavily into this area of statistics).  Such an approach might yield valuable insights, even if the problem is not fully characterised, let alone 'solved'.


Cheers,
Gavin

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux