Re: Thinking About Correlated Columns (again)

Andrew Dunstan <andrew@xxxxxxxxxxxx> · Wed, 15 May 2013 12:39:46 -0400

On 05/15/2013 12:23 PM, Craig James wrote:
On Wed, May 15, 2013 at 8:31 AM, Shaun Thomas 
<sthomas@xxxxxxxxxxxxxxxx <mailto:sthomas@xxxxxxxxxxxxxxxx>> wrote:

    [Inefficient plans for correlated columns] has been a pain point
    for quite a while. While we've had several discussions in the
    area, it always seems to just kinda trail off and eventually
    vanish every time it comes up.

    ...
    Since there really is no fix for this aside from completely
    rewriting the query or horribly misusing CTEs (to force sequence
    scans instead of bloated nested loops, for example)...
    I'm worried that without an easy answer for cases like this, more
    DBAs will abuse optimization fences to get what they want and
    we'll end up in a scenario that's actually worse than query hints.
    Theoretically, query hints can be deprecated or have GUCs to
    remove their influence, but CTEs are forever, or until the next
    code refactor.

    I've seen conversations on this since at least 2005. There were
    even proposed patches every once in a while, but never any
    consensus. Anyone care to comment?

It's a very hard problem.  There's no way you can keep statistics 
about all possible correlations since the number of possibilities is 
O(N^2) with the number of columns.

I don't see why we couldn't allow the DBA to specify some small subset 
of the combinations of columns for which correlation stats would be 
needed. I suspect in most cases no more than a handful for any given 
table would be required.

cheers

andrew

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance