Re: Thinking About Correlated Columns (again)

Gavin Flower <GavinFlower@xxxxxxxxxxxxxxxxx> · Thu, 16 May 2013 08:22:33 +1200



    On 16/05/13 03:52, Heikki Linnakangas
      wrote:

    
    On
      15.05.2013 18:31, Shaun Thomas wrote:
      

      I've seen conversations on this since at
        least 2005. There were even
        

        proposed patches every once in a while, but never any consensus.
        Anyone
        

        care to comment?
        

      Well, as you said, there has never been any consensus.
      

      There are basically two pieces to the puzzle:
      

      1. What metric do you use to represent correlation between
      columns?
      

      2. How do use collect that statistic?
      

      Based on the prior discussions, collecting the stats seems to be
      tricky. It's not clear for which combinations of columns it should
      be collected (all possible combinations? That explodes
      quickly...), or how it can be collected without scanning the whole
      table.
      

      I think it would be pretty straightforward to use such a
      statistic, once we have it. So perhaps we should get started by
      allowing the DBA to set a correlation metric manually, and use
      that in the planner.
      

      - Heikki
      

    How about pg comparing actual numbers
        of rows delivered with the predicted number - and
    if a specified threshold is reached, then maintaining statistics?
    There is obviously more to it, such as: is this a relevant query to
    consider & the size of the tables (no point in attempting to
    optimise tables with only 10 rows for example).

    
    Cheers,

    Gavin