slow query plans caused by under-estimation of CTE cardinality

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2012-10-09 23:09:21
Tom Lane wrote:
 > 


> re subject Why am I getting great/terrible estimates with these CTE queries?
 > You're assuming the case where the estimate is better is better for a
> reason ... but it's only better as a result of blind dumb luck.  The
> outer-level query planner doesn't know anything about the CTE's output
> except the estimated number of rows --- in particular, it doesn't drill
> down to find any statistics about the join column. 
> 

I am also struggling with a problem involving CTEs although in my case
it is caused by huge *under*-estimation of cardinality rather then *over*-estimation.
The statement is quite complex and the problem arises because there is a chain of
RECURSIVE CTEs each defined as a query involving an earlier CTE and more tables.
Eventually there is no hope for making a good cardinality estimate.

One CTE in particular has a cardinality estimate of 1  (I guess the actual
estimate is nearer zero and rounded up) but actual count is over 100000.
The planner puts this CTE as inner of a nested loop accessed by simple linear CTE scan
and the full query then takes over 20 minutes.

   ->  Nested Loop  (cost=0.00..0.06 rows=1 width=588) (actual time=2340.421..1201593.856 rows=105984 loops=1)
          Join Filter: ((winnum.subnet_id = binoptasc.subnet_id) AND (winnum.option_code = binoptasc.option_code) AND ((winnum.option_discriminator)::text = (binoptasc.option_discriminator)::text) AND (winnum.net_rel_level = binoptasc.net_rel_level))
          Rows Removed by Join Filter: 7001612448
          Buffers: shared hit=2290941
          ->  CTE Scan on winning_option_nums winnum  (cost=0.00..0.02 rows=1 width=536) (actual time=2338.422..2543.684 rows=62904 loops=1)
                Buffers: shared hit=2290941
          ->  CTE Scan on subnet_inhrt_options_asc binoptasc  (cost=0.00..0.02 rows=1 width=584) (actual time=0.000..9.728 rows=111308 loops=62904)

Whereas,  (by altering various statistics to be very wrong) the entire query runs in 21 seconds.

There have been several debates about how to address situations like this where
no practical non-query-specific statistics-gathering scheme can ever hope to
gather enough statistics to model the later derived tables.     E.g. the frowned-on
SELECTIVITY clause and ideas for query-specific statistics.

Meanwhile,   I have one other suggestion aimed specifically at problematic CTEs:
Would it be reasonable to provide a new Planner Configuration option  :

  enable_nestloop_cte_inner (boolean)
  Enables or disables the query planner's use of nested-loop join plans in which a CTE is the inner.
  It is impossible to suppress such nested-loop joins entirely,
  but turning this variable off discourages the planner from using one
  if there are other methods available,  such as sorting the CTE for merge-join
  or hashing it for hash-join.
  The default is on.

John




 		 	   		  

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux