On 2012-10-09 23:09:21 Tom Lane wrote: > > re subject Why am I getting great/terrible estimates with these CTE queries? > You're assuming the case where the estimate is better is better for a > reason ... but it's only better as a result of blind dumb luck. The > outer-level query planner doesn't know anything about the CTE's output > except the estimated number of rows --- in particular, it doesn't drill > down to find any statistics about the join column. > I am also struggling with a problem involving CTEs although in my case it is caused by huge *under*-estimation of cardinality rather then *over*-estimation. The statement is quite complex and the problem arises because there is a chain of RECURSIVE CTEs each defined as a query involving an earlier CTE and more tables. Eventually there is no hope for making a good cardinality estimate. One CTE in particular has a cardinality estimate of 1 (I guess the actual estimate is nearer zero and rounded up) but actual count is over 100000. The planner puts this CTE as inner of a nested loop accessed by simple linear CTE scan and the full query then takes over 20 minutes. -> Nested Loop (cost=0.00..0.06 rows=1 width=588) (actual time=2340.421..1201593.856 rows=105984 loops=1) Join Filter: ((winnum.subnet_id = binoptasc.subnet_id) AND (winnum.option_code = binoptasc.option_code) AND ((winnum.option_discriminator)::text = (binoptasc.option_discriminator)::text) AND (winnum.net_rel_level = binoptasc.net_rel_level)) Rows Removed by Join Filter: 7001612448 Buffers: shared hit=2290941 -> CTE Scan on winning_option_nums winnum (cost=0.00..0.02 rows=1 width=536) (actual time=2338.422..2543.684 rows=62904 loops=1) Buffers: shared hit=2290941 -> CTE Scan on subnet_inhrt_options_asc binoptasc (cost=0.00..0.02 rows=1 width=584) (actual time=0.000..9.728 rows=111308 loops=62904) Whereas, (by altering various statistics to be very wrong) the entire query runs in 21 seconds. There have been several debates about how to address situations like this where no practical non-query-specific statistics-gathering scheme can ever hope to gather enough statistics to model the later derived tables. E.g. the frowned-on SELECTIVITY clause and ideas for query-specific statistics. Meanwhile, I have one other suggestion aimed specifically at problematic CTEs: Would it be reasonable to provide a new Planner Configuration option : enable_nestloop_cte_inner (boolean) Enables or disables the query planner's use of nested-loop join plans in which a CTE is the inner. It is impossible to suppress such nested-loop joins entirely, but turning this variable off discourages the planner from using one if there are other methods available, such as sorting the CTE for merge-join or hashing it for hash-join. The default is on. John -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance