Re: Weird plan variation with recursive CTEs

Tom Lane <tgl@xxxxxxxxxxxxx> · Thu, 26 Apr 2012 16:58:20 -0400

Claudio Freire <klaussfreire@xxxxxxxxx> writes:
> On Thu, Apr 26, 2012 at 2:37 PM, Claudio Freire <klaussfreire@xxxxxxxxx> wrote:
>> Fun thing is, nothing in the CTE's execution really changed. The only
>> change, is that now a sequential scan of overview was chosen instead
>> of the index.
>> Why could this be? The output (number of search values, even the
>> values themselves and their order) is the same between both plans.

The estimated size of the UNION output is a lot different, thus
discouraging use of a nestloop for the outer query's join.

> I just noticed it's misestimating the output of the union distinct,
> but not of the select distinct.

> One would expect the estimation procedure to be the same in both cases.

No, I don't think that follows.  The UNION code is aware that people
frequently write UNION rather than UNION ALL even when they're not
expecting any duplicates, so it shies away from assuming that UNION
will reduce the number of rows at all.  On the other hand, it's not
at all common to write SELECT DISTINCT unless you actually expect
some duplicate removal to happen, so the default assumption in the
absence of any stats is different --- looks like it's assuming 10X
compression by the DISTINCT operation.

The real issue here of course is that we have no useful idea how many
rows will be produced by the recursive-union CTE, let alone what their
statistics will be like.  So the planner is falling back on rules of
thumb that are pretty much guaranteed to be wrong in any particular
case.

			regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance