Re: issue with query optimizer when joining two partitioned tables

Tom Lane <tgl@xxxxxxxxxxxxx> · Sat, 09 Jul 2011 13:43:50 -0400

Heikki Linnakangas <heikki.linnakangas@xxxxxxxxxxxxxxxx> writes:
> On 09.07.2011 00:36, Anish Kejariwal wrote:
>> My guess as to what happened:
>> -because the icecream parent table has zero records, the query optimizer
>> chooses the incorrect execution plan
>> -when I do select * from icecream, the optimizer now knows how many records
>> are really in the icecream table, by knowing that the icecream table has
>> partitions.

> "select * from icecream" won't have any direct effect on the 
> optimization of subsequent queries. What probably happened is that 
> autoanalyze ran in the background while you ran that select, and 
> analyzed some of the partitions. Simply waiting a while would've had the 
> same effect.

Yeah.  Also, the reason that a manual vacuum on icecream changes things
yet again is that in 9.0 and up, we have a notion of summary stats
across the whole inheritance tree, but autoanalyze hasn't been taught to
gather those.  The manual command on the parent table does gather them,
though.

So what's happening here is that we suddenly have an accurate idea of
the size of the join product as a result of having inheritance summary
stats to estimate with, and that drives the estimated cost of the merge
or hash join down out of the stratosphere.  The estimated cost of the
nestloop goes down a lot too, but not as much.

I experimented with a similar case here, and it seems like a lot of the
remaining error in the nestloop estimate comes from this:

>> ->   Seq Scan on icecream rc  (cost=0.00..23.00 rows=5 width=24) (actual time=0.002..0.002 rows=0 loops=1)
>>        Filter: (dataset_id = 281)

The indexscan on the nonempty child partition is estimated at less than
10 cost units, so this is a *large* fraction of what the planner sees as
the per-outer-row cost of a nestloop.  And with more than 11000 rows on
the other side of the join, that discourages it from using the nestloop.
In reality of course this takes negligible time compared to examining
the child partition.

Now why is the seqscan cost estimate so large, when actually the parent
icecream table is totally empty?  It's because the planner has been
taught to never believe that an empty table is empty.  If memory serves,
it's really estimating on an assumption that the table contains 10 pages
and some corresponding number of rows.  This is a reasonable defensive
posture when dealing with ordinary tables, I think, since most likely
if the catalogs say the table is empty that's just a leftover from when
it was created.  But maybe we should reconsider the heuristic for tables
that are members of inheritance trees --- particularly parents of
inheritance trees.

I was able to defeat the empty-table heuristic here by doing

update pg_class set relpages = 1 where relname = 'icecream';

and then I started getting much more realistic estimates in my test
case.  (It still wanted to use a merge join initially, but after
knocking down random_page_cost it went to the nestloop.)  It would
be interesting to see what sorts of results Anish gets with that.

			regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance