Re: totally different plan when using partitions

Richard Huxton <dev@xxxxxxxxxxxx> · Thu, 13 Aug 2009 10:22:24 +0100

Scara Maccai wrote:
Thank you for your reply. This makes partitions unusable for me...
hope someone explains why this happens... this still looks like a bug
to me... BTW the problem arises when adding the second "left outer
join": when using only 1 partitioned table (that is, only 1 "left
outer join") the 2 plans are identical.

My answers follow.

Well, the first version was expecting about 400 rows, the second
15000, so it's not surprising that they have different plans.

Well, they're using exactly the same tables; I don't understand why
one expects 400 rows and the other 15000....

Well let's see. It's difficult to be certain without the table 
definitions and sizes, but...

Both sequentially scan "cell_bsc_60_0610" because it will return < 300 
rows. Seems sensible, and it gets the row estimate right.

The fast one then uses the index "teststscell73_0610_1_pkey" to probe 
for matches and then again via "teststscell13_0610_1_pkey". Two nested 
loops only make sense where you have a small number of rows matching, 
otherwise it can get expensive going back and fore to the index and 
table all the time. In this case, we end up matching more rows than we 
expected (rows=60 loops=285 gives us ~17100 rows to check against the 
second index) but not enough to slow us down.

That first plan estimated a cost of 33391 for its 408 rows and the 
second 70402 for 15982 rows. That's double the estimated cost, but it 
takes four times longer to complete. If the first query had estimated 
the number of rows correctly it would have *looked* more expensive than 
the second. So - with your particular setup PostgreSQL thinks it takes 
longer to do index searches than it really does (or perhaps it thinks 
sorting is quicker, or both).

You might want to adjust your cost estimates (see ch 18.6 of the 
manuals). Start by checking effective_cache_size. Then maybe reduce 
random_page_cost. Small steps though - these settings will affect all 
your other queries too.

Also if you have the time, try issuing "set enable_mergejoin=off" and 
re-issue query #2. That will force it to choose another plan.

Oh - if the partitions hold historical (unchanging) data it might be 
worth CLUSTERing them on the pkey index too.

I'm not sure whether the planner is smart enough to cope with the
multiple tests on time vs the partitioning and realise it can use
your index on the partition.

Sorry, didn't understand that...

The partitioning relies on proving that only partition P needs to be 
considered. The more complicated the situation the harder it is to do 
that. In this case, I'm not sure that's whats happening though - it 
seems to narrow the partition down well enough.

I'm assuming the partition is defined as being "between '2006-10-01
00:00:00' and '2006-10-06 00:00:00'" - strikes me as an odd
interval, and you don't say anywhere.

Data is partitioned on a "4 tables per month" basis

But that's from the start of the 1st to the start of the 6th - five full
days rather than 7+. It also *includes* the start of the 6th (<= not <) 
which suggests overlap on the partitions. What does the definition of 
your partition say *exactly*?

--
  Richard Huxton
  Archonet Ltd

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general