Sim Zacks <sim@xxxxxxxxxxxxxx> writes: > I just upgraded my database server from 8.0.1 to 8.2.4 > Most things went very well, but I have a couple of queries that really slowed down with the new server. The core of the problem seems to be the rowcount misestimation here: > -> Merge Left Join (cost=5816.85..5948.10 rows=1245 width=82) (actual time=1169.837..1220.895 rows=3685 loops=1) > Merge Cond: (("outer".batchid = "inner".refid) AND ("outer".partid = "inner".partid)) > Filter: (COALESCE("inner".commited, false) = false) vs in 8.2 > -> Merge Left Join (cost=3984.15..4111.60 rows=1 width=28) (actual time=319.642..348.285 rows=3685 loops=1) > Merge Cond: ((d.batchid = f.refid) AND (e.partid = f.partid)) > Filter: (NOT COALESCE(f.commited, false)) The single-row estimate causes it to go for nestloops at all the higher join levels, and when the actual result size is 3685 rows, of course it takes 3685 times longer than the planner expected :-( I assume that the original query is something along the lines of d left join f on (...) where coalesce(f.commited, false) = false I traced through what would happen here, and found that: * 8.2 changes the "boolvar = false" clause to "NOT boolvar", because it wants to be able to recognize these equivalent forms as equivalent. 8.0 just leaves it as-is. * 8.0 can't figure out anything about a COALESCE, so it uses the fallback DEFAULT_EQ_SEL (0.005) selectivity estimate for what it sees as an equality clause. This is apparently close enough to be within a factor of 3 of reality. * 8.2 sees a NOT clause, which clause_selectivity() figures has a selectivity of 1.0 minus the selectivity of the argument, which is a COALESCE clause, which the recursive call to clause_selectivity() doesn't know anything about and so punts ... returning 1.0 which is its default for unknown clause types. So we end up with a selectivity estimate of exactly 0.0, pinning the estimated join size to the minimum of 1 row. The default 1.0 selectivity estimate seems fairly silly; in other cases where the planner really has no idea about the expected value of a boolean expression, we use 0.5. On studying the CVS history, it looks like I inserted that in revision 1.24 of clausesel.c, and I think that I must have misinterpreted what was happening in the previous state of the code: static Cost compute_selec(Query *root, List *clauses, List *or_selectivities) { Cost s1 = 0; List *clause = lfirst(clauses); if (clause == NULL) s1 = 1.0; else if (IsA(clause, Param)) { /* XXX How're we handling this before?? -ay */ s1 = 1.0; } else if (IsA(clause, Const)) s1 = ((bool) ((Const *) clause)->constvalue) ? 1.0 : 0.0; else if (IsA(clause, Var)) ... The reason for the first bit is that a NIL list of clauses means no WHERE clause, so 1.0 selectivity is actually right in that case. The 1.0 default for a Param is pretty silly (possibly AY made the same mistake as me at some earlier point). If you look further down in the routine you see 0.5 being used in default cases, but if you don't study the whole thing you might well think it was intending 1.0 as default. So I think we should change the default to 0.5 for 8.3, but seeing that it's been this way for 8 years (!) I'm a bit hesitant to back-patch. Comments anyone? In the meantime, Sim would probably have better luck if he restructured this particular clause in some other way, say where f.commited is not true or where f.commited = false or f.commited is null Note also that he really ought to move up to 8.2.6, as 8.2.4 is not very sane about what IS NULL means for a left join's result. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend