Martijn van Oosterhout wrote:
That's because they're not equivalent. IN/NOT IN have special semantics w.r.t. NULLs that make them a bit more difficult to optimise. OUTER JOINs on the other hand is easier since in a join condition anything = NULL evaluates to NULL -> FALSE.
Which is why Hash IN Joins were added, presumably. But there's nothing analogous for NOT IN, I guess, perhaps there can't be.
I think there's been some discussion about teaching the planner about columns that cannot be NULL (like primary keys) thus allowing it to perform this transformation safely. I don't know if anyone has done it though...
Yeah, I've noticed cases where I've thought "Ah, the planner doesn't know that column can't be null". Similarly, it has seemed to me that knowing that a column was UNIQUE could have made for a better plan, although I can't think of any examples off-hand. Maybe where I saw it using a Hash aggregate on a unique column, and I thought it could just use the index, although that may not make sense either.
- John D. Burger MITRE