Re: NOT IN vs. NOT EXISTS performance

Lincoln Swaine-Moore <lswainemoore@xxxxxxxxx> · Fri, 9 Nov 2018 19:06:15 -0500

Thanks, both! 
That's a very interesting thread. I was confident this was a subject that had been discussed--just wasn't sure where--so thank you for forwarding.

I guess the big-picture summary is that NOT IN's definition introduces complexity (the nature of which I now understand better) that is usually unwarranted by the question the querier is asking. So NOT EXISTS will almost always be preferable when a subquery is involved, unless the behavior around NULL values is specifically desired.

On Fri, Nov 9, 2018 at 8:45 AM Merlin Moncure <mmoncure@xxxxxxxxx> wrote:
On Thu, Nov 8, 2018 at 3:12 PM David Rowley

<david.rowley@xxxxxxxxxxxxxxx> wrote:

>

> On 9 November 2018 at 08:35, Lincoln Swaine-Moore

> <lswainemoore@xxxxxxxxx> wrote:

> > My primary question is: why is this approach only possible (for data too

> > large for memory) when using NOT EXISTS, and not when using NOT IN?

> >

> > I understand that there is a slight difference in the meaning of the two

> > expressions, in that NOT IN will produce NULL if there are any NULL values

> > in the right hand side (in this case there are none, and the queries should

> > return the same COUNT). But if anything, I would expect that to improve

> > performance of the NOT IN operation, since a single pass through that data

> > should reveal if there are any NULL values, at which point that information

> > could be used to short-circuit. So I am a bit baffled.

>

> The problem is that the planner makes the plan and would have to know

> beforehand that no NULLs could exist on either side of the join.

Yeah, the core issue is the SQL rules that define NOT IN behaves as:

postgres=# select 1 not in (select 2);

 ?column?

──────────

 t

(1 row)

postgres=# select 1 not in (select 2 union all select null);

 ?column?

──────────

(1 row)

There's a certain logic to it but it's a death sentence for performance.

merlin

-- 
Lincoln Swaine-Moore