Re: NOT IN vs. NOT EXISTS performance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 9 November 2018 at 08:35, Lincoln Swaine-Moore
<lswainemoore@xxxxxxxxx> wrote:
> My primary question is: why is this approach only possible (for data too
> large for memory) when using NOT EXISTS, and not when using NOT IN?
>
> I understand that there is a slight difference in the meaning of the two
> expressions, in that NOT IN will produce NULL if there are any NULL values
> in the right hand side (in this case there are none, and the queries should
> return the same COUNT). But if anything, I would expect that to improve
> performance of the NOT IN operation, since a single pass through that data
> should reveal if there are any NULL values, at which point that information
> could be used to short-circuit. So I am a bit baffled.

The problem is that the planner makes the plan and would have to know
beforehand that no NULLs could exist on either side of the join. For
more simple cases it could make use of NOT NULL constaints, but more
complex cases exist, such as:

SELECT * FROM t1 LEFT JOIN t2 ON t1.x = t2.y WHERE t2.y NOT IN(SELECT
z FROM t3);

There's a bit more reading about the complexity of this in [1]

[1] https://www.postgresql.org/message-id/flat/CAMkU%3D1zPVbez_HWao781L8PzFk%2Bd1J8VaJuhyjUHaRifk6OcUA%40mail.gmail.com#7c6d3178c18103d8508f3ec5982b1b8e

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services




[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux