On 1/7/11 1:29 AM, "??????? ????????" <lebiathan@xxxxxxxxx> wrote: > >So my follow-up question on the subject is this: > >Are there any particular semantics for the "NOT IN" statement that cause >the correlated query to execute for every row of the outter query, as >opposed to the "NOT EXISTS" ? => select * from explode_array(ARRAY[1,2,3,4,5]) where explode_array not in (0, 1, 2); explode_array --------------- 3 4 5 => select * from explode_array(ARRAY[1,2,3,4,5]) where explode_array not in (0, 1, 2, null); explode_array --------------- (0 rows) The existence of a single NULL in the "not in" segment causes no results to be returned. Postgres isn't smart enough to analyze whether the contents of the NOT IN() can contain null and choose a more optimal plan, and so it always scans ALL rows. Even if the NOT IN() is on a not null primary key. NOT IN is generally dangerous because of this behavior -- it results from the fact that '1 = null' is null, and 'not null' is equal to 'null': => select (1 = 1); ?column? ---------- t select NOT (1 = 1); ?column? ---------- f => select (1 = null); ?column? ---------- (1 row) => select NOT (1 = null); ?column? ---------- (1 row) NOT EXISTS doesn't have this problem, since NOT EXISTS essentially treats the existence of null as false, where NOT IN treats the existence of null as true. rr=> select * from (select * from explode_array(ARRAY[1,2,3,4,5])) foo where not exists (select 1 where explode_array in (0, 1, 2, null)); explode_array --------------- 3 4 5 (3 rows) Often, the best query plans result from 'LEFT JOIN WHERE right side is NULL' rather than NOT EXISTS however. I often get performance gains by switching NOT EXISTS queries to LEFT JOIN form. Though sometimes it is less performant. -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance