I'm getting some unexpected results with a NOT IN query -- this is on 8.2.5.
This is the query in question:
prod_2=> select id from feed_download_task where id in (111102466,141701504) and id not in (select last_feed_download_task_id from subscription);
id
----
(0 rows)
This query returns zero rows, but I expect it to return 1 row, because I know that 111102466 *is not* in (select last_feed_download_task_id from subscription) and I know that 1471701504 *is* in that set, as demonstrated below:
Verify that both id values are in the feed_download_task table:
prod_2=> select id from feed_download_task where id in (111102466,141701504);
id
-----------
141701504
111102466
(2 rows)
Verify that 111102466 is NOT in the set of last_feed_download_task_id's, and that 141701504 is in this set:
prod_2=> select last_feed_download_task_id from subscription where last_feed_download_task_id in (111102466,141701504);
last_feed_download_task_id
----------------------------
141701504
(1 row)
Here's the problem query again, with explain analyze.
prod_2=> select id from feed_download_task where id in (111102466,141701504) and id not in (select last_feed_download_task_id from subscription);
id
----
(0 rows)
prod_2=> explain analyze select id from feed_download_task where id in (111102466,141701504) and id not in (select last_feed_download_task_id from subscription);
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on feed_download_task (cost=45077.24..45083.27 rows=1 width=4) (actual time=601.229..601.229 rows=0 loops=1)
Recheck Cond: (id = ANY ('{111102466,141701504}'::integer[]))
Filter: (NOT (hashed subplan))
-> Bitmap Index Scan on feed_download_task_pkey (cost=0.00..30.52 rows=2 width=0) (actual time=0.095..0.095 rows=2 loops=1)
Index Cond: (id = ANY ('{111102466,141701504}'::integer[]))
SubPlan
-> Seq Scan on subscription (cost=0.00..44097.78 rows=379578 width=4) (actual time=0.032..488.193 rows=162365 loops=1)
Total runtime: 601.281 ms
(8 rows)
I've tried re-analyzing and re-indexing the tables involved in this query, but I still left scratching my head.
I am also aware that I can use a left join instead of a NOT IN query -- but in this case I need to use this in a DELETE statement, which eliminates the possibility of the left join (I think).
Here is a version using a left outer join, it returns the expected result:
prod_2=# select feed_download_task.id from feed_download_task left join subscription on (subscription.last_feed_download_task_id = feed_download_task.id) where feed_download_task.id in (111102466,141701504) and subscription.id IS NOT NULL;
id
-----------
141701504
(1 row)
Here is the explain analyze output for the above query:
prod_2=# explain analyze select feed_download_task.id from feed_download_task left join subscription on (subscription.last_feed_download_task_id = feed_download_task.id) where feed_download_task.id in (111102466,141701504) and subscription.id IS NOT NULL;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=31.19..51.69 rows=1 width=4) (actual time=0.158..0.210 rows=1 loops=1)
-> Bitmap Heap Scan on feed_download_task (cost=31.19..37.21 rows=2 width=4) (actual time=0.120..0.134 rows=2 loops=1)
Recheck Cond: (id = ANY ('{111102466,141701504}'::integer[]))
-> Bitmap Index Scan on feed_download_task_pkey (cost=0.00..31.19 rows=2 width=0) (actual time=0.102..0.102 rows=2 loops=1)
Index Cond: (id = ANY ('{111102466,141701504}'::integer[]))
-> Index Scan using index_subscription_on_last_feed_download_task_id on subscription (cost=0.00..7.23 rows=1 width=4) (actual time=0.036..0.037 rows=0 loops=2)
Index Cond: (subscription.last_feed_download_task_id = feed_download_task.id)
Filter: (id IS NOT NULL)
I feel like I must be missing something obvious.
Thanks in advance for the assistance.
cheers,
Mason
---------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on feed_download_task (cost=45077.24..45083.27 rows=1 width=4) (actual time=601.229..601.229 rows=0 loops=1)
Recheck Cond: (id = ANY ('{111102466,141701504}'::integer[]))
Filter: (NOT (hashed subplan))
-> Bitmap Index Scan on feed_download_task_pkey (cost=0.00..30.52 rows=2 width=0) (actual time=0.095..0.095 rows=2 loops=1)
Index Cond: (id = ANY ('{111102466,141701504}'::integer[]))
SubPlan
-> Seq Scan on subscription (cost=0.00..44097.78 rows=379578 width=4) (actual time=0.032..488.193 rows=162365 loops=1)
Total runtime: 601.281 ms
(8 rows)
I've tried re-analyzing and re-indexing the tables involved in this query, but I still left scratching my head.
I am also aware that I can use a left join instead of a NOT IN query -- but in this case I need to use this in a DELETE statement, which eliminates the possibility of the left join (I think).
Here is a version using a left outer join, it returns the expected result:
prod_2=# select feed_download_task.id from feed_download_task left join subscription on (subscription.last_feed_download_task_id = feed_download_task.id) where feed_download_task.id in (111102466,141701504) and subscription.id IS NOT NULL;
id
-----------
141701504
(1 row)
Here is the explain analyze output for the above query:
prod_2=# explain analyze select feed_download_task.id from feed_download_task left join subscription on (subscription.last_feed_download_task_id = feed_download_task.id) where feed_download_task.id in (111102466,141701504) and subscription.id IS NOT NULL;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=31.19..51.69 rows=1 width=4) (actual time=0.158..0.210 rows=1 loops=1)
-> Bitmap Heap Scan on feed_download_task (cost=31.19..37.21 rows=2 width=4) (actual time=0.120..0.134 rows=2 loops=1)
Recheck Cond: (id = ANY ('{111102466,141701504}'::integer[]))
-> Bitmap Index Scan on feed_download_task_pkey (cost=0.00..31.19 rows=2 width=0) (actual time=0.102..0.102 rows=2 loops=1)
Index Cond: (id = ANY ('{111102466,141701504}'::integer[]))
-> Index Scan using index_subscription_on_last_feed_download_task_id on subscription (cost=0.00..7.23 rows=1 width=4) (actual time=0.036..0.037 rows=0 loops=2)
Index Cond: (subscription.last_feed_download_task_id = feed_download_task.id)
Filter: (id IS NOT NULL)
I feel like I must be missing something obvious.
Thanks in advance for the assistance.
cheers,
Mason