I'm having difficulty understanding what I perceive as an inconsistency in how the postgres parser chooses to use indices. We have a query based on NOT IN against an indexed column that the parser executes sequentially, but when we perform the same query as IN, it uses the index. I've created a simplistic example that I believe demonstrates the issue, notice this first query is sequential CREATE TABLE node ( id SERIAL PRIMARY KEY, vid INTEGER ); CREATE INDEX x ON node(vid); INSERT INTO node(vid) VALUES (1),(2); EXPLAIN ANALYZE SELECT * FROM node WHERE NOT vid IN (1); Seq Scan on node (cost=0.00..36.75 rows=2129 width=8) (actual time=0.009..0.010 rows=1 loops=1) Filter: (vid <> 1) Rows Removed by Filter: 1 Total runtime: 0.025 ms But if we invert the query to IN, you'll notice that it now decided to use the index EXPLAIN ANALYZE SELECT * FROM node WHERE vid IN (2); Bitmap Heap Scan on node (cost=4.34..15.01 rows=11 width=8) (actual time=0.017..0.017 rows=1 loops=1) Recheck Cond: (vid = 1) -> Bitmap Index Scan on x (cost=0.00..4.33 rows=11 width=0) (actual time=0.012..0.012 rows=1 loops=1) Index Cond: (vid = 1) Total runtime: 0.039 ms Can anyone shed any light on this? Specifically, is there a way to re-write out NOT IN to work with the index (when obviously the result set is not as simplistic as just 1 or 2). We are using Postgres 9.2 on CentOS 6.6 |