I created such table (similar to example from http://use-the-index-luke.com/sql/example-schema/postgresql/performance-testing-scalability )
CREATE TABLE scale_data (
section NUMERIC NOT NULL,
id1 NUMERIC NOT NULL, -- unique values simulating ID or Timestamp
id2 NUMERIC NOT NULL -- a kind of Type
);
Populate it with:
INSERT INTO scale_data
SELECT sections.sections, sections.sections*10000 + gen.gen
, CEIL(RANDOM()*100)
FROM GENERATE_SERIES(1, 300) sections,
GENERATE_SERIES(1, 90000) gen
WHERE gen <= sections * 300;
It generated 13545000 records.
Composite index on it:
CREATE INDEX id1_id2_idx
ON public.scale_data
USING btree
(id1, id2);
And select#1:
select id2 from scale_data
where id2 in (50)
order by id1 desc
limit 500
Explain analyze:
"Limit (cost=0.56..1177.67 rows=500 width=11) (actual time=0.046..5.124 rows=500 loops=1)"
" -> Index Only Scan Backward using id1_id2_idx on scale_data (cost=0.56..311588.74 rows=132353 width=11) (actual time=0.045..5.060 rows=500 loops=1)"
" Index Cond: (id2 = '50'::numeric)"
" Heap Fetches: 0"
"Planning time: 0.103 ms"
"Execution time: 5.177 ms"
Select#2 --more values in IN - plan has changed
select id2 from scale_data
where id2 in (50, 52)
order by id1 desc
limit 500
Explain analyze#2:
"Limit (cost=0.56..857.20 rows=500 width=11) (actual time=0.061..8.703 rows=500 loops=1)"
" -> Index Only Scan Backward using id1_id2_idx on scale_data (cost=0.56..445780.74 rows=260190 width=11) (actual time=0.059..8.648 rows=500 loops=1)"
" Filter: (id2 = ANY ('{50,52}'::numeric[]))"
" Rows Removed by Filter: 25030"
" Heap Fetches: 0"
"Planning time: 0.153 ms"
"Execution time: 8.771 ms"
Why plan differs? Why in #1 it does show like Index condition, but in #2 Filter and number of index scanned cells. Doesn't sql#1 traverse index in the same way like explain for sql#2 shows?
On real/production DB #2 works much slower, even if search by 2 keys separately is fast
PG 9.5, CentOS 6.7