Hi all,
I'm debugging a performance issue that looks like it might actually be an issue/limitation/parameter/bug in the query planner, but since I couldn't find anything authoritative on when exactly postgresql is able to use partial not null indexes I'm not sure that that's the case and I was hopping someone could give some clarity around that or point to an option I could tweak that would change this behavior. Anyways the table in question (with names changed) is below. I'm running postgres 8.4.1
\d
Column | Type | Modifiers
--------------------------+----------+-----------
id | integer | not null
sid | integer |
bid | integer |
m | date | not null
k | integer |
cc | text |
f | integer |
d | smallint |
u | smallint |
f2 | integer |
cm | text |
Indexes:
"scm_pkey" PRIMARY KEY, btree (id) WITH (fillfactor=100)
"index_scm_on_bid" btree (bid) WITH (fillfactor=100) WHERE bid IS NOT NULL
~35 million rows (about 15 million of which have null bid). There are about 1 million distinct bids (with selectivity ranging from 1 to ~100,000 rows).
The end user selects an arbitrary number of bid's then we run several queries one of which I include explain analyze output below. For <= 100 bids the planner uses the index and completes in ~35ms, for 101+ bid's the planner uses a sequence scan and completes in ~45 seconds (3 orders of magnitude slower). My first thought was that there was a problem with the statistics/estimation in the planner, but using "set enable seq_scan=off;" still does not use the index when there's over 100 bid's in the IN clause. Breaking the IN clause into 2 < 100 element groups does however rescue the use of the index and the fast performance as does creating a new non-partial index on bid (i.e. an index "index_scm_on_bid2" btree (bid) WITH (fillfactor=100) will be used with over 100 bid's). My best guess is that this is do to some limit on the number of in clause elements the query planner will check to see if match a partial index before giving up and assuming it doesn't (if there is such a limit I'd definitely like to make it larger, at least for this big of a table...). Is this 100 limit documented anywhere?
Anyways my workarounds are to either split the IN clause into multiple < 100 element groups or switch the index to be across all values rather then not null. The reason I tend to use not null partial indexes is that I have another similarly sized table with 20 separately indexed columns each of which only has about 10000 non-null rows and the disk space savings from the not null partial index there are huge.
# <= 100 bids
=>explain analyze SELECT * FROM "scm" WHERE ((bid in (1000071082,1558141,1261493,1558137,1558166,1622957,1261535,1558191,1885437,2025548,1558144,1261485,1261536,1261539,1261541,1000067964,1558183,1789348,1000090512,1558150,1000096731,1261533,2056107,1875527,1177541,1535923,1558184,1558155,1261538,1261537,1558140,1159311,1558188,1558185,1261529,1558158,1000021460,1558517,1000090515,1558194,1558143,1558153,1261484,1261542,1558156,1201225,1261481,1558157,1891458,1200735,1285621,1702779,1558135,1261540,1579615,1558189,1558154,2053227,1261531,1261488,1558139,1261527,1558192,1261530,1261528,1159310,1558136,1558138,1558164,1261543,1000015605,2053214,1558187,1183258,1184576,1558145,1558159,1208646,1622955,1558161,1558160,1208046,1000060938,1000067963,1000067965,1261487,1828875,1541699,1261491,1210589,1558162,1558151,1558152,1558163,1181201,1186001,1197776,1200734,1208043,1208044))) order by m desc limit 100;
Limit (cost=66518.18..66518.43 rows=100 width=229) (actual time=24.665..24.821 rows=100 loops=1)
-> Sort (cost=66518.18..66563.14 rows=17987 width=229) (actual time=24.658..24.698 rows=100 loops=1)
Sort Key: m
Sort Method: top-N heapsort Memory: 85kB
-> Bitmap Heap Scan on scm (cost=566.83..65830.73 rows=17987 width=229) (actual time=1.863..12.850 rows=16430 loops=1)
Recheck Cond: (bid = ANY ('{1000071082,1558141,1261493,1558137,1558166,1622957,1261535,1558191,1885437,2025548,1558144,1261485,1261536,1261539,1261541,1000067964,1558183,1789348,1000090512,1558150,1000096731,1261533,2056107,1875527,1177541,1535923,1558184,1558155,1261538,1261537,1558140,1159311,1558188,1558185,1261529,1558158,1000021460,1558517,1000090515,1558194,1558143,1558153,1261484,1261542,1558156,1201225,1261481,1558157,1891458,1200735,1285621,1702779,1558135,1261540,1579615,1558189,1558154,2053227,1261531,1261488,1558139,1261527,1558192,1261530,1261528,1159310,1558136,1558138,1558164,1261543,1000015605,2053214,1558187,1183258,1184576,1558145,1558159,1208646,1622955,1558161,1558160,1208046,1000060938,1000067963,1000067965,1261487,1828875,1541699,1261491,1210589,1558162,1558151,1558152,1558163,1181201,1186001,1197776,1200734,1208043,1208044}'::integer[]))
-> Bitmap Index Scan on index_scm_on_bid (cost=0.00..562.33 rows=17987 width=0) (actual time=1.719..1.719 rows=16430 loops=1)
Index Cond: (bid = ANY ('{1000071082,1558141,1261493,1558137,1558166,1622957,1261535,1558191,1885437,2025548,1558144,1261485,1261536,1261539,1261541,1000067964,1558183,1789348,1000090512,1558150,1000096731,1261533,2056107,1875527,1177541,1535923,1558184,1558155,1261538,1261537,1558140,1159311,1558188,1558185,1261529,1558158,1000021460,1558517,1000090515,1558194,1558143,1558153,1261484,1261542,1558156,1201225,1261481,1558157,1891458,1200735,1285621,1702779,1558135,1261540,1579615,1558189,1558154,2053227,1261531,1261488,1558139,1261527,1558192,1261530,1261528,1159310,1558136,1558138,1558164,1261543,1000015605,2053214,1558187,1183258,1184576,1558145,1558159,1208646,1622955,1558161,1558160,1208046,1000060938,1000067963,1000067965,1261487,1828875,1541699,1261491,1210589,1558162,1558151,1558152,1558163,1181201,1186001,1197776,1200734,1208043,1208044}'::integer[]))
Total runtime: 24.908 ms
# > 100 bids
# NOTE - this is the same even after running
# set enable_seqscan = off;
# HOWEVER - this will use an index scan and run fast if we create a non partial index on bid
=>explain analyze SELECT * FROM "scm" WHERE ((bid in (1000071082,1558141,1261493,1558137,1558166,1622957,1261535,1558191,1885437,2025548,1558144,1261485,1261536,1261539,1261541,1000067964,1558183,1789348,1000090512,1558150,1000096731,1261533,2056107,1875527,1177541,1535923,1558184,1558155,1261538,1261537,1558140,1159311,1558188,1558185,1261529,1558158,1000021460,1558517,1000090515,1558194,1558143,1558153,1261484,1261542,1558156,1201225,1261481,1558157,1891458,1200735,1285621,1702779,1558135,1261540,1579615,1558189,1558154,2053227,1261531,1261488,1558139,1261527,1558192,1261530,1261528,1159310,1558136,1558138,1558164,1261543,1000015605,2053214,1558187,1183258,1184576,1558145,1558159,1208646,1622955,1558161,1558160,1208046,1000060938,1000067963,1000067965,1261487,1828875,1541699,1261491,1210589,1558162,1558151,1558152,1558163,1181201,1186001,1197776,1200734,1208043,1208044,1208045))) order by m desc limit 100;
Limit (cost=3783824.03..3783824.28 rows=100 width=229) (actual time=31229.140..31229.284 rows=100 loops=1)
-> Sort (cost=3783824.03..3783869.06 rows=18014 width=229) (actual time=31229.137..31229.193 rows=100 loops=1)
Sort Key: m
Sort Method: top-N heapsort Memory: 82kB
-> Seq Scan on scm (cost=0.00..3783135.55 rows=18014 width=229) (actual time=97.582..31217.270 rows=16433 loops=1)
Filter: (bid = ANY ('{1000071082,1558141,1261493,1558137,1558166,1622957,1261535,1558191,1885437,2025548,1558144,1261485,1261536,1261539,1261541,1000067964,1558183,1789348,1000090512,1558150,1000096731,1261533,2056107,1875527,1177541,1535923,1558184,1558155,1261538,1261537,1558140,1159311,1558188,1558185,1261529,1558158,1000021460,1558517,1000090515,1558194,1558143,1558153,1261484,1261542,1558156,1201225,1261481,1558157,1891458,1200735,1285621,1702779,1558135,1261540,1579615,1558189,1558154,2053227,1261531,1261488,1558139,1261527,1558192,1261530,1261528,1159310,1558136,1558138,1558164,1261543,1000015605,2053214,1558187,1183258,1184576,1558145,1558159,1208646,1622955,1558161,1558160,1208046,1000060938,1000067963,1000067965,1261487,1828875,1541699,1261491,1210589,1558162,1558151,1558152,1558163,1181201,1186001,1197776,1200734,1208043,1208044,1208045}'::integer[]))
Total runtime: 31229.377 ms
# Splitting the IN into <= 100 element pieces works
=>explain analyze SELECT * FROM "scm" WHERE ((bid in (1000071082,1558141,1261493,1558137,1558166,1622957,1261535,1558191,1885437,2025548,1558144,1261485,1261536,1261539,1261541,1000067964,1558183,1789348,1000090512,1558150,1000096731,1261533,2056107,1875527,1177541,1535923,1558184,1558155,1261538,1261537,1558140,1159311,1558188,1558185,1261529,1558158,1000021460,1558517,1000090515,1558194,1558143,1558153,1261484,1261542,1558156,1201225,1261481,1558157,1891458,1200735,1285621,1702779,1558135,1261540,1579615,1558189,1558154,2053227,1261531,1261488,1558139,1261527,1558192,1261530,1261528,1159310,1558136,1558138,1558164,1261543,1000015605,2053214,1558187,1183258,1184576,1558145,1558159,1208646,1622955,1558161,1558160,1208046,1000060938,1000067963,1000067965,1261487,1828875,1541699,1261491,1210589,1558162,1558151,1558152,1558163,1181201,1186001,1197776,1200734,1208043,1208044))) OR ((bid IN (1208045))) order by m desc limit 100;
Limit (cost=66664.94..66665.19 rows=100 width=229) (actual time=24.749..24.902 rows=100 loops=1)
-> Sort (cost=66664.94..66709.98 rows=18014 width=229) (actual time=24.747..24.795 rows=100 loops=1)
Sort Key: m
Sort Method: top-N heapsort Memory: 85kB
-> Bitmap Heap Scan on scm (cost=577.99..65976.46 rows=18014 width=229) (actual time=1.867..12.888 rows=16433 loops=1)
Recheck Cond: ((bid = ANY ('{1000071082,1558141,1261493,1558137,1558166,1622957,1261535,1558191,1885437,2025548,1558144,1261485,1261536,1261539,1261541,1000067964,1558183,1789348,1000090512,1558150,1000096731,1261533,2056107,1875527,1177541,1535923,1558184,1558155,1261538,1261537,1558140,1159311,1558188,1558185,1261529,1558158,1000021460,1558517,1000090515,1558194,1558143,1558153,1261484,1261542,1558156,1201225,1261481,1558157,1891458,1200735,1285621,1702779,1558135,1261540,1579615,1558189,1558154,2053227,1261531,1261488,1558139,1261527,1558192,1261530,1261528,1159310,1558136,1558138,1558164,1261543,1000015605,2053214,1558187,1183258,1184576,1558145,1558159,1208646,1622955,1558161,1558160,1208046,1000060938,1000067963,1000067965,1261487,1828875,1541699,1261491,1210589,1558162,1558151,1558152,1558163,1181201,1186001,1197776,1200734,1208043,1208044}'::integer[])) OR (bid = 1208045))
-> BitmapOr (cost=577.99..577.99 rows=18014 width=0) (actual time=1.720..1.720 rows=0 loops=1)
-> Bitmap Index Scan on index_scm_on_bid (cost=0.00..562.33 rows=17987 width=0) (actual time=1.715..1.715 rows=16430 loops=1)
Index Cond: (bid = ANY ('{1000071082,1558141,1261493,1558137,1558166,1622957,1261535,1558191,1885437,2025548,1558144,1261485,1261536,1261539,1261541,1000067964,1558183,1789348,1000090512,1558150,1000096731,1261533,2056107,1875527,1177541,1535923,1558184,1558155,1261538,1261537,1558140,1159311,1558188,1558185,1261529,1558158,1000021460,1558517,1000090515,1558194,1558143,1558153,1261484,1261542,1558156,1201225,1261481,1558157,1891458,1200735,1285621,1702779,1558135,1261540,1579615,1558189,1558154,2053227,1261531,1261488,1558139,1261527,1558192,1261530,1261528,1159310,1558136,1558138,1558164,1261543,1000015605,2053214,1558187,1183258,1184576,1558145,1558159,1208646,1622955,1558161,1558160,1208046,1000060938,1000067963,1000067965,1261487,1828875,1541699,1261491,1210589,1558162,1558151,1558152,1558163,1181201,1186001,1197776,1200734,1208043,1208044}'::integer[]))
-> Bitmap Index Scan on index_scm_on_bid (cost=0.00..6.66 rows=27 width=0) (actual time=0.003..0.003 rows=3 loops=1)
Index Cond: (bid = 1208045)
Total runtime: 24.998 ms
Thanks!
Tim