I would appreciate if someone explained me how exactly prepared parametrized statements are planned, i.e. what kind of assumptions planner makes on param values, selectivity, expected row count etc. that affect in particular whether indexes will be used or not. For instance consider the following case:
create table t1 ( id serial, val int);
insert into t1 (val)
select trunc(100000*random())
from generate_series(1, 1000000);
create index idx_t1 on t1(val);
analyze t1;
------------------------------
1.
prepare stmt (int) as
select * from t1 where val < $1;
explain execute stmt(100000);
-----------------------
QUERY PLAN
Seq Scan on t1 (cost= 0.00..17401.94 rows=333332 width=8)
Filter: (val < $1)
2.
prepare stmt (int, int) as
select * from t1 where val > $1 and val < $2;
explain execute stmt(20000, 30000);
-----------------------
QUERY PLAN
Bitmap Heap Scan on t1 (cost= 151.74..5307.59 rows=5000 width=8)
Recheck Cond: ((val > $1) AND (val < $2))
-> Bitmap Index Scan on idx_t1 (cost=0.00..150.49 rows=5000 width=0)
Index Cond: ((val > $1) AND (val < $2))
Hmm, why does it expect 5000 rows here? What influences this expectation?
3.
prepare stmt (int) as
select * from t1 where val = $1 or $1 is null;
explain execute stmt(20000);
QUERY PLAN
Seq Scan on t1 (cost=0.00..17401.94 rows=5013 width=8)
Filter: ((val = $1) OR ($1 IS NULL))
That's the weirdest behavior: where did 5013 rows assumption came from? Why use seq scan then? I should mention that planner refuses to use anything but seq scan here even if I explicitly disable it with "set enable_seqscan to off".
In general, I wonder if one could get somewhat predictable planner behavior in such cases since we have a lot of code written in plpgsql and the patterns above are pretty common there.
Thanks,
Viatcheslav