Prepared statement's planning

"Vyacheslav Kalinin" <vka@xxxxxxxx> · Tue, 15 Jan 2008 18:36:14 +0300

Hello,

I would appreciate if someone explained me how exactly
prepared parametrized statements are planned, i.e. what kind of
assumptions planner makes on param values, selectivity, expected row
count etc. that affect in particular whether indexes will be used or
not. For instance consider the following case:

create table t1 ( id serial, val int);

insert into t1 (val)
select trunc(100000*random())
  from generate_series(1, 1000000);

create index idx_t1 on t1(val);

analyze t1;
------------------------------

1.
prepare stmt (int) as
 select * from t1 where val < $1;

explain execute stmt(100000); 

-----------------------
QUERY PLAN
Seq Scan on t1  (cost=
0.00..17401.94 rows=333332 width=8)

 Filter: (val < $1)

2.
prepare stmt (int, int) as
 select * from t1 where val > $1 and val < $2;

explain execute stmt(20000, 30000); 

-----------------------
QUERY PLAN
Bitmap Heap Scan on t1  (cost=
151.74..5307.59 rows=5000 width=8)
  Recheck Cond: ((val > $1) AND (val < $2))
  ->  Bitmap Index Scan on idx_t1  (cost=0.00..150.49 rows=5000 width=0)
        Index Cond: ((val > $1) AND (val < $2))

Hmm, why does it expect 5000 rows here? What influences this expectation?

3.
prepare stmt (int) as
 select * from t1 where val = $1 or $1 is null;

explain execute stmt(20000); 

QUERY PLAN

Seq Scan on t1  (cost=0.00..17401.94 rows=5013 width=8)
  Filter: ((val = $1) OR ($1 IS NULL))

That's
the weirdest behavior: where did 5013 rows assumption came from? Why
use seq scan then? I should mention that planner refuses to use
anything but seq scan here even if I explicitly disable it with "set
enable_seqscan to off".

In general, I wonder if one could get somewhat predictable
planner behavior in such cases since we have a lot of code written in
plpgsql and the patterns above are pretty common there.

Thanks,
Viatcheslav