Stored proc and optimizer question

Antal Attila <antal.attila@xxxxxxxx> · Wed, 15 Feb 2006 13:57:03 +0100

Hi!

I have a question about the query optimizer and the function scan. See 
the next case:

CREATE TABLE a (id SERIAL PRIMARY KEY, userid INT4, col  TEXT);
CREATE TABLE b (id SERIAL PRIMARY KEY, userid INT4, a_id INT4 REFERENCES 
a (id), col  TEXT);
CREATE INDEX idx_a_uid ON a(userid);
CREATE INDEX idx_b_uid ON b(userid);
CREATE INDEX idx_a_col ON a(col);
CREATE INDEX idx_b_col ON b(col);

First solution:

   CREATE VIEW ab_view AS
       SELECT a.id AS id,
                      a.userid AS userid_a, b.userid AS userid_b,
                      a.col AS col_a, b.col AS col_b
       FROM a LEFT JOIN b ON (a.id = b.a_id);

   EXPLAIN ANALYSE SELECT * FROM ab_view
       WHERE userid_a = 23 AND userid_b = 23 AND col_a LIKE 's%'
       ORDER BY col_b
       LIMIT 10 OFFSET 10;

                                                               QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
Limit  (cost=15.70..15.70 rows=1 width=76) (actual time=0.108..0.108 
rows=0 loops=1)
  ->  Sort  (cost=15.69..15.70 rows=1 width=76) (actual 
time=0.104..0.104 rows=0 loops=1)
        Sort Key: b.col
        ->  Nested Loop  (cost=3.32..15.68 rows=1 width=76) (actual 
time=0.085..0.085 rows=0 loops=1)
              Join Filter: ("outer".id = "inner".a_id)
              ->  Bitmap Heap Scan on a  (cost=2.30..6.13 rows=1 
width=40) (actual time=0.082..0.082 rows=0 loops=1)
                    Recheck Cond: (userid = 23)
                    Filter: (col ~~ 's%'::text)
                    ->  BitmapAnd  (cost=2.30..2.30 rows=1 width=0) 
(actual time=0.077..0.077 rows=0 loops=1)
                          ->  Bitmap Index Scan on idx_a_uid  
(cost=0.00..1.02 rows=6 width=0) (actual time=0.075..0.075 rows=0 loops=1)
                                Index Cond: (userid = 23)
                          ->  Bitmap Index Scan on idx_a_col  
(cost=0.00..1.03 rows=6 width=0) (never executed)
                                Index Cond: ((col >= 's'::text) AND 
(col < 't'::text))
              ->  Bitmap Heap Scan on b  (cost=1.02..9.49 rows=5 
width=40) (never executed)
                    Recheck Cond: (userid = 23)
                    ->  Bitmap Index Scan on idx_b_uid  
(cost=0.00..1.02 rows=5 width=0) (never executed)
                          Index Cond: (userid = 23)
Total runtime: 0.311 ms

In the first solution the query optimizer can work on the view and the 
full execution of the query will be optimal. But I have to use 2 
condition for the userid fields (userid_a = 23 AND userid_b = 23 ). If I 
have to eliminate the duplication I can try to use stored function.

Second solution:
   CREATE FUNCTION ab_select(INT4)  RETURNS setof ab_view AS $$
       SELECT a.id AS id,
                      a.userid AS userid_a, b.userid AS userid_b,
                      a.col AS col_a, b.col AS col_b
       FROM a LEFT JOIN b ON (a.id = b.a_id AND b.userid = $1)
       WHERE a.userid = $1;
   $$ LANGUAGE SQL STABLE;

   EXPLAIN ANALYSE SELECT * FROM ab_select(23)
       WHERE col_a LIKE 's%'
       ORDER BY col_b
       LIMIT 10 OFFSET 10;

                                                     QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Limit  (cost=15.07..15.07 rows=1 width=76) (actual time=1.034..1.034 
rows=0 loops=1)
  ->  Sort  (cost=15.06..15.07 rows=5 width=76) (actual 
time=1.030..1.030 rows=0 loops=1)
        Sort Key: col_b
        ->  Function Scan on ab_select  (cost=0.00..15.00 rows=5 
width=76) (actual time=1.004..1.004 rows=0 loops=1)
              Filter: (col_a ~~ 's%'::text)
Total runtime: 1.103 ms

The second solution have 2 advantage:
 1. The second query is more beautiful and shorter.
 2. You can rewrite easier the stored function without modify the query.

But I have heartache, because the optimizer give up the game. It cannot 
optimize the query globally (inside and outside the stored function) in 
spite of the STABLE keyword. It use function scan on the result of the 
stored function.

How can I eliminate the function scan while I want to keep the advantages?

In my opinion the optimizer cannot replace the function scan with a more 
optimal plan, but this feature may be implemented in the next versions 
of  PostgreSQL. I would like to suggest this.

I built this case theoretically, but I have more stored procedure which 
works with bad performance therefore.

Regards,
Antal Attila