Hi!
I have a question about the query optimizer and the function scan. See
the next case:
CREATE TABLE a (id SERIAL PRIMARY KEY, userid INT4, col TEXT);
CREATE TABLE b (id SERIAL PRIMARY KEY, userid INT4, a_id INT4 REFERENCES
a (id), col TEXT);
CREATE INDEX idx_a_uid ON a(userid);
CREATE INDEX idx_b_uid ON b(userid);
CREATE INDEX idx_a_col ON a(col);
CREATE INDEX idx_b_col ON b(col);
First solution:
CREATE VIEW ab_view AS
SELECT a.id AS id,
a.userid AS userid_a, b.userid AS userid_b,
a.col AS col_a, b.col AS col_b
FROM a LEFT JOIN b ON (a.id = b.a_id);
EXPLAIN ANALYSE SELECT * FROM ab_view
WHERE userid_a = 23 AND userid_b = 23 AND col_a LIKE 's%'
ORDER BY col_b
LIMIT 10 OFFSET 10;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=15.70..15.70 rows=1 width=76) (actual time=0.108..0.108
rows=0 loops=1)
-> Sort (cost=15.69..15.70 rows=1 width=76) (actual
time=0.104..0.104 rows=0 loops=1)
Sort Key: b.col
-> Nested Loop (cost=3.32..15.68 rows=1 width=76) (actual
time=0.085..0.085 rows=0 loops=1)
Join Filter: ("outer".id = "inner".a_id)
-> Bitmap Heap Scan on a (cost=2.30..6.13 rows=1
width=40) (actual time=0.082..0.082 rows=0 loops=1)
Recheck Cond: (userid = 23)
Filter: (col ~~ 's%'::text)
-> BitmapAnd (cost=2.30..2.30 rows=1 width=0)
(actual time=0.077..0.077 rows=0 loops=1)
-> Bitmap Index Scan on idx_a_uid
(cost=0.00..1.02 rows=6 width=0) (actual time=0.075..0.075 rows=0 loops=1)
Index Cond: (userid = 23)
-> Bitmap Index Scan on idx_a_col
(cost=0.00..1.03 rows=6 width=0) (never executed)
Index Cond: ((col >= 's'::text) AND
(col < 't'::text))
-> Bitmap Heap Scan on b (cost=1.02..9.49 rows=5
width=40) (never executed)
Recheck Cond: (userid = 23)
-> Bitmap Index Scan on idx_b_uid
(cost=0.00..1.02 rows=5 width=0) (never executed)
Index Cond: (userid = 23)
Total runtime: 0.311 ms
In the first solution the query optimizer can work on the view and the
full execution of the query will be optimal. But I have to use 2
condition for the userid fields (userid_a = 23 AND userid_b = 23 ). If I
have to eliminate the duplication I can try to use stored function.
Second solution:
CREATE FUNCTION ab_select(INT4) RETURNS setof ab_view AS $$
SELECT a.id AS id,
a.userid AS userid_a, b.userid AS userid_b,
a.col AS col_a, b.col AS col_b
FROM a LEFT JOIN b ON (a.id = b.a_id AND b.userid = $1)
WHERE a.userid = $1;
$$ LANGUAGE SQL STABLE;
EXPLAIN ANALYSE SELECT * FROM ab_select(23)
WHERE col_a LIKE 's%'
ORDER BY col_b
LIMIT 10 OFFSET 10;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Limit (cost=15.07..15.07 rows=1 width=76) (actual time=1.034..1.034
rows=0 loops=1)
-> Sort (cost=15.06..15.07 rows=5 width=76) (actual
time=1.030..1.030 rows=0 loops=1)
Sort Key: col_b
-> Function Scan on ab_select (cost=0.00..15.00 rows=5
width=76) (actual time=1.004..1.004 rows=0 loops=1)
Filter: (col_a ~~ 's%'::text)
Total runtime: 1.103 ms
The second solution have 2 advantage:
1. The second query is more beautiful and shorter.
2. You can rewrite easier the stored function without modify the query.
But I have heartache, because the optimizer give up the game. It cannot
optimize the query globally (inside and outside the stored function) in
spite of the STABLE keyword. It use function scan on the result of the
stored function.
How can I eliminate the function scan while I want to keep the advantages?
In my opinion the optimizer cannot replace the function scan with a more
optimal plan, but this feature may be implemented in the next versions
of PostgreSQL. I would like to suggest this.
I built this case theoretically, but I have more stored procedure which
works with bad performance therefore.
Regards,
Antal Attila