Query becomes slow when written as view

Jan Strube <js@xxxxxxxxx> · Thu, 14 Feb 2013 14:23:19 +0100

Hi,

I have the following query which runs reasonably fast under PostgreSQL 
9.1.8:

SELECT
b."ISIN",
CASE
WHEN b."COMMENT" IS NOT NULL THEN b."COMMENT"
WHEN cc."ISIN" IS NOT NULL THEN cc.comment
ELSE get_comment(b."ISIN")
END AS "COMMENT"
FROM dtng."Z_BASE" b
LEFT JOIN dtng.cached_comments cc on b."ISIN" = cc."ISIN" AND 
cc.cache_time >= b._last_modified
WHERE b."ISIN" IN (SELECT "ISIN" from dtng."Z_BASE" LIMIT 1)

Here is the query plan:

Nested Loop Left Join (cost=0.08..16.65 rows=1 width=1053)
Join Filter: (cc.cache_time >= b._last_modified)
-> Nested Loop (cost=0.08..8.67 rows=1 width=644)
-> HashAggregate (cost=0.08..0.09 rows=1 width=13)
-> Subquery Scan on "ANY_subquery" (cost=0.00..0.08 rows=1 width=13)
-> Limit (cost=0.00..0.07 rows=1 width=13)
-> Seq Scan on "Z_BASE" (cost=0.00..106515.68 rows=1637368 width=13)
-> Index Scan using "Z_BASE_pkey" on "Z_BASE" b (cost=0.00..8.57 rows=1 
width=644)
Index Cond: (("ISIN")::bpchar = ("ANY_subquery"."ISIN")::bpchar)
-> Index Scan using cached_comments_pkey on cached_comments cc 
(cost=0.00..7.71 rows=1 width=425)
Index Cond: ((b."ISIN")::bpchar = ("ISIN")::bpchar)

When I´m trying to put this into a view, it becomes extremely slow:

CREATE VIEW export_comments AS
SELECT
b."ISIN",
CASE
WHEN b."COMMENT" IS NOT NULL THEN b."COMMENT"
WHEN cc."ISIN" IS NOT NULL THEN cc.comment
ELSE get_comment(b."ISIN")
END AS "COMMENT"
FROM dtng."Z_BASE" b
LEFT JOIN dtng.cached_comments cc on b."ISIN" = cc."ISIN" AND 
cc.cache_time >= b._last_modified

SELECT *
FROM export_comments
WHERE "ISIN" IN (SELECT "ISIN" FROM dtng."Z_BASE" LIMIT 1)

The query plan now is:

Hash Join (cost=79926.52..906644.87 rows=818684 width=45)
Hash Cond: ((b."ISIN")::bpchar = ("ANY_subquery"."ISIN")::bpchar)
-> Hash Left Join (cost=79926.42..884049.08 rows=1637368 width=1053)
Hash Cond: ((b."ISIN")::bpchar = (cc."ISIN")::bpchar)
Join Filter: (cc.cache_time >= b._last_modified)
-> Seq Scan on "Z_BASE" b (cost=0.00..106515.68 rows=1637368 width=644)
-> Hash (cost=74620.41..74620.41 rows=77841 width=425)
-> Seq Scan on cached_comments cc (cost=0.00..74620.41 rows=77841 width=425)
-> Hash (cost=0.09..0.09 rows=1 width=13)
-> HashAggregate (cost=0.08..0.09 rows=1 width=13)
-> Subquery Scan on "ANY_subquery" (cost=0.00..0.08 rows=1 width=13)
-> Limit (cost=0.00..0.07 rows=1 width=13)
-> Seq Scan on "Z_BASE" (cost=0.00..106515.68 rows=1637368 width=13)

By the way I get the same behaviour and query plan when I try this:

SELECT *
FROM (
-- above view definition
) x
WHERE x."ISIN" IN (SELECT "ISIN" FROM dtng."Z_BASE" LIMIT 1)

We already found out that the problem is the Perl function "get_comment" 
which is very expensive. In the first case the function is called at 
most once, but in the second case it is called many times. I believe 
this is because of the hash join which causes the view to fetch 
everything from dtng."Z_BASE" first?
The question is, how to avoid this? We tried to set the functions cost 
from 100 to 10000000 but that did not help. (Because of the architecture 
of the software that uses this query, we have the constraint that 
structure of the final WHERE clause (WHERE "ISIN" IN (...)) must not be 
altered.)

Thanks a lot for any idea,
Jan

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general