Problem: query becomes slow when calling a fast user defined function.

Dan Libby <dan@xxxxxxxxx> · Mon, 5 Jun 2006 18:03:59 -0600

Hi,

Using postgres 8.0.1, I'm having a problem where a user-defined function 
that executes quite quickly on its own slows down the calling query, 
which ignores at least one index.  I don't think this should be happening.

Please forgive the long explanation below; I'm trying to be clear.

So -- I have a function (marked STABLE) that takes 1-2 ms to execute when called 
via a simple select, eg:

\timing
select * from ascend_tree_breadcrumb( category_by_topic( 'World' ) );
 category_id | parent_category_id | topic | num_sub_items | num_sub_cats
-------------+--------------------+-------+---------------+--------------
           1 |                    | World |             0 |            0
(1 row)

Time: 1.311 ms

As you can see, there are actually 2 functions being called, and the top-level 
function returns a set, containing one row.  In practice, it will never return 
more than 5 rows.

For this very simple example, I can return the same data by calling a table directly:

lyrff=# SELECT * from category c where c.topic = 'World';
 category_id | parent_category_id | topic | num_sub_items | num_sub_cats
-------------+--------------------+-------+---------------+--------------
           1 |                    | World |             0 |            0
(1 row)

Time: 2.660 ms

So far, so good.

But now, when I join the set that is returned by the function with another 
table category_lang, which contains about 40k records, using the primary key 
for category_lang, then things become slow.

lyrff=# SELECT cl.category_id, atb.topic, cl.title
lyrff-# FROM ascend_tree_breadcrumb( category_by_topic( 'World' ) ) atb
lyrff-# inner join category_lang cl on (atb.category_id = cl.category_id and cl.lang_code = 'en');
 category_id | topic | title
-------------+-------+-------
           1 | World | World
(1 row)

Time: 308.822 ms

(Okay, so 308 ms is not a super-long time, but this query is supposed to run
on all pages of a website, so it quickly becomes painful.  And anyway, it's
about 300x where it could/should be.)

So now if I remove the function call and substitute the SQL that 
looks directly in the category table, then things are fast again:

lyrff=# SELECT cl.category_id, c.topic, cl.title
lyrff-# FROM category c
lyrff-# inner join category_lang cl on (c.category_id = cl.category_id and cl.lang_code = 'en')
lyrff-# where
lyrff-#  c.topic = 'World';
 category_id | topic | title
-------------+-------+-------
           1 | World | World
(1 row)

Time: 1.914 ms

So clearly the user-defined function is contributing to the slow-down, even though 
the function itself executes quite quickly.  Here's what explain has to say:

lyrff=# explain analyze
lyrff-# SELECT cl.category_id, atb.topic, cl.title
lyrff-# FROM ascend_tree_breadcrumb( category_by_topic( 'World' ) ) atb
lyrff-# inner join category_lang cl on (atb.category_id = cl.category_id and cl.lang_code = 'en');
                                                             QUERY PLAN                                                    
-------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=1791.79..2854.89 rows=1001 width=532) (actual time=350.935..352.317 rows=1 loops=1)
   Hash Cond: ("outer".category_id = "inner".category_id)
   ->  Function Scan on ascend_tree_breadcrumb atb  (cost=0.00..12.50 rows=1000 width=520) (actual time=0.834..0.835 rows=1 loops=1)
   ->  Hash  (cost=1327.33..1327.33 rows=58986 width=16) (actual time=329.393..329.393 rows=0 loops=1)
         ->  Seq Scan on category_lang cl  (cost=0.00..1327.33 rows=58986 width=16) (actual time=0.036..191.442 rows=40603 loops=1)
               Filter: (lang_code = 'en'::bpchar)
 Total runtime: 352.689 ms

As you can see, it is doing a Sequential Scan on the category_lang table, 
which has 40603 rows now, and will grow.  So that's not good.

Now, let's see the non-function version:

lyrff=# explain analyze
lyrff-# SELECT cl.category_id, c.topic, cl.title
lyrff-# FROM category c
lyrff-# inner join category_lang cl on (c.category_id = cl.category_id and cl.lang_code = 'en')
lyrff-# where
lyrff-#  c.topic = 'World';
                                                              QUERY PLAN                                                   
---------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..29.23 rows=2 width=72) (actual time=0.104..0.112 rows=1 loops=1)
   ->  Index Scan using category_topic_key on category c  (cost=0.00..9.70 rows=2 width=60) (actual time=0.058..0.060 rows=1 loops=1)
         Index Cond: ((topic)::text = 'World'::text)
   ->  Index Scan using table_lang_pk on category_lang cl  (cost=0.00..9.75 rows=1 width=16) (actual time=0.028..0.032 rows=1 loops=1)
         Index Cond: (("outer".category_id = cl.category_id) AND (cl.lang_code = 'en'::bpchar))
 Total runtime: 0.312 ms

This time, it used the index on category_lang, as it should.

I'm not an expert at reading explain output, so I've probably 
missed something important.

I've tried modifying the query in several ways, eg putting 
the function call in a sub-select, and so on.  I also tried
disabling the various query plans, but in the end I've only 
managed to slow it down even further.

So, I'm hoping someone can tell me what the magical cure is.   
Or failing that, I'd at least like to understand why the planner 
is deciding not to use the category_lang index when the result 
set is coming from a function instead of a "regular" table.

Many thanks in advance.

-- 
Dan Libby