function not called if part of aggregate

"Craig A. James" <cjames@xxxxxxxxxxxxxxxx> · Sun, 11 Jun 2006 10:18:20 -0700

My application has a function, call it "foo()", that requires initialization from a table of about 800 values.  Rather than build these values into the C code, it seemed like a good idea to put them on a PG table and create a second function, call it "foo_init()", which is called for each value, like this:

  select foo_init(value) from foo_init_table order by value_id;

This works well, but it requires me to actually retrieve the function's value 800 times.  So I thought I'd be clever:

  select count(1) from (select foo_init(value) from foo_init_table order by value_id) as foo;

And indeed, it count() returns 800, as expected.  But my function foo_init() never gets called!  Apparently the optimizer figures out that foo_init() must return one value for each row, so it doesn't bother to actually call the function.

db=> explain select count(1) from (select foo_init(value) from foo_init_table order by db_no) as foo;
                                            query plan                                             
----------------------------------------------------------------------------------------------------
aggregate  (cost=69.95..69.95 rows=1 width=0)
  ->  Subquery Scan foo  (cost=0.00..67.93 rows=806 width=0)
        ->  Index Scan using foo_init_table_pkey on foo_init_table  (cost=0.00..59.87 rows=806 width=30)

This doesn't seem right to me -- how can the optimizer possibly know that a function doesn't have a side effect, as in my case?  Functions could do all sorts of things, such as logging activity, filling in other tables, etc, etc.

Am I missing something here?

Thanks,
Craig