Re: Trivial function query optimized badly

"Adam Rich" <adam.r@xxxxxxxxxxxxx> · Wed, 3 Jan 2007 22:17:48 -0600

Craig,
What version of postgres are you using?  I just tested this on PG 8.1.2
and was unable to reproduce these results.  I wrote a simple function
that returns the same text passed to it, after sleeping for 1 second.
I use it in a where clause, like your example below, and regardless of
the number of rows in the table, it still takes roughly 1 second,
indicating to me the function is only called once.

Is it possible that your function really isn't immutable? Would PG 
realize this and fall back to treating it as VOLATILE ?

-----Original Message-----
From: pgsql-performance-owner@xxxxxxxxxxxxxx
[mailto:pgsql-performance-owner@xxxxxxxxxxxxxx] On Behalf Of Craig A.
James
Sent: Wednesday, January 03, 2007 9:11 PM
To: pgsql-performance@xxxxxxxxxxxxxx
Subject: [PERFORM] Trivial function query optimized badly

Well, once again I'm hosed because there's no way to tell the optimizer
the cost for a user-defined function.  I know this issue has already
been raised (by me!) several times, but I have to remind everyone about
this.  I frequently must rewrite my SQL to work around this problem.

Here is the function definition:

  CREATE OR REPLACE FUNCTION cansmiles(text) RETURNS text
  AS '/usr/local/pgsql/lib/libchem.so', 'cansmiles'
  LANGUAGE 'C' STRICT IMMUTABLE;

Here is the bad optimization:

db=> explain analyze select version_id, 'Brc1ccc2nc(cn2c1)C(=O)O' from
version where version.isosmiles = cansmiles('Brc1ccc2nc(cn2c1)C(=O)O',
1);
                                               QUERY PLAN

------------------------------------------------------------------------
--------------------------------
 Seq Scan on version  (cost=0.00..23.41 rows=1 width=4) (actual
time=1434.281..1540.253 rows=1 loops=1)
   Filter: (isosmiles = cansmiles('Brc1ccc2nc(cn2c1)C(=O)O'::text, 1))
 Total runtime: 1540.347 ms
(3 rows)

I've had to break it up into two separate queries.  Ironically, for
large databases, Postgres does the right thing -- it computes the
function, then uses the index on the "isosmiles" column.  It's blazingly
fast and very satisfactory.  But for small databases, it apparently
decides to recompute the function once per row, making the query N times
slower (N = number of rows) than it should be!

In this instance, there are 1000 rows, and factor of 10^4 is a pretty
dramatic slowdown...  To make it work, I had to call the function
separately then use its result to do the select.

db=> explain analyze select cansmiles('Brc1ccc2nc(cn2c1)C(=O)O', 1);
                                     QUERY PLAN

------------------------------------------------------------------------
------------
 Result  (cost=0.00..0.01 rows=1 width=0) (actual time=1.692..1.694
rows=1 loops=1)
 Total runtime: 1.720 ms
(2 rows)

db=> explain analyze select version_id, 'Brc1ccc2nc(cn2c1)C(=O)O' from
version where version.isosmiles = 'Brc1ccc2nc(cn2c1)C(=O)O';
                                                         QUERY PLAN

------------------------------------------------------------------------
-----------------------------------------------------
 Index Scan using i_version_isosmiles on version  (cost=0.00..5.80
rows=1 width=4) (actual time=0.114..0.117 rows=1 loops=1)
   Index Cond: (isosmiles = 'Brc1ccc2nc(cn2c1)C(=O)O'::text)
 Total runtime: 0.158 ms
(3 rows)

Craig

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match