Re: Difference between speed of 2 functions: SQL+STABLE and PLPGSQL+EXECUTE

Dmitry Koterov <dmitry@xxxxxxxxxx> · Thu, 3 May 2012 00:30:10 +0400

Albe, thanks for detailed explaination.
But it's quite strange that SQL+STABLE function does not recalculate the plan each time it is called. Because when I use a bunch of SQL+STABLE functions in e.g. a sub-select of a complex query, I see in the plan of this complex queries that function calls are "expanded". It looks like PostgreSQL uses SQL code defined in SQL+STABLE functions and merges (injects) it into the main query instead of the function call.

E.g.:

==========================================================================
CREATE TABLE a(i INTEGER, t VARCHAR(5));
INSERT INTO a(i, t) SELECT s, '' FROM generate_series(1, 10000) s;

INSERT INTO a(i, t) SELECT s, 't' FROM generate_series(10001, 10002) s;
CREATE INDEX "a_i_idx" ON "public"."a" USING btree ("i");
CREATE INDEX "a_t_idx" ON "public"."a" USING btree ("t");

ANALYZE a;

CREATE OR REPLACE FUNCTION a_get_t(in_a a) RETURNS TEXT STABLE LANGUAGE sql 
AS 'SELECT $1.t';

explain analyze
select * from a 

where a_get_t(a) = 't';

QUERY PLAN
Index Scan using a_t_idx on a  (cost=0.00..8.29 rows=2 width=5) (actual time=0.041..0.043 rows=2 loops=1)
  Index Cond: ((t)::text = 't'::text)

==========================================================================

You may see that a_get_t() SQL code was merged into the main query plan, so the result is found without a seqscan with t='t' filtering, but the index is used.

That was a very simple example, I use much more complex SQL+STABLE functions in my code and I am practically sure that this SQL extraction+injection is applied by PostgreSQL in other cases too (e.g. sometimes planner initiates a hash join with tables which are referred inside SQL+STABLE functions in sub-queries). If I replace STABLE with VOLATILE in that complex cases, the effect disappears: no more SQL extraction performed.

And more: assume we have a function f(x, y, z) VOLATILE with very complex SQL inside and we call it like:

select * from f(1, 2, 3);

The query may took a long time (during not only the first call, but during all others too). Then I just replace VOLATILE to STABLE for f(), and the same query:

select * from f(1, 2, 3);

suddenly becomes very fast (e.g. 1000 times faster or even more). It's a very common case: I've performed many times. I thought that it was because of re-planning of STABLE functions on each call according to real passed values...

If STABLE functions has frozen plans too (independent to its real passed arguments values), how could we explain so much difference in performance replacing VOLATILE to STABLE?

On Fri, Apr 27, 2012 at 2:34 PM, Albe Laurenz <laurenz.albe@xxxxxxxxxx> wrote:

Dmitry Koterov wrote:

> For example, I have 2 functions like these:

>

> CREATE OR REPLACE FUNCTION first(a INTEGER, b INTEGER, ...) RETURNS

... AS

> $body$

>     ...any SQL which uses $1, $2 etc. arguments, plus LIMIT $3...

> $body$

> LANGUAGE 'sql'

> STABLE

>

>

> and

>

>

> CREATE OR REPLACE FUNCTION second(a INTEGER, b INTEGER, ...) RETURNS

... AS

> $body$

> DECLARE

>     res ...;

> BEGIN

>     EXECUTE '...the same SELECT, ' ||

>         'but ' || quote_literal(a) || ' args are embedded, plus ' ||

>         'LIMIT ' || quote_literal($3)

>     INTO res;

>     RETURN res;

> END;

> $body$

> LANGUAGE 'plpgsql'

> STABLE

>

> And then I call

>

> EXPLAIN ANALYZE SELECT * FROM first(...);

> EXPLAIN ANALYZE SELECT * FROM second(...);

>

> Should these two queries be executed by the same time usage (i.e. does

PostgreSQL generate same plans

> for inner queries)?

>

> I always thought that the answer is YES: if a function is STABLE and

with language=SQL, its SQL code

> is embedded into outer context after all arguments are expanded into

their values (so the plan is

> built after argument expansion). But some days ago I detected a case

when second() works about 100

> times faster than first(), and the cause is seems that the planner

does not see all of expanded

> arguments in first() (if I replace arguments to constants in first(),

especially in LIMIT clause, it

> begins to work the same speed as second() does). Unfortunately EXPLAIN

ANALYZE does not go into

> functions and shows only overall time, so I have no real information

about what plan is actually used

> in first().

You can get EXPLAIN plans if you use the auto_explain contrib module

with auto_explain.log_nested_statements enabled.

As you suspect, the two functions work differently.

The SQL function will plan a parameterized statement (with $1 etc. in

it)

and execute that statement whenever it is called, while the PL/pgSQL

function will execute an SQL statement with all the constant literals

in it that gets planned and executed when you call the function.

The SQL function will create a statement that cannot benefit from

optimizations that work only for certain constant values (although

there will be improvements in 9.2 for that).  On the down side,

the PL/pgSQL function will have to plan the query every time it is

executed, which does not come for free.

To illustrate that, an example:

I create a table "test" as follows:

CREATE TABLE test(id integer PRIMARY KEY, val text NOT NULL);

CREATE INDEX test_val_ind ON test(val);

Then I fill it with 1000 rows, 11 of which have val='test'

and ANALYZE the table.

CREATE OR REPLACE FUNCTION first(text, integer) RETURNS integer

   STABLE STRICT LANGUAGE sql AS

   'SELECT id FROM test WHERE val=$1 LIMIT $2';

CREATE OR REPLACE FUNCTION second(IN v text, IN l integer, OUT r

integer)

   STABLE STRICT LANGUAGE plpgsql AS

   $$BEGIN

      EXECUTE 'SELECT id FROM test WHERE val=' || quote_literal(v)

                 || ' LIMIT ' || CAST(l AS integer) INTO r;

   END$$;

Then I execute "SELECT first('test', 1)" and "SELECT SECOND('test', 1)".

The respective plans are:

For the SQL function:

Query Text: SELECT id FROM test WHERE val=$1 LIMIT $2

Limit  (cost=0.00..1.75 rows=50 width=4)

  ->  Seq Scan on test  (cost=0.00..17.50 rows=500 width=4)

        Filter: (val = $1)

For the PL/pgSQL function:

Query Text: SELECT id FROM test WHERE val='test' LIMIT 1

Limit  (cost=0.00..0.95 rows=1 width=4)

  ->  Index Scan using test_val_ind on test  (cost=0.00..10.46 rows=11

width=4)

        Index Cond: (val = 'test'::text)

Yours,

Laurenz Albe

--

Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)

To make changes to your subscription:

http://www.postgresql.org/mailpref/pgsql-general