Re: simple functions, huge overhead, no cache

Pavel Stehule <pavel.stehule@xxxxxxxxx> · Sat, 10 Jul 2010 23:46:43 +0200

Hello

2010/7/10 Tom Lane <tgl@xxxxxxxxxxxxx>:
> Craig Ringer <craig@xxxxxxxxxxxxxxxxxxxxx> writes:
>> Your problem boils down to the fact that PL/PgSQL function calls are
>> expensive. PL/PgSQL is good for complex work, but bad for wrapping up
>> simple expressions because setting up / tearing down the function call
>> context is so expensive.
>
> Yes.  What's actually the most expensive part is invoking the main
> executor to process SQL expressions --- that's approximately the same
> cost as starting up/shutting down an entire SQL query.  plpgsql has some
> optimizations to avoid some of that cost when evaluating a "simple"
> expression, but both of the examples here defeat that optimization:
> one because it's an EXISTS() subquery, and the other because of the
> entirely-unnecessary use of a sub-SELECT.

any using a non simple expression is very slow - so there can be some
a warning when people use it. Sometimes people don't know (me too),
when use expensive expression

for example

rowvar := (10,20)

it isn't simple - I am not sure, if it is true still. Maybe we can
have a two GUC for plpgsql - like current parser's plpgsql GUC
variables:

plpgsql.validate_expressions
plpgsql.show_performance_warnings

second GUC can show warning when
a) expression isn't simple
b) expression result isn't binary compatible with variable
c) plpgsql function can be simply replaced by SQL function

c is maybe controversal - SQL/PSM stimulates users to write simple one
statement functions.

Can we inline simple PL/pgSQL functions?

CREATE OR REPLACE FUNCTION foo(a int, b int)
RETURNS int AS $$
BEGIN
  RETURN a + b;
END;
$$ LANGUAGE plpgsql;

Some first idea is to add column "inlined_code" - validator can
returns this value - for this function just SELECT $1 + $2; and
executor doesn't run expensive plpgsql code, but just inlined_code.
Maybe this idea is too simple. This is real problem - I can see a
similar code every month.

Regards

Pavel Stehule

>
> To illustrate the point I did a bit of experimentation using CVS HEAD.
> This is an assert-enabled build so it will have more overhead than a
> production build ... you might care to repeat the tests on your own
> installation.
>
> Base case (to measure the overhead of the test query):
>
> regression=# explain analyze verbose select (x::float8) from generate_series(1,100000) x;
>                                                               QUERY PLAN
> ----------------------------------------------------------------------------------------------------------------------------------------
>  Function Scan on pg_catalog.generate_series x  (cost=0.00..12.50 rows=1000 width=4) (actual time=252.595..980.359 rows=100000 loops=1)
>   Output: (x)::double precision
>  Total runtime: 1495.410 ms
> (3 rows)
>
> Add the built-in cos() function to that:
>
> regression=# explain analyze verbose select cos(x::float8) from generate_series(1,100000) x;
>                                                               QUERY PLAN
> -----------------------------------------------------------------------------------------------------------------------------------------
>  Function Scan on pg_catalog.generate_series x  (cost=0.00..15.00 rows=1000 width=4) (actual time=242.764..1140.608 rows=100000 loops=1)
>   Output: cos((x)::double precision)
>  Total runtime: 1652.968 ms
> (3 rows)
>
> So cos() added about 150 ms to the test, or 1.5 microsec per execution.
> Now let's wrap the cos() call in a plpgsql function:
>
> regression=# create function plpgsql_cosine(float8) returns float8 as '
> begin return cos($1); end' language plpgsql;
> CREATE FUNCTION
> regression=# explain analyze verbose select plpgsql_cosine(x::float8) from generate_series(1,100000) x;
>                                                                QUERY PLAN
> ------------------------------------------------------------------------------------------------------------------------------------------
>  Function Scan on pg_catalog.generate_series x  (cost=0.00..262.50 rows=1000 width=4) (actual time=254.050..5719.287 rows=100000 loops=1)
>   Output: plpgsql_cosine((x)::double precision)
>  Total runtime: 6296.207 ms
> (3 rows)
>
> The wrapper added about 4650 ms.  If we're incautious enough to make
> the RETURN's expression look not-simple:
>
> regression=# create function plpgsql_cosine2(float8) returns float8 as '
> begin return (select cos($1)); end' language plpgsql;
> CREATE FUNCTION
> regression=# explain analyze verbose select plpgsql_cosine2(x::float8) from generate_series(1,100000) x;
>                                                                QUERY PLAN
> -------------------------------------------------------------------------------------------------------------------------------------------
>  Function Scan on pg_catalog.generate_series x  (cost=0.00..262.50 rows=1000 width=4) (actual time=244.735..15557.516 rows=100000 loops=1)
>   Output: plpgsql_cosine2((x)::double precision)
>  Total runtime: 16092.261 ms
> (3 rows)
>
> ... the wrapper's overhead balloons to 14440 ms.
>
>> For such simple expressions, you should use 'SQL' functions. These can
>> often be inlined to allow the query planner to avoid call overheads
>> entirely, and are WAY cheaper even if they can't be inlined. They're
>> less flexible, but much faster.
>
> I think the performance benefit of SQL functions is real only if they
> can be inlined.  Continuing the example:
>
> regression=# create function sql_cosine(float8) returns float8 as '
> regression'# select cos($1)' language sql;
> CREATE FUNCTION
> regression=# explain analyze verbose select sql_cosine(x::float8) from generate_series(1,100000) x;
>                                                               QUERY PLAN
> -----------------------------------------------------------------------------------------------------------------------------------------
>  Function Scan on pg_catalog.generate_series x  (cost=0.00..15.00 rows=1000 width=4) (actual time=243.677..1140.872 rows=100000 loops=1)
>   Output: cos((x)::double precision)
>  Total runtime: 1653.346 ms
> (3 rows)
>
> The runtime is indistinguishable from the bare cos() call, which is not
> surprising since you can see from the EXPLAIN output that what was
> getting executed was just the bare cos() call --- the SQL function was
> inlined.  If I do something to prevent inlining, it's in the same
> ballpark as plpgsql:
>
> regression=# create function sql_cosine2(float8) returns float8 as '
> select cos($1) limit 1' language sql;
> CREATE FUNCTION
> regression=# explain analyze verbose select sql_cosine2(x::float8) from generate_series(1,100000) x;
>                                                                QUERY PLAN
> ------------------------------------------------------------------------------------------------------------------------------------------
>  Function Scan on pg_catalog.generate_series x  (cost=0.00..262.50 rows=1000 width=4) (actual time=242.401..8647.405 rows=100000 loops=1)
>   Output: sql_cosine2((x)::double precision)
>  Total runtime: 9166.177 ms
> (3 rows)
>
> The SQL-function executor is fairly stupid and hasn't had nearly the
> level of optimization effort put into it that plpgsql has over the
> years.  This test case doesn't really show the effects of that, but
> for example plpgsql can remember plans across queries whereas a SQL
> function will not.  So if you're focused on performance, use SQL
> functions when they can be inlined (and use EXPLAIN to *verify*
> that they're getting inlined) ... but otherwise you're probably as
> well or better off with plpgsql.
>
> Oh, and one more thing: there is no "function cache".
>
>                        regards, tom lane
>
> --
> Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general