Hello 2010/7/10 Tom Lane <tgl@xxxxxxxxxxxxx>: > Craig Ringer <craig@xxxxxxxxxxxxxxxxxxxxx> writes: >> Your problem boils down to the fact that PL/PgSQL function calls are >> expensive. PL/PgSQL is good for complex work, but bad for wrapping up >> simple expressions because setting up / tearing down the function call >> context is so expensive. > > Yes. What's actually the most expensive part is invoking the main > executor to process SQL expressions --- that's approximately the same > cost as starting up/shutting down an entire SQL query. plpgsql has some > optimizations to avoid some of that cost when evaluating a "simple" > expression, but both of the examples here defeat that optimization: > one because it's an EXISTS() subquery, and the other because of the > entirely-unnecessary use of a sub-SELECT. any using a non simple expression is very slow - so there can be some a warning when people use it. Sometimes people don't know (me too), when use expensive expression for example rowvar := (10,20) it isn't simple - I am not sure, if it is true still. Maybe we can have a two GUC for plpgsql - like current parser's plpgsql GUC variables: plpgsql.validate_expressions plpgsql.show_performance_warnings second GUC can show warning when a) expression isn't simple b) expression result isn't binary compatible with variable c) plpgsql function can be simply replaced by SQL function c is maybe controversal - SQL/PSM stimulates users to write simple one statement functions. Can we inline simple PL/pgSQL functions? CREATE OR REPLACE FUNCTION foo(a int, b int) RETURNS int AS $$ BEGIN RETURN a + b; END; $$ LANGUAGE plpgsql; Some first idea is to add column "inlined_code" - validator can returns this value - for this function just SELECT $1 + $2; and executor doesn't run expensive plpgsql code, but just inlined_code. Maybe this idea is too simple. This is real problem - I can see a similar code every month. Regards Pavel Stehule > > To illustrate the point I did a bit of experimentation using CVS HEAD. > This is an assert-enabled build so it will have more overhead than a > production build ... you might care to repeat the tests on your own > installation. > > Base case (to measure the overhead of the test query): > > regression=# explain analyze verbose select (x::float8) from generate_series(1,100000) x; > QUERY PLAN > ---------------------------------------------------------------------------------------------------------------------------------------- > Function Scan on pg_catalog.generate_series x (cost=0.00..12.50 rows=1000 width=4) (actual time=252.595..980.359 rows=100000 loops=1) > Output: (x)::double precision > Total runtime: 1495.410 ms > (3 rows) > > Add the built-in cos() function to that: > > regression=# explain analyze verbose select cos(x::float8) from generate_series(1,100000) x; > QUERY PLAN > ----------------------------------------------------------------------------------------------------------------------------------------- > Function Scan on pg_catalog.generate_series x (cost=0.00..15.00 rows=1000 width=4) (actual time=242.764..1140.608 rows=100000 loops=1) > Output: cos((x)::double precision) > Total runtime: 1652.968 ms > (3 rows) > > So cos() added about 150 ms to the test, or 1.5 microsec per execution. > Now let's wrap the cos() call in a plpgsql function: > > regression=# create function plpgsql_cosine(float8) returns float8 as ' > begin return cos($1); end' language plpgsql; > CREATE FUNCTION > regression=# explain analyze verbose select plpgsql_cosine(x::float8) from generate_series(1,100000) x; > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------------------------ > Function Scan on pg_catalog.generate_series x (cost=0.00..262.50 rows=1000 width=4) (actual time=254.050..5719.287 rows=100000 loops=1) > Output: plpgsql_cosine((x)::double precision) > Total runtime: 6296.207 ms > (3 rows) > > The wrapper added about 4650 ms. If we're incautious enough to make > the RETURN's expression look not-simple: > > regression=# create function plpgsql_cosine2(float8) returns float8 as ' > begin return (select cos($1)); end' language plpgsql; > CREATE FUNCTION > regression=# explain analyze verbose select plpgsql_cosine2(x::float8) from generate_series(1,100000) x; > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------------------------- > Function Scan on pg_catalog.generate_series x (cost=0.00..262.50 rows=1000 width=4) (actual time=244.735..15557.516 rows=100000 loops=1) > Output: plpgsql_cosine2((x)::double precision) > Total runtime: 16092.261 ms > (3 rows) > > ... the wrapper's overhead balloons to 14440 ms. > >> For such simple expressions, you should use 'SQL' functions. These can >> often be inlined to allow the query planner to avoid call overheads >> entirely, and are WAY cheaper even if they can't be inlined. They're >> less flexible, but much faster. > > I think the performance benefit of SQL functions is real only if they > can be inlined. Continuing the example: > > regression=# create function sql_cosine(float8) returns float8 as ' > regression'# select cos($1)' language sql; > CREATE FUNCTION > regression=# explain analyze verbose select sql_cosine(x::float8) from generate_series(1,100000) x; > QUERY PLAN > ----------------------------------------------------------------------------------------------------------------------------------------- > Function Scan on pg_catalog.generate_series x (cost=0.00..15.00 rows=1000 width=4) (actual time=243.677..1140.872 rows=100000 loops=1) > Output: cos((x)::double precision) > Total runtime: 1653.346 ms > (3 rows) > > The runtime is indistinguishable from the bare cos() call, which is not > surprising since you can see from the EXPLAIN output that what was > getting executed was just the bare cos() call --- the SQL function was > inlined. If I do something to prevent inlining, it's in the same > ballpark as plpgsql: > > regression=# create function sql_cosine2(float8) returns float8 as ' > select cos($1) limit 1' language sql; > CREATE FUNCTION > regression=# explain analyze verbose select sql_cosine2(x::float8) from generate_series(1,100000) x; > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------------------------ > Function Scan on pg_catalog.generate_series x (cost=0.00..262.50 rows=1000 width=4) (actual time=242.401..8647.405 rows=100000 loops=1) > Output: sql_cosine2((x)::double precision) > Total runtime: 9166.177 ms > (3 rows) > > The SQL-function executor is fairly stupid and hasn't had nearly the > level of optimization effort put into it that plpgsql has over the > years. This test case doesn't really show the effects of that, but > for example plpgsql can remember plans across queries whereas a SQL > function will not. So if you're focused on performance, use SQL > functions when they can be inlined (and use EXPLAIN to *verify* > that they're getting inlined) ... but otherwise you're probably as > well or better off with plpgsql. > > Oh, and one more thing: there is no "function cache". > > regards, tom lane > > -- > Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general