I've found a few performance issues with an internal database
application and I'm sure it's related to my misunderstanding of how
and when queries are planned when used in a plpgsql function. This
is against Postgres 7.4.
For example, suppose this function is defined as a per-statement
update trigger on "some_table":
CREATE OR REPLACE FUNCTION foo() RETURNS TRIGGER LANGUAGE 'plpsql' AS
'DECLARE
my_id int8;
BEGIN
my_id := 12; -- some arbitrary value
UPDATE some_other_table SET field = 'value' WHERE id = my_id::int8;
RETURN NULL;
END;'
The above function is (obviously) a stripped down version of
something real, but it accurately represents the basics.
When is the UPDATE statement inside foo() planned? When the trigger
is first created, or when it's first used per backend, or every time
it's used per backend? It's gotta be one of the former, because it
sure ain't the latter.
I dunno what plan is being generated, but it's gotta be using a
sequential scan. "some_other_table" contains roughly 2 million rows
and the "id" column is uniquely indexed, yet the UPDATE takes 35-40
seconds when run via the trigger, but only milliseconds if the
equivalent UPDATE statement is played into psql. The database is
freshly vacuumed and analyzed. And of course an EXPLAIN via psql
shows an index scan.
If I turn on statement logging I can plainly see that the embedded
UPDATE statement is the query taking a long time. Additionally, a
gdb stacktrace of the backend shows Postgres is somewhere inside the
pl_pgsql call handlers.
Changing the UPDATE to be:
EXECUTE ''UPDATE some_other_table SET field = ''''value'''' WHERE id
= '' || my_id || ''::int8'';
seems to "solve" the bad planning problem, but this sure is ugly.
Maybe I missed it in the 7.4 docs, but I can't find any information
on query planning for plpgsql functions. Any insight into how this
works would be greatly appreciated. Also, any mention of how PG
8.1.3 differs in this regard would also be handy.
thanks!
eric