Replacing Cursors with Temporary Tables

Eliot Gable <egable+pgsql-performance@xxxxxxxxx> · Wed, 21 Apr 2010 16:16:18 -0400

I have previously discussed my very long PL/PGSQL stored procedure on this list. However, without getting into too many details, I have another performance-related question.

The procedure currently uses cursors to return multiple result sets to the program executing the procedure. Basically, I do this:

BEGIN;
SELECT * FROM stored_proc();
FETCH ALL FROM cursor1;
FETCH ALL FROM cursor2;
FETCH ALL FROM cursor3;
etc.
COMMIT;

However, there are some cases in the stored procedure where some of the result sets returned by these cursors are also needed as inputs to additional queries. To use them, I am currently doing:

FOR temp IN cursorX LOOP
  -- Some code that pushes the current temp record onto the end of an array
END LOOP;
OPEN cursorX;
MOVE FIRST FROM cursorX;

Then, when I need to use the results in a query, I do something like:

SELECT * FROM table1 INNER JOIN (SELECT * FROM unnest(result_array)) AS table2 ON ( blah blah ) WHERE blah

This seems extremely inefficient to me. First, I'm not sure of the penalty for unnesting an array into a SET OF object. Second, the set of records returned from unnesting would not be indexed for the join which means a sequential scan. Third, building the array in the first place using array_append seems extremely inefficient. Fourth, opening the cursor twice seems like it would execute the query twice, though given the placement and context, it's probably got it cached somewhere (hopefully). I'm sure there are probably other things I am overlooking. 

Instead of doing things this way, I think using temporary tables is really what I want. I am thinking that instead of doing this cursor BS, I can do something like:

CREATE TEMPORARY TABLE table2 WITH (OIDS=FALSE) ON COMMIT DROP AS (

  SELECT * FROM blah blah blah -- whatever the cursor is defined as doing
);
ALTER TABLE table2 ADD PRIMARY KEY (id);
CREATE INDEX table2_blah_idx ON table2 USING btree (blah);
ANALYZE table2;

Then, when I need to use the results in another query, I could do:

SELECT * FROM table1 INNER JOIN table2 ON ( blah blah ) WHERE blah

This would use the indexes and the primary key appropriately. I could also ensure that the order of the information in the temporary table is such that it facilitates any joining, where clauses, or order by clauses on the additional queries. Finally, to get results into my application, I would then do:

BEGIN;
SELECT * FROM stored_procedure();
SELECT * FROM temp_table1;
SELECT * FROM temp_table2;
SELECT * FROM temp_table3;
etc
COMMIT;

However, this is a fairly major re-write of how things are done. Before I spend the time to do all that re-writing, can anyone share some insight on where / how I might expect to gain performance from this conversion and also speak to some of the overhead (if any) in using temporary tables like this (building them, creating indexes on them, analyzing them, then dropping them on commit)? It is worth mentioning that the data being stored in these temporary tables is probably <1,000 records for all tables involved. Most will probably be <100 records. Some of these temporary tables will be joined to other tables up to 4 more times throughout the rest of the stored procedure. Most will be generated and then retrieved only from outside the stored procedure. Obviously, I would not create indexes on or analyze the temporary tables being retrieved only from outside the stored procedure. Indexes and primary keys will only be created on the tables that are joined to other tables and have WHERE conditions applied to them.

I have done a lot of Googling on temporary tables and cursors in PostgreSQL, but I have found only very limited discussion as to performance differences with respect to how I'm planning on using them, and I am unsure about the quality of the information given that most of it is 4+ years out of date and posted on various expert exchanges and not on this pgsql-performance list. 

One final question:

In this conversion to temporary table use, there are a couple of cases where I would prefer to do something like:

prepare blah(blah blah) as select blah;

Then, I want to call this prepared statement multiple times, passing a different argument value each time. The only reason to do this would be to save writing code and to ensure that updating the select statement in once place covers all places where it is used. However, I am concerned it might incur a performance hit by re-preparing the query since I assume that having this inside the PL/PGSQL procedure means it is already prepared once. Can anyone speak to this? I know that I could put it in a separate stored procedure, but then the question becomes, does that add extra overhead? Or, in different words, is it similar to the difference between an inlined function and a non-inlined function in C?

I would greatly appreciate any insights to these questions/issues.

Thanks in advance for any assistance anyone can provide.

-- 
Eliot Gable

"We do not inherit the Earth from our ancestors: we borrow it from our children." ~David Brower 

"I decided the words were too conservative for me. We're not borrowing from our children, we're stealing from them--and it's not even considered to be a crime." ~David Brower

"Esse oportet ut vivas, non vivere ut edas." (Thou shouldst eat to live; not live to eat.) ~Marcus Tullius Cicero