Re: looping over a small record set over and over in a function

Ivan Sergio Borgonovo <mail@xxxxxxxxxxxxxxx> · Sat, 20 Jun 2009 14:35:56 +0200

On Sat, 20 Jun 2009 12:30:42 +0200
Alban Hertroys <dalroi@xxxxxxxxxxxxxxxxxxxxxxxxxxxx> wrote:

> You could add a column to your query as a placeholder for the
> computed value.
> For example, SELECT *, 0 AS computed_value FROM table.

> If you use a scrollable cursor (possible in PL/pgSQL these days,  
> although it still has some limitations) you could just loop
> through its results and rewind the cursor until you're done.

> It does look like you're implementing some kind of aggregate
> function though. Maybe you can implement it as an actual aggregate
> function, maybe in combination with the windowing stuff in 8.4?
> Then you could just add the aggregate to your query instead of
> using your query results in a function. That's about the inverse
> of what you're attempting now.
> You seem to be implementing something that would look like SELECT  
> sum(SELECT * FROM table), while you may be better off aiming for  
> SELECT SUM(value) FROM table.
> 
> Considering you want to loop over the results multiple times that
> may not be possible; only you can tell.

> > Is it really worth to load the whole record set in an array, loop
> > over etc... in spite of eg. building a temp table with the same
> > structure of the input record set + 1 field, loop over the table
> > etc... what about all the UPDATEs involved to change field N+1 of
> > the temp table? Will be they expensive?
> 
> Neither of those look like a convenient approach. With the array
> you lose all the meta-data the record set provided (types, column
> names, etc), while the temp table approach looks like it will be
> kind of inefficient.

> Is the result of that function volatile or can you maybe store it  
> after computing it once and fire a trigger to update the computed  
> value if the underlying data changes?

The function is volatile.
What I was thinking about was to fire a trigger to wipe the temp
table if the table on which the computation is made is changed.

Considering I can't make the loop run in parallel aggregates may be
a way to go... but somehow they look as they are making the
implementation a bit hard to manage. Anyway it still have to be seen
if I could actually implement the loops with aggregates since every
loop has 2 "side effects" compute some aggregates for the whole
record set and compute an extra field for each row.

> I wonder why you need to re-iterate over your result set multiple  
> times? It's kind of rare that once isn't enough.

> And lastly, if your current approach really is the only way to
> compute what you're after, then maybe PL/pgSQL isn't the right
> match for the problem; it looks like you'd be better served by a

Yeah. I gave a look to python but I don't want to add one more
language to the mix.
I enjoy strict type checking of plpgsql even if some bit of
syntactic sugar would help to make it more pleasing and I think it
is the most lightweight among the offer.
Still I don't know how easy it is with eg. python to load an array
with a result set, change it and place it back into the table where
it was coming from.

> language that can work with arrays of typed structures. As I'm not
> familiar with the other PL languages I can't tell whether they
> would be suitable in that respect, but I suspect Python or Java
> would be able to handle this better.

Your suggestion about cursor could be the way... but I don't know
enough about cursors internals to understand if updating a field of
a cursor will cause disk writes.

Currently my main concern is making this things readable and
extensible. I'm not planning to optimise yet. The result set on which
the computations are made is pretty small. It is just taken out from
a large one. But I don't want to cut my way to optimisation.

So one way could be:

select into a temp table the record set.

Build up a trigger that will wipe the temp table if the tables on
which the record set is built changes. This may even not be
necessary, since everything related to that record set is going to
happen in one session.

loop several times over the temp table

Since every loop should actually correspond to a function... I may
wonder if I could build up the cursor and pass it along to several
functions.
Every function will MOVE FIRST, update the one row of the record
set and return some computed fields that will be used by the next
function etc...

If this is possible, this way have the advantage to be extensible.
I wonder if it is efficient since I don't know if an
UPDATE table set where current of cursor
on a temp table is going to incur in any disk write.

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general