Re: processing large amount of rows with plpgsql

Merlin Moncure <mmoncure@xxxxxxxxx> · Wed, 8 Aug 2012 15:04:08 -0500

On Wed, Aug 8, 2012 at 2:41 PM, Geert Mak <pobox@xxxxxxxxxxxxx> wrote:
> hello everybody,
>
> we are trying to move the data from table1 into table2 using a plpgsql stored procedure which is performing simple a data conversion
>
> there are about 50 million rows
>
> the tables are relatively simple, less than a dozen columns, most are integer, a couple are char(32) and one is varchar holding URLs
>
> what happens is that when we execute the stored procedure, the execution eats up the 17 GB free space on the server and the server crashes
>
> if we apply a limit of 1 million rows, the execution completes successfully in about a minute
>
> we understand, by now, that plpgsql functions are executed in their own transaction, which commits when the function comes to an end
>
> the question is -
>
> are plpgsql functions suitable for such massive operations on data and can this work without so much disk space is being eaten for something which should be simply "read-change-write, read-change-write, read-change-write, …"? i. e. any way to force commit inside, or so?
>
> or should we rather implement this operation in some external scripting language (speed is not that important, this is a one time conversion)

What is the general structure of the procedure?  In particular, how
are you browsing and updating the rows?  There is (almost) no way to
force commit inside a function -- there has been some discussion about
stored procedure and/or autonomous transaction feature in terms of
getting there.

I say 'almost' because you can emulate some aspects of autonomous
transactions with dblink, but that may not be a very good fit for your
particular case.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general