Once per quarter, we need to load a lot of data, which
causes many updates across the database. We have an online transaction
processing-style application, which we really want to stay up during the update
job. The programmer coded a stored procedure which does the job
well … logically. But as a single PL/pgSQL stored procedure, it is
one long-running transaction. At least, that is my interpretation of http://www.postgresql.org/docs/8.0/interactive/plpgsql-porting.html#CO.PLPGSQL-PORTING-COMMIT
– and in fact, we do get errors when we try little BEGIN-COMMIT blocks
inside a stored procedure. A single long-running transaction would be bad in production.
A long run time = OK, but long-running transaction = site outage. So I’m asking for advice on whether I can break this
into small transactions without too much of a rewrite. Roughly, the
algorithm is: (1) One job
dumps the data from the external source into a load table. (2) Another job
calls the stored procedure, which uses a cursor to traverse the load table.
A loop for each record: a. Processes a lot
of special cases, with inserts and/or updates to many tables. Unless this can be done within PL/pgSQL, I will have the
programmer refactor job (2) so that the loop is in a java program, and the “normalization”
logic in (a) – the guts of the loop – remain in a smaller stored
procedure. The java loop will call that stored procedure once per row of
the load table, each call in a separate transaction. That would both
preserve the bulk of the PL/pgSQL code and keep the normalization logic close
to the data. So the runtime will be reasonable, probably somewhat longer
than his single monolithic stored procedure, but the transactions will be
short. We don’t need anything like SERIALIZATION transaction
isolation of the online system from the entire load job. Thanks for any ideas, David Crane DonorsChoose.org |