Re: Long Running Update

"Kevin Grittner" <Kevin.Grittner@xxxxxxxxxxxx> · Thu, 23 Jun 2011 14:32:24 -0500

Harry Mantheakis <harry.mantheakis@xxxxxxxxxxxxxxxxxxxxxx> wrote:

> UPDATE
>    table_A
> SET
> (
>    field_1
> , field_2
> ) = (
> table_B.field_1
> , table_B.field_2
> )
> FROM
> table_B
> WHERE
> table_B.id = table_A.id
> ;

I would have just done:

  SET field_1 = table_B.field_1, field_2 = table_B.field_2

instead of using row value constructors.  That might be slowing
things down a bit.

> I tested (the logic of) this statement with a very small sample,
> and it worked correctly.

Always a good sign.  :-)

> The statement has been running for 18+ hours so far.

> My question is: can I reasonably expect a statement like this to 
> complete with such a large data-set, even if it takes several
> days?

If it's not leaking memory, I expect that it will complete.

To get some sense of what it's doing, you could log on to another
connection and EXPLAIN the statement.  (NOTE: Be careful *not* to
use EXPLAIN ANALYZE.)

Another thing to consider if you run something like this again is
that an UPDATE is an awful lot like an INSERT combined with a
DELETE.  The way PostgreSQL MVCC works, the old version of each row
must remain until the updating transaction completes.  If you were
to divide this update into a series of updates by key range, the new
versions of the rows from later updates could re-use the space
previously occupied by the old version of rows from earlier updates.
For similar reasons, you might want to add something like this to
your WHERE clause, to prevent unnecessary updates:

  AND (table_B.field_1 IS DISTINCT FROM table_A.field_1
    OR table_B.field_2 IS DISTINCT FROM table_A.field_2);

-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance