Re: Need to update all my 60 million rows at once without transactional integrity

Tom Allison <tom@xxxxxxxxxxx> · Wed, 23 Apr 2008 17:35:14 -0400

Far from being an expert on postgres, but there are two ideas-- 
assuming that you cannot afford the time it would take to simply  
UPDATE and wait...

Write a script to update all the rows, one at a time. Lowest impact to  
operations but would take a very long time.

Assuming you have a sequence primary key value on each row, update by  
ID blocks on the order of 10,000's or 100,000's at a time (or more).   
This is a balancing act between time to complete and immediate load on  
the server.

I've used both options but I don't think I've exceeded 15 million rows.

Sent from my iPhone.

On Apr 23, 2008, at 2:15 PM, Simon Riggs <simon@xxxxxxxxxxxxxxx> wrote:

On Mon, 2008-04-21 at 00:19 +0200, christian_behrens@xxxxxxx wrote:

How can I make a Update of a column in a very large table for all  
rows
without using the double amount of disc space and without any need  
for
atomic operation?

I have a very large table with about 60 million rows. I sometimes  
need
to do a simple update to ALL rows that resets a status-flag to zero.

I don't need to have transactional integrity (but of course if the
system crashes, there should be no data corruption.

No such thing. Without transactions you have no sensible definition of
what constitutes data corruption.

A separate flag in the file system can well save the fact that that
bulk update was in progress) for this, I don't care or want a abort  
or
"all or nothing" like SQL mandates. The idea is basically that either
this update succeeds or it succeeds or - there is no "not". It must
succeed. It must be tried until it works. If the update is halfway
finished, that's okay.

Don't reset them to zero, just redefine the meaning of the counter.  
Take
the max value in the table and then have the app understand that
anything <= the previous max value means the same thing as whatever
"status = 0" means now. The app would need to read the new baseline
value before performing any work.

This is roughly the technique used by Slony to avoid needing to update
every row in the log table to show that it has successfully replicated
it. It's also the technique MVCC relies upon internally.

It's less work and crash safe in all cases.

--
 Simon Riggs
 2ndQuadrant  http://www.2ndQuadrant.com

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general