Search Postgresql Archives

Re: Need to update all my 60 million rows at once without transactional integrity

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Far from being an expert on postgres, but there are two ideas-- assuming that you cannot afford the time it would take to simply UPDATE and wait...

Write a script to update all the rows, one at a time. Lowest impact to operations but would take a very long time.

Assuming you have a sequence primary key value on each row, update by ID blocks on the order of 10,000's or 100,000's at a time (or more). This is a balancing act between time to complete and immediate load on the server.

I've used both options but I don't think I've exceeded 15 million rows.

Sent from my iPhone.

On Apr 23, 2008, at 2:15 PM, Simon Riggs <simon@xxxxxxxxxxxxxxx> wrote:

On Mon, 2008-04-21 at 00:19 +0200, christian_behrens@xxxxxxx wrote:

How can I make a Update of a column in a very large table for all rows without using the double amount of disc space and without any need for
atomic operation?

I have a very large table with about 60 million rows. I sometimes need
to do a simple update to ALL rows that resets a status-flag to zero.

I don't need to have transactional integrity (but of course if the
system crashes, there should be no data corruption.

No such thing. Without transactions you have no sensible definition of
what constitutes data corruption.

A separate flag in the file system can well save the fact that that
bulk update was in progress) for this, I don't care or want a abort or
"all or nothing" like SQL mandates. The idea is basically that either
this update succeeds or it succeeds or - there is no "not". It must
succeed. It must be tried until it works. If the update is halfway
finished, that's okay.

Don't reset them to zero, just redefine the meaning of the counter. Take
the max value in the table and then have the app understand that
anything <= the previous max value means the same thing as whatever
"status = 0" means now. The app would need to read the new baseline
value before performing any work.

This is roughly the technique used by Slony to avoid needing to update
every row in the log table to show that it has successfully replicated
it. It's also the technique MVCC relies upon internally.

It's less work and crash safe in all cases.

--
 Simon Riggs
 2ndQuadrant  http://www.2ndQuadrant.com


--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux