Re: Long Running Update - My Solution

Harry Mantheakis <harry.mantheakis@xxxxxxxxxxxxxxxxxxxxxx> · Mon, 27 Jun 2011 16:02:02 +0100

I am glad to report that the 'salami-slice' approach worked nicely - all 
done in about 2.5 hours.

Instead of using an all-in-one-go statement, we executed 800 statements, 
each updating 100,000 records. On average it tool about 10-seconds for 
each statement to return.

This is "thinking out of the box" solution, which others might not be 
able to emulate.

The mystery remains, for me: why updating 100,000 records could complete 
in as quickly as 5 seconds, whereas an attempt to update a million 
records was still running after 25 minutes before we killed it?

One thing remains crystal clear: I love Postgresql :-)

Kind regards

Harry Mantheakis
London, UK

On 23/06/2011 16:05, Harry Mantheakis wrote:
Hello

I am attempting to run an update statement that copies two fields from 
one table to another:

UPDATE
  table_A
SET
(
  field_1
, field_2
) = (
table_B.field_1
, table_B.field_2
)
FROM
table_B
WHERE
table_B.id = table_A.id
;

Table "table_B" contains almost 75 million records, with IDs that 
match those in "table_A".

Both "field_1" and "field_2" are DOUBLE PRECISION. The ID fields are 
SERIAL primary-key integers in both tables.

I tested (the logic of) this statement with a very small sample, and 
it worked correctly.

The database runs on a dedicated Debian server in our office.

I called both VACUUM and ANALYZE on the databased before invoking this 
statement.

The statement has been running for 18+ hours so far.

TOP, FREE and VMSTAT utilities indicate that only about half of the 
6GB of memory is being used, so I have no reason to believe that the 
server is struggling.

My question is: can I reasonably expect a statement like this to 
complete with such a large data-set, even if it takes several days?

We do not mind waiting, but obviously we do not want to wait 
unnecessarily.

Many thanks.

Harry Mantheakis
London, UK

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance