I am glad to report that the 'salami-slice' approach worked nicely - all
done in about 2.5 hours.
Instead of using an all-in-one-go statement, we executed 800 statements,
each updating 100,000 records. On average it tool about 10-seconds for
each statement to return.
This is "thinking out of the box" solution, which others might not be
able to emulate.
The mystery remains, for me: why updating 100,000 records could complete
in as quickly as 5 seconds, whereas an attempt to update a million
records was still running after 25 minutes before we killed it?
One thing remains crystal clear: I love Postgresql :-)
Kind regards
Harry Mantheakis
London, UK
On 23/06/2011 16:05, Harry Mantheakis wrote:
Hello
I am attempting to run an update statement that copies two fields from
one table to another:
UPDATE
table_A
SET
(
field_1
, field_2
) = (
table_B.field_1
, table_B.field_2
)
FROM
table_B
WHERE
table_B.id = table_A.id
;
Table "table_B" contains almost 75 million records, with IDs that
match those in "table_A".
Both "field_1" and "field_2" are DOUBLE PRECISION. The ID fields are
SERIAL primary-key integers in both tables.
I tested (the logic of) this statement with a very small sample, and
it worked correctly.
The database runs on a dedicated Debian server in our office.
I called both VACUUM and ANALYZE on the databased before invoking this
statement.
The statement has been running for 18+ hours so far.
TOP, FREE and VMSTAT utilities indicate that only about half of the
6GB of memory is being used, so I have no reason to believe that the
server is struggling.
My question is: can I reasonably expect a statement like this to
complete with such a large data-set, even if it takes several days?
We do not mind waiting, but obviously we do not want to wait
unnecessarily.
Many thanks.
Harry Mantheakis
London, UK
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance