On 03/29/2011 08:56 AM, Nic Chidu wrote:
Got a situation where a 130 mil rows (137GB) table needs to be brought down in size to 10 mil records (most recent)
with the least amount of downtime.
Doing a full vacuum would be faster on:
- 120 mil rows deleted and 10 mil active (delete most of them then full vacuum)
- 10 mil deleted and 120 mil active. (delete small batches and full vacuum after each delete).
Any other suggestions?
I didn't really ask earlier, but what amount of downtime is acceptable
and how much space is available on your drive?
If a few minutes is acceptable and you aren't completely against the
wall in disk-space, the easiest is probably to delete all the old rows
and cluster the table. Cluster requires an exclusive lock, but so does
vacuum full. And in my experience with larger tables a cluster is an
order or magnitude or more faster (and you get shiny new indexes as
well). Analyze the table immediately afterward.
The exact solution will depend on the nature of your operation (what
depends on this table, what is the nature of data inserts and queries,
etc.). If it is critical that you be able to continuously insert data,
you might copy the live rows up to some recent point in time to a
holding table, then lock the table, copy the recently added rows,
truncate and unlock the table and backfill it from the other table. This
will minimize the unavailability for inserts but may not be feasible if
you have constant query requirements, foreign-key constraints or the like.
If availability is truly critical, be sure to test whatever approach you
take in advance.
Cheers,
Steve
--
Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin