On 01/12/2017 12:06 PM, Jonathan Vanasco wrote: > I'm just wondering if there's a more efficient way of handling a certain periodic data migration. > > We have a pair of tables with this structure: > > table_a__live > column_1 INT > column_2 INT > record_timestamp TIMESTAMP > > table_a__archive > column_1 INT > column_2 INT > record_timestamp TIMESTAMP > > periodically, we must migrate items that are 'stale' from `table_a__live ` to `table_a__archive`. The entries are copied over to the archive, then deleted. > > The staleness is calculated based on age-- so we need to use INTERVAL. the "live" table can have anywhere from 100k to 20MM records. > > the primary key on `table_a__live` is a composite of column_1 & column_2, > > In order to minimize scanning the table, we opted to hint migrations with a dedicated column: > > ALTER TABLE table_a__live ADD is_migrate BOOLEAN DEFAULT NULL; > CREATE INDEX idx_table_a__live_migrate ON table_a__live(is_migrate) WHERE is_migrate IS NOT NULL; > > so our migration is then based on that `is_migrate` column: > > BEGIN; > UPDATE table_a__live SET is_migrate = TRUE WHERE record_timestamp < transaction_timestamp() AT TIME ZONE 'UTC' - INTERVAL '1 month'; > INSERT INTO table_a__archive (column_1, column_2, record_timestamp) SELECT column_1, column_2, record_timestamp FROM table_a__live WHERE is_migrate IS TRUE; > DELETE FROM table_a__live WHERE is_migrate IS TRUE; > COMMIT; > > The inserts & deletes are blazing fast, but the UPDATE is a bit slow from postgres re-writing all the rows. Maybe I am missing something, but why do the UPDATE? Why not?: BEGIN; INSERT INTO table_a__archive (column_1, column_2, record_timestamp) SELECT column_1, column_2, record_timestamp FROM table_a__live WHERE record_timestamp < transaction_timestamp() AT TIME ZONE 'UTC' - INTERVAL '1 month'; DELETE FROM table_a__live WHERE record_timestamp < transaction_timestamp() AT TIME ZONE 'UTC' - INTERVAL '1 month'; COMMIT; With an index on record_timestamp. > > can anyone suggest a better approach? > > I considered copying everything to a tmp table then inserting/deleting based on that table -- but there's a lot of disk-io on that approach too. > > > fwiw we're on postgres9.6.1 > -- Adrian Klaver adrian.klaver@xxxxxxxxxxx -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general