On 04/11/2018 10:24 AM, Ron wrote:
On 04/11/2018 11:15 AM, Don Seiler wrote:
Let's say I have two DBs: main (9.6.6) and archive (10.2).
I have a table in main where I want to archive data older then 60
days. For various reasons, the table is not partitioned, so for now
we must use DELETE. The destination table in the archive DB is
partitioned with the new Pg10 partitioning.
My initial plan was to have a stored procedure on the archive DB use
postgres_fdw to do an INSERT INTO / SELECT to select the data
remotely and insert into the local archive table. It would then issue
a single DELETE command to remotely remove the data from the main DB.
However I found that doing this resulted in the main DB calling
thousands (perhaps millions if it's one-per-row) of individual DELETE
statements based on a ctid column. Aside from WAL behavior concerns,
it is flooding my postgresql server logs since I log any DML.
On top of that, I'm told that a remote DELETE wouldn't be
transactional, so if I were to compare inserted rows vs deleted rows
and found a mismatch, I couldn't just rollback the DELETE. I plan to
verify this with a small test case later but for now I'll assume this
to be true.
Right now I'm thinking of falling back to the far-less-elegant method
of dumping the data to a flat file via COPY, running psql to connect
to the archive DB remotely and running a COPY to load the data (or
maybe transferring the flat file to the archive DB to load it there,
offloading that part of the workload), then deleting the data from
the main DB. I could capture the rows dumped in a control table and
compare the rows deleted against that and then rollback the delete if
necessary.
Like I said, not elegant, but I don't want to risk losing data that
wasn't successfully archived to the archive DB. I'm very interested
to hear what others might be doing for tasks like this.
It might not be elegant, but a COPY / DELETE / LOAD is granular, so
you can restart at any point.
I might be inclined to COPY/LOAD/check/DELETE