Re: Fastest option to transfer db?

Israel Brewster <ijbrewster@xxxxxxxxxx> · Tue, 14 Sep 2021 09:03:22 -0800

On Sep 14, 2021, at 1:42 AM, Tomas Pospisek <tpo2@xxxxxxxxxxxxx> wrote:

I'm potentiall facing the same problem and would be interested in the solution. Is there any particular howto you followed?

I used https://hevodata.com/learn/postgresql-logical-replication/, but I rather suspect there are better options, as I had a number of issues with that one. Essentially, it seems to boil down to the following steps:

1) Set up a new “slave” database. I used a pg_dump -s command piped through to psql to transfer schema only to the new DB
2) Configure both master and slave postgresql servers with wal_level=logical in the postgresql.conf file. Will require a server restart, which stinks if you need absolutely 0 downtime, but ideally should only take a few seconds.
3) On the master server, run the SQL command 

CREATE PUBLICATION publication_name FOR ALL TABLES; 

Presumably this should be issued while in the database you wish to replicate, and you can specify specific tables only if desired (though I haven’t looked into how). There is a note in that how-to about the published table needing a REPLICA IDENTITY to replicate DELETE and UPDATE operations, I still need to look into that more. For my case, however, I have an insert-only workflow, so it wasn’t an issue.

4) Finally, on the slave server, run the command 

CREATE SUBSCRIPTION subscription_name CONNECTION '...<connection string> ...' PUBLICATION my_publication;

where connection string is something like:

‘host=my.remote.host port=5432 dbname=my_big_db'

...And that’s it, replication should kick off by syncing up the data from master to slave, and after that changes should be sent in real time.

One gotcha that I ran into: my table used the postgis extension, which was set up on the slave DB when I ran the pg_dump commands. Creating that extension creates *and populates* a table named “spatial_ref_sys”. Since I had created the publication as “FOR ALL TABLES”, the replication tried to include this table. Which failed due to UNIQUE constraint violations. At which point replication stopped AS DID ALL INSERTS INTO THE MASTER DB!

I deleted all records from said table on the slave DB, replication kicked in and re-populated it, and life went on, but I lost something like 14 hours of data thanks to this mistake. Guess that’s what I get for not keeping an eye on the process :-( Lesson learned: make sure any tables to be replicated are empty in the slave before issuing the CREATE SUBSCRIPTION command.

For cutover, I plan the following, we’ll see how it goes:

1) move all read-only processes over to the new server. That can be done at any point after the above, with little or no downtime depending on your processes.
2) Reconfigure writing process to point to the slave, and stop them
3) Issue a DROP SUBSCRIPTION subscription_name on the slave, thereby “promoting” it to master
4) Restart writing processes, now pointed to the new master

Steps 2-4 should happen in quick succession, resulting in only seconds of downtime. At least, that’s the theory. We’ll see how it goes (or if anyone else here can offer a better procedure!)

---
Israel Brewster
Software Engineer
Alaska Volcano Observatory 
Geophysical Institute - UAF 
2156 Koyukuk Drive 
Fairbanks AK 99775-7320
Work: 907-474-5172
cell:  907-328-9145

Also at some point I'd like to cut of the link between the two DBs promote the copy to be the master and delete the original DB. Have you figured out the correct step for the cut-over to happen?
*t

On 13.09.21 23:10, Israel Brewster wrote:
Ok, I have logical replication up-and-running (I guess - seemed to simple to be working. Shouldn’t it be complicated, requiring many steps and configuration changes?), maxing out one CPU core on each machine (more or less), and showing network throughput of around 15M. If DU changes are to be believed, it’s transferring data at about 0.8GB/minute, implying something like a 8 hour transfer time.
Of course, since it is replication, it has the benefit that any data that comes in during that 8 hour window should also be replicated, after which the two systems should remain in sync allowing for zero (or nearly so) downtime cutover. Which is nice.
Any gotchas I need to be aware of during this initial transfer window, such as WAL files building up on the source machine?
---
Israel Brewster
Software Engineer
Alaska Volcano Observatory
Geophysical Institute - UAF
2156 Koyukuk Drive
Fairbanks AK 99775-7320
Work: 907-474-5172
cell:  907-328-9145
On Sep 13, 2021, at 10:10 AM, Michael Lewis <mlewis@xxxxxxxxxxx <mailto:mlewis@xxxxxxxxxxx>> wrote:

What version of Postgres is the source? Can you make use of logical replication?