Hi friends,
We have a relatively large database (~5TB) receiving very frequent INSERTs/UPDATEs, and we’d like to create a logical replica of it. To put things in perspective, we often generate up to 10 WAL-segments (16MB each) per second (wal_level=replica).
We successfully set up logical replication in our smaller staging DB, but the scale of production is making things difficult.
---
Here are some approaches we considered:
1. `CREATE SUBSCRIPTION … WITH ( copy_data=true )`
This would be ideal, except we’re afraid of accumulating too much WAL on the publishing server while waiting for ~5TB to copy over. We could theoretically increase the size of the drive to something like 10TB, but that’s not currently an option for us.
There’s also the issue of performance. It’s unclear what impact `copy_data=true` would have on our DB read/write throughput, and we’re hesitant to try it.
2. Start from a base-backup and catch-up with `restore_command`:
`SELECT pg_replication_slot_advance( pg_current_wal_lsn() )` could work nicely with a matching `recovery_target_lsn` in `recovery.conf`.
We already use `archive_command` with `wal-g` to store WAL and base-backups to GCS. Unfortunately, by the time our base-backup is downloaded onto the replica, it’s already at least 6 hours old. And restoring with `restore_command` runs at a similar speed to `archive_command`, so it remains ~5-8 hours behind as long as it’s in standby.
Our postgres process isn’t limited by CPU, memory, network, or disk-speed on the replica. It doesn’t seem to be working at its full capacity, which suggests a configuration issue.
Please look at our configurations below and let us know if there are any options we could tweak in `postgresql.conf` to speed up the WAL ingestion for `restore_command`.
3. Start from a base-backup and catch-up with the `pg_wal` directory:
We could write a script to grab WAL-segments from GCS and place them into the `pg_wal` directory. This is what `restore_command` is technically doing, but we might be able to squeeze some efficiency out of a custom script. Maybe.
—
Our server has 32 cores and 120GB of memory with the following selected configs: ``` shared_buffers = 30GB huge_pages = try maintenance_work_mem = 1GB max_files_per_process = 2000 effective_io_concurrency = 100 max_worker_processes = 32 max_parallel_maintenance_workers = 2 max_parallel_workers_per_gather = 0 parallel_leader_participation = on max_parallel_workers = 32 wal_level = archive wal_buffers = 16MB checkpoint_timeout = 30min max_wal_size = 100GB min_wal_size = 1GB checkpoint_completion_target = 0.9 max_wal_senders = 20 wal_keep_segments = 10 effective_cache_size = 90GB ```
Thanks for all the help!
ts |