I have a Production machine which is having objects dropped/created/truncated at all hours of the day (Read: No zero activity window). I have multiple standbys (repmgr streaming replication) for this machine including a cascading standby. Each night, I am attempting to take a logical backup on the standby databases via pg_dump of key schemas.
Recently, due to the activity on the primary, pg_dump is failing on the standby usually with "ERROR: could not obtain lock on relation."
I've had the following settings set in postgresql.conf which gave me successful backups for a while:
hot_standby = on # "off" disallows queries during recovery
max_standby_archive_delay = -1 # max delay before canceling queries
max_standby_streaming_delay = -1 # max delay before canceling queries
hot_standby_feedback = on # send info from standby to prevent
max_standby_streaming_delay = -1 # max delay before canceling queries
hot_standby_feedback = on # send info from standby to prevent
wal_receiver_timeout = 300s # time that receiver waits for
I have it set up this way because I don't mind any replication lag on the standbys during the logical backup. However, recently logical backups have been failing either due to a table dropped/truncated on the master.
Also, I use pg_dump with the parallel option in directory format. However, even single threaded pg_dump fails when a table is truncated on the primary.
Is there any way to guarantee consistent logical backups on a standby server with a master that has constant DDL/activity?
I am on Postgres 10.3; RHEL 7; 128gb RAM
Thanks,
Arjun