On 5/31/20 12:47 PM, Andrus wrote:
Hi!
pg_basebackup takes 8 hours.
After it is finished, replication slave does not start:
LOG: consistent recovery state reached at 2DE/985A5BE0
LOG: database system is ready to accept read only connections
LOG: started streaming WAL from primary at 2DE/99000000 on timeline 1
replikaator@[unknown] LOG: received replication command: SHOW
data_directory_mode
replikaator@[unknown] LOG: received replication command: IDENTIFY_SYSTEM
replikaator@[unknown] LOG: received replication command:
START_REPLICATION 2CF/E9000000 TIMELIN
replikaator@[unknown] ERROR: requested WAL segment
00000001000002CF000000E9 has already been re
replikaator@[unknown] LOG: received replication command: SHOW
data_directory_mode
replikaator@[unknown] LOG: received replication command: IDENTIFY_SYSTEM
replikaator@[unknown] LOG: received replication command:
START_REPLICATION 2CF/E9000000 TIMELIN
replikaator@[unknown] ERROR: requested WAL segment
00000001000002CF000000E9 has already been removed
There's your problem ^
...
i tried it again and same error occured.
How to force replication to start?
If the WAL is gone you can't.
More below.
I increased wal parameters in master to
wal_compression=on
max_wal_size = 5GB
min_wal_size = 4GB # was 80MB
wal_keep_segments= 360 # was 180
Will this allow replication to start after pg_basebackup ?
According to doc min_wal_size and wal_keep_segments both keep the
minimum number of wal segments for replication.
No it doesn't:
https://www.postgresql.org/docs/12/runtime-config-replication.html
"wal_keep_segments (integer)
Specifies the minimum number of past log file segments kept in the
pg_wal directory, in case a standby server needs to fetch them for
streaming replication. Each segment is normally 16 megabytes. If a
standby server connected to the sending server falls behind by more than
wal_keep_segments segments, the sending server might remove a WAL
segment still needed by the standby, in which case the replication
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
connection will be terminated. Downstream connections will also
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
eventually fail as a result. (However, the standby server can recover by
fetching the segment from archive, if WAL archiving is in use.)
...
https://www.postgresql.org/docs/12/runtime-config-wal.html
"min_wal_size (integer)
As long as WAL disk usage stays below this setting, old WAL files
are always recycled for future use at a checkpoint, rather than removed.
This can be used to ensure that enough WAL space is reserved to handle
spikes in WAL usage, for example when running large batch jobs. If this
value is specified without units, it is taken as megabytes. The default
is 80 MB. This parameter can only be set in the postgresql.conf file or
on the server command line.
"
I'm guessing are looking for:
https://www.postgresql.org/docs/12/runtime-config-replication.html
"
26.2.6. Replication Slots
Replication slots provide an automated way to ensure that the master
does not remove WAL segments until they have been received by all
standbys, and that the master does not remove rows which could cause a
recovery conflict even when the standby is disconnected.
...
"
This is spelled out here:
https://www.postgresql.org/docs/12/warm-standby.html#STREAMING-REPLICATION
"If you use streaming replication without file-based continuous
archiving, the server might recycle old WAL segments before the standby
has received them. If this occurs, the standby will need to be
reinitialized from a new base backup. You can avoid this by setting
wal_keep_segments to a value large enough to ensure that WAL segments
are not recycled too early, or by configuring a replication slot for the
standby. If you set up a WAL archive that's accessible from the standby,
these solutions are not required, since the standby can always use the
archive to catch up provided it retains enough segments."
Why those parameters are duplicated?
Andrus.
--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx