Re: 'replication checkpoint has wrong magic' on the newly cloned replicas

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 29. Nov 2017, at 18:52, Stephen Frost <sfrost@xxxxxxxxxxx> wrote:

Greetings,

On Wed, Nov 29, 2017 at 12:41 Oleksii Kliukin <oleksii@xxxxxxxxxxxx> wrote:
Hi Stephen,

> On 29. Nov 2017, at 15:54, Stephen Frost <sfrost@xxxxxxxxxxx> wrote:
>
> Greetings,
>
> * Alex Kliukin (alexk@xxxxxxxxxxxx) wrote:
>> The cloning itself is done by copying a compressed image via ssh,
>> running the
>> following command from the replica:
>>
>> """ssh {master} 'cd {master_datadir} && tar -lcp --exclude "*.conf" \
>>         --exclude "recovery.done" \
>>         --exclude "pacemaker_instanz" \
>>         --exclude "dont_start" \
>>         --exclude "pg_log" \
>>         --exclude "pg_xlog" \
>>         --exclude "postmaster.pid" \
>>         --exclude "recovery.done" \
>>           * | pigz -1 -p 4' | pigz -d -p 4 | tar -xpmUv -C
>>           {slave_datadir}""
>>
>> The WAL archiving starts before the copy starts, as the script that
>> clones the
>> replica checks that the WALs archiving is running before the cloning.
>
> Maybe you've doing it and haven't mentioned it, but you have to use
> pg_start/stop_backup

Sorry for not mentioning it, as it seemed obvious, but we are calling pg_start_backup and pg_stop_backup at the right time.

Ah, not something I can assume, heh. 

Then it depends on which version of PG and if you’re able to run start/stop on the replica or not. If you can’t run it on the replica and have to run it on the primary (prior to 9.6) then you need to make sure to wait for things to happen on the primary and for that to be replicated before you can start.  

We are using exclusive backups from the master.  First, the script checks that WAL files are shipped to the NFS, where the replica expects to find them (we check the md5 checksum of the file in order to make sure that the NFS actually delivers the file that the master has archived) . Then pg_start_backup runs on the master and its status is checked. On success, the copy command runs. When the copy command finishes, pg_stop_backup is executed. Once pg_stop_backup finishes successfully, replica configuration files (postgesql.conf, pg_hba.conf. pg_ident.conf)  are linked from their location in the repository and the replica is started.

This is a fairly typical procedure, which, I believe, is also well described in the docs.


If you’re on 9.6 and using non-exclusive backup, you need to be sure to capture the contents of the stop backup and write it into backup_label before you start the system back up. 

We don’t use non-exclusive backups altogether.

Cheers,
Alex

[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux