Re: [EXTERNAL] Re: PostgreSQL-12 replication failover, pg_rewind fails

Mariya Rampurawala <Mariya.Rampurawala@xxxxxxxxxxx> · Wed, 13 May 2020 04:58:15 +0000

Thank you Kyotaro and Laurenz for your quick responses.
This helped me get my setup working.

Regards,
Mariya

On 13/05/20, 6:51 AM, "Kyotaro Horiguchi" <horikyota.ntt@xxxxxxxxx> wrote:

    At Tue, 12 May 2020 17:29:50 +0200, Laurenz Albe <laurenz.albe@xxxxxxxxxxx> wrote in 
    > On Tue, 2020-05-12 at 09:40 +0000, Mariya Rampurawala wrote:
    > > > but if the target cluster ran for a long time after the divergence,
    > > > the old WAL files might no longer be present. In that case, they can
    > > > be manually copied from the WAL archive to the pg_wal directory, or
    > > > fetched on startup by configuring primary_conninfo or restore_command.
    > > 
    > > I hit this issue every time I follow the aforementioned steps, manually as well as with scripts.
    > > How long is "long time after divergence"? Is there a way I can make some
    > > configuration changes so that I donʼt hit this issue?
    > > Is there anything I must change in my restore command?

    As mentioned in the documentation, pg_rewind uses the WAL records
    startng from the last checkpoint just before the divergence point. The
    divergence point is shown as the follows in your log messages.

    > pg_rewind: servers diverged at WAL location 6/B9FFFFD8 on timeline 53

    pg_rewind scans backward starting from that location to find a
    checkpoint record, which is the oldest WAL record pg_rewind needs.  As
    you see it is not a matter of calculation.  There's no other way than
    actually restoring WAL segment files to read and try finding.

    > What you can do is to use a higher value for "wal_keep_segments".
    > Then PostgreSQL will keep around that number of old WAL segments,
    > which increases the chance for "pg_rewind" to succeed.

    So this is one effective way to reduce the chance to lose required WAL
    (segment) files.

    On PG12, an easy way to automatically restore all required WAL files
    would be restoring the WAL file every time pg_rewind complains that it
    is missing.

    Or, you could use pg_waldump to find a checkpoint record.

    regards.

    -- 
    Kyotaro Horiguchi
    NTT Open Source Software Center