Re: Fwd: standby stop replicating, then picked back up

Michael Paquier <michael.paquier@xxxxxxxxx> · Wed, 8 Nov 2017 09:32:19 +0900

On Wed, Nov 8, 2017 at 5:17 AM, Laurenz Albe <laurenz.albe@xxxxxxxxxxx> wrote:
> chris kim wrote:
>> I had a standby hang for a while, not replicating, but then it fixed
>> itself but I'm not sure why it happened in the first place. What would I
>> look into to see why this happened, or any insight into why is greatly
>> appreciated.
>
> You give us precious little information.
>
> If there is nothing suspicious in the log, and hot standby is enabled,
> and the standby is configured appropriately, it could be that a conflicting
> query on the standby block WAL application for a while.

I am understanding here the following: if a standby is stopped for a
long time, would it be able to catch up automatically? This is mainly
a matter of WAL segments recycled on the primary (or a standby for
cascading streaming). In short, when the primary completes two
checkpoints, it recycles or renames past WAL segments in pg_xlog that
it does not need for recovery because it is able to recover to a
consistent state.

If the standby uses a replication slot for recovery, then you could
allow a standby to plug in back as long as the primary's pg_xlog does
not get bloated too much, at which point the partition where pg_xlog
is located would cause the primary to go down because of space
exhaustion. Using a WAL archive can be worthy if standbys are taken
down for a long time though, with a proper recovery command, or a WAL
segment range copy, you could allow a standby to recover from an
earlier point. Strategies to adopt mainly depend on if taking a full
backup is more costly than a range of WAL segments, so the data folder
size of the primary instance matters as a decision-making parameter.
-- 
Michael

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general