Re: Streaming Replication replay lag

Laurenz Albe <laurenz.albe@xxxxxxxxxxx> · Wed, 25 Nov 2020 15:51:38 +0100

On Wed, 2020-11-25 at 11:02 +0530, Nikhil Shetty wrote:
> 
> On Tue, Nov 24, 2020 at 9:35 PM Laurenz Albe <laurenz.albe@xxxxxxxxxxx> wrote:
> > On Tue, 2020-11-24 at 19:54 +0530, Nikhil Shetty wrote:
> > > We have one Primary and three Standby Postgresql setup. Two standby in DC and one standby in DR.
> > > 
> > > We have configured streaming replication.When there are a lot of write transactions
> > >  in the master, we observed that DR lags behind the master for 5-6 hours. We initially
> > >  thought this could be because of the slow network between DC and DR and this is normal
> > >  but when we saw the replay_lsn and receive_lsn , they had a huge difference.
> > > 
> > > DR standby was receiving the WAL's in time and there was no delay, the delay was in
> > >  replaying the lag. Can anybody point me to where I can start with the investigation?
> > > DC (Master and 2 Standby) and DR(Standby) have the same setup in terms of storage
> > >  and some database configuration parameters but DR has a little less CPU and RAM.
> > 
> > If there is a delay replaying WAL, the cause is probably a replication conflict.
> > 
> > Set "hot_standby = off" on the standby to avoid the problem totally.
> > 
> > If you want queries on the standby, set "max_standby_streaming_delay" to 0
> > to keep replication from falling behind.
>
> We don't have any queries running on the DR server so setting "hot_standby=off"
> can be done and we will observe the WAL replay but ss there a way to identify
> that replay lag is because of the queries running on DR. I don't see any long
> running queries in pg_stat_activity to
> suggest that queries are not allowing the replay to happen.

You could look for conflicts in "pg_stat_database" on the standby.

If that is not the cause, perhaps the network is too slow for the amount of
WAL, or the standby cannot replay fast enough (the startup process is a
single thread).

Yours,
Laurenz Albe