Re: Queries are failing on standby server

Fernando Hevia <fhevia@xxxxxxxxx> · Fri, 26 Jul 2024 13:56:42 -0300

> > The problem of this behaviour is that the standby server starts to fall behind in relation
> > to the master, a scenario which presents a risky condition: if the master happens to fail
> > while the replica is delayed you end up with data loss.

No, because the WAL is replayed.
What happens is that promoting the standby will take longer if it has to replay a lot of WAL.

I definitely messed that one up. Thanks for the correction Laurenz.

Regards,
Fernando.

El vie, 26 jul 2024 a la(s) 2:08 a.m., Laurenz Albe (laurenz.albe@xxxxxxxxxxx) escribió:
On Thu, 2024-07-25 at 22:59 -0400, Keith Fiske wrote:

> On Thu, Jul 25, 2024 at 7:57 PM Fernando Hevia <fhevia@xxxxxxxxx> wrote:

> > I think you might have misinterpreted the explanation given to you. The cancellation of the

> > query on the standby server isn't related to the load on the primary server. It happens that

> > when you run queries on a hot standby, the replication is temporarily paused in order to not

> > modify data the running queries on the standby server need.

Replication (applying the WAL information) is only paused if there is a conflict.

Even when replay is paused, the WAL is still replicated to the standby and piles up there.

> > Once the queries end, replication resumes.

> > The problem of this behaviour is that the standby server starts to fall behind in relation

> > to the master, a scenario which presents a risky condition: if the master happens to fail

> > while the replica is delayed you end up with data loss.

No, because the WAL is replayed.

What happens is that promoting the standby will take longer if it has to replay a lot of WAL.

> > To avoid having a standby server lagging too far behind Postgres will cancel long running

> > queries on the replica. The parameter max_standby_streaming_delay defines the maximum

> > replication delay the standby will tolerate. Default is 30 seconds. Increase the value to

> > allow for longer running queries on the standby server bearing in mind that you could end

> > up with data loss if the master fails at the wrong moment.

Yes, increasing "max_standby_streaming_delay" is the correct solution.

You can set it to -1 to prevent any queries on the standby from bein cancelled.

> > A working alternative is to have one standby server exclusively for replication purposes

> > and another standby for reporting/read-only queries where you can increase the

> > max_standby_streaming_delay to accommodate your long running queries. Of course, this will

> > require additional computing and storage resources.

That is good advice.

> > > 

> This is all true, but the hot_standby_feedback option is the way to get around needing to

> worry about replication delay all together.

No, because there are other kinds of replication conflicts.  The most frequent are:

- lock conflicts

  They can occur whenever an ACCESS EXCLUSIVE lock on the primary conflicts with

  a query on the standby.  The most frequent cause is VACUUM truncation (which can

  be disabled for individual tables).

- buffer pin conflicts

  It depends on the workload if you get them, but you cannot get rid of them.

Yours,

Laurenz Albe