Search Postgresql Archives

Re: Postgresql 11: terminating walsender process due to replication timeout

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



At Thu, 9 Sep 2021 14:52:25 +0900, Abhishek Bhola <abhishek.bhola@xxxxxxxxxxxxxxx> wrote in 
> I have found some questions about the same error, but didn't find any of
> them answering my problem.
> 
> The setup is that I have two Postgres11 clusters (A and B) and they are
> making use of publication and subscription features to copy data from A to
> B.
> 
> A (source DB- publication) --------------> B (target DB - subscription)
> 
> This works fine, but often (not always) when the data volume being inserted
> on a table in node A increases, it gives the following error.
> 
> "terminating walsender process due to replication timeout"
> 
> The data volume at the moment being entered is about 30K rows per second
> continuously for hours through COPY command.
> 
> Earlier the wal_sender_timeout was set to 5 sec and I would see this error
> much often. I then increased it to 1 min and the frequency of this error
> reduced. But I don't want to keep increasing it without understanding what
> is causing it. I looked at the code of walsender.c and know the exact lines
> where it's coming from.
> 
> But I am still not clear which parameter is making the sender assume that
> the receiver node is inactive and therefore it should stop the wal_sender.
> 
> Can anyone please suggest what changes I should make to remove this error?

What minor-version is the Postgres server mentioned? PostgreSQL 11
have gotten the following fix at 11.6, which could be related to the
trouble.

https://www.postgresql.org/docs/11/release-11-6.html

> Fix timeout handling in logical replication walreceiver processes
> (Julien Rouhaud)
> 
> Erroneous logic prevented wal_receiver_timeout from working in
> logical replication deployments.

The details of the fix is here.

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=3f60f690fac1bf375b92cf2f8682e8fe8f69098
> Fix timeout handling in logical replication worker
> 
> The timestamp tracking the last moment a message is received in a
> logical replication worker was initialized in each loop checking if a
> message was received or not, causing wal_receiver_timeout to be ignored
> in basically any logical replication deployments.  This also broke the
> ping sent to the server when reaching half of wal_receiver_timeout.


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center





[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux