Hello Everyone!
I can simplify and describe the
issue
I faced.
I have 2 nodes in db cluster: master and standby.
I create a simple table on master
node
by a command via psql:
CREATE TABLE table1 (a INTEGER);
After this I fill the table by
COPY
command from a file which contains 2000000 (2 million) entries.
And in case when I run for
example such
a command:
UPDATE table1 SET a='1'
or such a command:
DELETE FROM table1;
I see in PostgreSQL log the an
entry:
terminating walsender process due to replication timeout.
I suppose that this issue caused by small value of
wal_sender_timeout=1s
and long runtime of the queries (it takes about 15 seconds).
What is the best way to proceed
it?
How to avoid this? Is there any additional configuration which
can help
here?
I have set mine to 15min. No problems for over 7
months, knock on wood.
Regards,
Andrei
From:
Andrei Yahorau/IBA
To:
Kyotaro HORIGUCHI
<horiguchi.kyotaro@xxxxxxxxxxxxx>,
Cc:
pgsql-general@xxxxxxxxxxxxxx,
rene.romero.b@xxxxxxxxx
Date:
17/05/2019 11:04
Subject:
Re: terminating
walsender process due to replication timeout
Hello.
Thanks for the answer.
Can frequent database operations cause getting a standby server
behind?
Is there a way to avoid this situation?
I checked that walsender works well in my test if I set
wal_sender_timeout
at least to 5 second.
Best regards,
Andrei Yahorau
From:
Kyotaro HORIGUCHI
<horiguchi.kyotaro@xxxxxxxxxxxxx>
To:
AYahorau@xxxxxxxxxxx,
Cc:
rene.romero.b@xxxxxxxxx,
pgsql-general@xxxxxxxxxxxxxx
Date:
16/05/2019 10:36
Subject:
Re: terminating
walsender process due to replication timeout
Hello.
At Wed, 15 May 2019 10:04:12 +0300, AYahorau@xxxxxxxxxxx wrote
in
<OF99D0D839.6A5BCB70-ON432583FB.0025912E-432583FB.0026D664@xxxxxx>
> Hello,
> Thank You for the response.
>
> Yes that's possible to monitor replication delay. But my
questions
were
> not about monitoring network issues.
>
> I use exactly wal_sender_timeout=1s because it allows to
detect
> replication problems quickly.
Though I don't have an exact idea of your configuration, it
seems
to me that your standby is simply getting behind more than one
second from the master. If you regard the fact as a problem of
replication, the configuration can be said to be finding the
problem correctly.
Since the keep-alive packet is sent in-band, it doesn't get to
the standby before already-sent-but-not-processed packets.
> So, I need clarification to the following questions:
> Is it possible to use exactly this configuration and be
sure
that it will
> be work properly.
> What did I do wrong? Should I correct my configuration
somehow?
> Is this the same issue as mentioned here:
> https://www.postgresql.org/message-id/e082a56a-fd95-a250-3bae-0fff93832510@xxxxxxxxxxxxxxx
> ? If it is so, why I do I face this problem again?
It is not the same "problem". What was mentioned there is fast
network making the sender-side loop busy, which prevents
keepalive packet from sending.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
--
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt
|