Re: Streaming replication - 11.5

Adrian Klaver <adrian.klaver@xxxxxxxxxxx> · Fri, 13 Mar 2020 08:15:49 -0700

On 3/13/20 4:11 AM, Nicola Contu wrote:
So in the logs I now see this :

2020-03-13 11:03:42 GMT [10.150.20.22(45294)] [27804]: [1-1] 
db=[unknown],user=replicator LOG:  terminating walsender process due to 
replication timeout

Yeah that's been showing up the log snippets you have been posting.

To figure this out you will need to:

1) Make a list of what changed since the last time replication worked 
consistently.

2) Monitor the changed components, start logging or increase logging.

3) Monitor the chain of replication as whole, to catch changes that you 
do not know about. Since you seem to be operating across data centers 
that would include verifying the network.

So I tried increasing the wal_sender_timeout to 300s but it did not help

Il giorno gio 12 mar 2020 alle ore 15:56 Nicola Contu 
<nicola.contu@xxxxxxxxx <mailto:nicola.contu@xxxxxxxxx>> ha scritto:

    The encryption is at os level. So the drives are encrypted with a
    password where the db saves data

    Il gio 12 mar 2020, 15:51 Adrian Klaver <adrian.klaver@xxxxxxxxxxx
    <mailto:adrian.klaver@xxxxxxxxxxx>> ha scritto:

        On 3/12/20 4:31 AM, Nicola Contu wrote:
         > The replicator is ok and the replicated as well.
         > %Cpu(s):  0.2 us,  1.0 sy,  0.0 ni, 94.8 id,  4.0 wa,  0.0
        hi,  0.0 si,
         >   0.0 st
         >
         > CPU is really low on both.
         >
         > I am running pg_basebackup again everytime.
         > Any other suggestions?
         >

        I have to believe their is a connection between changing to
        encrypting
        the disks and your issues. Not sure what, but to help how is the
        encryption being done and what program is being used?

        -- 
        Adrian Klaver
        adrian.klaver@xxxxxxxxxxx <mailto:adrian.klaver@xxxxxxxxxxx>

--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx