I was able to make pg_basebackup working using --max-rate=128M
Still don't understand why. I guess it is related to the encryption and slowness of the disk..
Do you have any idea?
Il giorno ven 13 mar 2020 alle ore 16:15 Adrian Klaver <adrian.klaver@xxxxxxxxxxx> ha scritto:
On 3/13/20 4:11 AM, Nicola Contu wrote:
> So in the logs I now see this :
>
> 2020-03-13 11:03:42 GMT [10.150.20.22(45294)] [27804]: [1-1]
> db=[unknown],user=replicator LOG: terminating walsender process due to
> replication timeout
Yeah that's been showing up the log snippets you have been posting.
To figure this out you will need to:
1) Make a list of what changed since the last time replication worked
consistently.
2) Monitor the changed components, start logging or increase logging.
3) Monitor the chain of replication as whole, to catch changes that you
do not know about. Since you seem to be operating across data centers
that would include verifying the network.
>
> So I tried increasing the wal_sender_timeout to 300s but it did not help
>
> Il giorno gio 12 mar 2020 alle ore 15:56 Nicola Contu
> <nicola.contu@xxxxxxxxx <mailto:nicola.contu@xxxxxxxxx>> ha scritto:
>
> The encryption is at os level. So the drives are encrypted with a
> password where the db saves data
>
> Il gio 12 mar 2020, 15:51 Adrian Klaver <adrian.klaver@xxxxxxxxxxx
> <mailto:adrian.klaver@xxxxxxxxxxx>> ha scritto:
>
> On 3/12/20 4:31 AM, Nicola Contu wrote:
> > The replicator is ok and the replicated as well.
> > %Cpu(s): 0.2 us, 1.0 sy, 0.0 ni, 94.8 id, 4.0 wa, 0.0
> hi, 0.0 si,
> > 0.0 st
> >
> > CPU is really low on both.
> >
> > I am running pg_basebackup again everytime.
> > Any other suggestions?
> >
>
> I have to believe their is a connection between changing to
> encrypting
> the disks and your issues. Not sure what, but to help how is the
> encryption being done and what program is being used?
>
>
> --
> Adrian Klaver
> adrian.klaver@xxxxxxxxxxx <mailto:adrian.klaver@xxxxxxxxxxx>
>
--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx