Re: postgres wal sender replication timeout during pg_basebackup

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Peter Brunnengräber wrote:
>    I brought the database files over from the current single production postgres server. By this I
> mean I shutdown postgres and tar-ed up the data directory and copied it over the the cluster's Master
> node. I put the files in place, set the permissions, and was able to start-up postgres on the Master
> via corosync just fine.
> 
>    In preparing the slave, I used the pg_basebackup tool to bring the database over from the Master
> and this is where I keep having issues. As it is transferring, at about 57% I see the error:
> 
> >  $ pg_basebackup -h db-master -U u_repl -D /db/data/postgresql/9.2/main/ -X stream -P
> >  pg_basebackup: could not receive data from WAL stream: SSL connection has been closed unexpectedly
> >  176472/176472 kB (100%), 1/1 tablespace
> >  pg_basebackup: child process exited with error 1`
> 
> And on the server, I see:
> 
> >  2016-04-06 21:05:31 UTC LOG:  terminating walsender process due to replication timeout
> 
>   But the transfer doesn't stop and keeps going to completion.
> 
>   I found this [http://dba.stackexchange.com/questions/59916/streaming-replication-log-is-puzzling-me]
> question on stackexchange about setting "ssl_renegotiation_limit" to 0, but this didn't make much
> difference.
> 
>   Anyone have any ideas? I didn't find any reference to this problem in the mailing list archives. I
> am completely baffled as to why this would error, but keep on going. Maybe this isn't a problem at
> all?  It is the same procedure I used in the lab setup... the only difference is that the production
> database is much bigger in size.

ssl_renegotiation_limit would also have been my first guess.
What PostgreSQL version are you running?

The server error message means that the client did not send a status update
within "wal_sender_timeout" milliseconds, see
http://www.postgresql.org/docs/current/static/runtime-config-replication.html#GUC-WAL-SENDER-TIMEOUT

Does pg_basebackup succeed if you set "wal_sender_timeout" to zero?

Is there a firewall between client and server that could swallow such messages?

Could you try without SSL (e.g. set the environment variable PGSSLMODE to "disable")
an see if that makes the problem go away?
Avoiding SSL will also greatly speed up pg_basebackup.

Yours,
Laurenz Albe

-- 
Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux