Re: streaming replication timeout error

Adrian Klaver <adrian.klaver@xxxxxxxxx> · Wed, 09 Oct 2013 19:15:46 -0700

On 10/09/2013 05:51 PM, 高健 wrote:
Hello:

Thanks for replying.

The recovery.conf file on standby(DB2) is like that:

standby_mode             = 'on'
primary_conninfo         = 'host=DB1 port=5432 application_name=testpg
user=postgres connect_timeout=10 keepalives_idle=5 keepalives_interval=1'
recovery_target_timeline = 'latest'
restore_command          = 'scp -o "ConnectTimeout 5" -i
/opt/PostgresPlus/9.2AS/.ssh/id_edb
DB1:/opt/PostgresPlus/9.2AS/data/arch/%f %p'

I  am not familiar with the scp command,  I think that here scp is used
to copy archive wal log files from primary  to standby...

Maybe the ConnectionTimeout is too small, And sometimes when network is
not very well,
the restore_command will fail and return FATAL error?

In fact I am a little confused about restore_command, we are using
streaming replication, but why restore_command is still needed to copy
archive wal log, isn't it  the old warm standby (file shipping)?

Best explanation is in the docs:

http://www.postgresql.org/docs/9.3/static/warm-standby.html
"
At startup, the standby begins by restoring all WAL available in the 
archive location, calling restore_command. Once it reaches the end of 
WAL available there and restore_command fails, it tries to restore any 
WAL available in the pg_xlog directory. If that fails, and streaming 
replication has been configured, the standby tries to connect to the 
primary server and start streaming WAL from the last valid record found 
in archive or pg_xlog. If that fails or streaming replication is not 
configured, or if the connection is later disconnected, the standby goes 
back to step 1 and tries to restore the file from the archive again. 
This loop of retries from the archive, pg_xlog, and via streaming 
replication goes on until the server is stopped or failover is triggered 
by a trigger file.
"

Basically by having a restore_command and primary_conninfo you are 
telling the standby to do both, following the sequence described above.

FYI ConnectTimeout is a SSH option passed to scp.

man ssh_config will get you more information.

Would seem both your streaming and archiving are using the same network, 
is that correct?

If so you have a single point of failure, the network.

Best Regards
jian gao

--
Adrian Klaver
adrian.klaver@xxxxxxxxx

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general