On 10/09/2013 05:51 PM, 高健 wrote:
Hello:
Thanks for replying.
The recovery.conf file on standby(DB2) is like that:
standby_mode = 'on'
primary_conninfo = 'host=DB1 port=5432 application_name=testpg
user=postgres connect_timeout=10 keepalives_idle=5 keepalives_interval=1'
recovery_target_timeline = 'latest'
restore_command = 'scp -o "ConnectTimeout 5" -i
/opt/PostgresPlus/9.2AS/.ssh/id_edb
DB1:/opt/PostgresPlus/9.2AS/data/arch/%f %p'
I am not familiar with the scp command, I think that here scp is used
to copy archive wal log files from primary to standby...
Maybe the ConnectionTimeout is too small, And sometimes when network is
not very well,
the restore_command will fail and return FATAL error?
In fact I am a little confused about restore_command, we are using
streaming replication, but why restore_command is still needed to copy
archive wal log, isn't it the old warm standby (file shipping)?
Best explanation is in the docs:
http://www.postgresql.org/docs/9.3/static/warm-standby.html
"
At startup, the standby begins by restoring all WAL available in the
archive location, calling restore_command. Once it reaches the end of
WAL available there and restore_command fails, it tries to restore any
WAL available in the pg_xlog directory. If that fails, and streaming
replication has been configured, the standby tries to connect to the
primary server and start streaming WAL from the last valid record found
in archive or pg_xlog. If that fails or streaming replication is not
configured, or if the connection is later disconnected, the standby goes
back to step 1 and tries to restore the file from the archive again.
This loop of retries from the archive, pg_xlog, and via streaming
replication goes on until the server is stopped or failover is triggered
by a trigger file.
"
Basically by having a restore_command and primary_conninfo you are
telling the standby to do both, following the sequence described above.
FYI ConnectTimeout is a SSH option passed to scp.
man ssh_config will get you more information.
Would seem both your streaming and archiving are using the same network,
is that correct?
If so you have a single point of failure, the network.
Best Regards
jian gao
--
Adrian Klaver
adrian.klaver@xxxxxxxxx
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general