>I see if no standby connect to master when synchronous_standby_names = > '*', all commit will delay to standby connect to master. It is good. So I think the commit is sync between master and standby, But why the master delete the WAL segment before the standby commit when the standby connected ? -----邮件原件----- 发件人: pgsql-general-owner@xxxxxxxxxxxxxx [mailto:pgsql-general-owner@xxxxxxxxxxxxxx] 代表 Condor 发送时间: 2012年4月9日 21:33 收件人: pgsql-general@xxxxxxxxxxxxxx 主题: Re: [streaming replication] 9.1.3 streaming replication bug ? On 09.04.2012 13:33, 乔志强 wrote: > I use postgresql-9.1.3-1-windows-x64.exe on windows 2008 R2 x64. > > 1 master and 1 standby. The standby is a synchronous standby use > streaming replication (synchronous_standby_names = '*', archive_mode = > off), the master output: > standby "walreceiver" is now the synchronous standby with > priority 1 the standby output: > LOG: streaming replication successfully connected to primary > > Then run the test program to write and commit large blob(10 to 1000 MB > bytes rand size) to master server use 40 threads(40 sessions) in loop, > The Master and standby is run on the same machine, and the client run > on another machine with 100 mbps network. > > > But after some minutes the master output: > requested WAL segment XXX has already been removed the standby > output: > FATAL: could not receive data from WAL stream: FATAL: > requested WAL segment XXX > has already been removed > > > Question: > Why the master deletes the WAL segment before send to standby in > synchronous mode? It is a streaming replication bug ? > > > I see if no standby connect to master when synchronous_standby_names = > '*', all commit will delay to standby connect to master. It is good. > > Use a bigger wal_keep_segments? But I think the master should keep > all WAL segments not sent to online standby (sync or async). > wal_keep_segments shoud be only for offline standby. > > If use synchronous_standby_names for sync standby, if no online > standby, all commit will delay to standby connect to master, So > wal_keep_segments is only for offline async standby actually. > > > > //////////////////////////////////////// > > master server output: > LOG: database system was interrupted; last known up at 2012-03-30 > 15:37:03 HKT > LOG: database system was not properly shut down; automatic recovery > in progress > > LOG: redo starts at 0/136077B0 > LOG: record with zero length at 0/17DF1E10 > LOG: redo done at 0/17DF1D98 > LOG: last completed transaction was at log time 2012-03-30 > 15:37:03.148+08 > FATAL: the database system is starting up > LOG: database system is ready to accept connections > LOG: autovacuum launcher started > ///////////////////// the standby is a synchronous standby > LOG: standby "walreceiver" is now the synchronous standby with > priority 1 > ///////////////////// > LOG: checkpoints are occurring too frequently (16 seconds apart) > HINT: Consider increasing the configuration parameter > "checkpoint_segments". > LOG: checkpoints are occurring too frequently (23 seconds apart) > HINT: Consider increasing the configuration parameter > "checkpoint_segments". > LOG: checkpoints are occurring too frequently (24 seconds apart) > HINT: Consider increasing the configuration parameter > "checkpoint_segments". > LOG: checkpoints are occurring too frequently (20 seconds apart) > HINT: Consider increasing the configuration parameter > "checkpoint_segments". > LOG: checkpoints are occurring too frequently (22 seconds apart) > HINT: Consider increasing the configuration parameter > "checkpoint_segments". > FATAL: requested WAL segment 000000010000000000000032 has already > been removed > FATAL: requested WAL segment 000000010000000000000032 has already > been removed > FATAL: requested WAL segment 000000010000000000000032 has already > been removed > LOG: checkpoints are occurring too frequently (8 seconds apart) > HINT: Consider increasing the configuration parameter > "checkpoint_segments". > FATAL: requested WAL segment 000000010000000000000032 has already > been removed > > > > //////////////////////// > standby server output: > LOG: database system was interrupted while in recovery at log time > 2012-03-30 1 > 4:44:31 HKT > HINT: If this has occurred more than once some data might be > corrupted and you might need to choose an earlier recovery target. > LOG: entering standby mode > LOG: redo starts at 0/16E4760 > LOG: consistent recovery state reached at 0/12D984D8 > LOG: database system is ready to accept read only connections > LOG: record with zero length at 0/17DF1E68 > LOG: invalid magic number 0000 in log file 0, segment 50, offset > 6946816 > LOG: streaming replication successfully connected to primary > FATAL: could not receive data from WAL stream: FATAL: requested WAL > segment 00 > 0000010000000000000032 has already been removed Well, that is not a bug, just activate archive_mode = on on the master server and set also wal_keep_segments = 1000 for example to avoid that situation. I had the same situation, after digging on search engines that was recomended settings. Well I forgot real reason why, may be was too slow sending / receiving data from master / sleave, but this fix the problem. Regards, Condor -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general