Re: streaming replication does not work across datacenter with 20ms latency?

Yan Chunlu <springrider@xxxxxxxxx> · Mon, 25 Jul 2011 00:20:15 +0800

I did the SR procedure again, still no luck:

is that normal that after start slave postgresql, the first line of log is:
database system was interrupted; last known up at 2011-07-24 10:53:38 CDT??

4760 2011-07-24 10:55:58 CDT 2011-07-24 10:55:58 CDT @ LOG:  database
system was interrupted; last known up at 2011-07-24 10:53:38 CDT
4760 2011-07-24 10:55:58 CDT 2011-07-24 10:55:58 CDT @ LOG:  entering
standby mode
4762 2011-07-24 10:55:59 CDT 2011-07-24 10:55:59 CDT postgres@postgres
[local]FATAL:  the database system is starting up
4761 2011-07-24 10:55:59 CDT 2011-07-24 10:55:59 CDT @ LOG:  streaming
replication successfully connected to primary
4764 2011-07-24 10:55:59 CDT 2011-07-24 10:55:59 CDT postgres@postgres
10.28.53.11(53442)FATAL:  the database system is starting up
4770 2011-07-24 10:56:00 CDT 2011-07-24 10:56:00 CDT postgres@postgres
[local]FATAL:  the database system is starting up
4802 2011-07-24 10:56:01 CDT 2011-07-24 10:56:01 CDT postgres@postgres
[local]FATAL:  the database system is starting up
4760 2011-07-24 10:56:01 CDT 2011-07-24 10:56:01 CDT @ LOG:  redo
starts at 57/6B002028
4760 2011-07-24 10:56:01 CDT 2011-07-24 10:56:01 CDT @ LOG:  invalid
record length at 57/6B20E010
4761 2011-07-24 10:56:01 CDT 2011-07-24 10:56:01 CDT @ FATAL:
terminating walreceiver process due to administrator command
4760 2011-07-24 10:56:01 CDT 2011-07-24 10:56:01 CDT @ LOG:  invalid
magic number 0000 in log file 87, segment 107, offset 2490368
4847 2011-07-24 10:56:02 CDT 2011-07-24 10:56:02 CDT postgres@postgres
[local]FATAL:  the database system is starting up
4850 2011-07-24 10:56:02 CDT 2011-07-24 10:56:02 CDT postgres@postgres
10.28.53.11(53443)FATAL:  the database system is starting up
4851 2011-07-24 10:56:03 CDT 2011-07-24 10:56:03 CDT postgres@postgres
[local]FATAL:  the database system is starting up
4860 2011-07-24 10:56:04 CDT 2011-07-24 10:56:04 CDT postgres@postgres
[local]FATAL:  the database system is starting up
4865 2011-07-24 10:56:05 CDT 2011-07-24 10:56:05 CDT postgres@postgres
[local]FATAL:  the database system is starting up
4859 2011-07-24 10:56:05 CDT 2011-07-24 10:56:05 CDT @ LOG:  streaming
replication successfully connected to primary
4874 2011-07-24 10:56:06 CDT 2011-07-24 10:56:06 CDT postgres@postgres
[local]FATAL:  the database system is starting up
4869 2011-07-24 10:56:06 CDT 2011-07-24 10:56:06 CDT
postgres@template1 10.28.53.11(53444)FATAL:  the database system is
starting up
4879 2011-07-24 10:56:07 CDT 2011-07-24 10:56:07 CDT postgres@postgres
[local]FATAL:  the database system is starting up
4760 2011-07-24 10:56:07 CDT 2011-07-24 10:56:07 CDT @ LOG:  invalid
record length at 57/6B2BA010
4859 2011-07-24 10:56:07 CDT 2011-07-24 10:56:07 CDT @ FATAL:
terminating walreceiver process due to administrator command
4760 2011-07-24 10:56:07 CDT 2011-07-24 10:56:07 CDT @ LOG:  invalid
magic number 0000 in log file 87, segment 107, offset 2883584
4887 2011-07-24 10:56:08 CDT 2011-07-24 10:56:08 CDT postgres@postgres
[local]FATAL:  the database system is starting up
4888 2011-07-24 10:56:08 CDT 2011-07-24 10:56:08 CDT @ LOG:  streaming
replication successfully connected to primary
4892 2011-07-24 10:56:09 CDT 2011-07-24 10:56:09 CDT postgres@postgres
[local]FATAL:  the database system is starting up
4896 2011-07-24 10:56:09 CDT 2011-07-24 10:56:09 CDT
postgres@template1 10.28.53.11(53445)FATAL:  the database system is
starting up
4901 2011-07-24 10:56:10 CDT 2011-07-24 10:56:10 CDT postgres@postgres
[local]FATAL:  the database system is starting up
4906 2011-07-24 10:56:11 CDT 2011-07-24 10:56:11 CDT postgres@postgres
[local]FATAL:  the database system is starting up
4760 2011-07-24 10:56:11 CDT 2011-07-24 10:56:11 CDT @ LOG:  invalid
record length at 57/6B486010
4888 2011-07-24 10:56:11 CDT 2011-07-24 10:56:11 CDT @ FATAL:
terminating walreceiver process due to administrator command
4760 2011-07-24 10:56:11 CDT 2011-07-24 10:56:11 CDT @ LOG:  invalid
magic number 0000 in log file 87, segment 107, offset 4849664

On Sun, Jul 24, 2011 at 8:46 PM, Yan Chunlu <springrider@xxxxxxxxx> wrote:
> checkpoint_segments = 64
> wal_keep_segments = 128
>
> On Sun, Jul 24, 2011 at 8:25 PM, Tomas Vondra <tv@xxxxxxxx> wrote:
>> On 24 Červenec 2011, 6:09, Yan Chunlu wrote:
>>> thanks for all the help!
>>>
>>> @Adrian:  yes, only one instance on each machine
>>>
>>> not the slave finally started and could be connect, replication didn't
>>> begin, just following errors:
>>> https://gist.github.com/1102225
>>
>> These errors just mean the master already removed WAL segments, so the
>> slave can't actually start the replication because there'd be a gap. This
>> usually happens with enough write activity (inserts, updates) when the
>> slave is being setup.
>>
>> Whaht is your wal_keep_segments value? Increase it or set up WAL
>> archiving, so that the slave can get the data.
>>
>> Tomas
>>
>>
>

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general