Re: streaming replication: one problem & several questions

Fujii Masao <masao.fujii@xxxxxxxxx> · Tue, 16 Aug 2011 13:34:28 +0900

On Thu, Aug 11, 2011 at 7:19 AM, Lonni J Friedman <netllama@xxxxxxxxx> wrote:
> First the problem.  On *only* one of the two standby servers, I'm
> seeing errors like the following whenever I issue any SQL commands on
> the master which write (insert, update, etc) to the database:
> LOG:  invalid record length at 8/7A000020
> FATAL:  terminating walreceiver process due to administrator command
> LOG:  invalid record length at 8/7A0000B0
> LOG:  streaming replication successfully connected to primary
> LOG:  invalid record length at 8/7B000020
> FATAL:  terminating walreceiver process due to administrator command
> LOG:  record with zero length at 8/7B0000B0
> LOG:  streaming replication successfully connected to primary
> LOG:  record with incorrect prev-link 8/79000058 at 8/7D0000B0
> LOG:  streaming replication successfully connected to primary

Did you use gcc4.6 or later to build PostgreSQL9.0? If yes, you would
face the same problem reported before;
http://archives.postgresql.org/pgsql-hackers/2011-06/msg00661.php

This problem was fixed, and the fix will be included in next minor update
(i.e., 9.0.5).
http://archives.postgresql.org/pgsql-committers/2011-06/msg00101.php

Of course, you can avoid the problem by building PostgreSQL with
pre-4.6 gcc.

> 0) I've successfully setup the WAL archiving on the master, and set
> archive_timeout=61.  However, what I'm seeing is that new files are
> not getting generated every 61 seconds, but instead only when some
> kind of SQL is invoked which writes to the database, or every 305
> seconds (whichever comes first).

First of all, you don't need to set archive_timeout. Without archive_timeout,
streaming replication transfers WAL records from the master to the standby
in almost real time.

archive_timeout doesn't always generate new WAL file for each timeout.
If there is no write workload, WAL file generation by archive_timeout is
skipped.

OTOH, checkout generates write workload, so archive_timeout after checkpoint
always creates new WAL file. Since (I guess) you set checkpoint_timeout
to 5min, you observed WAL file generation for each about 5min.

> 1) Both of the wiki links above comment that the restore_command may
> not be necessary if wal_keep_segments is large enough (mine is set to
> 128).  I was going to setup the restore_command anyway, as I'm not yet
> confident enough about streaming replication and failover with
> postgresql to take chances, although the fact that i have two standby
> servers makes this setup a bit more complex.  However, can anyone
> comment about whether its ever truly safe 100% of the time to run
> without a restore_command ?

Specifically, what problem are you concerned about?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general