Re: pg 8.3 replication causing corruption

Merlin Moncure <mmoncure@xxxxxxxxx> · Thu, 13 Oct 2011 16:20:07 -0500

On Thu, Oct 13, 2011 at 4:07 PM, Bob Hatfield <bobhatfield@xxxxxxxxx> wrote:
>> have you had any power events?  hard shutdowns, etc? I wonder if the problem is in the clog files, and not the heap itself.
>
> Nothing unusual for as long as I can tell.  Reminder that as long as I
> don't restart the primary's pg process, everything works fine
> (secondary's data is intact).
>
> It's as if stopping/starting the primary causes a shipped wal file to
> be corrupt or contain duplicated data then processed by the secondary.

My money is on clog/visibility  related issues.  It's a bit of a bear,
but can you pull the xmin/xmax/ctid for the two duplicate records on
the standby and the correspondingly non-duplicated record on the
master?  I'm curious if the heap blocks are identical and if the
standby is incorrectly marking a transaction as valid/invalid.

>From there,

We need to:
*) figure out the transaction bits in clog on both systems and look
them up there.
*) also, look for differences in clog generally
*) digest the heap block containing the records to see if they are identical
*) double check hint bits?

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general