Re: replication behind high lag

Lonni J Friedman <netllama@xxxxxxxxx> · Mon, 25 Mar 2013 13:25:48 -0700

On Mon, Mar 25, 2013 at 1:23 PM, AI Rumman <rummandba@xxxxxxxxx> wrote:
>
>
> On Mon, Mar 25, 2013 at 4:03 PM, AI Rumman <rummandba@xxxxxxxxx> wrote:
>>
>>
>>
>> On Mon, Mar 25, 2013 at 4:00 PM, Lonni J Friedman <netllama@xxxxxxxxx>
>> wrote:
>>>
>>> On Mon, Mar 25, 2013 at 12:55 PM, AI Rumman <rummandba@xxxxxxxxx> wrote:
>>> >
>>> >
>>> > On Mon, Mar 25, 2013 at 3:52 PM, Lonni J Friedman <netllama@xxxxxxxxx>
>>> > wrote:
>>> >>
>>> >> On Mon, Mar 25, 2013 at 12:43 PM, AI Rumman <rummandba@xxxxxxxxx>
>>> >> wrote:
>>> >> >
>>> >> >
>>> >> > On Mon, Mar 25, 2013 at 3:40 PM, Lonni J Friedman
>>> >> > <netllama@xxxxxxxxx>
>>> >> > wrote:
>>> >> >>
>>> >> >> On Mon, Mar 25, 2013 at 12:37 PM, AI Rumman <rummandba@xxxxxxxxx>
>>> >> >> wrote:
>>> >> >> > Hi,
>>> >> >> >
>>> >> >> > I have two 9.2 databases running with hot_standby replication.
>>> >> >> > Today
>>> >> >> > when I
>>> >> >> > was checking, I found that replication has not been working since
>>> >> >> > Mar
>>> >> >> > 1st.
>>> >> >> > There was a large database restored in master on that day and I
>>> >> >> > believe
>>> >> >> > after that the lag went higher.
>>> >> >> >
>>> >> >> > SELECT pg_xlog_location_diff(pg_current_xlog_location(), '0/0')
>>> >> >> > AS
>>> >> >> > offset
>>> >> >> >
>>> >> >> > 431326108320
>>> >> >> >
>>> >> >> > SELECT pg_xlog_location_diff(pg_last_xlog_receive_location(),
>>> >> >> > '0/0')
>>> >> >> > AS
>>> >> >> > receive,
>>> >> >> > pg_xlog_location_diff(pg_last_xlog_replay_location(),
>>> >> >> > '0/0')
>>> >> >> > AS replay
>>> >> >> >
>>> >> >> >    receive    |    replay
>>> >> >> > --------------+--------------
>>> >> >> >  245987541312 | 245987534032
>>> >> >> > (1 row)
>>> >> >> >
>>> >> >> > I checked the pg_xlog in both the server. In Slave the last xlog
>>> >> >> > file
>>> >> >> > -rw------- 1 postgres postgres 16777216 Mar  1 06:02
>>> >> >> > 00000001000000390000007F
>>> >> >> >
>>> >> >> > In Master, the first xlog file is
>>> >> >> > -rw------- 1 postgres postgres 16777216 Mar  1 04:45
>>> >> >> > 00000001000000390000005E
>>> >> >> >
>>> >> >> >
>>> >> >> > Is there any way I could sync the slave in quick process?
>>> >> >>
>>> >> >> generate a new base backup, and seed the slave with it.
>>> >> >
>>> >> >
>>> >> > OK. I am getting these error in slave:
>>> >> > LOG:  invalid contrecord length 284 in log file 57, segment 127,
>>> >> > offset
>>> >> > 0
>>> >> >
>>> >> > What is the actual reason?
>>> >>
>>> >> Corruption?  What were you doing when you saw the error?
>>> >
>>> >
>>> > I did not have enough idea about these stuffs. I got the database now
>>> > and
>>> > saw the error.
>>> > Is there any way to recover from this state. The master database is a
>>> > large
>>> > database of 500 GB.
>>>
>>> generate a new base backup, and seed the slave with it.  if the error
>>> persists, then i'd guess that your master is corrupted, and then
>>> you've got huge problems.
>>
>>
>> Master is running fine right now showing only a warning:
>> WARNING:  archive_mode enabled, yet archive_command is not set
>>
>> Do you think the master could be corrupted?
>>
>
> Hi,
>
> I got the info that there was a master db restart on Feb 27th. Could this be
> a reason of this error?
>

restarting the database cleanly should never cause corruption.  again,
you need to create a new base backup, and seed the slave with it.  if
the problem persists, then the master is likely corrupted.

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general