Re: Postgresql 9.2 has standby server lost data?

Paula Price <pj8abug@xxxxxxxxx> · Tue, 23 Jun 2015 17:49:55 -0600

The script I used to check the lag time between the primary and the standby would show that the standby server was not even close, right?

Paula

On Sat, Jun 20, 2015 at 9:51 AM, Jerry Sievers <gsievers19@xxxxxxxxxxx> wrote:
Adrian Klaver <adrian.klaver@xxxxxxxxxxx> writes:

> On 06/19/2015 01:05 PM, Paula Price wrote:

>

>>

>>

>> On Fri, Jun 19, 2015 at 12:01 PM, Adrian Klaver

>> <adrian.klaver@xxxxxxxxxxx <mailto:adrian.klaver@xxxxxxxxxxx>> wrote:

>>

>>     On 06/18/2015 05:45 PM, Paula Price wrote:

>>

>>         I have Postgresql 9.2.10 streaming replication set up with log

>>         shipping in

>>         case the replication falls behind.  I discovered that the

>>         log-shipping had

>>         been disabled at some point in time.  I enabled the log shipping

>>         again.

>>

>>         If at some point in time the streaming replication fell behind

>>         and the

>>         standby server was not able to retrieve the necessary WAL

>>         file(s) from the

>>         primary, would the standby server continue to function

>>         normally?  Do I need

>>         to rebuild the standby server?  I have restarted the standby

>>         server and it

>>         is up and running with no issues.

>>

>>

>>     Well that seems at odds with it being unable to retrieve the WAL

>>     files. This leads to these questions:

>>

>>     1) What makes you think it did not retrieve the WAL files via streaming?

>>

>> 

>> It _may_ _not _have fallen behind via replication.  We do have standby

>> servers that fall behind, but since we have log-shipping it is not a

>> concern.  On this server, i have no idea how long we were running

>> without log-shipping.  I have no idea how many log files I would have to

>> go through to find out when log-shipping stopped.

>> My basic question is:

>> If a standby server falls behind with streaming replication AND the

>> standby server cannot obtain the WAL file needed from the primary, will

>> you get an error from the standby server?  Or does it just hiccup and

>> try to carry on?

>

> No it will fall over:

I wouldn't describe it that way...

To a user, the standby will function and appear normal, unless they

notice that the data is not current.

In the server logs, there will be indications that replication is stuck

waiting for WAL.

HTH

> http://www.postgresql.org/docs/9.2/interactive/runtime-config-replication.html#RUNTIME-CONFIG-REPLICATION-SENDER

>

> wal_keep_segments (integer)

>

>     Specifies the minimum number of past log file segments kept in the

> pg_xlog directory, in case a standby server needs to fetch them for

> streaming replication. Each segment is normally 16 megabytes. If a

> standby server connected to the sending server falls behind by more

> than wal_keep_segments segments, the sending server might remove a WAL

> segment still needed by the standby, in which case the replication

> connection will be terminated. Downstream connections will also

> eventually fail as a result. (However, the standby server can recover

> by fetching the segment from archive, if WAL archiving is in use.)

>

>     This sets only the minimum number of segments retained in pg_xlog;

> the system might need to retain more segments for WAL archival or to

> recover from a checkpoint. If wal_keep_segments is zero (the default),

> the system doesn't keep any extra segments for standby purposes, so

> the number of old WAL segments available to standby servers is a

> function of the location of the previous checkpoint and status of WAL

> archiving. This parameter can only be set in the postgresql.conf file

> or on the server command line.

>

> When you started up if the necessary WAL files where not on the server

> you would have seen Postgres throwing errors in the log.

>

> I would check out the below to verify:

>

> http://www.postgresql.org/docs/9.2/interactive/warm-standby.html#STREAMING-REPLICATION

>

> 25.2.5.2. Monitoring

>

>>

>>

>>     2) What does the postgres log show at the time you restarted the

>>     standby?

>>

>>         2015-06-18 01:12:41.871 UTC::@:[28168]:2015-06-18 01:12:41 UTC:

>>         LOG:  00000: database system was shut down in recovery at

>>         2015-06-18 01:12:14 UTC

>>

>>         2015-06-18 01:12:41.871 UTC::@:[28168]:2015-06-18 01:12:41 UTC:

>>         LOCATION:  StartupXLOG, xlog.c:6298

>>

>>         2015-06-18 01:12:41.904 UTC::@:[28168]:2015-06-18 01:12:41 UTC:

>>         LOG:  00000: entering standby mode

>>

>>         2015-06-18 01:12:41.904 UTC::@:[28168]:2015-06-18 01:12:41 UTC:

>>         LOCATION:  StartupXLOG, xlog.c:6384

>>

>>         2015-06-18 01:12:41.987 UTC::@:[28168]:2015-06-18 01:12:41 UTC:

>>         LOG:  00000: redo starts at 867/FDF32E18

>>

>>         2015-06-18 01:12:41.987 UTC::@:[28168]:2015-06-18 01:12:41 UTC:

>>         LOCATION:  StartupXLOG, xlog.c:6855

>>

>>         2015-06-18 01:12:42.486

>>         UTC::[unknown]@[unknown]:[28213]:2015-06-18 01:12:42 UTC: LOG:

>>           00000: connection received: host=[local]

>>

>>         2015-06-18 01:12:42.486

>>         UTC::[unknown]@[unknown]:[28213]:2015-06-18 01:12:42 UTC:

>>         LOCATION:  BackendInitialize, postmaster.c:3501

>>

>>         2015-06-18 01:12:42.486

>>         UTC:[local]:postgres@postgres:[28213]:2015-06-18 01:12:42 UTC:

>>         FATAL:  57P03: the database system is starting up

>>

>>         2015-06-18 01:12:42.486

>>         UTC:[local]:postgres@postgres:[28213]:2015-06-18 01:12:42 UTC:

>>         LOCATION:  ProcessStartupPacket, postmaster.c:1792

>>

>>         2015-06-18 01:12:43.488

>>         UTC::[unknown]@[unknown]:[28270]:2015-06-18 01:12:43 UTC: LOG:

>>           00000: connection received: host=[local]

>>

>>         2015-06-18 01:12:43.488

>>         UTC::[unknown]@[unknown]:[28270]:2015-06-18 01:12:43 UTC:

>>         LOCATION:  BackendInitialize, postmaster.c:3501

>>

>>         2015-06-18 01:12:43.488

>>         UTC:[local]:postgres@postgres:[28270]:2015-06-18 01:12:43 UTC:

>>         FATAL:  57P03: the database system is starting up

>>

>>         2015-06-18 01:12:43.488

>>         UTC:[local]:postgres@postgres:[28270]:2015-06-18 01:12:43 UTC:

>>         LOCATION:  ProcessStartupPacket, postmaster.c:1792

>>

>>         2015-06-18 01:12:44.489

>>         UTC::[unknown]@[unknown]:[28327]:2015-06-18 01:12:44 UTC: LOG:

>>           00000: connection received: host=[local]

>>

>>         2015-06-18 01:12:44.489

>>         UTC::[unknown]@[unknown]:[28327]:2015-06-18 01:12:44 UTC:

>>         LOCATION:  BackendInitialize, postmaster.c:3501

>>

>>         2015-06-18 01:12:44.489

>>         UTC:[local]:postgres@postgres:[28327]:2015-06-18 01:12:44 UTC:

>>         FATAL:  57P03: the database system is starting up

>>

>>         2015-06-18 01:12:44.489

>>         UTC:[local]:postgres@postgres:[28327]:2015-06-18 01:12:44 UTC:

>>         LOCATION:  ProcessStartupPacket, postmaster.c:1792

>>

>>         2015-06-18 01:12:44.490 UTC::@:[28168]:2015-06-18 01:12:41 UTC:

>>         LOG:  00000: consistent recovery state reached at 868/112AF7F8

>>

>>         2015-06-18 01:12:44.490 UTC::@:[28168]:2015-06-18 01:12:41 UTC:

>>         LOCATION:  CheckRecoveryConsistency, xlog.c:7405

>>

>>         2015-06-18 01:12:44.490 UTC::@:[28168]:2015-06-18 01:12:41 UTC:

>>         LOG:  00000: invalid record length at 868/112AFB00

>>

>>         2015-06-18 01:12:44.490 UTC::@:[28168]:2015-06-18 01:12:41 UTC:

>>         LOCATION:  ReadRecord, xlog.c:4078

>>

>>         2015-06-18 01:12:44.490 UTC::@:[28166]:2015-06-18 01:12:41 UTC:

>>         LOG:  00000: database system is ready to accept read only

>>         connections

>>

>>         2015-06-18 01:12:44.490 UTC::@:[28166]:2015-06-18 01:12:41 UTC:

>>         LOCATION:  sigusr1_handler, postmaster.c:4314

>>

>>     

>>

>>        I need to know if the

>>

>>         data integrity has been compromised.

>>

>>         I have run this query to determine the lag time for the

>>         standby(in case

>>         this tells me anything):

>>         "SELECT now(), now() - pg_last_xact_replay_timestamp()  AS time_lag;

>>         RESULT:

>>         "2015-06-19 00:40:48.83701+00";"00:00:01.078616"

>>

>>

>>         Thank you,

>>         Paula P

>>

>>

>>

>>     --

>>     Adrian Klaver

>>     adrian.klaver@xxxxxxxxxxx <mailto:adrian.klaver@xxxxxxxxxxx>

>>

>>

>

>

> --

> Adrian Klaver

> adrian.klaver@xxxxxxxxxxx

--

Jerry Sievers

Postgres DBA/Development Consulting

e: postgres.consulting@xxxxxxxxxxx

p: 312.241.7800