Re: Postgresql 9.2 has standby server lost data?

Adrian Klaver <adrian.klaver@xxxxxxxxxxx> · Fri, 19 Jun 2015 13:35:17 -0700

On 06/19/2015 01:05 PM, Paula Price wrote:

On Fri, Jun 19, 2015 at 12:01 PM, Adrian Klaver
<adrian.klaver@xxxxxxxxxxx <mailto:adrian.klaver@xxxxxxxxxxx>> wrote:

    On 06/18/2015 05:45 PM, Paula Price wrote:

        I have Postgresql 9.2.10 streaming replication set up with log
        shipping in
        case the replication falls behind.  I discovered that the
        log-shipping had
        been disabled at some point in time.  I enabled the log shipping
        again.

        If at some point in time the streaming replication fell behind
        and the
        standby server was not able to retrieve the necessary WAL
        file(s) from the
        primary, would the standby server continue to function
        normally?  Do I need
        to rebuild the standby server?  I have restarted the standby
        server and it
        is up and running with no issues.

    Well that seems at odds with it being unable to retrieve the WAL
    files. This leads to these questions:

    1) What makes you think it did not retrieve the WAL files via streaming?

It _may_ _not _have fallen behind via replication.  We do have standby
servers that fall behind, but since we have log-shipping it is not a
concern.  On this server, i have no idea how long we were running
without log-shipping.  I have no idea how many log files I would have to
go through to find out when log-shipping stopped.
My basic question is:
If a standby server falls behind with streaming replication AND the
standby server cannot obtain the WAL file needed from the primary, will
you get an error from the standby server?  Or does it just hiccup and
try to carry on?

No it will fall over:

http://www.postgresql.org/docs/9.2/interactive/runtime-config-replication.html#RUNTIME-CONFIG-REPLICATION-SENDER

wal_keep_segments (integer)

    Specifies the minimum number of past log file segments kept in the 
pg_xlog directory, in case a standby server needs to fetch them for 
streaming replication. Each segment is normally 16 megabytes. If a 
standby server connected to the sending server falls behind by more than 
wal_keep_segments segments, the sending server might remove a WAL 
segment still needed by the standby, in which case the replication 
connection will be terminated. Downstream connections will also 
eventually fail as a result. (However, the standby server can recover by 
fetching the segment from archive, if WAL archiving is in use.)

    This sets only the minimum number of segments retained in pg_xlog; 
the system might need to retain more segments for WAL archival or to 
recover from a checkpoint. If wal_keep_segments is zero (the default), 
the system doesn't keep any extra segments for standby purposes, so the 
number of old WAL segments available to standby servers is a function of 
the location of the previous checkpoint and status of WAL archiving. 
This parameter can only be set in the postgresql.conf file or on the 
server command line.

When you started up if the necessary WAL files where not on the server 
you would have seen Postgres throwing errors in the log.

I would check out the below to verify:

http://www.postgresql.org/docs/9.2/interactive/warm-standby.html#STREAMING-REPLICATION

25.2.5.2. Monitoring

    2) What does the postgres log show at the time you restarted the
    standby?

        2015-06-18 01:12:41.871 UTC::@:[28168]:2015-06-18 01:12:41 UTC:
        LOG:  00000: database system was shut down in recovery at
        2015-06-18 01:12:14 UTC

        2015-06-18 01:12:41.871 UTC::@:[28168]:2015-06-18 01:12:41 UTC:
        LOCATION:  StartupXLOG, xlog.c:6298

        2015-06-18 01:12:41.904 UTC::@:[28168]:2015-06-18 01:12:41 UTC:
        LOG:  00000: entering standby mode

        2015-06-18 01:12:41.904 UTC::@:[28168]:2015-06-18 01:12:41 UTC:
        LOCATION:  StartupXLOG, xlog.c:6384

        2015-06-18 01:12:41.987 UTC::@:[28168]:2015-06-18 01:12:41 UTC:
        LOG:  00000: redo starts at 867/FDF32E18

        2015-06-18 01:12:41.987 UTC::@:[28168]:2015-06-18 01:12:41 UTC:
        LOCATION:  StartupXLOG, xlog.c:6855

        2015-06-18 01:12:42.486
        UTC::[unknown]@[unknown]:[28213]:2015-06-18 01:12:42 UTC: LOG:
          00000: connection received: host=[local]

        2015-06-18 01:12:42.486
        UTC::[unknown]@[unknown]:[28213]:2015-06-18 01:12:42 UTC:
        LOCATION:  BackendInitialize, postmaster.c:3501

        2015-06-18 01:12:42.486
        UTC:[local]:postgres@postgres:[28213]:2015-06-18 01:12:42 UTC:
        FATAL:  57P03: the database system is starting up

        2015-06-18 01:12:42.486
        UTC:[local]:postgres@postgres:[28213]:2015-06-18 01:12:42 UTC:
        LOCATION:  ProcessStartupPacket, postmaster.c:1792

        2015-06-18 01:12:43.488
        UTC::[unknown]@[unknown]:[28270]:2015-06-18 01:12:43 UTC: LOG:
          00000: connection received: host=[local]

        2015-06-18 01:12:43.488
        UTC::[unknown]@[unknown]:[28270]:2015-06-18 01:12:43 UTC:
        LOCATION:  BackendInitialize, postmaster.c:3501

        2015-06-18 01:12:43.488
        UTC:[local]:postgres@postgres:[28270]:2015-06-18 01:12:43 UTC:
        FATAL:  57P03: the database system is starting up

        2015-06-18 01:12:43.488
        UTC:[local]:postgres@postgres:[28270]:2015-06-18 01:12:43 UTC:
        LOCATION:  ProcessStartupPacket, postmaster.c:1792

        2015-06-18 01:12:44.489
        UTC::[unknown]@[unknown]:[28327]:2015-06-18 01:12:44 UTC: LOG:
          00000: connection received: host=[local]

        2015-06-18 01:12:44.489
        UTC::[unknown]@[unknown]:[28327]:2015-06-18 01:12:44 UTC:
        LOCATION:  BackendInitialize, postmaster.c:3501

        2015-06-18 01:12:44.489
        UTC:[local]:postgres@postgres:[28327]:2015-06-18 01:12:44 UTC:
        FATAL:  57P03: the database system is starting up

        2015-06-18 01:12:44.489
        UTC:[local]:postgres@postgres:[28327]:2015-06-18 01:12:44 UTC:
        LOCATION:  ProcessStartupPacket, postmaster.c:1792

        2015-06-18 01:12:44.490 UTC::@:[28168]:2015-06-18 01:12:41 UTC:
        LOG:  00000: consistent recovery state reached at 868/112AF7F8

        2015-06-18 01:12:44.490 UTC::@:[28168]:2015-06-18 01:12:41 UTC:
        LOCATION:  CheckRecoveryConsistency, xlog.c:7405

        2015-06-18 01:12:44.490 UTC::@:[28168]:2015-06-18 01:12:41 UTC:
        LOG:  00000: invalid record length at 868/112AFB00

        2015-06-18 01:12:44.490 UTC::@:[28168]:2015-06-18 01:12:41 UTC:
        LOCATION:  ReadRecord, xlog.c:4078

        2015-06-18 01:12:44.490 UTC::@:[28166]:2015-06-18 01:12:41 UTC:
        LOG:  00000: database system is ready to accept read only
        connections

        2015-06-18 01:12:44.490 UTC::@:[28166]:2015-06-18 01:12:41 UTC:
        LOCATION:  sigusr1_handler, postmaster.c:4314

       I need to know if the

        data integrity has been compromised.

        I have run this query to determine the lag time for the
        standby(in case
        this tells me anything):
        "SELECT now(), now() - pg_last_xact_replay_timestamp()  AS time_lag;
        RESULT:
        "2015-06-19 00:40:48.83701+00";"00:00:01.078616"

        Thank you,
        Paula P

    --
    Adrian Klaver
    adrian.klaver@xxxxxxxxxxx <mailto:adrian.klaver@xxxxxxxxxxx>

--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general