Re: Wal archive way behind in streaming replication

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Well, having read and understanding the source code in pgarch.c, I saw nothing dangerous for performing these steps. Maybe there is something deeper in there, but it just seemed odd that the standby isn't recognizing the new files considering they arrived by traditional means. I'm going to go back and study the source a little more.

On 6/25/2014 12:42 PM, Jerry Sievers wrote:
John Scalia <jayknowsunix@xxxxxxxxx> writes:

A little examination of the pgarch.c file showed what the archive
process on the primary is doing. Anyway, to ensure that the primary
knows that it has transmitted all the up to date WALs, I went into the
primary's data/pg_xlog/archive_status directory and performed "touch
00000003000000900000036.ready" and repeated this command for the other
WALs up to *44.ready. This really shouldn't have been a problem as the
most recent WAL file in pg_xlog was *45. The archiver then picked up
all those WAL files and transmitted them to the standbys. At least I
saw them appear on the standby in the directory specified in the
recovery.conf file.

Now, what I really don't understand is the standby's behavior. After
the WALs arrived, I saw nothing in today's pg_log/Wed.log file showing
it saw them. I then issued a service postgresql-9.3 restart and this
is what was spit out in the log:

LOG: entering standby mode
LOG: restored log file "00000000300000000900000035" from archive
LOG: unexpected pageaddr 9/1B000000 in log segment 00000000300000000900000035, offset 0
LOG: started streaming WAL from primary at 9/35000000 on timeline 3
FATAL: the database system is starting up
LOG: consistent recovery state reached at 9/350000C8
LOG: redo starts at 9/350000C8
LOG: database system is ready to accept read only connections

Two things stand out here. First, the standby didn't seem to process the newly arrived WAL files, and second. what's with the FATAL: in the logfile?
I'd suggest you ...

1. Toss out that standby instance.
2. Re-read all manual sections regarding hot backup/PITR/streaming
    replication etc.
3. Start fresh.
--
I would not trust a standby instance after possibly corrupting it by
having to frob the .ready files on master.

A standby server configured as hybrid streamer/WAL shipper should...

1. Stream and/or read WAL segments from master's xlog directory when
    wal_keep_segments permits it.
2. Fetch WALs from a remote repository when it can't get  a feed
    directly from master.

There is no manual touching of .ready files needed and I can imagine
doing so   could be harmful in certain situations.

HTH



Jay

On 6/24/2014 2:52 PM, Andrew Krause wrote:
You shouldn’t have to touch the files as long as they aren’t compressed.  You may have to restart the standby instance to get the recovery to begin though.  I’d suggest tailing your instance log and restarting the standby instance.  It should show that the logs from the gap are applying right away at startup.


Andrew Krause




On Jun 24, 2014, at 1:19 PM, John Scalia <jayknowsunix@xxxxxxxxx> wrote:

Ok, I did the copy from pg_xlog directory into the restore.conf specifieddirectory. The standby servers seem fine with that, however, just copying does not inform the primary that
the copy has happened. The archive_status directory under pg_xlog on the primary still thinks the last WAL sent was *B7 and yet it's now writing *C9. When I did the copy it was
only up to *C7 and nothing else has shown in the standby's directory.

Now, the *.done files in archive_status are just zero length, but I'm a bit hesitant to just do a touch for the ones I manually copied as I don't know if this is from an in-memory
queue or if it Postgresql reads the contents of this regularly in order to decide what to copy.

Is that safe to do?

On 6/24/2014 9:56 AM, Andrew Krause wrote:
You can copy all of the WAL logs from your gap to the standby.  If you place them in the correct location (directory designated for restore) theinstance will automatically apply them all.


Andrew Krause



On Jun 23, 2014, at 9:24 AM, John Scalia <jayknowsunix@xxxxxxxxx> wrote:

Came in this morning to numerous complaints from pgpool about the standby servers being behind from the primary. Looking into it, no WAL files had been transferred since late Friday. All I did was restart the primaryand the WAL archving resumed, however, looking at the WAL files on the standby servers, this is never going to catch up. Now, I've got the archive_timeout on the primary = 600 or 10 minutes and I see WAL files in pg_xlog every 10 minutes. As they show up on the standby servers, they're also 10 minutes apart, but the primary is writing *21 and the standby's areonly up to *10. Now, like I said prior, with there being 10 minutes (600seconds) between transfers (the same pace as the WALs are generated) it will never catch up. Is this really the intended behavior? How would I get the additional WAL files over to the standbys without waiting 10 minutes to copy them one at a time?
--
Jay


--
Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin




--
Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin





[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux