Search Postgresql Archives

Re: WARNINGs after starting backup server created with PITR

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Jan 19, 2008, at 1:24 PM, Simon Riggs wrote:

So we definitely have missing data. I think the multi-phase rsync is
definitely suspect and should be avoided until we get to the bottom of
this.

By this, do you mean the way we ran the rsync multiple times before performing the actual base backup? Easy enough.

Can you examine the server logs for that 11 hour period and see if there
are any other messages that might be relevant on both primary and
standby servers?

On the primary, all relevant log entries are just entries for successfully logged WALs.

Can you save the WAL files covering that period also. We may want to
inspect them later to confirm whether the data was actually in them or
not. Can you save the very first WAL files after the recovery started?
We can use those to examine the first data blocks touched, which would
help confirm when the gun was fired.

Sure, I still have all of the WALs that were shipped from just before the beginning of the base back up until about 8 or 9 hours after the missing WAL file. I'm not sure if I'll be able to get to it today as I've got lots of household chores to get to, but either tomorrow of first thing Monday I'll zip them and post them here.

Has rsync produced any messages of note? Might your script be ignoring
errors? What versions of rsync are you using?

Version 2.6.9 on all three servers (1 primary, 2 standbys).

Here's the python script I was using to ship to both servers, right now I'm back to a direct rsync call for my archive_command to sb1. What's really weird, is that for the two WALs that disappeared, or didn't make it, even though the primary's log said they were successfully archived, they weren't on either server.

#####
#!/usr/bin/env python
from commands import getstatusoutput
import sys

source_path = sys.argv[1]
file_name = sys.argv[2]

# db2
res, out = getstatusoutput("ssh db2 'test -f /mnt/xfer/wal_archive/% s'" % file_name)
if res != 0:
res, out = getstatusoutput("rsync -a %s db2:/mnt/xfer/wal_archive/" % source_path)

if res != 0:
	sys.exit(res)

# sb1
res, out = getstatusoutput("test -f /mnt/xfer/wal_archive/%s" % file_name)
if res != 0:
res, out = getstatusoutput("rsync -a %s /mnt/xfer/wal_archive", source_path)

sys.exit(res)
#####

Is the History table Insert only? We might use that fact to examine the
LSNs of the equivalent blocks on the Primary. If the LSNs are prior to
the start of the recovery, as noted in the backup label file of the
original base backup, then we can confirm rsync or postgres as the
cause. Do you still have the base backup label file or base backup?

I still have the backup label but not the base backup. The history tables are about 90-95% insert only, on average, but they do see the occasional update if a message's delivery status is a bounce. If you think this would still be an option, I'd be happy to follow whatever instructions, or supply any files, needed.

Erik Jones

DBA | Emma®
erik@xxxxxxxxxx
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com




---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux