Re: recovery question

Lee Azzarello <lee@xxxxxxxxxx> · Wed, 25 Feb 2009 10:40:02 -0500

Is 0000000100000C28000000B1 the same size as the other segments?

-lee

2009/2/25 Mark Steben <msteben@xxxxxxxxxxxxxxx>:
> Hi listers,
>
>
>
> Here is my problem.  I am running PITR restore on a machine remote from my
> production machine.
>
> I'm shipping logs over there, compressed, then uncompressing them and
> copying them to pg_xlog.
>
> Everything works fine until a network outage creates a gap in my logs.
>
> The recovery terminates at log  "0000000100000C28000000B1" and brings the
> database up
>
> Because it can't find "0000000100000C28000000B2".
>
> Log "0000000100000C28000000B3" is copied over but I wish to restart recovery
> at B2.
>
> So I scp B2 over from my primary machine from a folder that I created for
> just such an occasion.
>
>
>
> Now I rename recovery.done to recovery.conf  (Copied here for your
> convenience)
>
>
>
> 'sh /usr/local/postgresql-8.2.5/bin/copy.sh %f %p 2>>/tmp/recovery.log'
>
>
>
> (and copy.sh:)
>
>
>
> REQ_FILE=$1
>
> DEST=$2
>
> LF="${REQ_FILE}.lock"
>
> SUFFIX=${REQ_FILE##*.}
>
> ###############################################################
>
> ## check if file is transaction log or informational file
>
> ## if transaction log, cat from archlog and uncompress into unzipped folder
>
> ## if informational simply copy into unzipped folder (it came over
> uncompressed)
>
> #####################################################################################
>
> if [ "${SUFFIX}" != 'history' ] && [ "${SUFFIX}" != 'backup' ]; then
>
>   cat "/logs/var/backups/archlog/${REQ_FILE}" | gzip -dc  >
> "/logs/var/backups/unzipped/${REQ_FILE}"
>
>   if [ "$?" = "0" ] ;
>
>   then
>
>      echo 'successful uncompress of  '
> "/logs/var/backups/unzipped/${REQ_FILE}" >> /tmp/restore.mavmail.log
>
>   else
>
>      echo 'unsuccessful uncompress of  '
> "/logs/var/backups/unzipped/${REQ_FILE}" >> /tmp/restore.mavmail.log
>
>      echo 'the return code is ' "$?" >> /tmp/restore.mavmail.log
>
>   fi
>
> else
>
>   cp "/logs/var/backups/archlog/${REQ_FILE}"
> "/logs/var/backups/unzipped/${REQ_FILE}"
>
> fi
>
> #######################################################################################
>
> ##  check for size.  If not a full size (16777216) trans log, the copy from
>
> ##   cobra is still in progress. Don't copy this file. Stop recovery here.
>
> #######################################################################################
>
> SIZE=$(ls -gG1 "/logs/var/backups/unzipped/${REQ_FILE}" | awk '{ print $3}'
> )
>
> echo "The size of the log to be restored is " "${SIZE}" >>
> /tmp/restore.mavmail.log
>
> if [ "${SUFFIX}" != 'history' ] && [ "${SUFFIX}" != 'backup' ]; then
>
>   if [ "${SIZE}" != '16777216' ]; then
>
>     echo 'partially written log - not restored - finishing recovery' >>
> /tmp/restore.mavmail.log
>
>     exit 0
>
>   fi
>
> fi
>
>
>
> /usr/bin/lockfile "${LF}"
>
> ################################################################
>
> ## copy either full sized trans log or informational file
>
> ## into pg_xlog data cluster.
>
> ################################################################
>
>  cp "/logs/var/backups/unzipped/${REQ_FILE}"  "${DEST}"
>
> rm -f "${LF}"
>
> rm "/logs/var/backups/unzipped/${REQ_FILE}"
>
>
>
> (END)
>
>
>
> Now when I try to restart, hoping to begin recovery with the C2 log I get an
> invalid checkpoint error:
>
>
>
> : LOG:  starting archive recovery
>
> Feb 25 10:08:10 ar-db3 postgres[32538]: [3-1] @: LOG:  restore_command = "sh
> /usr/local/postgresql-8.2.5/bin/copy.sh %f %p 2>>/tmp/recovery.log"
>
> Feb 25 10:08:11 ar-db3 postgres[32538]: [4-1] @: LOG:  restored log file
> "0000000100000C28000000B1" from archive
>
> Feb 25 10:08:11 ar-db3 postgres[32538]: [5-1] @: LOG:  invalid record length
> at C28/B1FFECA4
>
> Feb 25 10:08:11 ar-db3 postgres[32538]: [6-1] @: LOG:  invalid primary
> checkpoint record
>
> Feb 25 10:08:12 ar-db3 postgres[32538]: [7-1] @: LOG:  restored log file
> "0000000100000C28000000B1" from archive
>
> Feb 25 10:08:12 ar-db3 postgres[32538]: [8-1] @: LOG:  invalid record length
> at C28/B1FFEC5C
>
> Feb 25 10:08:12 ar-db3 postgres[32538]: [9-1] @: LOG:  invalid secondary
> checkpoint record
>
> Feb 25 10:08:12 ar-db3 postgres[32538]: [10-1] @: PANIC:  could not locate a
> valid checkpoint record
>
> Feb 25 10:08:12 ar-db3 postgres[32537]: [1-1] @: LOG:  startup process (PID
> 32538) was terminated by signal 6
>
> Feb 25 10:08:12 ar-db3 postgres[32537]: [2-1] @: LOG:  aborting startup due
> to startup process failure
>
>
>
> I remove the recovery.conf file, successfully start the database and issue a
> checkpoint.  I try the restore again and get the same error.
>
>
>
> So, is there a way that I can force the recovery to begin at B2 or am I dead
> in the water and need to bring in another full file copy and
>
> Start from scratch:
>
>
>
> Thanks for your time.
>
>
>
> Mark Steben│Database Administrator│
>
> @utoRevenue-(R)- "Join the Revenue-tion"
> 95 Ashley Ave. West Springfield, MA., 01089
> 413-243-4800 x1512 (Phone) │ 413-732-1824 (Fax)
>
> @utoRevenue is a registered trademark and a division of Dominion Enterprises
>
>
>
>

-- 
Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin