Re: FW: Setting up of PITR system.

"Rajesh Kumar Mallah" <mallah.rajesh@xxxxxxxxx> · Sat, 1 Apr 2006 09:53:41 +0530

On 3/30/06, Grega Bremec <gregab@xxxxxxx> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: RIPEMD160
>
> Rajesh Kumar Mallah wrote:
> |
> | OK i am posting my full script [ its not heavy programming i guess :) ]
> | shall be grateful if  you/someone could review it . (its well
> commented i think)
> | script also carries sample data.
> |
> | it does following
> | 1. takes base backup to a destined folder by rsync
> | 2. waits for .backup file to arrive in archive folder
> |    after pg_stop_bacup()
> | 3. searches and removes unwanted archived log files.
> |
> | I have run it many times in my server and it seems to
> | be working fine.
>
> Hello, Rajesh.
>
> Just a couple of comments on the script itself, not what it actually
> does - I never tried WAL archiving before, so I can't comment on that. I
> inserted the comments at relevant points in the script. I'm sorting them
> into three categories, one is just improvements in style, the other is
> optimization and the third is correction of an error.

Dear Grega ,

Thanks for the useful tips and error spotting,
i am incorporating some of them and testing
the script in my server . I have concerns regarding
some of your optimisations that makes the script less
generic , below find my comments.

>
> | ------------------------------------ BEGIN
> | --------------------------------------------
> | #!/bin/bash
> |
> | # folder where base_backup is put
> | BACKUPFOLDER=/mnt/disk3/base_backups
> | today=`date +%d-%m-%Y-%H-%M-%S`
> | PSQL=/opt/usr/local/pgsql/bin/psql
> | RSYNC="/usr/bin/rsync  -a"
> | PGDATADIR=/mnt/disk5/pgdatadir
> |
> | # two table spaces.
> |
> | TS1=/mnt/disk4/bigtables
> | TS2=/mnt/disk3/indexspace
>
> (optimization) Since you're using bash, you can use arrays. This could
> be better written as
>
> ~  TS[0]=/mnt/disk5/pgdatadir
> ~  TS[1]=/mnt/disk4/bigtables
> ~  TS[2]=/mnt/disk3/indexspace
>
> or even
>
> ~  TS=(/mnt/disk5/pgdatadir \
> ~      /mnt/disk4/bigtables \
> ~      /mnt/disk3/indexspace)

agreed , already incorporated.

>
> That way, you can add tablespaces at will and just use a while loop to
> back them up, which greatly simplifies adding new tablespaces or moving
> the script somewhere else. See below for how to implement that.
>
> | # folder where *archived* logs are put.
> | WAL_ARCHIVE=/mnt/wal_archive
> |
> | label=base_backup_${today}
> |
> | echo "Executing pg_start_backup with label $label in server ... "
> |
> | # get the checkpoint at which backup starts
> | # the .backup files seems to be bearing this string in it.
> |
> | CP=`$PSQL -q -Upostgres -d template1 -c "SELECT
> | pg_start_backup('$label');" -P tuples_only -P format=unaligned`
> |
> | echo "Begin CheckPoint is $CP" # this contain string like A/681D1214
> |
> | if [ $? -ne 0 ]
> |         then
> |         echo "PSQL pg_start_backup failed"
> |         exit 1;
> | fi
> | echo "pg_start_backup executed successfully"
>
> (style) If you want to capture any error messages pg_start_backup may
> have caused and store them into ${CP}, you should add 2>&1 at the end of
> the psql invocation, see below snippet.

Incorporated it.

>
> (error) Checking for exit status of pg_start_backup using $? at this
> point will never report an error, as you've used echo prior to checking
> what pg_start_backup returned. You should either move the echo below the
> if statement (by adding an "else" clause) or store the exit status of
> pg_start_backup into RVAL like this:
>
> ~  CP="`$PSQL ... 2>&1`"
> ~  RVAL=$?
> ~  echo "Begin CheckPoint says: ${CP}"
> ~  if [ ${RVAL} -ne 0 ]; then
> ~    ...
> ~  fi
>
> | echo "RSYNC begins.."
> |
> | # rsync each of the folders to the backup folder.
> | for i in $TS1 $TS2 $PGDATADIR ;
> | do
> |         echo "Syncing $i .. "
> |         time $RSYNC $i $BACKUPFOLDER
> |         echo "Done"
> | done
>
> (optimization) If you store locations into an array, you could rewrite
> this as follows:
>
> ~  CTR=0
> ~  while [ -n "${TS[${CTR}]}" ]; do
> ~    echo "Syncing ${TS[${CTR}]}..."
> ~    time ${RSYNC} ${TS[${CTR}]} ${BACKUPFOLDER}
> ~    RVAL=$?
> ~    echo "Sync finished with exit status ${RVAL}"
> ~    if [ ${RVAL} -ne 0 ]; then
> ~      <handle errors>
> ~    fi
> ~    CTR=$((CTR + 1))
> ~  done
> ~  unset CTR
>
> | # fortunately rsync does *not* seems to be exitting with non zero exit
> code
> | # for expected file disappearances and modifications.
> | if [ $? -ne 0 ]
> |         then
> |         echo "RSYNC failed"
> |         exit 1;
> | fi
>
> (error) Same error as above - what you're checking here is whether the
> last command in the last for loop run was successful, and this is always
> going to be true as echoing to stdout will never fail until stdout is
> closed for some reason.

Yes it was an error, i am doing repeat runs of the script
to find the non zero exit codes which should be treated
as normal in context of taking base backups.

>
> | echo "RSYNC Done successfully"
> |
> | echo "Executing pg_stop_backup in server ... "
> | $PSQL -Upostgres template1 -c "SELECT pg_stop_backup();"
> | if [ $? -ne 0 ]
> |         then
> |         echo "PSQL pg_stop_backup failed"
> |         exit 1;
> | fi
> | echo "pg_stop_backup done successfully

I think i should improve the style here.

> | TO_SEARCH="00${CP:4}" # $TO_SEARCH contains 1D1214
> |
> | # now remove the unneeded files.
> |
> | # strip off first 4 chars from CP and prefix 00 to the result.
> | # search the file that has the Checkpoint in its filename.
> | # it takes a while to come, so wait till it comes.
> |
> | while true; do
> |         REF_FILE=`ls -1 $WAL_ARCHIVE | grep $TO_SEARCH`
> |         if [ ! $REF_FILE ]; then
> |                 echo  "Waitng for file with $TO_SEARCH in $WAL_ARCHIVE
> ... "
> |         else
> |                 break
> |         fi
> |         sleep 1
> | done
>
> (optimization) You could simplify this significantly using the test builtin:
>
> ~  while [ ! -e ${WAL_ARCHIVE}/*.00${TO_SEARCH}.backup.bz2 ]; do
> ~    echo "Waiting for ${WAL_ARCHIVE}/*.00${TO_SEARCH}.backup.bz2"
> ~    sleep 1
> ~  done
> ~  REF_FILE="`echo ${WAL_ARCHIVE}/*.00${TO_SEARCH}.backup.bz2`"

I would not like to incorporate this becuase this code assumes WAL archives
are being externally compressed to .bz2. Can you suggest an optimized but
generic alternative ?

>
> | # REF_FILE is 000000010000000A00000068.001D1214.backup.bz2
> |
> | # take only first 24 chars and store.
> | REF_FILE_NUM=${REF_FILE:0:24}
> |
> | # REF_FILE_NUM is 000000010000000A00000068
> |
> | echo "REF_FILE_NUM=$REF_FILE_NUM"
> |
> | # iterate list of files in the WAL_ARCHIVE folder
> | for i in `ls -1 $WAL_ARCHIVE` ;
> | do
> |         # $i is :000000010000000A0000005D.bz2 eg
> |         # get first 24 chars in filename
> |         FILE_NUM=${i:0:24}
> |
> |         # compare if the number is less than the reference
> |         # here string comparison is being used.
> |         if [[ $FILE_NUM  < $REF_FILE_NUM ]]
> |         then
> |                 echo "$FILE_NUM [ $i ] removed"
> |                 rm -f $WAL_ARCHIVE/$i
> |         else
> |                 echo "$FILE_NUM [ $i ] not removed"
> |         fi
> | done
>
> (optimization) Perhaps using find -newer/-anewer/-cnewer could be used
> here to find files older than the reference file:
>
> ~  # "-not -newer" or "\! -newer" will also return REF_FILE
> ~  # so you have to grep it out and use xargs; otherwise you
> ~  # could also use the -delete action
> ~  find ${WAL_ARCHIVE} \! -newer ${REF_FILE} -type f | \
> ~    grep -v "^${REF_FILE}$" | \
> ~    xargs rm -f

Nopes , i have gone by the DOCS which tells to numerically compare the filenames
i do not want to assume more recently created files are numerically more that
later as i have not seen anything like that in the docs.

I am concerned if the comparison below used in the script

> |         if [[ $FILE_NUM  < $REF_FILE_NUM ]]

is correct , as it compares strings not numbers , i am assuming that
the results will be same as numerical comparison, as all the filenames
are padded with '0' form the left.

i thank you once again for your comments and shall post the improved version
once my observations are complete.

Regds
Rajesh Kumar Mallah.