On 3/30/06, Grega Bremec <gregab@xxxxxxx> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: RIPEMD160 > > Rajesh Kumar Mallah wrote: > | > | OK i am posting my full script [ its not heavy programming i guess :) ] > | shall be grateful if you/someone could review it . (its well > commented i think) > | script also carries sample data. > | > | it does following > | 1. takes base backup to a destined folder by rsync > | 2. waits for .backup file to arrive in archive folder > | after pg_stop_bacup() > | 3. searches and removes unwanted archived log files. > | > | I have run it many times in my server and it seems to > | be working fine. > > Hello, Rajesh. > > Just a couple of comments on the script itself, not what it actually > does - I never tried WAL archiving before, so I can't comment on that. I > inserted the comments at relevant points in the script. I'm sorting them > into three categories, one is just improvements in style, the other is > optimization and the third is correction of an error. Dear Grega , Thanks for the useful tips and error spotting, i am incorporating some of them and testing the script in my server . I have concerns regarding some of your optimisations that makes the script less generic , below find my comments. > > | ------------------------------------ BEGIN > | -------------------------------------------- > | #!/bin/bash > | > | # folder where base_backup is put > | BACKUPFOLDER=/mnt/disk3/base_backups > | today=`date +%d-%m-%Y-%H-%M-%S` > | PSQL=/opt/usr/local/pgsql/bin/psql > | RSYNC="/usr/bin/rsync -a" > | PGDATADIR=/mnt/disk5/pgdatadir > | > | # two table spaces. > | > | TS1=/mnt/disk4/bigtables > | TS2=/mnt/disk3/indexspace > > (optimization) Since you're using bash, you can use arrays. This could > be better written as > > ~ TS[0]=/mnt/disk5/pgdatadir > ~ TS[1]=/mnt/disk4/bigtables > ~ TS[2]=/mnt/disk3/indexspace > > or even > > ~ TS=(/mnt/disk5/pgdatadir \ > ~ /mnt/disk4/bigtables \ > ~ /mnt/disk3/indexspace) agreed , already incorporated. > > That way, you can add tablespaces at will and just use a while loop to > back them up, which greatly simplifies adding new tablespaces or moving > the script somewhere else. See below for how to implement that. > > | # folder where *archived* logs are put. > | WAL_ARCHIVE=/mnt/wal_archive > | > | label=base_backup_${today} > | > | echo "Executing pg_start_backup with label $label in server ... " > | > | # get the checkpoint at which backup starts > | # the .backup files seems to be bearing this string in it. > | > | CP=`$PSQL -q -Upostgres -d template1 -c "SELECT > | pg_start_backup('$label');" -P tuples_only -P format=unaligned` > | > | echo "Begin CheckPoint is $CP" # this contain string like A/681D1214 > | > | if [ $? -ne 0 ] > | then > | echo "PSQL pg_start_backup failed" > | exit 1; > | fi > | echo "pg_start_backup executed successfully" > > (style) If you want to capture any error messages pg_start_backup may > have caused and store them into ${CP}, you should add 2>&1 at the end of > the psql invocation, see below snippet. Incorporated it. > > (error) Checking for exit status of pg_start_backup using $? at this > point will never report an error, as you've used echo prior to checking > what pg_start_backup returned. You should either move the echo below the > if statement (by adding an "else" clause) or store the exit status of > pg_start_backup into RVAL like this: > > ~ CP="`$PSQL ... 2>&1`" > ~ RVAL=$? > ~ echo "Begin CheckPoint says: ${CP}" > ~ if [ ${RVAL} -ne 0 ]; then > ~ ... > ~ fi > > | echo "RSYNC begins.." > | > | # rsync each of the folders to the backup folder. > | for i in $TS1 $TS2 $PGDATADIR ; > | do > | echo "Syncing $i .. " > | time $RSYNC $i $BACKUPFOLDER > | echo "Done" > | done > > (optimization) If you store locations into an array, you could rewrite > this as follows: > > ~ CTR=0 > ~ while [ -n "${TS[${CTR}]}" ]; do > ~ echo "Syncing ${TS[${CTR}]}..." > ~ time ${RSYNC} ${TS[${CTR}]} ${BACKUPFOLDER} > ~ RVAL=$? > ~ echo "Sync finished with exit status ${RVAL}" > ~ if [ ${RVAL} -ne 0 ]; then > ~ <handle errors> > ~ fi > ~ CTR=$((CTR + 1)) > ~ done > ~ unset CTR > > | # fortunately rsync does *not* seems to be exitting with non zero exit > code > | # for expected file disappearances and modifications. > | if [ $? -ne 0 ] > | then > | echo "RSYNC failed" > | exit 1; > | fi > > (error) Same error as above - what you're checking here is whether the > last command in the last for loop run was successful, and this is always > going to be true as echoing to stdout will never fail until stdout is > closed for some reason. Yes it was an error, i am doing repeat runs of the script to find the non zero exit codes which should be treated as normal in context of taking base backups. > > | echo "RSYNC Done successfully" > | > | echo "Executing pg_stop_backup in server ... " > | $PSQL -Upostgres template1 -c "SELECT pg_stop_backup();" > | if [ $? -ne 0 ] > | then > | echo "PSQL pg_stop_backup failed" > | exit 1; > | fi > | echo "pg_stop_backup done successfully I think i should improve the style here. > | TO_SEARCH="00${CP:4}" # $TO_SEARCH contains 1D1214 > | > | # now remove the unneeded files. > | > | # strip off first 4 chars from CP and prefix 00 to the result. > | # search the file that has the Checkpoint in its filename. > | # it takes a while to come, so wait till it comes. > | > | while true; do > | REF_FILE=`ls -1 $WAL_ARCHIVE | grep $TO_SEARCH` > | if [ ! $REF_FILE ]; then > | echo "Waitng for file with $TO_SEARCH in $WAL_ARCHIVE > ... " > | else > | break > | fi > | sleep 1 > | done > > (optimization) You could simplify this significantly using the test builtin: > > ~ while [ ! -e ${WAL_ARCHIVE}/*.00${TO_SEARCH}.backup.bz2 ]; do > ~ echo "Waiting for ${WAL_ARCHIVE}/*.00${TO_SEARCH}.backup.bz2" > ~ sleep 1 > ~ done > ~ REF_FILE="`echo ${WAL_ARCHIVE}/*.00${TO_SEARCH}.backup.bz2`" I would not like to incorporate this becuase this code assumes WAL archives are being externally compressed to .bz2. Can you suggest an optimized but generic alternative ? > > | # REF_FILE is 000000010000000A00000068.001D1214.backup.bz2 > | > | # take only first 24 chars and store. > | REF_FILE_NUM=${REF_FILE:0:24} > | > | # REF_FILE_NUM is 000000010000000A00000068 > | > | echo "REF_FILE_NUM=$REF_FILE_NUM" > | > | # iterate list of files in the WAL_ARCHIVE folder > | for i in `ls -1 $WAL_ARCHIVE` ; > | do > | # $i is :000000010000000A0000005D.bz2 eg > | # get first 24 chars in filename > | FILE_NUM=${i:0:24} > | > | # compare if the number is less than the reference > | # here string comparison is being used. > | if [[ $FILE_NUM < $REF_FILE_NUM ]] > | then > | echo "$FILE_NUM [ $i ] removed" > | rm -f $WAL_ARCHIVE/$i > | else > | echo "$FILE_NUM [ $i ] not removed" > | fi > | done > > (optimization) Perhaps using find -newer/-anewer/-cnewer could be used > here to find files older than the reference file: > > ~ # "-not -newer" or "\! -newer" will also return REF_FILE > ~ # so you have to grep it out and use xargs; otherwise you > ~ # could also use the -delete action > ~ find ${WAL_ARCHIVE} \! -newer ${REF_FILE} -type f | \ > ~ grep -v "^${REF_FILE}$" | \ > ~ xargs rm -f Nopes , i have gone by the DOCS which tells to numerically compare the filenames i do not want to assume more recently created files are numerically more that later as i have not seen anything like that in the docs. I am concerned if the comparison below used in the script > | if [[ $FILE_NUM < $REF_FILE_NUM ]] is correct , as it compares strings not numbers , i am assuming that the results will be same as numerical comparison, as all the filenames are padded with '0' form the left. i thank you once again for your comments and shall post the improved version once my observations are complete. Regds Rajesh Kumar Mallah.