On 3/28/06, Andy Shellam <andy.shellam@xxxxxxxxxxxxxxxxx> wrote: > > > > we want to remove archived WAL log files which are no longer needed > so that the size of wal_archive folder is under control. > > The DOCs say that filenames numerically less than the WAL record that > pg_stop_backup() > suggests can be removed. Will an alphabetical sorting be different from > numerically sorted > filename ? Sorting numerically is not easy as the filenames look like > HEX numbers > ut they are are so huge that they cant be stored in normal integers > (32bits). > > The WAL files are all numerical – however the backup file has the checkpoint > appended to it – eg. 00009012514000916.A0AC91.backup. > > You need to examine the contents of this file to find the earliest file > needed (it's usually the one just before it – i.e. 00009012514000916 in > this case) and any numerically later than the .backup file. > > > > Also you wouldn't particular need any heavy programming – I'm sure a simple > shell script could be written in bash to pick out the correct files. OK i am posting my full script [ its not heavy programming i guess :) ] shall be grateful if you/someone could review it . (its well commented i think) script also carries sample data. it does following 1. takes base backup to a destined folder by rsync 2. waits for .backup file to arrive in archive folder after pg_stop_bacup() 3. searches and removes unwanted archived log files. I have run it many times in my server and it seems to be working fine. ------------------------------------ BEGIN -------------------------------------------- #!/bin/bash # folder where base_backup is put BACKUPFOLDER=/mnt/disk3/base_backups today=`date +%d-%m-%Y-%H-%M-%S` PSQL=/opt/usr/local/pgsql/bin/psql RSYNC="/usr/bin/rsync -a" PGDATADIR=/mnt/disk5/pgdatadir # two table spaces. TS1=/mnt/disk4/bigtables TS2=/mnt/disk3/indexspace # folder where *archived* logs are put. WAL_ARCHIVE=/mnt/wal_archive label=base_backup_${today} echo "Executing pg_start_backup with label $label in server ... " # get the checkpoint at which backup starts # the .backup files seems to be bearing this string in it. CP=`$PSQL -q -Upostgres -d template1 -c "SELECT pg_start_backup('$label');" -P tuples_only -P format=unaligned` echo "Begin CheckPoint is $CP" # this contain string like A/681D1214 if [ $? -ne 0 ] then echo "PSQL pg_start_backup failed" exit 1; fi echo "pg_start_backup executed successfully" echo "RSYNC begins.." # rsync each of the folders to the backup folder. for i in $TS1 $TS2 $PGDATADIR ; do echo "Syncing $i .. " time $RSYNC $i $BACKUPFOLDER echo "Done" done # fortunately rsync does *not* seems to be exitting with non zero exit code # for expected file disappearances and modifications. if [ $? -ne 0 ] then echo "RSYNC failed" exit 1; fi echo "RSYNC Done successfully" echo "Executing pg_stop_backup in server ... " $PSQL -Upostgres template1 -c "SELECT pg_stop_backup();" if [ $? -ne 0 ] then echo "PSQL pg_stop_backup failed" exit 1; fi echo "pg_stop_backup done successfully" TO_SEARCH="00${CP:4}" # $TO_SEARCH contains 1D1214 # now remove the unneeded files. # strip off first 4 chars from CP and prefix 00 to the result. # search the file that has the Checkpoint in its filename. # it takes a while to come, so wait till it comes. while true; do REF_FILE=`ls -1 $WAL_ARCHIVE | grep $TO_SEARCH` if [ ! $REF_FILE ]; then echo "Waitng for file with $TO_SEARCH in $WAL_ARCHIVE ... " else break fi sleep 1 done # REF_FILE is 000000010000000A00000068.001D1214.backup.bz2 # take only first 24 chars and store. REF_FILE_NUM=${REF_FILE:0:24} # REF_FILE_NUM is 000000010000000A00000068 echo "REF_FILE_NUM=$REF_FILE_NUM" # iterate list of files in the WAL_ARCHIVE folder for i in `ls -1 $WAL_ARCHIVE` ; do # $i is :000000010000000A0000005D.bz2 eg # get first 24 chars in filename FILE_NUM=${i:0:24} # compare if the number is less than the reference # here string comparison is being used. if [[ $FILE_NUM < $REF_FILE_NUM ]] then echo "$FILE_NUM [ $i ] removed" rm -f $WAL_ARCHIVE/$i else echo "$FILE_NUM [ $i ] not removed" fi done ------------------------------------ END ----------------------------------------------- ---------------------------------- REAL OUTPUT OF A RUN----------------------- Executing pg_start_backup with label base_backup_29-03-2006-17-29-01 in server ... Begin CheckPoint is A/681D1214 pg_start_backup executed successfully RSYNC begins.. Syncing /mnt/disk4/bigtables .. real 6m24.338s user 1m12.831s sys 0m55.295s Done Syncing /mnt/disk3/indexspace .. real 5m45.245s user 0m32.520s sys 0m26.567s Done Syncing /mnt/disk5/pgdatadir .. readlink pgdatadir/global/pgstat.tmp: No such file or directory rsync error: some files could not be transferred (code 23) at main.c(620) real 12m1.844s user 1m43.698s sys 1m24.486s Done RSYNC Done successfully Executing pg_stop_backup in server ... pg_stop_backup ---------------- A/6FA82D74 (1 row) pg_stop_backup done successfully Waitng for file with 001D1214 in /mnt/wal_archive ... REF_FILE_NUM=000000010000000A00000068 000000010000000A0000005D [ 000000010000000A0000005D.003C0B54.backup.bz2 ] removed 000000010000000A0000005D [ 000000010000000A0000005D.bz2 ] removed 000000010000000A0000005E [ 000000010000000A0000005E.bz2 ] removed 000000010000000A0000005F [ 000000010000000A0000005F.bz2 ] removed 000000010000000A00000060 [ 000000010000000A00000060.bz2 ] removed 000000010000000A00000061 [ 000000010000000A00000061.bz2 ] removed 000000010000000A00000062 [ 000000010000000A00000062.bz2 ] removed 000000010000000A00000063 [ 000000010000000A00000063.bz2 ] removed 000000010000000A00000064 [ 000000010000000A00000064.bz2 ] removed 000000010000000A00000065 [ 000000010000000A00000065.bz2 ] removed 000000010000000A00000066 [ 000000010000000A00000066.bz2 ] removed 000000010000000A00000067 [ 000000010000000A00000067.bz2 ] removed 000000010000000A00000068 [ 000000010000000A00000068.001D1214.backup.bz2 ] not removed 000000010000000A00000068 [ 000000010000000A00000068.bz2 ] not removed 000000010000000A00000069 [ 000000010000000A00000069.bz2 ] not removed 000000010000000A0000006A [ 000000010000000A0000006A.bz2 ] not removed 000000010000000A0000006B [ 000000010000000A0000006B.bz2 ] not removed 000000010000000A0000006C [ 000000010000000A0000006C.bz2 ] not removed 000000010000000A0000006D [ 000000010000000A0000006D.bz2 ] not removed 000000010000000A0000006E [ 000000010000000A0000006E.bz2 ] not removed ------------------------------- END----------------------------------------------------- > > > Q2. We are attempting to automate the process of taking base backup and > removal > of the unneeded WAL files. Is there any reliable way of knowing the > WAL file X from > inside the shell script , such that files prior to X can be removed ? > > from a shell script we issue pg_stop_backup() by psql -c , it returns > something > which does not looks like a WAL file. > > > > Pg_stop_backup() returns the checkpoint record – something like 9/A0AC91 – > this is purely a guess, but you might be able to find which backup file > contains this checkpoint by taking the digits after the forward-slash in the > checkpoint (i.e. A0AC91 in this case), then finding the filename that > contains this – in my example it's 00009012514000916.A0AC91.backup, I think its not correct , pg_start_backup() returns the checkpoint record which becomes part of the backup filename. and grep > the file for the number after the text "Start WAL Location: " in this file – > then remove anything numerically less. > > > > It'd be a very "bitty" process, but I'm certain it could be done – it would > need heavy testing over a period of backups though to ensure the wrong files > are not being deleted. > > > > Q3. tar exits with non zero status for the same reasons as mentioned in > docs > is there any better archiving tool for this purpose ? can we use cp > -a ? > > > > The WAL archive command can be set to use either cp or mv – then why not > have a regular cronjob that runs a shell script to add any new files to your > tar archive every time a new file is detected in your WAL-archive directory? Hmmm you got me wrong, am talking about taking the base_backup not archival of WAL segments > > > > Regds > Rajesh Kumar Mallah. > > !DSPAM:14,44293ed135042000516834! > > > Regards > > > > Andy Shellam