I've run into a problem with a PITR setup at a client. The problem is that
whenever the CIFS NAS device that we're mounting at /mnt/pgbackup has
problems, it seems that the current client connection gets blocked and this
eventually builds up to a "sorry, too many clients already" error. I'm
wondering if this is expected behavior with the archive command and if I
should build in some more smarts to my archive script. Maybe I should fork
and waitpid such that I can use a manual timeout shorter than whatever the
CIFS timeout is so that I can return an error in a reasonable amount of time?
Has anyone else seen this problem? Restarting the NAS device fixes the
problem but it would be much preferable if postges could soldier along without
the NAS for a little while before we resuscitate it. We don't have an NFS or
rsync server available in this environment currently, though I suppose setting
up an rsync server for windows on the NAS would be a possibility.
Any suggestions much appreciated.
Currently the script is fairly simple and just does a 'cp' and then a 'gzip'
although we do use cp -f to copy over a possible previosly failed 'cp'.
Script is below:
. /usr/local/lib/includes.sh
FULLPATH="$1"
FILENAME="$2"
#
# Make sure we have pgbackup dir mounted
#
checkpgbackupmount
/bin/cp -f "$FULLPATH" "$PITRDESTDIR/$FILENAME"
if [ $? -ne 0 ]; then
die "Could not cp $FULLPATH to $PITRDESTDIR/$FILENAME"
fi
/usr/bin/gzip -f "$PITRDESTDIR/$FILENAME"
#
# Make sure it worked, otherwise roll back
#
if [ $? -ne 0 ]; then
/bin/rm -f "$PITRDESTDIR/$FILENAME"
die "Could not /usr/bin/gzip $PITRDESTDIR/$FILENAME"
fi
exit 0
--
Jeff Frost, Owner <jeff@xxxxxxxxxxxxxxxxxxxxxx>
Frost Consulting, LLC http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954