On 02/03/18 03:21, Rui DeSousa wrote:
On Mar 1, 2018, at 12:21 AM, scott ribe <scott_ribe@xxxxxxxxxxxxxxxx
<mailto:scott_ribe@xxxxxxxxxxxxxxxx>> wrote:
The false report of success is not good, but it's not the root problem.
A false success if a problem; especially in this use case as the
source WAL file will be deleted by Postgres before it was truly
successful. While monitoring is nice to avoid the issue it is not a
fix for the issue.
I personally cannot recommend the use of rsync in this application for
two reasons.
1. It adds no value; it’s a more complex cp command (no bandwidth
saved, etc as archive processes a single file at a time).
2. It lies on success/failure — Period.
I have use “cat” longer than I have used rsync to archive WALs. I can
say that I’ve lost zero WAL files using cat; I can not say the same
for rsync.
The following code is more reliable than rsync and works with across
multiple platforms and filesystems without fail.
STS=3
OUTPUT=$(cat $XLOGFILE | $SSH_CMD "(mkdir -p $ARCH_DIR && cat >
$ARCH_DIR/$WALFILE.swap) && mv $ARCH_DIR/$WALFILE.swap
$ARCH_DIR/$WALFILE")
if [ $? == 0 ]; then
STS=0
fi
exit $STS
If you have a self contained case that demonstrates rsync returning 0
when it has actually failed, then please do get the rsync authors
involved in investigating it (I'm sure they would be interested).
Now I've been unable to reproduce any cases of bad return codes or zero
length files (using rsync based archive command + quotas), however I'm
probably not using the same setup as you (and probably a different
platform as well).
regards
Mark