Hi,
I am running streaming replication with the archive.
pg_xlog/
----------------------------------
-rw------- 1 postgres postgres 16777216 Sep 27 23:30 000000010000013300000082
-rw------- 1 postgres postgres 16777216 Sep 27 23:31 000000010000013300000083
-rw------- 1 postgres postgres 16777216 Sep 27 23:31 000000010000013300000084
-rw------- 1 postgres postgres 16777216 Sep 27 23:31 000000010000013300000085
-rw------- 1 postgres postgres 16777216 Sep 27 23:51 000000010000013300000086
-rw------- 1 postgres postgres 16777216 Sep 28 00:47 000000010000013300000087
-rw------- 1 postgres postgres 16777216 Sep 28 01:55 000000010000013300000088
-rw------- 1 postgres postgres 16777216 Sep 28 02:47 000000010000013300000089
-rw------- 1 postgres postgres 16777216 Sep 27 18:04 00000001000001330000008A
-rw------- 1 postgres postgres 16777216 Sep 27 18:02 00000001000001330000008B
-rw------- 1 postgres postgres 16777216 Sep 27 18:02 00000001000001330000008C
-rw------- 1 postgres postgres 16777216 Sep 27 18:05 00000001000001330000008D
-rw------- 1 postgres postgres 16777216 Sep 27 18:03 00000001000001330000008E
-rw------- 1 postgres postgres 16777216 Sep 27 18:02 00000001000001330000008F
-rw------- 1 postgres postgres 16777216 Sep 27 18:02 000000010000013300000090
-rw------- 1 postgres postgres 16777216 Sep 27 18:37 000000010000013300000091
-rw------- 1 postgres postgres 16777216 Sep 27 20:21 000000010000013300000092
-rw------- 1 postgres postgres 16777216 Sep 27 21:00 000000010000013300000093
pg_xlog/archive_status
----------------------------------
-rw------- 1 postgres postgres 0 Sep 27 23:30 000000010000013300000081.done
-rw------- 1 postgres postgres 0 Sep 27 23:30 000000010000013300000082.done
-rw------- 1 postgres postgres 0 Sep 27 23:31 000000010000013300000083.done
-rw------- 1 postgres postgres 0 Sep 27 23:31 000000010000013300000084.done
-rw------- 1 postgres postgres 0 Sep 27 23:31 000000010000013300000085.done
-rw------- 1 postgres postgres 0 Sep 27 23:51 000000010000013300000086.done
-rw------- 1 postgres postgres 0 Sep 28 00:47 000000010000013300000087.done
-rw------- 1 postgres postgres 0 Sep 28 01:55 000000010000013300000088.done
postgresql.conf
#----------------------------------
wal_level = hot_standby
max_wal_senders = 5
wal_keep_segments = 32
archive_mode = on
archive_command = '/opt/wal_archive.sh "%p" "%f" > archive_command.log 2>&1'
wal_archive.sh
------------------------------------
#!/bin/bash -xv
PG_XLOG="$1"
PG_XLOG_FILENAME="$2"
HOST=hostname
ARCHIVE_DIR="/pg_archive/master"
ARCHIVE_TO_KEEP="3" #No of days of archive logs to keep
EMAIL="a@xxxxx"
ERROR_COUNT_FILE="/tmp/replication_archive_error_count.txt"
OLD_COUNT=`cat ${ERROR_COUNT_FILE}`
DNOW=`date +%u`
hour=$(date +%H)
D=`date`
#Do the cleanup if the day is Monday or Thursday and time is between 11 p.m. UTC and 22 hrs UTC
if [ "$DNOW" -eq "1" -o "$DNOW" -eq "4" -a "$hour" -ge 11 -a "$hour" -lt 22 ]; then
find "${ARCHIVE_DIR}"/ -type f -mtime +"${ARCHIVE_TO_KEEP}" -exec rm -f {} +
if [ "$?" -eq "1" ]; then
echo "The wal_archive script could not cleanup the archive directory of $HOST" | mail -s "ERROR - WAL Archive for $HOST" "$EMAIL"
fi
fi
if [ ! -f "${ARCHIVE_DIR}"/"${PG_XLOG_FILENAME}" ]; then
cp "${PG_XLOG}" "${ARCHIVE_DIR}"/"${PG_XLOG_FILENAME}"
/usr/bin/rsync -W -az "${PG_XLOG}" postgres@standby:"${ARCHIVE_DIR}"/"${PG_XLOG_FILENAME}"
if [ "$?" -ne "0" ]; then
#If rsync fails, then remove the copied file from master, increase the error count, and retry.
rm -rf "${ARCHIVE_DIR}"/"${PG_XLOG_FILENAME}"
NEW_COUNT=`expr $OLD_COUNT + 1`
if [ "$NEW_COUNT" -ge "100" ]; then
echo -e "${D}""\n""Rsync could not transfer the WAL file from Master to slave." | mail -s "ALERT - WAL Archive for $HOST" "$EMAIL"
echo "0" > $ERROR_COUNT_FILE
else
echo "$NEW_COUNT" > $ERROR_COUNT_FILE
fi
exit 1
else
echo "0" > $ERROR_COUNT_FILE
exit 0
fi
else
exit 0
fi
On Thu, Sep 28, 2017 at 1:05 AM, Scott Mead <scottm@xxxxxxxxxxx> wrote:
On Wed, Sep 27, 2017 at 2:55 PM, Jerry Sievers <gsievers19@xxxxxxxxxxx> wrote:John Britto <john@xxxxxxxxxxx> writes:
> Hello,
>
> I have a streaming replication setup along with WAL archive.
>
> archive_command = ‘test ! -f /var/pg_archive/%f && cp %p <archive
> location>%f && scp %p postgres@192.168.0.123:<archive location>/%f' Yup Pg is going to handle the unshipped WALs one at a time and it will
>
> When the SCP command fails, the master repeatedly tries to send the
> archived WAL to standby. But during this time, the pg_xlog directly
> grew with newer WAL files.
>
> The streaming replication hasn't had the problem because on my check,
> the WAL write location on the primary was same with the last WAL
> location received/replayed in standby.
>
> Since the pg_xlog in the master had few newer WAL files, the master
> archive is lagging to pick the current pg_xlog WAL file. When a new
> WAL occur in the pg_xlog, Master picks the old WAL file to send to
> the standby.
do them in order, oldest (lowest file name) first.
> How should I force the PostgreSQL to batch copy the lagging WAL files
> to pg_archive and then send to standby. Can I do this manually using
> rsync? I wonder how PostgreSQL knows the changes because it
> maintains info in archive_status with extension as .ready and .done.
I suggest you fix your basic archiving routine to complete and exit with
success to postgres.+1scp %p host:/archive/%fif [ "$?" -ne 0 ]thenecho "`date`:FAILED:%p" >> /var/log/failed_xlog.logcp %p /localarchive/%fexit 0fiEssentially, always make sure that you are returning a 0 to postgres. If there is a failure, either log it or handle it separately. This code snippet is NOT COMPLETE, there's a lot more to do in order to make it production ready and recoverable. The biggest issue I've had with scp is that you have to set and enforce a timeout and trap the timeout. Note, the above only works until your local machine (or the /localarchive partition) runs out of space. It's really important that you have ultra solid logging and alerting around this.
And as for archive command scripts in general, simpler is better.
If you want to manually ship them in bulk, you may do so but then will
need to remove the corresponding archive_status/$foo.ready file so that
postgres won't keep trying to ship the same one.I'm a huge fan of this strategy, especially if you're sending to a remote datacenter.archive_command.sh:cp %p /localarchive/%fif [ "$?" -ne 0 ]thenecho "`date`:FAILED:%p" >> /var/log/failed_xlog.logexit 0fisend_archive_to_remote.shrsync -avzP /localarchive/* host:/archive/Of course, now you have to deal with files that are removed from the slave and making sure they get removed from the master appropriately, but, this is fairly straightforward.--Scott
HTH
--
> Please assist.
>
> Thanks,
>
> John Britto
>
>
>
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consulting@xxxxxxxxxxt
p: 312.241.7800
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
--