Re: Minimizing Recovery Time (wal replication)

Bryan Murphy <bmurphy1976@xxxxxxxxx> · Thu, 9 Apr 2009 19:58:16 -0500

On Thu, Apr 9, 2009 at 7:33 PM, Greg Smith <gsmith@xxxxxxxxxxxxx> wrote:
>>> 1) Decrease the maximum possible segment backlog so you can never get
>>> this
>>>   far behind
>>
>> I understand conceptually what you are saying, but I don't know how to
>> practically realize this. :)  Do you mean lower checkpoint_segments?
>
> Theoretically, every time the archive_command writes a new segment out you
> can immediately move that to your standby, and setup the standby to
> regularly look for those and apply them as they come in.  The fact that
> you're getting so many of them queued up suggests there's something in that
> path that isn't moving that pipeline along aggressively enough, without
> knowing more about what you're doing it's hard to say where that is.

This is our archiving command:

#!/bin/bash

echo "archiving $2.bz2"

bzip2 -k -9 -c "$1" > "/srv/pg_logs/archive/$2.bz2.tmp" || exit $?
mv "/srv/pg_logs/archive/$2.bz2.tmp" "/srv/pg_logs/archive/$2.bz2" || exit $?

scp "/srv/pg_logs/archive/$2.bz2" "w.x.y.z:/srv/logs/$2.bz2.tmp" || exit $?
ssh w.x.y.z "mv /srv/logs/$2.bz2.tmp /srv/logs/$2.bz2" || exit $?

rm "/srv/pg_logs/archive/$2.bz2" || exit $?

And this is our restoring command:

#!/bin/bash

if [ "$1" == "" ] || [ "$2" == "" ]; then
    echo "dbrestore [source] [destination]"
    exit 1
fi

echo "`date`: restoring $1"

while true
do
    if [ -f "$1.bz2" ]; then
        echo "`date`: restore $1.bz2 -> $2"
        bunzip2 -d -c "$1.bz2" > "$2.tmp"
        mv "$2.tmp" "$2"
        exit 0
    fi

    if [[ "$1" =~ ".history" ]]; then
        echo "`date`: skipping $1"
        exit 1
    fi

    if [ -f "/tmp/golive" ]; then
        echo "`date`: going live"
        rm -f "/tmp/golive"
        exit 2
    fi

    sleep 5s
done

Essentially, what we do is bzip2 the file, scp it to the backup
server, and then ssh rename it.  The bzip2 is legacy from when we were
uploading to Amazon via the public internet and can go away now.  The
rename can happen in the restore script, and is something I probably
should change anyway, one less thing for the master database to do.
We create file system snapshots of the hot spares, and I periodically
purge the old log files after I've verified that we can bring the most
recent snapshot live.

We've used NFS in the past, but we're currently investigating other
distribution alternatives (primarily londiste and pgpool2).  We've
used slony in the past, but find it introduces too much administrative
overhead and is too brittle for our tastes.

Thanks again!
Bryan

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general