Re: Backup/disaster recovery and bandwidth (long)

amador alvarez <aalvarez@xxxxxx> · Wed, 25 Apr 2012 09:32:37 -0700



    Hi Scott,

    Why you do not replicate this master to the other location/s using
    other methods like bucardo?, you can pick the tables you really want
    get replicated there.

    For the backup turn to hot backup (tar $PGDATA)+ archiving, easier,
    faster and more efficient rather than a logical copy with pgdump.

    
    A.A 

    
    On 04/25/2012 09:11 AM, Scott Whitney wrote:
    
      
      Hello, everyone. I want to throw a scenario out
        there to see what y'all think.

        
        Soon, my cluster backups will be increasing in size
        inordinately. They're going to immediately go to 3x as large as
        they currently are with the potential to be about 20x within a
        year or so.

        
        My current setup uses a single PG 8.x server doing nightly dumps
        (not ideal but sufficient for the moment, and one of the main
        reasons to move to PG 9) which are then downloaded from my
        hosting center to our offices for DR purposes. Each night I pull
        down roughly 5GB of compressed pg_dump data. Dumping this takes
        about 1.5hrs. Downloading this at 15Mbps takes about an hour.
        Soon I'll be looking at somewhere around 7hrs for the dumps to
        complete and downloading a 12GB file (which will take about 3
        hrs). Oh, and I'll have to pay for significant bandwidth overage
        since I'm charged on a 95%, and while an hour a day does NOT
        kick me up to 15Mbps usage at 95%, 3hrs per night certainly
        will, so there's a real cost associated with this strategy as
        well.

        
        While the time of the actual dumps is not a huge issue, the time
        of the download IS a large concern, especially since my support
        folks use that file daily to extract individual customer
        databases for restore in assisting customer support issues.

        
        So, while now I have my pg_dumps completed around 2AM and
        downloaded to my local network at about 3AM, with the increase
        in our database sizes, what will be happening is that my pg_dump
        will not be completed until around 7AM, and the download would
        not be completed until around 10AM, best-case scenario. Add into
        that support trying to restore a database...more on that in a
        moment.

        
        My _new_ setup will instead be 2 PG 9.x servers with hot-standby
        enabled (at my hosting center) and a 3rd PG 9.x server at my
        local office also replicating off of the master. Each one of
        those servers will perform his own pg_dumps of the individual
        databases for backup/disaster recovery purposes, and while each
        dump might not be consistent with one another, each SERVER will
        have dumps consistent to itself, which is viable for our
        situation, and does not require me to download 12GB (or more)
        each night with all of those associated nightmares, costs and
        other problems.

        
        Alright, well, I've got that part all thought out, and it seems
        like a good way to do it to me, but I'm _still_ running into the
        situation that I've got to take 8hrs-ish to run the pg_dump no
        matter where it runs, and when my support folks need it (which
        they do daily), this basically means that if they have to have a
        customer database up NOW NOW NOW for support reasons, they
        simply cannot have it within an hour in many cases.
        Specifically, one database takes between 2 and 7.5hrs to pg_dump
        depending on which format I use, so if they need a CURRENT copy,
        they're at least 4 hours out. Additionally, they can't directly
        use the replicating server at my local office, because they need
        to test the problems the customers are having which include
        pesky things like INSERT, UPDATE and DELETE, so they have to
        restore this data to another internal PG backend.

        
        Enter my outside-the-box thinking.

        
        I rather assume that you cannot do a start/stop backup on a
        hot-standby server. HOWEVER, what if....

        
        I set up a 4th database server internally at my office. Each
        night I stop PG on my 3rd server (the local one replicating off
        of the master) and rsync my pg_data directory to this new 4th
        server. I bring up the 4th server NOT as a standby, but as a
        master. They would then have all customer data on an internal,
        usable PG system from the time of the rsync, and while it might
        not reflect the immediate state of the database, that's pretty
        well always true, and they're used to that, since whenever they
        "clone" a site, they're using the dumps done around midnight
        anyway.

        
        I believe, then, that when I restart server #3 (the standby who
        is replicating), he'll say "oh, geez, I was down, let me catch
        up on all that crap that happened while I was out of the loop,"
        he'll replay the WAL files that were written while he was down,
        and then he'll catch back up.

        
        Does this sound like a viable option? Or does someone have
        additional suggestions?