Re: Looking for help on backing up a large database (500GB+) with LOBs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 03/06/2012 02:03 PM, Todd Hunt wrote:
Hi,
 
First let me state that I'm not a DBA, but a developer.  I know enough about databases to be dangerous, but not enough to make any money at it. ;-)
 
We are using large objects with OIDs as part of our data model.  One of our processes is to backup the database every night.  Our DBAs told me that pg_dump was taking too long.  So they decided to shut down Postgres, TAR up the directory to a back up location, restart Postgres, then copy the back up TAR to tape.
 
For our smaller sites, 200GB or less (mostly LOBs), it takes less than an hour to shutdown, TAR, and restart (writing to tape is not part of this time frame).  Some of our larger sites have 500GB+ worth of data, which is mostly LOBs.  Our DBAs want to move the LOBs out of the database and store them on the file system and have the record have a path the binary files.  I'd like to come up with a better and faster back up solution that allows the LOBs to stay in the DB.
 
A few things to note is that when a LOB gets inserted into the DB, it is never updated.  It may be deleted on rare occasions, but never updated.  Also, the DBAs are against incremental backups and I don't blame them, sort of. 
 
I'm open to any ideas.  The servers are pretty standard.  They initially come with four 1TB hard drives ran in RAID 10, so they have 2TB available.  There is another controller card and space for 4 more drives.  We want to keep cost down, but uptime is very important.   Even though I'm not a Sys Admin either, I was wondering if there would be a way to replicate the DB on the two different RAID sets, "halt" one to do the backup, then reinitialize it so that it would sync up with the other RAID set.
 
Other than adding a secondary server to do data replication, I'm open to ideas.  Even if it means moving the LOBs onto the file system.  I just need a back up solution that scales well when we exceed 1TB.
 
Thank you,
 
Todd

Howdy,

With a database that size you probably want to consider performing hot backups http://www.postgresql.org/docs/9.0/static/continuous-archiving.html.

That way your system is never down for the reason of backups. Also you can take full backups on the weekends (or during a known slow time) and then deltas (your archived logs) daily. This speeds up your backups to a much more manageable duration.

or, if you're on linux/bsd/etc you could also rsync to another server, you can configure it to only do deltas so it's not bringing over 500GB every day.

If you can get a SAN with snapshot capability, you can throw the database into backup mode, snapshot it, and then take it out, making the impact of your entire backup take ~5 minutes regardless of size on disk.

All that said, I would advocate for taking the LOBs out of the database. A database isn't a filesystem, it's never going to match a filesystem + nginx (or something) for serving up files.  The fact that your LOBs are rarely updated strengthens the case for getting it out of the database. My philosophy is, let the database handle what it's good at: joins, transactions, updates, etc. and let filesystems / webservers handle what they're good at: serving up static content.

Dave

[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux