On Fri, 2009-06-12 at 09:07 -0700, Alan Hodgson wrote: > On Friday 12 June 2009, Scott Ribe <scott_ribe@xxxxxxxxxxxxxxx> wrote: > > > It's far easier to backup and restore a database than millions of small > > > files. Small files = random disk I/O. That depends on how you're backing up. If you want to back a file system up database style, use the filesystem dump utility. fs dump tools have gone somewhat out of fashion lately, because of space use concerns, inconvenience of partial restores, cross-hardware/version compat issues, etc, but they're actually really rather similar to the result you get when backing up a DB like Pg with a fs-level snapshot. If your dump tool supports incrementals, you also get results rather akin to PITR WAL logging. Personally, there's no way I'd back up a filesystem with dump utilities. I don't trust even dumpe2fs enough, the space requirements are prohibitive, and restores are nightmarish. I have similar problems (minus the trust issue) with backups of databases full of LOBs, though. > > On the contrary, I think backup is one of the primary reasons to move > > files *out* of the database. Decent incremental backup software greatly > > reduces the I/O & time needed for backup of files as compared to a pg > > dump. (Of course this assumes the managed files are long-lived.) > > We'll have to just disagree on that. You still have to do level 0 backups > occasionally. Scanning a directory tree of millions of files to decide what > to backup for an incremental can take forever. In particular, if your incremental backup tool is smart enough to track deleted files the resource requirements can be astonishing. In addition for looking for new/altered files, the tool needs to test to see if any previously backed up file has since vanished - and it can't really even rely on directory modification times to exclude very static data from checking. I use Bacula at at work, and backing up my Cyrus mail spools is a very I/O intensive and somewhat CPU-intensive operation even for incrementals, since the backup server and file daemon are exchanging file lists all the time, scanning the whole huge directory tree, etc. The adantage, though, is that the resulting daily backups are only about 1GB instead of 60 - 70GB. When you have to keep three full monthly backups plus daily incrementals for audit/history purposes, that matters. I can't see any way I could reasonably achieve a similar effect if I stored my mail in an RDBMS. Not, at least, and still have acceptable performance in the RDBMS. -- Craig Ringer -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general