>> Is there a better strategy , probably within the cyrus framework , to >> take backups efficiently We're a large site (400k with 1GB quotas users and growing) and this has for years been our biggest problem too. Typical backup systems (we run NetBackup), which scan the entire filesystem looking for changes, do not scale. Until Cyrus uses a more efficient means of storing the data than a single file for every message (which does have its own merits), the problem is only going to get worse. Filesystems with 100 million files can't be backed up in a reasonable time span and yes, the cause is the stat() of every_single_file done during every_freaking_backup. It's gotten to the point where I've considered writing my own filesystem to do things more Google-filesystem-chunkservers-etc with a FUSE layer to avoid actual Cyrus changes where backups can be done against nice large chunks. That would be its own mess though, of course. What Fastmail has done to fix this is really quite slick but only applies to IMAP. We have other loads (TB-scale file servers, for example) that will need a more generic solution. I've hatched this wacky scheme for incrementals: A daemon runs that monitors each filesystem you're concerned about (I just look at /) that uses Linux's inotify to monitor the filesystem for changes and writes each to a sqlite3 db. On backup start, that database is consulted for a list of files that have changed and a file is written that tells the backup agent what files to fetch. If only 10,000 files have changed, only 10,000 files are touched. There's some windowing logic in there to make sure you're looking at stuff changed since the last backup was started and such, but that's the basic idea. The daemon consumes a lot of memory because each directory has to be monitored individually and each of our servers has about a million of them, so that's a knock against it. It does appear to be pretty efficient otherwise though. That still leaves full backups as a big issue (they take days to run) and NetBackup has a solution for that: You run one full backup and store it on disk somewhere and from then on, fulls are called "synthetic fulls," where the incrementals are applied periodically in snapshot fashion and "voila, you have a full backup." After that one full backup, the only thing you ever run is incremental. This takes 2x your disk, but it's manageable. John -- John Madden Sr UNIX Systems Engineer Ivy Tech Community College of Indiana jmadden@xxxxxxxxxxx ---- Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html