On Mon, Feb 15, 2010 at 10:28:13AM +0000, Gavin McCullagh wrote: > Hi, > > I'm a relative newbie with cyrus, but I'm interested in this discussion... Hehe - you should read through the mailing list archives for how FastMail does backups for a really complex but _FAST_ solution :) > > Things have been working fine but off late we find that emailusage has > > grown and so our backups take too long to complete .. we use dar to take > > differential backups and take backups everynight. and transfer the > > backup files to a remote server. > > Have you identified the bottleneck? Is it disk access on the mail server > itself, bandwidth to your remote server, something else? I can tell you the major cost we identified - stat on every single file in the spool. > > If the backup is still running in the morning people notice a > > considerable degradation of the server performance > > Is this a recent linux server? In principal, you could use ionice to class > your dar process "idle" which should mean that users will get a better > share of disk access. However, that will also mean your backup takes even > longer. Probably not ideal. Not really, no - and I suspect it still causes stat overload and inode cache flushing and all that bad stuff. > > Is there a better strategy , probably within the cyrus framework , to > > take backups efficiently > > I've wondered about the best means of backup myself. We've been doing > something similar using rsync to sync the mail spools and other associated > data to a remote server. This works, but I'm slightly worried that we > continue delivering mail throughout the process. So our mail spool is > changing as we back it up. I've considered the possibility of stopping all > daemons, taking an LVM snapshot, restarting and backing up the snapshot. > That way you get a consistent spool where everything was backed up at the > same moment. On the other hand, it appears that you can generally > reconstruct mailboxes, so perhaps I just don't need to worry about that. > I'd prefer the cosy feeling of knowing the data is in a consistent state > though. Our backups are consistent per mailbox - not even per user - I considered doing that but the deadlock risk is too high. > If you simply can't run an incremental or differential backup in the > "quiet" time, perhaps it would make more sense to do rolling replication to > another server. Then, your backup can stop the replication temporarily, > backup the replica and start the replication back up -- leaving the live > server alone. I imagine this does add load to the main server, but > distributes it over the whole day. > > http://cyrusimap.web.cmu.edu/imapd/install-replication.html Yes - that's certainly a solution! I prefer to back up the master than the replica in our particular setup because it's more likely that files will be "hot" on the master. Not that much more likely, but hey. Also, up-to-dateness if replication is running behind and generally not needing to bookkeep about which replicas might be "down" for some reason. We run daily backups during the quiet period - they complete for all users in a little under 5 hours at the moment, with the bottleneck actually being CPU on backup server (we gzip everything on the backup server and it's a single CPU Sun x4500 - we could buy more CPU if it became and issue) REALLY BRIEF OVERVIEW: (I don't mind re-writing this because it keeps it fresh in my mind!) * every cyrus server runs a backupd which speaks a very simple protocol * there are 8 "backup threads" running on the backup server. * these is one "feeder thread" per Cyrus drive unit RAIDset, with a list of all users on partitions on that set of drives, meaning we never hit a single set of physical drives with multiple concurrent backup requests. Just to keep load reasonable. This list gets pulled from the list of active users in the database and re-filled once per day. * each backup thread randomly pulls a set of 50 users off a feeder thread and backs them up. 50 is a nice balance between thrashing around too much and providing easy-to-read feedback :) FOR EACH USER: * the backup server contacts the cyrus server's backupd, and: a) requests a listing of all folders for this user. b) SELECTs each folder - which involves statting each meta file (cyrus.index, cyrus.header, cyrus.expunge?) It also involves getting the mailbox UNIQUEID from the cyrus.header file. c) compares this stat data for that UNIQUEID. d) if unchanged, just updates the mailbox name -> uniqueid pointer if required (to handle mailbox renames efficiently) e) if changed, fetches the FULL CONTENTS of each file, while holding a fcntl lock on each file as well, so the folder is locked from changes by Cyrus processes. f) parses the cyrus.index and cyrus.expunge files, and checks by GUID (sha1) that we already have all the files. If any are not present, it fetches and checks them individually. g) also fetches any sieve scripts, .seen files, .sub files, etc. So - we get de-duplication both within and across folders (but only within a single user, each user is its own entity!), we get cheap rename support, we get super-efficient IO (never have to stat a message file) Once the backup server has finished backing up a user and is satisfied that the backup is complete, it updates a "LastBackedUp" timestamp in the database for that user. Every day, I get an email with a summary of number of backups per hour age - just a SQL query like this: Age COUNT(*) NULL 179 4 3638 5 25049 6 50262 7 51353 8 50075 9 1340 Obviously if there's anything over 24 then we have a problem! (Null is users created since the last backup run who haven't had a backup yet...) As you can see, we can do about 50,000 users per hour with this thing. The backup format is two files per user (plus a lock file while the backup is running!) The first file is a sqlite3 database file, containing indexed lookup data on which files exist, how they're stored, etc - including offsets in a theoretically unzipped .tar file. The second is the .tar.gz file itself. It turns out you can concatenate .tar.gz files without the empty marker blocks on the end and they just work. So - every backup run appends new records to the .tar.gz, including (by abusing a few unused fields) all the metadata we need. The .sqlite file can be rebuilt from scratch by streaming the .tar file if need be. We also calculate what percentage of the file is "dirty" - stuff that no longer exists on the master and is over 2 weeks old. When the file gets too dirty, we stream it through a processing function which only selects files that we want to keep. The overhead of the .sqlite file is pretty low - around the 5% mark: -rw-r--r-- 1 fmuser 402618048 Feb 14 21:25 backupdata-1265583567.tar.gz -rw-r--r-- 1 fmuser 15737856 Feb 14 21:25 backupstate.sqlite3 That's my backups. The datestamp in the backupdata file is the date when it was created. Every time it gets cleaned, it gets a new name. I do intend to abstract this stuff out at some point. It's very nice, and super efficient (well, I could make it more efficient by moving away from pure Perl to XS, but that would be micro-optimising a bit too much) - it has its own implementations of a bunch of formats built in! IndexFile, HeaderFile and TarStream perl modules that can read and write those files :) Bron. ---- Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html