Re: Backup strategy for large mailbox stores

John Madden <jmadden@xxxxxxxxxxx> · Mon, 15 Feb 2010 10:46:46 -0500

>> Is there a better strategy , probably within the cyrus framework , to
>> take backups efficiently

We're a large site (400k with 1GB quotas users and growing) and this has 
for years been our biggest problem too.  Typical backup systems (we run 
NetBackup), which scan the entire filesystem looking for changes, do not 
scale.  Until Cyrus uses a more efficient means of storing the data than 
a single file for every message (which does have its own merits), the 
problem is only going to get worse.  Filesystems with 100 million files 
can't be backed up in a reasonable time span and yes, the cause is the 
stat() of every_single_file done during every_freaking_backup.

It's gotten to the point where I've considered writing my own filesystem 
to do things more Google-filesystem-chunkservers-etc with a FUSE layer 
to avoid actual Cyrus changes where backups can be done against nice 
large chunks.  That would be its own mess though, of course.

What Fastmail has done to fix this is really quite slick but only 
applies to IMAP.  We have other loads (TB-scale file servers, for 
example) that will need a more generic solution.  I've hatched this 
wacky scheme for incrementals:

A daemon runs that monitors each filesystem you're concerned about (I 
just look at /) that uses Linux's inotify to monitor the filesystem for 
changes and writes each to a sqlite3 db.  On backup start, that database 
is consulted for a list of files that have changed and a file is written 
that tells the backup agent what files to fetch.  If only 10,000 files 
have changed, only 10,000 files are touched.  There's some windowing 
logic in there to make sure you're looking at stuff changed since the 
last backup was started and such, but that's the basic idea.  The daemon 
consumes a lot of memory because each directory has to be monitored 
individually and each of our servers has about a million of them, so 
that's a knock against it.  It does appear to be pretty efficient 
otherwise though.

That still leaves full backups as a big issue (they take days to run) 
and NetBackup has a solution for that: You run one full backup and store 
it on disk somewhere and from then on, fulls are called "synthetic 
fulls," where the incrementals are applied periodically in snapshot 
fashion and "voila, you have a full backup."  After that one full 
backup, the only thing you ever run is incremental.  This takes 2x your 
disk, but it's manageable.

John

-- 
John Madden
Sr UNIX Systems Engineer
Ivy Tech Community College of Indiana
jmadden@xxxxxxxxxxx
----
Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html