Re: load balancing at fastmail.fm

urgrue <urgrue@xxxxxxxxxxx> · Tue, 16 Jan 2007 10:27:17 +0200

May I ask how you are doing the actual replication, technically
speaking? shared fs, drbd, something over imap?

Rob Mueller wrote:
>> as fastmail.fm seems to be a very big setup of cyrus nodes, I would be
>> interested to know how you organized load balancing and managing disk
>> space.
>>
>> Did you setup servers for a maximum of lets say 1000 mailboxes and
>> then you use a new server? Or do you use a murder installation so you
>> can move mailboxes to another server once a certain gets too much
>> load? Or do you have a big SAN storage with good mmap support behind
>> an arbitrary amount of cyrus nodes?
> 
> We don't use a murder setup. Two main reasons.
> 1) Murder wasn't very mature when we started
> 2) The main advantage murder gives you is a set of proxies
> (imap/pop/lmtp) to connect users to the appropriate backends, which we
> ended up using other software for, and a unified mailbox namespace if
> you want to do mailbox sharing, something we didn't really need either.
> Also the unifed mailbox needs a global mailboxes.db somewhere. As it
> was, because the skiplist backend mmaps the entire mailboxes.db file
> into memory, and we had multiple machines with 100M+ mailboxes.db files,
> I didn't really like the idea of dealing with a 500M+ mailboxes.db file.
> 
> We don't use a shared SAN storage. When we started out we didn't have
> that much money, so purchasing an expensive SAN unit wasn't an option.
> 
> What we have has evolved over time to our current point. Basically we
> now have a hardware set that is quite nicely balanced with regard to
> spool IO vs metadata IO vs CPU, and a storage configuration that gives
> us replication with good failure capability, but without having to waste
> lots of hardware on just having replica machines.
> 
> IMAP/POP frontend - We used to use perdition, but have now changed to
> nginx (http://blog.fastmail.fm/?p=592). As you can read from the linked
> blog post, nginx is great.
> 
> LMTP delivery - We use a custom written perl daemon that forwards lmtp
> deliveries from postfix to the appropriate backend server. It also does
> the spam scanning, virus checking and a bunch of other in house stuff.
> 
> Servers - We use servers with attached SATA-to-SCSI RAID units with
> battery backed up caches. We have a mix of large drives for the email
> spool, and smaller faster drives for meta-data. That's the reason we
> sponsored the metapartition config options
> (http://cyrusimap.web.cmu.edu/imapd/changes.html).
> 
> Replication - We initial started with pairs of machines, half of each
> being a replica and half a master replicating between each other, but
> that meant on a failure, one machine became fully loaded with masters.
> masters take a much bigger IO hit than replicas. Instead we went with a
> system we calls "slots" and "stores". Each machine is divided into a set
> of "slots". "slots" from different machines are then paired as a
> replicated "store" with a master and replica. So say you have 20 slots
> per machine (half master, half replica), and 10 machines, then if one
> machine fails, on average you only have to distribute one more master
> slot to each of the other machines. Much better on IO. Some more details
> in this blog post on our replication trials...
> http://blog.fastmail.fm/?p=576
> 
> Yep, this means we need quite a bit more software to manage the setup,
> but now that it's done, it's quite nice and works well. For maintenance,
> we can safely fail all masters off a server in a few minutes, about
> 10-30 seconds a store. Then we can take the machine down, do whatever we
> want, bring it back up, wait for replication to catch up again, then
> fail any masters we want back on to the server.
> 
> Unfortunately most of this software is in house and quite specific to
> our setup, it's not very "generic" (e.g. it assumes particular disk
> layouts and sizes, machines, database tables, hostnames, etc) to manage
> and track it all, so it's not something we're going to release.
> 
> Rob
> 
> ----
> Cyrus Home Page: http://cyrusimap.web.cmu.edu/
> Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
> List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

----
Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html