Re: load balancing at fastmail.fm

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



as fastmail.fm seems to be a very big setup of cyrus nodes, I would be interested to know how you organized load balancing and managing disk space.

Did you setup servers for a maximum of lets say 1000 mailboxes and then you use a new server? Or do you use a murder installation so you can move mailboxes to another server once a certain gets too much load? Or do you have a big SAN storage with good mmap support behind an arbitrary amount of cyrus nodes?

We don't use a murder setup. Two main reasons.
1) Murder wasn't very mature when we started
2) The main advantage murder gives you is a set of proxies (imap/pop/lmtp) to connect users to the appropriate backends, which we ended up using other software for, and a unified mailbox namespace if you want to do mailbox sharing, something we didn't really need either. Also the unifed mailbox needs a global mailboxes.db somewhere. As it was, because the skiplist backend mmaps the entire mailboxes.db file into memory, and we had multiple machines with 100M+ mailboxes.db files, I didn't really like the idea of dealing with a 500M+ mailboxes.db file.

We don't use a shared SAN storage. When we started out we didn't have that much money, so purchasing an expensive SAN unit wasn't an option.

What we have has evolved over time to our current point. Basically we now have a hardware set that is quite nicely balanced with regard to spool IO vs metadata IO vs CPU, and a storage configuration that gives us replication with good failure capability, but without having to waste lots of hardware on just having replica machines.

IMAP/POP frontend - We used to use perdition, but have now changed to nginx (http://blog.fastmail.fm/?p=592). As you can read from the linked blog post, nginx is great.

LMTP delivery - We use a custom written perl daemon that forwards lmtp deliveries from postfix to the appropriate backend server. It also does the spam scanning, virus checking and a bunch of other in house stuff.

Servers - We use servers with attached SATA-to-SCSI RAID units with battery backed up caches. We have a mix of large drives for the email spool, and smaller faster drives for meta-data. That's the reason we sponsored the metapartition config options (http://cyrusimap.web.cmu.edu/imapd/changes.html).

Replication - We initial started with pairs of machines, half of each being a replica and half a master replicating between each other, but that meant on a failure, one machine became fully loaded with masters. masters take a much bigger IO hit than replicas. Instead we went with a system we calls "slots" and "stores". Each machine is divided into a set of "slots". "slots" from different machines are then paired as a replicated "store" with a master and replica. So say you have 20 slots per machine (half master, half replica), and 10 machines, then if one machine fails, on average you only have to distribute one more master slot to each of the other machines. Much better on IO. Some more details in this blog post on our replication trials... http://blog.fastmail.fm/?p=576

Yep, this means we need quite a bit more software to manage the setup, but now that it's done, it's quite nice and works well. For maintenance, we can safely fail all masters off a server in a few minutes, about 10-30 seconds a store. Then we can take the machine down, do whatever we want, bring it back up, wait for replication to catch up again, then fail any masters we want back on to the server.

Unfortunately most of this software is in house and quite specific to our setup, it's not very "generic" (e.g. it assumes particular disk layouts and sizes, machines, database tables, hostnames, etc) to manage and track it all, so it's not something we're going to release.

Rob

----
Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

[Index of Archives]     [Cyrus SASL]     [Squirrel Mail]     [Asterisk PBX]     [Video For Linux]     [Photo]     [Yosemite News]     [gtk]     [KDE]     [Gimp on Windows]     [Steve's Art]

  Powered by Linux