Re: Cyrus IMAP and MySQL mailboxes (Building load-balancing cluster)

Attila Nagy <bra@xxxxxx> · Sun, 26 Nov 2006 22:14:02 +0100

On 2006. 11. 16. 22:46, Bron Gondwana wrote:
Seriously, see the other response, DbMail might be what you want -
personally I'd put blobs in the filesystem (actually, my SHA1 based
VFS system, but that's a different story) and metadata in mysql... if
I was writing my perfect IMAP solution, which I'm not, yet.  Cyrus
does the job just fine, and you work around the wrinkles.  It's better
than anything else out there for a biggish system right now.

I've come to a sligthly different conclusion after pondering on the 
"perfect IMAP solution" topic.

I've thought about the following type/group of servers:
- "traditional" mailbox servers, which speak protocols like imap and pop 
to the user
- object servers, which store key/value pairs and basically nothing more
- metadata servers, which store the needed metadata to serve the e-mails 
(could be the same group as the object servers)
- directors, which would manage all the data (object and meta) on the 
storage servers and direct the clients (the mailbox servers) to the 
right one

The basic principles would be:
- object servers can be dumb, their all purpose is to store the value (a 
blob) with a key (identifier to them)
- object servers report to the directors regularly, so the directors 
know how much space is there, what keys are stored there and how loaded 
is the given server at the moment
- metadata servers store everything, which is now stored in a directory 
structure or in different databases (bdb, skiplist, others) in cyrus 
backends, they know that a folder in user's mailbox consists of what 
keys, stores meta information to the e-mails (headers, maybe keywords 
from the e-mail for faster searching, etc)
- delivery to the system would happen by first split the e-mail to 
different parts (metadata and data, data into mime parts, etc). Then 
each of the data would be checksummed (for example with SHA1) and stored 
into one (or more, depends on the design) object server along with the 
metadata for the aproppriate servers.
- directors would then notice (by the object servers announcing them, or 
the mailbox servers asked them where to store the given object) the new 
object, distributed between themselves (for redundancy)
- fetching e-mails (like fetching them from the filesystem now) would 
involve metadata, director lookups (where is the information) and 
finally one or more object store get operation
- the mailbox servers could do some compression and/or encryption on the 
contents they store in the metadata/object servers
- the directors play a role of a global broker for the data, so every 
transaction would flow through them. This gives the ability of 
implementing storage-hierarchies (where there are multiple level of 
servers and a given information could be kept in geographically 
different location for redundancy, availability or speed) and automatic 
leveling, so you could keep each of your object servers busy (both in 
the terms of disk space and CPU/IO capacity) and equal the load among them.

Obvious benefits:
- if there is a pdf attachment flowing around in 100.000 people's 
mailbox, it will be stored exactly once, regardless of the surrounding 
e-mail
- you can choose to replicate a given object to any number of servers. 
If you combine this with this with crypto, you can even store 
information on untrustworthy computers (you can use your spare diskspace 
from your dns servers for example)
- you can pull out and start a new storage server any time, you just 
have to tell the broker not to direct connections to there (if the data 
is replicated to at least one other box), or migrate them to another 
servers (if you don't do replication). Installing a new server is as 
simple as adding it's IP to the directors, they notice that there is an 
effectively unused, empty system, so a slow migration (leveling) starts, 
maybe according to the usage statistics of the object servers, so "hot" 
object would be moved first
- you could install an in-memory (or a local disk-backed) object cache 
for each mail frontend, so heavily used objects would remain local to them

etc, etc

It seems to be simple in mind (of course there are a lot more details 
inside), a little harder to code, but not impossible. The hardest part 
seems to be the directors and the metadata servers, then the 
modifications to the mail server (eg. cyrus) and the object store (which 
is painly simple).

Speaking for existing components, I think memcached could be used (the 
protocol and the server implementation) for the object storage 
(complemented with a disk based store on the object servers and a 
transactional layer, which would ask for servers from the directors), 
and maybe an SQL based DB for the metadata store (with some replication).

Any takers? :)
----
Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html