Re: Cyrus IMAP and MySQL mailboxes (Building load-balancing cluster)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2006. 11. 16. 22:46, Bron Gondwana wrote:
Seriously, see the other response, DbMail might be what you want -
personally I'd put blobs in the filesystem (actually, my SHA1 based
VFS system, but that's a different story) and metadata in mysql... if
I was writing my perfect IMAP solution, which I'm not, yet.  Cyrus
does the job just fine, and you work around the wrinkles.  It's better
than anything else out there for a biggish system right now.
I've come to a sligthly different conclusion after pondering on the "perfect IMAP solution" topic.

I've thought about the following type/group of servers:
- "traditional" mailbox servers, which speak protocols like imap and pop to the user
- object servers, which store key/value pairs and basically nothing more
- metadata servers, which store the needed metadata to serve the e-mails (could be the same group as the object servers) - directors, which would manage all the data (object and meta) on the storage servers and direct the clients (the mailbox servers) to the right one

The basic principles would be:
- object servers can be dumb, their all purpose is to store the value (a blob) with a key (identifier to them) - object servers report to the directors regularly, so the directors know how much space is there, what keys are stored there and how loaded is the given server at the moment - metadata servers store everything, which is now stored in a directory structure or in different databases (bdb, skiplist, others) in cyrus backends, they know that a folder in user's mailbox consists of what keys, stores meta information to the e-mails (headers, maybe keywords from the e-mail for faster searching, etc) - delivery to the system would happen by first split the e-mail to different parts (metadata and data, data into mime parts, etc). Then each of the data would be checksummed (for example with SHA1) and stored into one (or more, depends on the design) object server along with the metadata for the aproppriate servers. - directors would then notice (by the object servers announcing them, or the mailbox servers asked them where to store the given object) the new object, distributed between themselves (for redundancy) - fetching e-mails (like fetching them from the filesystem now) would involve metadata, director lookups (where is the information) and finally one or more object store get operation - the mailbox servers could do some compression and/or encryption on the contents they store in the metadata/object servers - the directors play a role of a global broker for the data, so every transaction would flow through them. This gives the ability of implementing storage-hierarchies (where there are multiple level of servers and a given information could be kept in geographically different location for redundancy, availability or speed) and automatic leveling, so you could keep each of your object servers busy (both in the terms of disk space and CPU/IO capacity) and equal the load among them.

Obvious benefits:
- if there is a pdf attachment flowing around in 100.000 people's mailbox, it will be stored exactly once, regardless of the surrounding e-mail - you can choose to replicate a given object to any number of servers. If you combine this with this with crypto, you can even store information on untrustworthy computers (you can use your spare diskspace from your dns servers for example) - you can pull out and start a new storage server any time, you just have to tell the broker not to direct connections to there (if the data is replicated to at least one other box), or migrate them to another servers (if you don't do replication). Installing a new server is as simple as adding it's IP to the directors, they notice that there is an effectively unused, empty system, so a slow migration (leveling) starts, maybe according to the usage statistics of the object servers, so "hot" object would be moved first - you could install an in-memory (or a local disk-backed) object cache for each mail frontend, so heavily used objects would remain local to them

etc, etc

It seems to be simple in mind (of course there are a lot more details inside), a little harder to code, but not impossible. The hardest part seems to be the directors and the metadata servers, then the modifications to the mail server (eg. cyrus) and the object store (which is painly simple).

Speaking for existing components, I think memcached could be used (the protocol and the server implementation) for the object storage (complemented with a disk based store on the object servers and a transactional layer, which would ask for servers from the directors), and maybe an SQL based DB for the metadata store (with some replication).

Any takers? :)
----
Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

[Index of Archives]     [Cyrus SASL]     [Squirrel Mail]     [Asterisk PBX]     [Video For Linux]     [Photo]     [Yosemite News]     [gtk]     [KDE]     [Gimp on Windows]     [Steve's Art]

  Powered by Linux