Hello, 2009/10/19 Michael Bacon <baconm@xxxxxxxxxxxxx>: > Hello, list, > > Today we're enjoying our first full work day of independence from the old > monolithic cyrus server installed in 1999 (Sun 6800 -- it's had new CPU > boards since then, but that's it), and on our new shiny cluster of T5220's > that are mostly happily operating as a murder. > > I say mostly because while most of the times the thing handles our 80,000 > users and 14,000+ simultaneous connections like a champ, some of the time, > we get some extreme pain, mostly due to syncs between the MUPDATE master > and the front-end servers. > > When we spec'ed out our servers, we didn't put much I/O capacity into the > front-end servers -- just a pair of mirrored 10k disks doing the OS, the > logging, the mailboxes.db, and all the webmail action going on in another > solaris zone on the same hardware. We thought this was sufficient given > the fact that no real permanent data lives on these servers, but it turns > out that while most of thie time it's fine, if the mupdate processes ever > decide they need to re-sync with the master, we've got 6 minutes of trouble > ahead while it downloads and stores the 800k entries in the mailboxes.db. > > During these sync periods, we see two negative impacts. The first is > lockup on the mailboxes.db on the front-end servers, which slows down both > accepting new IMAP/POP connections and the reception of incoming messages. > (The front-ends also accept LMTP connections from a separate pair of > queueing hosts, then proxy those to the back-ends.) The second is that, > because the front-ends go into a > > It's awfully frustrating that a system that, as my boss says, performs like > a Camaro most of the times until you hit a little rock in the road, and it > suddenly turns into a Pinto. It's also frustrating that this seems like > one of the less complicated aspects of the system -- publishing replicas of > a read-only database to a few worker boxes. > > I suppose this is Fastmail and others ripped out the proxyd's and replaced > them with nginx or perdition. Currently we still support GSSAPI as an auth > mechanism, which kept me from going that direction, but given the problems > we're seeing, I'd be open to architectural suggestions on either how to tie > perdition or nginx to the MUPDATE master (because we don't have the > back-ends split along any discernable lines at this point), or suggestions > on how to make the master-to-frontend propagation faster or less painful. > > Sorry for the long message, but it's not a simple problem we're fighting. > > Michael Bacon > UNC Chapel Hill > ---- > Cyrus Home Page: http://cyrusimap.web.cmu.edu/ > Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki > List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html > Here we had a similar situation : more than a million mailboxes, and each MUPDATE sync was veeeeery long (when it succeeded). Now, we bypass the problem : we get rid of the MUPDATE (and the skiplist mailboxes.db). We use a home made mysql backend for mailboxes. We added write and read filters to this backend so front-end and back-end servers get the right value from mysql. With this configuration, we're no more in murder mode, we just use front-end cyrus (proxys), back-end cyrus, and mysql. We don't need MUPDATE any more, so we have no sync problems. Cyrus restarts are fast. -- Cyril ---- Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html