Hello, list, Today we're enjoying our first full work day of independence from the old monolithic cyrus server installed in 1999 (Sun 6800 -- it's had new CPU boards since then, but that's it), and on our new shiny cluster of T5220's that are mostly happily operating as a murder. I say mostly because while most of the times the thing handles our 80,000 users and 14,000+ simultaneous connections like a champ, some of the time, we get some extreme pain, mostly due to syncs between the MUPDATE master and the front-end servers. When we spec'ed out our servers, we didn't put much I/O capacity into the front-end servers -- just a pair of mirrored 10k disks doing the OS, the logging, the mailboxes.db, and all the webmail action going on in another solaris zone on the same hardware. We thought this was sufficient given the fact that no real permanent data lives on these servers, but it turns out that while most of thie time it's fine, if the mupdate processes ever decide they need to re-sync with the master, we've got 6 minutes of trouble ahead while it downloads and stores the 800k entries in the mailboxes.db. During these sync periods, we see two negative impacts. The first is lockup on the mailboxes.db on the front-end servers, which slows down both accepting new IMAP/POP connections and the reception of incoming messages. (The front-ends also accept LMTP connections from a separate pair of queueing hosts, then proxy those to the back-ends.) The second is that, because the front-ends go into a It's awfully frustrating that a system that, as my boss says, performs like a Camaro most of the times until you hit a little rock in the road, and it suddenly turns into a Pinto. It's also frustrating that this seems like one of the less complicated aspects of the system -- publishing replicas of a read-only database to a few worker boxes. I suppose this is Fastmail and others ripped out the proxyd's and replaced them with nginx or perdition. Currently we still support GSSAPI as an auth mechanism, which kept me from going that direction, but given the problems we're seeing, I'd be open to architectural suggestions on either how to tie perdition or nginx to the MUPDATE master (because we don't have the back-ends split along any discernable lines at this point), or suggestions on how to make the master-to-frontend propagation faster or less painful. Sorry for the long message, but it's not a simple problem we're fighting. Michael Bacon UNC Chapel Hill ---- Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html