On Fri, Oct 30, 2009 at 03:24:25PM -0400, Michael Bacon wrote: > I haven't had the guts to roll the patched, CVS version into > production as our primary mupdate server, but I did put it in on a > test machine in replica mode. My measurement was on a clean server > (no pre-existing mailboxes.db), and it didn't appear noticeably > faster. I haven't measured hard numbers, but it was still well over > 10 minutes to complete the sync and write it out to disk. Sorry - I probably didn't explain what the patch does very well! It doesn't actually make things run any faster - what it does it breaks the one big transaction into lots of small transactions so it doens't block everything else from happening while it runs. > The odd thing is that we see major performance differences depending > on what disk the client is living on. For instance, if we put the > mailboxes.db (and the whole metapartition) on superfast Hitachi > disks over a 4 GB SAN connection, the sync will finish in just under > three minutes. Still, even though we see that big difference, we > don't see any kind of I/O contention in the iostat output. the > k/sec figures are well within what the drives should be able to > handle, and the % blocking stays in low single digits most of the > time, while peeking up in the 15-25 range from time to time, but not > staying there. It does make me wonder if what we're seeing is > related to I/O latency. Hmm, yeah. > I haven't delved deep into the skiplist code, but I almost wonder if > at least some of the slowness is the foreach iteration on the > mupdate master in read mode. On all systems in the murder, we'll > see instances where the mupdate process goes into a spin where, in > truss, it's an endless repeat of fcntl, stat, fstat, fcntl, > thousands of times over. These execute extremely quickly, but I do > wonder if we're assuming that something that takes very little time > takes an insignificant amount of time, when the time involved > becomes significant on an 800k mailboxes database. Almost definitely. I didn't even look at that end of the operation, but I suspect this could be made a lot more efficient with transactional batching as well. Either read all 800k database into a linked list in memory, or do something even trickier. The even trickier bit will be pretty nasty though. Here's what I really want to add to the cyrus db layer: /* pseudocode */ db->next_record(char *key, int keylen, db_txn *txn); Which gets the next record AFTER the (possibly non-existant) record pointed to by key. This is what foreach uses internally - but by having it directly accessible you could implement a partial, restartable foreach. > Finally, as to how we get into this situation in the first place, it > appears to happen when the mupdate master, in our environment and > configuration, can handle having up to three replicas connected to > it before it goes into a bad state during high load. I've never > caught it at the point of actually going downhill, but my impression > is that so many processes start demanding responses from the mupdate > server that the persistent connections that the slave mupdates have > to the master timeout and disconnect, then reconnect and try to > re-sync. (At least that's what it looks like in the logs.) > Incoming IMAP connections won't do it, but lmtpproxy connections > seem to have a knack for it, since for whatever reason they appear > to generate "kicks" at a pretty high rate. > > Still looking, but open to suggestions here. I'll have a look at speeding up the mupdate reads. Bron. ---- Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html