On Fri, 30 Oct 2009, Michael Bacon wrote: > On all systems in the murder, we'll see instances where the mupdate > process goes into a spin where, in truss, it's an endless repeat of > fcntl, stat, fstat, fcntl, thousands of times over. These execute > extremely quickly, but I do wonder if we're assuming that something that > takes very little time takes an insignificant amount of time, when the > time involved becomes significant on an 800k mailboxes database. I agree that latency is probably your problem here. I'm wondering if fsync() latency on the frontends might be a factor given that you report little disk I/O on the mupdate master (IOPS are much more important than Kps, but I'm sure that you already know that). The update process will only be as fast as its weakest link, and you stated earlier: > When we spec'ed out our servers, we didn't put much I/O capacity into > the front-end servers -- just a pair of mirrored 10k disks doing the OS, > the logging, the mailboxes.db, and all the webmail action going on in > another solaris zone on the same hardware. No mention of battery backed write cache there, which tends to be fairly critical for anything involving fsync(). There is an easy way to find out: skiplist_unsafe: 0 If enabled, this option forces the skiplist cyrusdb backend to not sync writes to the disk. Enabling this option is NOT RECOMMENDED. You can ignore the scary warning (at least for test purposes) on murder frontends, given that it is just a readonly replica of the mupdate master. I hope that this isn't a complete red herring. It just struck me that it would be a really easy test to make. -- David Carter Email: David.Carter@xxxxxxxxxxxxx University Computing Service, Phone: (01223) 334502 New Museums Site, Pembroke Street, Fax: (01223) 334679 Cambridge UK. CB2 3QH. ---- Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html